TD
TurboDiffusion
Open Source

TurboDiffusion: Bringing Video Diffusion into the Seconds Era

Built by Tsinghua University's ML group, TurboDiffusion combines attention acceleration, step distillation, and low-bit quantization to achieve 100-200x end-to-end speedups on a single RTX 5090 while preserving video quality.

See the Difference

Original

Wan2.2 Original

~27 minutes generation time

TurboDiffusion

Wan2.2 + TurboDiffusion

~9 seconds generation time

100-200x End-to-end speedup
3-4 Steps High-quality distillation
W8A8 Low-bit quantization
1.3B-14B Model scale

Project Overview

TurboDiffusion is an acceleration framework for video generation models. It targets practical deployment with support for both text-to-video and image-to-video pipelines based on Wan2.1 and Wan2.2 families, and has already landed in multiple production platforms.

Research Team

Authors span Tsinghua University, UC Berkeley, and industry partners led by Jun Zhu.

License

Apache License 2.0, friendly to commercial and non-commercial usage.

Community Impact

Often called the "DeepSeek Moment" for video diffusion in developer communities.

Core Technology Stack

TurboDiffusion delivers full-pipeline acceleration through graph compression, attention speedups, and quantization.

Attention Acceleration

SageAttention2++ provides low-bit attention acceleration, with SLA sparse attention layered on top.

Step Distillation

rCM distillation enables high-quality video in just 3-4 steps.

W8A8 Quantization

8-bit weights and activations boost linear layer throughput and reduce memory.

SLA Sparse Attention

Provides an additional 17-20x sparse speedup, orthogonal to low-bit acceleration.

Performance Benchmarks

Single RTX 5090 tests show 100-200x acceleration for 5-8 second high-quality video outputs.

Model Original Time TurboDiffusion Time Speedup
Wan2.1-T2V-1.3B-480P ~166s 1.8s ~92x
Wan2.1-T2V-14B ~1635s 9.4s ~174x
Vidu 1080p 8s ~900s ~8s ~112x

Supported Model Matrix

TurboWan2.1-T2V-1.3B-480P

Optimized for lightweight, real-time generation workflows.

TurboWan2.1-T2V-14B-720P

High-fidelity outputs for commercial-grade video generation.

TurboWan2.2-I2V-A14B-720P

Image-to-video support for storyboard-driven pipelines.

Installation & Quick Start

Recommended: Python ≥ 3.9, Torch ≥ 2.7.0. Use unquantized checkpoints for GPUs with 40GB+ VRAM.

Quick Install

conda create -n turbodiffusion python=3.12
conda activate turbodiffusion
pip install turbodiffusion --no-build-isolation

Build from Source

git clone https://github.com/thu-ml/TurboDiffusion.git
cd TurboDiffusion
git submodule update --init --recursive
pip install -e . --no-build-isolation

Ecosystem & Integrations

ComfyUI Plugin

Community-driven Comfyui_turbodiffusion plugin supports 100-200x fast video generation.

Industry Adoption

Adopted by leading teams such as Tencent, ByteDance, Alibaba, Baidu, and others.

Inference Engines

SageAttention is integrated into TensorRT and multiple accelerator platforms.

Roadmap

The team is expanding support for additional video generation paradigms, with ongoing work on parallelism optimization, vLLM-Omni integration, and broader model coverage.

Citation

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and
          Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}