CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

Abstract

Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures. Existing acceleration techniques either require extensive model retraining or compromise significantly on sample quality.

This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism. Our framework views multi-core diffusion sampling as an ODE solver pipeline, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core communication mechanism. This motivates our multi-core training-free diffusion sampling accelerator, CHORDS, which is compatible with various diffusion samplers, model architectures, and modalities.

Through extensive experiments, CHORDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation. This advancement enables CHORDS to establish a solid foundation for real-time, high-fidelity diffusion generation.

Method

Our approach leverages at core an operation named "Multi-core Rectification" (as depicted in the left figure) that refines the latents of the faster cores (the cores with larger core index) with the slower but more accurate ones (those with smaller core index), with necessary theoretical justification. With such technique, we are able to remarkably streamline diffusion generation with multi-core parallelism without measurable quality degradation. We also establish a general pipeline recipe that optimally avoids bubbles and permits efficient information propagation across cores. See Algorithm 1 for more details.

Main Results

CHORDS offers significant diffusion sampling speedup across a diverse range of video diffusion (Table 1) and image diffusion (Table 2) models consistently across 4 to 8 cores, while not sacrificing sample quality, as evidenced by VBench Quality Score and CLIP Score. Please refer to our paper for more detailed results and ablation studies.

BibTeX

@inproceedings{han2025chords,
  author    = {Han, Jiaqi and Ye, Haotian and Li, Puheng and Xu, Minkai and Zou, James and Ermon, Stefano},
  title     = {CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}

CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

CHORDS applied to HunyuanVideo accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the early-terminated fast core.

CHORDS applied to CogVideoX1.5-5B accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the early-terminated fast core.

Qualitative comparisons on CogVideo1.5X with the same initial latent.
Left: CHORDS (Ours); Middle: SRDS; Right: ParaDIGMS.

Abstract

Method

Main Results

BibTeX

CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

CHORDS applied to HunyuanVideo accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the early-terminated fast core.

CHORDS applied to CogVideoX1.5-5B accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the early-terminated fast core.

Qualitative comparisons on CogVideo1.5X with the same initial latent. Left: CHORDS (Ours); Middle: SRDS; Right: ParaDIGMS.

Abstract

Method

Main Results

BibTeX

Qualitative comparisons on CogVideo1.5X with the same initial latent.
Left: CHORDS (Ours); Middle: SRDS; Right: ParaDIGMS.