CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

ICCV 2025
Stanford University
*Equal contribution

CHORDS is a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism for diffusion sampling. CHORDS initializes a sequence of cores hierarchically from slow to fast, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core rectification mechanism. Extensive experiments demonstrate that CHORDS significantly accelerates diffusion sampling across various modern image and video diffusion models.

CHORDS applied to HunyuanVideo accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the fastest core for each case.

CHORDS applied to CogVideoX1.5-5B accelerates text-to-video generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the fastest core for each case.

CHORDS applied to Flux accelerates text-to-image generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the fastest core for each case.

CHORDS applied to Stable Diffusion 3.5 Large accelerates text-to-image generation by 2-3x using 4-8 cores without quality degradation. Samples are produced by the fastest core for each case.

Qualitative comparisons on CogVideo1.5X with the same initial latent.
Left: CHORDS (Ours); Middle: SRDS; Right: ParaDIGMS.

Abstract

Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures. Existing acceleration techniques either require extensive model retraining or compromise significantly on sample quality.

This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism. Our framework views multi-core diffusion sampling as an ODE solver pipeline, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core communication mechanism. This motivates our multi-core training-free diffusion sampling accelerator, CHORDS, which is compatible with various diffusion samplers, model architectures, and modalities.

Through extensive experiments, CHORDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation. This advancement enables CHORDS to establish a solid foundation for real-time, high-fidelity diffusion generation.

Method

Method Overview

Our approach leverages at core an operation named "Multi-core Rectification" (as depicted in the left figure) that refines the latents of the faster cores (the cores with larger core index) with the slower but more accurate ones (those with smaller core index), with necessary theoretical justification. With such technique, we are able to remarkably streamline diffusion generation with multi-core parallelism without measurable quality degradation. We also establish a general pipeline recipe that optimally avoids bubbles and permits efficient information propagation across cores. See Algorithm 1 for more details.

Main Results

Method Overview Method Overview

CHORDS offers significant diffusion sampling speedup across a diverse range of video diffusion (Table 1) and image diffusion (Table 2) models consistently across 4 to 8 cores, while not sacrificing sample quality, as evidenced by VBench Quality Score and CLIP Score. Please refer to our paper for more detailed results and ablation studies.

BibTeX

@inproceedings{han2025chords,
  author    = {Han, Jiaqi and Ye, Haotian and Li, Puheng and Xu, Minkai and Zou, James and Ermon, Stefano},
  title     = {CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}