OASIS

Towards Redundancy Reduction in Diffusion Models
for Efficient Video Super-Resolution

¹Carnegie Mellon University, ²Shanghai Jiao Tong University, ³Snap Inc., ⁴South China University of Technology

^*Corresponding author, ^†Equal advising

Abstract

Diffusion models have recently shown promising results for video super-resolution (VSR). However, directly adapting generative diffusion models to VSR can result in redundancy, since low-quality videos already preserve substantial content information. Such redundancy leads to increased computational overhead and learning burden, as the model performs superfluous operations and must learn to filter out irrelevant information. To address this problem, we propose OASIS, an efficient an efficient one-step diffusion model with attention specialization for real-world video super-resolution. OASIS incorporates an attention specialization routing that assigns attention heads to different patterns according to their intrinsic behaviors. This routing mitigates redundancy while effectively preserving pretrained knowledge, allowing diffusion models to better adapt to VSR and achieve stronger performance. Moreover, we propose a simple yet effective progressive training strategy, which starts with temporally consistent degradations and then shifts to inconsistent settings. This strategy facilitates learning under complex degradations. Extensive experiments demonstrate that OASIS achieves state-of-the-art performance on both synthetic and real-world datasets. OASIS also provides superior inference speed, offering a 6.2× speedup over one-step diffusion baselines such as SeedVR2.

Method

OASIS incorporates an attention specialization routine (ASR) to mitigate redundancy while improving performance. Given an input low-resolution video, we first map it into the latent space via pixel-unshuffle, and then process it with a diffusion transformer equipped with ASR. ASR divides attention heads into global, intra-frame, and window groups to capture complementary contexts, with grouping based on the KL-divergence between their localized and global attention distributions. Their outputs are concatenated into an aggregated feature, and a VAE decoder reconstructs the video from the restored latent.

Comparison with SOTA

Qualitative Comparison

VEnhancer

STAR

SeedVR

SeedVR2

OASIS (ours)

VEnhancer

STAR

SeedVR

SeedVR2

OASIS (ours)

VEnhancer

STAR

SeedVR

SeedVR2

OASIS (ours)

VEnhancer

STAR

SeedVR

SeedVR2

OASIS (ours)

VEnhancer

STAR

SeedVR

SeedVR2

OASIS (ours)

BibTeX

@article{guo2025towards, title={Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution}, author={Guo, Jinpei and Ji, Yifei and Chen, Zheng and Wang, Yufei and Ma, Sizhuo and Guo, Yong and Zhang, Yulun and Wang, Jian}, journal={arXiv preprint arXiv:2509.23980}, year={2025} }

O A S I S

Towards Redundancy Reduction in Diffusion Models
for Efficient Video Super-Resolution

Real-World Degradations (upscale ×4)

Synthetic Degradations (upscale ×4)

Abstract

Method

Comparison with SOTA

BibTeX