I2SB: Image-to-Image Schrödinger Bridge

Guan-Horng Liu1
Arash Vahdat2
De-An Huang2
Evangelos A. Theodorou1
Weili Nie†,2
Anima Anandkumar†,2,3
1 Georgia Tech 2 NVIDIA Corporation 3 Caltech
† Equal advising
Paper (arXiv) Code (coming soon)

I2SB is a new class of conditional diffusion models that directly construct diffusion bridges between two given distributions. It yields interpretable generation, enjoys better sampling efficiency, and sets new records on many image restoration tasks.

(best viewed on a desktop/laptop)
Abstract

We propose Image-to-Image Schrödinger Bridge (I2SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I2SB belongs to a tractable class of Schrödinger bridge, the nonlinear extension to score-based models, whose marginal distributions can be computed analytically given boundary pairs. This results in a simulation-free framework for nonlinear diffusions, where the I2SB training becomes scalable by adopting practical techniques used in standard diffusion models. We validate I2SB in solving various image restoration tasks, including inpainting, super-resolution, deblurring, and JPEG restoration on ImageNet 256x256 and show that I2SB surpasses standard conditional diffusion models with more interpretable generative processes. Moreover, I2SB matches the performance of inverse methods that additionally require the knowledge of the corruption operators. Our work opens up new algorithmic opportunities for developing efficient nonlinear diffusion models on a large scale.


Schrödinger Bridge as Interpretable Conditional Diffusion Model

Rather than generating images from random noise as in prior conditional or inverse-guided diffusion models, I2SB directly learns the diffusion bridges between two given distribution, e.g., degraded and clean image distributions, yielding more interpretable generation that is effective for image restoration.

X

Model diagrams between I2SB and prior diffusion models.

Interpretable generative processes of I2SB.


Results on Image Restoration

We validate I2SB in solving various image restoration problems on ImageNet 256x256. I2SB surpasses standard conditional diffusion models in many tasks and matches the performance of diffusion-based inversed models (e.g., DDRM) without knowing the corruption operators in both training and generation. See our paper for the quantitative results.

Clink the tabs below to visualize reconstruction results of each restoration task.

Input
I2SB output
Reference
Input
I2SB output
Reference
Input
I2SB output
Reference
Input
I2SB output
Reference

Interpretable and Efficient Generative Processes

Interpretable generation implies sampling efficiency. Since the clean and degraded images are typically close to each other, the generation of I2SB starts from a much more structurally informative prior compared to random noise. Consequently, I2SB enjoys little or no performance drop as the number of function evaluation (NFE) decreases in sampling. In constrast, standard conditional diffuion models such as Palette (Saharia et al., 2022) tends to generate unnatural images with noisy repainting or contrast shift with small NFE.

Clink the tabs below to visualize comparison between Palette and I2SB across different NFEs.


Publications

arXiv preprint: https://arxiv.org/abs/2302.05872

code: coming soon

BibTex:

@article{liu2023i2sb,
  title={I{$^2$}SB: Image-to-Image Schr{\"o}dinger Bridge},
  author={Liu, Guan-Horng and Vahdat, Arash and Huang, De-An and Theodorou, Evangelos A and Nie, Weili and Anandkumar, Anima},
  journal={arXiv preprint arXiv:2302.05872},
  year={2023},
}

Back to top