Project5 The Power of Diffusion Models

Part A of a Larger Project

Overview

In Part A of this project, we explore the capabilities of diffusion models. We implement diffusion sampling loops and apply them to tasks such as inpainting and creating optical illusions.

Part 1: Sampling Loops

1.1 Implementing the Forward Process

Test Image at Noise Levels [250, 500, 750]

1.2 Classical Denoising

For each noisy image, we applied Gaussian blur to attempt denoising.

1.3 One-Step Denoising

Using the UNet model to estimate and remove noise.

1.4 Iterative Denoising

Denoising Process (Every 5th Image)

download

download-2

download-3

download-4

download-5

Comparison with One-Step Denoising and Gaussian Blurring

• One-Step Denoised Image

download-6

• Gaussian Blurred Image

gau

1.5 Diffusion Model Sampling

We generated images from random noise using the iterative denoising process.

Sampled Images

download-7

download-8

download-9

download-10

1.6 Classifier-Free Guidance (CFG)

Generated Images with CFG Scale = 7

download-11

download-12

download-13

download-14

download-16

1.7 Image-to-image Translation

We applied the iterative denoising process to noisy versions of images.

Edits of Test Image at Noise Levels [1, 3, 5, 7, 10, 20]

WeChat3b5105ca04bacff2241811e5869c884d

Edits of Own Test Images

WeChat20cfd01c60138606de0c9f4041df5eab

1.7.1 Editing Hand-Drawn and Web Images

WechatIMG296

7.1

WechatIMG298

1.7.2 Inpainting

Inpainted Test Image

download-15

WechatIMG301

Own Images Edited

Image 1

• Original

download-19

• Mask

download-18

• Inpainted

download-33

If we reverse the mask:

download-20

Image 2

• Original

download-22

• Mask

download-21

• Inpainted

download-38

download-32

1.7.3 Text-Conditional Image-to-image Translation

Edits of Test Image with Given Prompt

WeChatb0985639ad48a31115fd4aab0e2cbe77

Edits of Own Test Images

• Image 1

WeChat952ea3d8c3139ca8bec1227993b6ab7d

• Image 2

WeChatbb72a682a0753ab5fef36c77058755c1

1.8 Visual Anagrams

Visual Anagram: “Old Man” and “People Around a Campfire”

download-34

download-35

download-27

Additional Illusions

• Illusion 1: ["a lithograph of waterfalls"], ["a photo of a man"]

• Image

download-37

• Flipped

download-29

• Illusion 2: ["a photo of the amalfi cost"], ["a photo of a dog"]

• Image

download-36

• Flipped

download-28

1.9 Hybrid Images

Hybrid Image: “Skull” and “Waterfall”

• Resulting Image

WechatIMG305

WechatIMG307

Additional Hybrid Images

• Hybrid Image 1': a lithograph of a forest scene', 'a lithograph of hulk's face'

WechatIMG310

WechatIMG309 1

• Hybrid Image 2: 'a lithograph of a skull', 'an oil painting of a snowy mountain village'

WechatIMG312

WechatIMG311

Conclusion

In this project, we explored the capabilities of diffusion models through various implementations and applications. We observed how iterative denoising improves image quality over single-step methods, and how techniques like CFG enhance the results further. By experimenting with image translation, inpainting, visual anagrams, and hybrid images, we demonstrated the versatility and power of diffusion models in generating and manipulating images.