Showing preview only (951K chars total). Download the full file or copy to clipboard to get everything.
Repository: OpenGVLab/Diffree
Branch: main
Commit: 29aee635bc56
Files: 106
Total size: 906.8 KB
Directory structure:
gitextract_fktbw4g3/
├── LICENSE
├── README.md
├── app.py
├── config/
│ ├── generate.yaml
│ └── train.yaml
├── dataset_diffree.py
├── main.py
├── requirements.txt
└── stable_diffusion/
├── LICENSE
├── README.md
├── Stable_Diffusion_v1_Model_Card.md
├── assets/
│ ├── results.gif.REMOVED.git-id
│ ├── stable-samples/
│ │ ├── img2img/
│ │ │ ├── upscaling-in.png.REMOVED.git-id
│ │ │ └── upscaling-out.png.REMOVED.git-id
│ │ └── txt2img/
│ │ ├── merged-0005.png.REMOVED.git-id
│ │ ├── merged-0006.png.REMOVED.git-id
│ │ └── merged-0007.png.REMOVED.git-id
│ └── txt2img-preview.png.REMOVED.git-id
├── configs/
│ ├── autoencoder/
│ │ ├── autoencoder_kl_16x16x16.yaml
│ │ ├── autoencoder_kl_32x32x4.yaml
│ │ ├── autoencoder_kl_64x64x3.yaml
│ │ └── autoencoder_kl_8x8x64.yaml
│ ├── latent-diffusion/
│ │ ├── celebahq-ldm-vq-4.yaml
│ │ ├── cin-ldm-vq-f8.yaml
│ │ ├── cin256-v2.yaml
│ │ ├── ffhq-ldm-vq-4.yaml
│ │ ├── lsun_bedrooms-ldm-vq-4.yaml
│ │ ├── lsun_churches-ldm-kl-8.yaml
│ │ └── txt2img-1p4B-eval.yaml
│ ├── retrieval-augmented-diffusion/
│ │ └── 768x768.yaml
│ └── stable-diffusion/
│ └── v1-inference.yaml
├── data/
│ ├── example_conditioning/
│ │ └── text_conditional/
│ │ └── sample_0.txt
│ ├── imagenet_clsidx_to_label.txt
│ ├── imagenet_train_hr_indices.p.REMOVED.git-id
│ ├── imagenet_val_hr_indices.p
│ └── index_synset.yaml
├── environment.yaml
├── ldm/
│ ├── data/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── imagenet.py
│ │ └── lsun.py
│ ├── lr_scheduler.py
│ ├── models/
│ │ ├── autoencoder.py
│ │ └── diffusion/
│ │ ├── __init__.py
│ │ ├── classifier.py
│ │ ├── ddim.py
│ │ ├── ddpm.py
│ │ ├── ddpm_diffree.py
│ │ ├── ddpm_edit.py
│ │ ├── dpm_solver/
│ │ │ ├── __init__.py
│ │ │ ├── dpm_solver.py
│ │ │ └── sampler.py
│ │ └── plms.py
│ ├── modules/
│ │ ├── attention.py
│ │ ├── diffusionmodules/
│ │ │ ├── __init__.py
│ │ │ ├── model.py
│ │ │ ├── openaimodel.py
│ │ │ ├── openaimodel_diffree.py
│ │ │ └── util.py
│ │ ├── distributions/
│ │ │ ├── __init__.py
│ │ │ └── distributions.py
│ │ ├── ema.py
│ │ ├── encoders/
│ │ │ ├── __init__.py
│ │ │ └── modules.py
│ │ ├── image_degradation/
│ │ │ ├── __init__.py
│ │ │ ├── bsrgan.py
│ │ │ ├── bsrgan_light.py
│ │ │ └── utils_image.py
│ │ ├── losses/
│ │ │ ├── __init__.py
│ │ │ ├── contperceptual.py
│ │ │ └── vqperceptual.py
│ │ └── x_transformer.py
│ └── util.py
├── main.py
├── models/
│ ├── first_stage_models/
│ │ ├── kl-f16/
│ │ │ └── config.yaml
│ │ ├── kl-f32/
│ │ │ └── config.yaml
│ │ ├── kl-f4/
│ │ │ └── config.yaml
│ │ ├── kl-f8/
│ │ │ └── config.yaml
│ │ ├── vq-f16/
│ │ │ └── config.yaml
│ │ ├── vq-f4/
│ │ │ └── config.yaml
│ │ ├── vq-f4-noattn/
│ │ │ └── config.yaml
│ │ ├── vq-f8/
│ │ │ └── config.yaml
│ │ └── vq-f8-n256/
│ │ └── config.yaml
│ └── ldm/
│ ├── bsr_sr/
│ │ └── config.yaml
│ ├── celeba256/
│ │ └── config.yaml
│ ├── cin256/
│ │ └── config.yaml
│ ├── ffhq256/
│ │ └── config.yaml
│ ├── inpainting_big/
│ │ └── config.yaml
│ ├── layout2img-openimages256/
│ │ └── config.yaml
│ ├── lsun_beds256/
│ │ └── config.yaml
│ ├── lsun_churches256/
│ │ └── config.yaml
│ ├── semantic_synthesis256/
│ │ └── config.yaml
│ ├── semantic_synthesis512/
│ │ └── config.yaml
│ └── text2img256/
│ └── config.yaml
├── notebook_helpers.py
├── scripts/
│ ├── download_first_stages.sh
│ ├── download_models.sh
│ ├── img2img.py
│ ├── inpaint.py
│ ├── knn2img.py
│ ├── latent_imagenet_diffusion.ipynb.REMOVED.git-id
│ ├── sample_diffusion.py
│ ├── tests/
│ │ └── test_watermark.py
│ ├── train_searcher.py
│ └── txt2img.py
└── setup.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
# Diffree
Official PyTorch implement of paper "Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model"
<p align="center">
<a href="https://opengvlab.github.io/Diffree/"><u>[🌐 Project Page]</u></a>
<a href="https://huggingface.co/datasets/LiruiZhao/OABench"><u>[🗞️ Dataset]</u></a>
<a href="https://drive.google.com/file/d/1AdIPA5TK5LB1tnqqZuZ9GsJ6Zzqo2ua6/view"><u>[🎥 Video]</u></a>
<a href="https://arxiv.org/pdf/2407.16982"><u>[📜 Arxiv]</u></a>
<a href="https://huggingface.co/spaces/LiruiZhao/Diffree"><u>[🤗 Hugging Face Demo]</u></a>
</p>
## Abstract
<details><summary>CLICK for the full abstract</summary>
> This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve cumbersome human intervention in specifying bounding boxes or user-scribbled masks. To tackle this challenge, we introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control. To this end, we curate OABench, an exquisite synthetic dataset by removing objects with advanced image inpainting techniques. OABench comprises 74K real-world tuples of an original image, an inpainted image with the object removed, an object mask, and object descriptions. Trained on OABench using the Stable Diffusion model with an additional mask prediction module, Diffree uniquely predicts the position of the new object and achieves object addition with guidance from only text. Extensive experiments demonstrate that Diffree excels in adding new objects with a high success rate while maintaining background consistency, spatial appropriateness, and object relevance and quality.
> </details>
We are open to any suggestions and discussions and feel free to contact us through [liruizhao@stu.xmu.edu.cn](mailto:liruizhao@stu.xmu.edu.cn).
## News
- [2024/07] Release inference code and <a href="https://huggingface.co/LiruiZhao/Diffree">checkpoint</a>
- [2024/07] Release <a href="https://huggingface.co/spaces/LiruiZhao/Diffree">🤗 Hugging Face Demo</a>
- [2024/08] Release ConfyUI demo. Thanks [smthemex](https://github.com/smthemex) ([ComfyUI_Diffree](https://github.com/smthemex/ComfyUI_Diffree)) for helping!
- [2024/08] Release [training dataset OABench](https://huggingface.co/datasets/LiruiZhao/OABench) in Hugging Face
- [2024/08] Release training code
- [2024/08] Update <a href="https://huggingface.co/spaces/LiruiZhao/Diffree">🤗 Demo</a>, now support iterative generation through a text list
## Contents
- [Install](#install)
- [Inference](#inference)
- [Data Download](#data-download)
- [Training](#training)
- [Citation](#citation)
## Install
1. Clone this repository and navigate to Diffree folder
```
git clone https://github.com/OpenGVLab/Diffree.git
cd Diffree
```
2. Install package
```
conda create -n diffree python=3.8.5
conda activate diffree
pip install -r requirements.txt
```
## Inference
1. Download the Diffree model from Huggingface.
```
pip install huggingface_hub
huggingface-cli download LiruiZhao/Diffree --local-dir ./checkpoints
```
2. You can inference with the script:
```
python app.py
```
Specifically, `--resolution` defines the maximum size for both the resized input image and output image. For our <a href="https://huggingface.co/spaces/LiruiZhao/Diffree">Hugging Face Demo</a>, we set the `--resolution` to `512` to enhance the user experience with higher-resolution results. While during the training process of Diffree, `--resolution` is set to `256`. Therefore, reducing `--resolution` might improve results (e.g., consider trying `320` as a potential value).
## Data Download
You can download the OABench here, which are used for training the Diffree.
1. Download the OABench dataset from Huggingface.
```
huggingface-cli download --repo-type dataset LiruiZhao/OABench --local-dir ./dataset --local-dir-use-symlinks False
```
2. Find and extract all compressed files in the dataset directory
```
cd dataset
ls *.tar.gz | xargs -n1 tar xvf
```
The data structure should be like:
```
|-- dataset
|-- original_images
|-- 58134.jpg
|-- 235791.jpg
|-- ...
|-- inpainted_images
|-- 58134
|-- 634757.jpg
|-- 634761.jpg
|-- ...
|-- 235791
|-- ...
|-- mask_images
|-- 58134
|-- 634757.png
|-- 634761.png
|-- ...
|-- 235791
|-- ...
|-- annotations.json
```
In the `inpainted_images` and `mask_images` directories, the top-level folders correspond to the original images, and the contents of each folder are the inpainted images and masks for those images.
## Training
Diffree is trained by fine-tuning from an initial StableDiffusion checkpoint.
1. Download a Stable Diffusion checkpoint and move it to the `checkpoints` directory. For our trained models, we used [the v1.5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt) as the starting point. You can also use the following command:
```
curl -L https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt -o checkpoints/v1-5-pruned-emaonly.ckpt
```
2. Next, you can start training.
```
python main.py --name diffree --base config/train.yaml --train --gpus 0,1,2,3
```
All configurations are stored in the YAML file. If you need to use custom configuration settings, you can modify the `--base` to point to your custom config file.
## Citation
If you found this work useful, please consider citing:
```
@article{zhao2024diffree,
title={Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model},
author={Zhao, Lirui and Yang, Tianshuo and Shao, Wenqi and Zhang, Yuxin and Qiao, Yu and Luo, Ping and Zhang, Kaipeng and Ji, Rongrong},
journal={arXiv preprint arXiv:2407.16982},
year={2024}
}
```
================================================
FILE: app.py
================================================
from __future__ import annotations
import math
import random
import sys
from argparse import ArgumentParser
from tqdm.auto import trange
import einops
import gradio as gr
import k_diffusion as K
import numpy as np
import torch
import torch.nn as nn
from einops import rearrange
from omegaconf import OmegaConf
from PIL import Image, ImageOps, ImageFilter
from torch import autocast
import cv2
import imageio
sys.path.append("./stable_diffusion")
from stable_diffusion.ldm.util import instantiate_from_config
class CFGDenoiser(nn.Module):
def __init__(self, model):
super().__init__()
self.inner_model = model
def forward(self, z_0, z_1, sigma, cond, uncond, text_cfg_scale, image_cfg_scale):
cfg_z_0 = einops.repeat(z_0, "1 ... -> n ...", n=3)
cfg_z_1 = einops.repeat(z_1, "1 ... -> n ...", n=3)
cfg_sigma = einops.repeat(sigma, "1 ... -> n ...", n=3)
cfg_cond = {
"c_crossattn": [torch.cat([cond["c_crossattn"][0], uncond["c_crossattn"][0], uncond["c_crossattn"][0]])],
"c_concat": [torch.cat([cond["c_concat"][0], cond["c_concat"][0], uncond["c_concat"][0]])],
}
output_0, output_1 = self.inner_model(cfg_z_0, cfg_z_1, cfg_sigma, cond=cfg_cond)
out_cond_0, out_img_cond_0, out_uncond_0 = output_0.chunk(3)
out_cond_1, _, _ = output_1.chunk(3)
return out_uncond_0 + text_cfg_scale * (out_cond_0 - out_img_cond_0) + image_cfg_scale * (out_img_cond_0 - out_uncond_0), \
out_cond_1
def load_model_from_config(config, ckpt, vae_ckpt=None, verbose=False):
print(f"Loading model from {ckpt}")
pl_sd = torch.load(ckpt, map_location="cpu")
if "global_step" in pl_sd:
print(f"Global Step: {pl_sd['global_step']}")
sd = pl_sd["state_dict"]
if vae_ckpt is not None:
print(f"Loading VAE from {vae_ckpt}")
vae_sd = torch.load(vae_ckpt, map_location="cpu")["state_dict"]
sd = {
k: vae_sd[k[len("first_stage_model.") :]] if k.startswith("first_stage_model.") else v
for k, v in sd.items()
}
model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=True)
if len(m) > 0 and verbose:
print("missing keys:")
print(m)
if len(u) > 0 and verbose:
print("unexpected keys:")
print(u)
return model
def append_dims(x, target_dims):
"""Appends dimensions to the end of a tensor until it has target_dims dimensions."""
dims_to_append = target_dims - x.ndim
if dims_to_append < 0:
raise ValueError(f'input has {x.ndim} dims but target_dims is {target_dims}, which is less')
return x[(...,) + (None,) * dims_to_append]
class CompVisDenoiser(K.external.CompVisDenoiser):
def __init__(self, model, quantize=False, device='cpu'):
super().__init__(model, quantize, device)
def get_eps(self, *args, **kwargs):
return self.inner_model.apply_model(*args, **kwargs)
def forward(self, input_0, input_1, sigma, **kwargs):
c_out, c_in = [append_dims(x, input_0.ndim) for x in self.get_scalings(sigma)]
# eps_0, eps_1 = self.get_eps(input_0 * c_in, input_1 * c_in, self.sigma_to_t(sigma), **kwargs)
eps_0, eps_1 = self.get_eps(input_0 * c_in, self.sigma_to_t(sigma).cuda(), **kwargs)
return input_0 + eps_0 * c_out, eps_1
def to_d(x, sigma, denoised):
"""Converts a denoiser output to a Karras ODE derivative."""
return (x - denoised) / append_dims(sigma, x.ndim)
def default_noise_sampler(x):
return lambda sigma, sigma_next: torch.randn_like(x)
def get_ancestral_step(sigma_from, sigma_to, eta=1.):
"""Calculates the noise level (sigma_down) to step down to and the amount
of noise to add (sigma_up) when doing an ancestral sampling step."""
if not eta:
return sigma_to, 0.
sigma_up = min(sigma_to, eta * (sigma_to ** 2 * (sigma_from ** 2 - sigma_to ** 2) / sigma_from ** 2) ** 0.5)
sigma_down = (sigma_to ** 2 - sigma_up ** 2) ** 0.5
return sigma_down, sigma_up
def decode_mask(mask, height = 256, width = 256):
mask = nn.functional.interpolate(mask, size=(height, width), mode="bilinear", align_corners=False)
mask = torch.where(mask > 0, 1, -1) # Thresholding step
mask = torch.clamp((mask + 1.0) / 2.0, min=0.0, max=1.0)
mask = 255.0 * rearrange(mask, "1 c h w -> h w c")
mask = torch.cat([mask, mask, mask], dim=-1)
mask = mask.type(torch.uint8).cpu().numpy()
return mask
def sample_euler_ancestral(model, x_0, x_1, sigmas, height, width, extra_args=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
"""Ancestral sampling with Euler method steps."""
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x_0) if noise_sampler is None else noise_sampler
s_in = x_0.new_ones([x_0.shape[0]])
mask_list = []
image_list = []
for i in trange(len(sigmas) - 1, disable=disable):
denoised_0, denoised_1 = model(x_0, x_1, sigmas[i] * s_in, **extra_args)
image_list.append(denoised_0)
sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
d_0 = to_d(x_0, sigmas[i], denoised_0)
# Euler method
dt = sigma_down - sigmas[i]
x_0 = x_0 + d_0 * dt
if sigmas[i + 1] > 0:
x_0 = x_0 + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
x_1 = denoised_1
mask_list.append(decode_mask(x_1, height, width))
image_list = torch.cat(image_list, dim=0)
return x_0, x_1, image_list, mask_list
parser = ArgumentParser()
parser.add_argument("--resolution", default=512, type=int)
parser.add_argument("--config", default="config/generate.yaml", type=str)
parser.add_argument("--ckpt", default="checkpoints/diffree-step=000010999.ckpt", type=str)
parser.add_argument("--vae-ckpt", default=None, type=str)
args = parser.parse_args()
config = OmegaConf.load(args.config)
model = load_model_from_config(config, args.ckpt, args.vae_ckpt)
model.eval().cuda()
model_wrap = CompVisDenoiser(model)
model_wrap_cfg = CFGDenoiser(model_wrap)
null_token = model.get_learned_conditioning([""])
def generate(
input_image: Image.Image,
instruction: str,
steps: int,
randomize_seed: bool,
seed: int,
randomize_cfg: bool,
text_cfg_scale: float,
image_cfg_scale: float,
weather_close_video: bool,
decode_image_batch: int
):
seed = random.randint(0, 100000) if randomize_seed else seed
text_cfg_scale = round(random.uniform(6.0, 9.0), ndigits=2) if randomize_cfg else text_cfg_scale
image_cfg_scale = round(random.uniform(1.2, 1.8), ndigits=2) if randomize_cfg else image_cfg_scale
width, height = input_image.size
factor = args.resolution / max(width, height)
factor = math.ceil(min(width, height) * factor / 64) * 64 / min(width, height)
width = int((width * factor) // 64) * 64
height = int((height * factor) // 64) * 64
input_image = ImageOps.fit(input_image, (width, height), method=Image.Resampling.LANCZOS)
input_image_copy = input_image.convert("RGB")
if instruction == "":
return [input_image, seed]
model.cuda()
with torch.no_grad(), autocast("cuda"), model.ema_scope():
cond = {}
cond["c_crossattn"] = [model.get_learned_conditioning([instruction]).to(model.device)]
input_image = 2 * torch.tensor(np.array(input_image)).float() / 255 - 1
input_image = rearrange(input_image, "h w c -> 1 c h w").to(model.device)
cond["c_concat"] = [model.encode_first_stage(input_image).mode().to(model.device)]
uncond = {}
uncond["c_crossattn"] = [null_token.to(model.device)]
uncond["c_concat"] = [torch.zeros_like(cond["c_concat"][0])]
sigmas = model_wrap.get_sigmas(steps).to(model.device)
extra_args = {
"cond": cond,
"uncond": uncond,
"text_cfg_scale": text_cfg_scale,
"image_cfg_scale": image_cfg_scale,
}
torch.manual_seed(seed)
z_0 = torch.randn_like(cond["c_concat"][0]).to(model.device) * sigmas[0]
z_1 = torch.randn_like(cond["c_concat"][0]).to(model.device) * sigmas[0]
z_0, z_1, image_list, mask_list = sample_euler_ancestral(model_wrap_cfg, z_0, z_1, sigmas, height, width, extra_args=extra_args)
x_0 = model.decode_first_stage(z_0)
x_1 = nn.functional.interpolate(z_1, size=(height, width), mode="bilinear", align_corners=False)
x_1 = torch.where(x_1 > 0, 1, -1) # Thresholding step
x_0 = torch.clamp((x_0 + 1.0) / 2.0, min=0.0, max=1.0)
x_1 = torch.clamp((x_1 + 1.0) / 2.0, min=0.0, max=1.0)
x_0 = 255.0 * rearrange(x_0, "1 c h w -> h w c")
x_1 = 255.0 * rearrange(x_1, "1 c h w -> h w c")
x_1 = torch.cat([x_1, x_1, x_1], dim=-1)
edited_image = Image.fromarray(x_0.type(torch.uint8).cpu().numpy())
edited_mask = Image.fromarray(x_1.type(torch.uint8).cpu().numpy())
image_video_path = None
if not weather_close_video:
image_video = []
for i in range(0, len(image_list), decode_image_batch):
if i + decode_image_batch < len(image_list):
tmp_image_list = image_list[i:i+decode_image_batch]
else:
tmp_image_list = image_list[i:]
tmp_image_list = model.decode_first_stage(tmp_image_list)
tmp_image_list = torch.clamp((tmp_image_list + 1.0) / 2.0, min=0.0, max=1.0)
tmp_image_list = 255.0 * rearrange(tmp_image_list, "b c h w -> b h w c")
tmp_image_list = tmp_image_list.type(torch.uint8).cpu().numpy()
# image list to image
for image in tmp_image_list:
image_video.append(image)
image_video_path = "image.mp4"
fps = 30
with imageio.get_writer(image_video_path, fps=fps) as video:
for image in image_video:
video.append_data(image)
# 对edited_mask做膨胀
edited_mask_copy = edited_mask.copy()
kernel = np.ones((3, 3), np.uint8)
edited_mask = cv2.dilate(np.array(edited_mask), kernel, iterations=3)
edited_mask = Image.fromarray(edited_mask)
m_img = edited_mask.filter(ImageFilter.GaussianBlur(radius=3))
m_img = np.asarray(m_img).astype('float') / 255.0
img_np = np.asarray(input_image_copy).astype('float') / 255.0
ours_np = np.asarray(edited_image).astype('float') / 255.0
mix_image_np = m_img * ours_np + (1 - m_img) * img_np
mix_image = Image.fromarray((mix_image_np * 255).astype(np.uint8)).convert('RGB')
red = np.array(mix_image).astype('float') * 1
red[:, :, 0] = 180.0
red[:, :, 2] = 0
red[:, :, 1] = 0
mix_result_with_red_mask = np.array(mix_image)
mix_result_with_red_mask = Image.fromarray(
(mix_result_with_red_mask.astype('float') * (1 - m_img.astype('float') / 2.0) +
m_img.astype('float') / 2.0 * red).astype('uint8'))
mask_video_path = "mask.mp4"
fps = 30
with imageio.get_writer(mask_video_path, fps=fps) as video:
for image in mask_list:
video.append_data(image)
return [int(seed), text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask_copy, mask_video_path, image_video_path, input_image_copy, mix_result_with_red_mask]
def generate_list(
input_image: Image.Image,
generate_list: str,
steps: int,
randomize_seed: bool,
seed: int,
randomize_cfg: bool,
text_cfg_scale: float,
image_cfg_scale: float,
weather_close_video: bool,
decode_image_batch: int
):
generate_list = generate_list.split('\n')
# Remove the empty element
generate_list = [element for element in generate_list if element]
seed = random.randint(0, 100000) if randomize_seed else seed
text_cfg_scale = round(random.uniform(6.0, 9.0), ndigits=2) if randomize_cfg else text_cfg_scale
image_cfg_scale = round(random.uniform(1.2, 1.8), ndigits=2) if randomize_cfg else image_cfg_scale
width, height = input_image.size
factor = args.resolution / max(width, height)
factor = math.ceil(min(width, height) * factor / 64) * 64 / min(width, height)
width = int((width * factor) // 64) * 64
height = int((height * factor) // 64) * 64
input_image = ImageOps.fit(input_image, (width, height), method=Image.Resampling.LANCZOS)
if len(generate_list) == 0:
return [input_image, seed]
model.cuda()
image_video = [np.array(input_image).astype(np.uint8)]
generate_index = 0
retry_number = 0
max_retry = 10
input_image_copy = input_image.convert("RGB")
while generate_index < len(generate_list):
print(f'generate_index: {str(generate_index)}')
instruction = generate_list[generate_index]
with torch.no_grad(), autocast("cuda"), model.ema_scope():
cond = {}
input_image_torch = 2 * torch.tensor(np.array(input_image_copy.copy())).float() / 255 - 1
input_image_torch = rearrange(input_image_torch, "h w c -> 1 c h w").to(model.device)
cond["c_crossattn"] = [model.get_learned_conditioning([instruction]).to(model.device)]
cond["c_concat"] = [model.encode_first_stage(input_image_torch).mode().to(model.device)]
uncond = {}
uncond["c_crossattn"] = [null_token.to(model.device)]
uncond["c_concat"] = [torch.zeros_like(cond["c_concat"][0])]
sigmas = model_wrap.get_sigmas(steps).to(model.device)
extra_args = {
"cond": cond,
"uncond": uncond,
"text_cfg_scale": text_cfg_scale,
"image_cfg_scale": image_cfg_scale,
}
torch.manual_seed(seed)
z_0 = torch.randn_like(cond["c_concat"][0]).to(model.device) * sigmas[0]
z_1 = torch.randn_like(cond["c_concat"][0]).to(model.device) * sigmas[0]
z_0, z_1, _, _ = sample_euler_ancestral(model_wrap_cfg, z_0, z_1, sigmas, height, width, extra_args=extra_args)
x_0 = model.decode_first_stage(z_0)
x_1 = nn.functional.interpolate(z_1, size=(height, width), mode="bilinear", align_corners=False)
x_1 = torch.where(x_1 > 0, 1, -1) # Thresholding step
x_1_mean = torch.sum(x_1).item()/x_1.numel()
if x_1_mean < -0.99:
seed += 1
retry_number +=1
if retry_number > max_retry:
generate_index += 1
continue
else:
generate_index += 1
x_0 = torch.clamp((x_0 + 1.0) / 2.0, min=0.0, max=1.0)
x_1 = torch.clamp((x_1 + 1.0) / 2.0, min=0.0, max=1.0)
x_0 = 255.0 * rearrange(x_0, "1 c h w -> h w c")
x_1 = 255.0 * rearrange(x_1, "1 c h w -> h w c")
x_1 = torch.cat([x_1, x_1, x_1], dim=-1)
edited_image = Image.fromarray(x_0.type(torch.uint8).cpu().numpy())
edited_mask = Image.fromarray(x_1.type(torch.uint8).cpu().numpy())
# 对edited_mask做膨胀
edited_mask_copy = edited_mask.copy()
kernel = np.ones((3, 3), np.uint8)
edited_mask = cv2.dilate(np.array(edited_mask), kernel, iterations=3)
edited_mask = Image.fromarray(edited_mask)
m_img = edited_mask.filter(ImageFilter.GaussianBlur(radius=3))
m_img = np.asarray(m_img).astype('float') / 255.0
img_np = np.asarray(input_image_copy).astype('float') / 255.0
ours_np = np.asarray(edited_image).astype('float') / 255.0
mix_image_np = m_img * ours_np + (1 - m_img) * img_np
image_video.append((mix_image_np * 255).astype(np.uint8))
mix_image = Image.fromarray((mix_image_np * 255).astype(np.uint8)).convert('RGB')
mix_result_with_red_mask = None
mask_video_path = None
image_video_path = None
edited_mask_copy = None
if generate_index == len(generate_list):
image_video_path = "image.mp4"
fps = 2
with imageio.get_writer(image_video_path, fps=fps) as video:
for image in image_video:
video.append_data(image)
yield [int(seed), text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask_copy, mask_video_path, image_video_path, input_image, mix_result_with_red_mask]
input_image_copy = mix_image
# mix_result_with_red_mask = None
# mask_video_path = None
# edited_mask_copy = None
# return [int(seed), text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask_copy, mask_video_path, image_video_path, input_image, mix_result_with_red_mask]
def reset():
return [100, "Randomize Seed", 1372, "Fix CFG", 7.5, 1.5, None, None, None, None, None, None, None, "Close Image Video", 10]
def get_example():
return [
["example_images/dufu.png", "", "black and white suit\nsunglasses\nblue medical mask\nyellow schoolbag\nred bow tie\nbrown high-top hat", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "", "reflective sunglasses\nshiny golden crown\ndiamond necklace\ngorgeous yellow gown", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/dufu.png", "black and white suit", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "reflective sunglasses", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/road_sign.png", "stop sign", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/dufu.png", "blue medical mask", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/people_standing.png", "dark green pleated skirt", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "shiny golden crown", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/dufu.png", "sunglasses", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "diamond necklace", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/iron_man.jpg", "sunglasses", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "the queen's crown", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
["example_images/girl.jpeg", "gorgeous yellow gown", "", 100, "Fix Seed", 1372, "Fix CFG", 7.5, 1.5],
]
with gr.Blocks(css="footer {visibility: hidden}") as demo:
with gr.Row():
gr.Markdown(
"<div align='center'><font size='14'>Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model</font></div>" # noqa
)
with gr.Row():
with gr.Column(scale=1, min_width=100):
with gr.Row():
input_image = gr.Image(label="Input Image", type="pil", interactive=True)
with gr.Row():
instruction = gr.Textbox(lines=1, label="Single object description", interactive=True)
with gr.Row():
reset_button = gr.Button("Reset")
generate_button = gr.Button("Generate")
with gr.Row():
list_input = gr.Textbox(label="Input List", placeholder="Enter one item per line", lines=10)
with gr.Row():
list_generate_button = gr.Button("List Generate")
with gr.Row():
steps = gr.Number(value=100, precision=0, label="Steps", interactive=True)
randomize_seed = gr.Radio(
["Fix Seed", "Randomize Seed"],
value="Randomize Seed",
type="index",
label="Seed Selection",
show_label=False,
interactive=True,
)
seed = gr.Number(value=1372, precision=0, label="Seed", interactive=True)
randomize_cfg = gr.Radio(
["Fix CFG", "Randomize CFG"],
value="Fix CFG",
type="index",
label="CFG Selection",
show_label=False,
interactive=True,
)
text_cfg_scale = gr.Number(value=7.5, label=f"Text CFG", interactive=True)
image_cfg_scale = gr.Number(value=1.5, label=f"Image CFG", interactive=True)
with gr.Column(scale=1, min_width=100):
with gr.Column():
mix_image = gr.Image(label=f"Mix Image", type="pil", interactive=False)
with gr.Column():
edited_mask = gr.Image(label=f"Output Mask", type="pil", interactive=False)
with gr.Accordion('Click to see more (includes generation process per object for list generation and per step for single generation)', open=False):
with gr.Row():
weather_close_video = gr.Radio(
["Show Image Video", "Close Image Video"],
value="Close Image Video",
type="index",
label="Image Generation Process Selection (close for faster generation)",
interactive=True,
)
decode_image_batch = gr.Number(value=10, precision=0, label="Decode Image Batch (<steps)", interactive=True)
with gr.Row():
image_video = gr.Video(label="Image Video of Generation Process")
mask_video = gr.Video(label="Mask Video of Generation Process")
with gr.Row():
original_image = gr.Image(label=f"Original Image", type="pil", interactive=False)
edited_image = gr.Image(label=f"Output Image", type="pil", interactive=False)
mix_result_with_red_mask = gr.Image(label=f"Mix Image With Red Mask", type="pil", interactive=False)
with gr.Row():
gr.Examples(
examples=get_example(),
inputs=[input_image, instruction, list_input, steps, randomize_seed, seed, randomize_cfg, text_cfg_scale, image_cfg_scale],
)
generate_button.click(
fn=generate,
inputs=[
input_image,
instruction,
steps,
randomize_seed,
seed,
randomize_cfg,
text_cfg_scale,
image_cfg_scale,
weather_close_video,
decode_image_batch
],
outputs=[seed, text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask, mask_video, image_video, original_image, mix_result_with_red_mask],
)
list_generate_button.click(
fn=generate_list,
inputs=[
input_image,
list_input,
steps,
randomize_seed,
seed,
randomize_cfg,
text_cfg_scale,
image_cfg_scale,
weather_close_video,
decode_image_batch
],
outputs=[seed, text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask, mask_video, image_video, original_image, mix_result_with_red_mask],
)
reset_button.click(
fn=reset,
inputs=[],
outputs=[steps, randomize_seed, seed, randomize_cfg, text_cfg_scale, image_cfg_scale, edited_image, mix_image, edited_mask, mask_video, image_video, original_image, mix_result_with_red_mask, weather_close_video, decode_image_batch],
)
# demo.queue(concurrency_count=1)
# demo.launch(share=True)
demo.queue().launch(enable_queue=True)
================================================
FILE: config/generate.yaml
================================================
model:
base_learning_rate: 5.0e-05
target: ldm.models.diffusion.ddpm_diffree.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: edited
cond_stage_key: edit
image_size: 16
channels: 4
cond_stage_trainable: false # Note: different from the one we trained before
conditioning_key: hybrid
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: true
load_ema: true
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [ 0 ]
cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
f_start: [ 1.e-6 ]
f_max: [ 1. ]
f_min: [ 1. ]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel_diffree.UNetModel
params:
image_size: 32 # unused
in_channels: 8
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
omp_config:
target: ldm.modules.diffusionmodules.openaimodel_diffree.OMPModule
params:
image_size: 32
in_channels: 8
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
data:
target: main.DataModuleFromConfig
params:
batch_size: 128
num_workers: 1
wrap: false
validation:
target: edit_dataset_pam.EditDatasetMask
params:
path: data/clip-filtered-dataset
cache_dir: data/
cache_name: data_10k
split: val
min_text_sim: 0.2
min_image_sim: 0.75
min_direction_sim: 0.2
max_samples_per_prompt: 1
min_resize_res: 512
max_resize_res: 512
crop_res: 512
output_as_edit: False
real_input: True
================================================
FILE: config/train.yaml
================================================
model:
base_learning_rate: 2.0e-05
target: ldm.models.diffusion.ddpm_diffree.LatentDiffusion
params:
ckpt_path: checkpoints/v1-5-pruned-emaonly.ckpt
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: [edited, mask]
cond_stage_key: edit
image_size: 32
channels: 4
cond_stage_trainable: false # Note: different from the one we trained before
conditioning_key: hybrid
mask_loss_factor: 1.0
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: true
load_ema: false
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [ 0 ]
cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
f_start: [ 1.e-6 ]
f_max: [ 1. ]
f_min: [ 1. ]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel_diffree.UNetModel
params:
image_size: 32 # unused
in_channels: 8
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
# independent_blocks_num: 1
omp_config:
target: ldm.modules.diffusionmodules.openaimodel_diffree.OMPModule
params:
image_size: 32 # unused
in_channels: 8
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
data:
target: main.DataModuleFromConfig
params:
batch_size: 16
num_workers: 8
train:
target: dataset_diffree.Dataset
params:
path: dataset
split: train
min_resize_res: 256
max_resize_res: 256
crop_res: 256
flip_prob: 0.5
validation:
target: dataset_diffree.Dataset
params:
path: dataset
split: val
min_resize_res: 256
max_resize_res: 256
crop_res: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 2000
max_images: 2
increase_log_steps: False
trainer:
max_epochs: 100
benchmark: True
accumulate_grad_batches: 1
check_val_every_n_epoch: 4
================================================
FILE: dataset_diffree.py
================================================
from __future__ import annotations
import os
import json
import math
from pathlib import Path
from typing import Any
import numpy as np
import torch
import torchvision
from tqdm import tqdm
from einops import rearrange
from PIL import Image
from torch.utils.data import Dataset
class Dataset(Dataset):
def __init__(
self,
path: str,
split: str = "train",
splits: tuple[float, float, float] = (0.9, 0.05, 0.05),
min_resize_res: int = 256,
max_resize_res: int = 256,
crop_res: int = 256,
flip_prob: float = 0.0,
):
assert split in ("train", "val", "test")
assert sum(splits) == 1
self.path = path
self.min_resize_res = min_resize_res
self.max_resize_res = max_resize_res
self.crop_res = crop_res
self.flip_prob = flip_prob
self.annotation_path = os.path.join(self.path, "annotations.json")
if not os.path.exists(self.annotation_path):
raise FileNotFoundError(f"Annotation file not found at {self.annotation_path}")
with open(self.annotation_path) as f:
annotations = json.load(f)
original_dir_path = os.path.join(self.path, "original_images")
inpainted_dir_path = os.path.join(self.path, "inpainted_images")
mask_dir_path = os.path.join(self.path, "mask_images")
self.dataset = []
for annotation in tqdm(annotations):
original_image_path = os.path.join(original_dir_path, f'{annotation["image_id"]}.jpg')
inpainted_image_path = os.path.join(inpainted_dir_path, annotation["image_id"], f'{annotation["mask_id"]}.jpg')
mask_image_path = os.path.join(mask_dir_path, annotation["image_id"], f'{annotation["mask_id"]}.png')
category_name = annotation["category_name"]
self.dataset.append((original_image_path, inpainted_image_path, mask_image_path, category_name))
split_0, split_1 = {
"train": (0.0, splits[0]),
"val": (splits[0], splits[0] + splits[1]),
"test": (splits[0] + splits[1], 1.0),
}[split]
idx_0 = math.floor(split_0 * len(self.dataset))
idx_1 = math.floor(split_1 * len(self.dataset))
self.dataset = self.dataset[idx_0:idx_1]
def __len__(self) -> int:
return len(self.dataset)
def __getitem__(self, i: int) -> dict[str, Any]:
original_image_path, inpainted_image_path, mask_image_path, category_name = self.dataset[i]
prompt = category_name
inpainted_image = Image.open(inpainted_image_path).convert("RGB")
original_image = Image.open(original_image_path).convert("RGB")
mask_image = Image.open(mask_image_path).convert("L")
reize_res = torch.randint(self.min_resize_res, self.max_resize_res + 1, ()).item()
inpainted_image = inpainted_image.resize((reize_res, reize_res), Image.Resampling.LANCZOS)
original_image = original_image.resize((reize_res, reize_res), Image.Resampling.LANCZOS)
mask_image = mask_image.resize((reize_res, reize_res), Image.Resampling.NEAREST)
inpainted_image = rearrange(2 * torch.tensor(np.array(inpainted_image)).float() / 255 - 1, "h w c -> c h w")
original_image = rearrange(2 * torch.tensor(np.array(original_image)).float() / 255 - 1, "h w c -> c h w")
mask_image = torch.tensor(np.array(mask_image) / 255).int().unsqueeze(0)
mask_image = mask_image.repeat(3, 1, 1)
crop = torchvision.transforms.RandomCrop(self.crop_res)
flip = torchvision.transforms.RandomHorizontalFlip(float(self.flip_prob))
inpainted_image, original_image, mask_image = flip(crop(torch.cat((inpainted_image, original_image, mask_image)))).chunk(3)
mask_image = mask_image[0].unsqueeze(0)
return dict(edited=original_image, mask=mask_image, edit=dict(c_concat=inpainted_image, c_crossattn=prompt))
================================================
FILE: main.py
================================================
import argparse, os, sys, datetime, glob
import numpy as np
import time
import torch
import torchvision
import pytorch_lightning as pl
import json
import pickle
from packaging import version
from omegaconf import OmegaConf
from torch.utils.data import DataLoader, Dataset
from functools import partial
from PIL import Image
import torch.distributed as dist
from pytorch_lightning import seed_everything
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint, Callback, LearningRateMonitor
from pytorch_lightning.utilities.distributed import rank_zero_only
from pytorch_lightning.utilities import rank_zero_info
from pytorch_lightning.plugins import DDPPlugin
sys.path.append("./stable_diffusion")
from ldm.data.base import Txt2ImgIterableBaseDataset
from ldm.util import instantiate_from_config
def get_parser(**parser_kwargs):
def str2bool(v):
if isinstance(v, bool):
return v
if v.lower() in ("yes", "true", "t", "y", "1"):
return True
elif v.lower() in ("no", "false", "f", "n", "0"):
return False
else:
raise argparse.ArgumentTypeError("Boolean value expected.")
parser = argparse.ArgumentParser(**parser_kwargs)
parser.add_argument(
"-n",
"--name",
type=str,
const=True,
default="",
nargs="?",
help="postfix for logdir",
)
parser.add_argument(
"-r",
"--resume",
type=str,
const=True,
default="",
nargs="?",
help="resume from logdir or checkpoint in logdir",
)
parser.add_argument(
"-b",
"--base",
nargs="*",
metavar="base_config.yaml",
help="paths to base configs. Loaded from left-to-right. "
"Parameters can be overwritten or added with command-line options of the form `--key value`.",
default=list(),
)
parser.add_argument(
"-t",
"--train",
type=str2bool,
const=True,
default=False,
nargs="?",
help="train",
)
parser.add_argument(
"--no-test",
type=str2bool,
const=True,
default=False,
nargs="?",
help="disable test",
)
parser.add_argument(
"-p",
"--project",
help="name of new or path to existing project"
)
parser.add_argument(
"-d",
"--debug",
type=str2bool,
nargs="?",
const=True,
default=False,
help="enable post-mortem debugging",
)
parser.add_argument(
"-s",
"--seed",
type=int,
default=23,
help="seed for seed_everything",
)
parser.add_argument(
"-f",
"--postfix",
type=str,
default="",
help="post-postfix for default name",
)
parser.add_argument(
"-l",
"--logdir",
type=str,
default="logs",
help="directory for logging dat shit",
)
parser.add_argument(
"--scale_lr",
action="store_true",
default=False,
help="scale base-lr by ngpu * batch_size * n_accumulate",
)
return parser
def nondefault_trainer_args(opt):
parser = argparse.ArgumentParser()
parser = Trainer.add_argparse_args(parser)
args = parser.parse_args([])
return sorted(k for k in vars(args) if getattr(opt, k) != getattr(args, k))
class WrappedDataset(Dataset):
"""Wraps an arbitrary object with __len__ and __getitem__ into a pytorch dataset"""
def __init__(self, dataset):
self.data = dataset
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx]
def worker_init_fn(_):
worker_info = torch.utils.data.get_worker_info()
dataset = worker_info.dataset
worker_id = worker_info.id
if isinstance(dataset, Txt2ImgIterableBaseDataset):
split_size = dataset.num_records // worker_info.num_workers
# reset num_records to the true number to retain reliable length information
dataset.sample_ids = dataset.valid_ids[worker_id * split_size:(worker_id + 1) * split_size]
current_id = np.random.choice(len(np.random.get_state()[1]), 1)
return np.random.seed(np.random.get_state()[1][current_id] + worker_id)
else:
return np.random.seed(np.random.get_state()[1][0] + worker_id)
class DataModuleFromConfig(pl.LightningDataModule):
def __init__(self, batch_size, train=None, validation=None, test=None, predict=None,
wrap=False, num_workers=None, shuffle_test_loader=False, use_worker_init_fn=False,
shuffle_val_dataloader=False):
super().__init__()
self.batch_size = batch_size
self.dataset_configs = dict()
self.num_workers = num_workers if num_workers is not None else batch_size * 2
self.use_worker_init_fn = use_worker_init_fn
if train is not None:
self.dataset_configs["train"] = train
self.train_dataloader = self._train_dataloader
if validation is not None:
self.dataset_configs["validation"] = validation
self.val_dataloader = partial(self._val_dataloader, shuffle=shuffle_val_dataloader)
if test is not None:
self.dataset_configs["test"] = test
self.test_dataloader = partial(self._test_dataloader, shuffle=shuffle_test_loader)
if predict is not None:
self.dataset_configs["predict"] = predict
self.predict_dataloader = self._predict_dataloader
self.wrap = wrap
def prepare_data(self):
for data_cfg in self.dataset_configs.values():
instantiate_from_config(data_cfg)
def setup(self, stage=None):
self.datasets = dict(
(k, instantiate_from_config(self.dataset_configs[k]))
for k in self.dataset_configs)
if self.wrap:
for k in self.datasets:
self.datasets[k] = WrappedDataset(self.datasets[k])
def _train_dataloader(self):
is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)
if is_iterable_dataset or self.use_worker_init_fn:
init_fn = worker_init_fn
else:
init_fn = None
return DataLoader(self.datasets["train"], batch_size=self.batch_size,
num_workers=self.num_workers, shuffle=False if is_iterable_dataset else True,
worker_init_fn=init_fn, persistent_workers=True)
def _val_dataloader(self, shuffle=False):
if isinstance(self.datasets['validation'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:
init_fn = worker_init_fn
else:
init_fn = None
return DataLoader(self.datasets["validation"],
batch_size=self.batch_size,
num_workers=self.num_workers,
worker_init_fn=init_fn,
shuffle=shuffle, persistent_workers=True)
def _test_dataloader(self, shuffle=False):
is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)
if is_iterable_dataset or self.use_worker_init_fn:
init_fn = worker_init_fn
else:
init_fn = None
# do not shuffle dataloader for iterable dataset
shuffle = shuffle and (not is_iterable_dataset)
return DataLoader(self.datasets["test"], batch_size=self.batch_size,
num_workers=self.num_workers, worker_init_fn=init_fn, shuffle=shuffle, persistent_workers=True)
def _predict_dataloader(self, shuffle=False):
if isinstance(self.datasets['predict'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:
init_fn = worker_init_fn
else:
init_fn = None
return DataLoader(self.datasets["predict"], batch_size=self.batch_size,
num_workers=self.num_workers, worker_init_fn=init_fn, persistent_workers=True)
class SetupCallback(Callback):
def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, lightning_config):
super().__init__()
self.resume = resume
self.now = now
self.logdir = logdir
self.ckptdir = ckptdir
self.cfgdir = cfgdir
self.config = config
self.lightning_config = lightning_config
def on_keyboard_interrupt(self, trainer, pl_module):
if trainer.global_rank == 0:
print("Summoning checkpoint.")
ckpt_path = os.path.join(self.ckptdir, "last.ckpt")
trainer.save_checkpoint(ckpt_path)
def on_pretrain_routine_start(self, trainer, pl_module):
if trainer.global_rank == 0:
# Create logdirs and save configs
# os.makedirs(self.logdir, exist_ok=True)
# os.makedirs(self.ckptdir, exist_ok=True)
# os.makedirs(self.cfgdir, exist_ok=True)
if "callbacks" in self.lightning_config:
if 'metrics_over_trainsteps_checkpoint' in self.lightning_config['callbacks']:
os.makedirs(os.path.join(self.ckptdir, 'trainstep_checkpoints'), exist_ok=True)
print("Project config")
print(OmegaConf.to_yaml(self.config))
OmegaConf.save(self.config,
os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)))
print("Lightning config")
print(OmegaConf.to_yaml(self.lightning_config))
OmegaConf.save(OmegaConf.create({"lightning": self.lightning_config}),
os.path.join(self.cfgdir, "{}-lightning.yaml".format(self.now)))
def get_world_size():
if not dist.is_available():
return 1
if not dist.is_initialized():
return 1
return dist.get_world_size()
def all_gather(data):
"""
Run all_gather on arbitrary picklable data (not necessarily tensors)
Args:
data: any picklable object
Returns:
list[data]: list of data gathered from each rank
"""
world_size = get_world_size()
if world_size == 1:
return [data]
# serialized to a Tensor
origin_size = None
if not isinstance(data, torch.Tensor):
buffer = pickle.dumps(data)
storage = torch.ByteStorage.from_buffer(buffer)
tensor = torch.ByteTensor(storage).to("cuda")
else:
origin_size = data.size()
tensor = data.reshape(-1)
tensor_type = tensor.dtype
# obtain Tensor size of each rank
local_size = torch.LongTensor([tensor.numel()]).to("cuda")
size_list = [torch.LongTensor([0]).to("cuda") for _ in range(world_size)]
dist.all_gather(size_list, local_size)
size_list = [int(size.item()) for size in size_list]
max_size = max(size_list)
# receiving Tensor from all ranks
# we pad the tensor because torch all_gather does not support
# gathering tensors of different shapes
tensor_list = []
for _ in size_list:
tensor_list.append(torch.FloatTensor(size=(max_size,)).cuda().to(tensor_type))
if local_size != max_size:
padding = torch.FloatTensor(size=(max_size - local_size,)).cuda().to(tensor_type)
tensor = torch.cat((tensor, padding), dim=0)
dist.all_gather(tensor_list, tensor)
data_list = []
for size, tensor in zip(size_list, tensor_list):
if origin_size is None:
buffer = tensor.cpu().numpy().tobytes()[:size]
data_list.append(pickle.loads(buffer))
else:
buffer = tensor[:size]
data_list.append(buffer)
if origin_size is not None:
new_shape = [-1] + list(origin_size[1:])
resized_list = []
for data in data_list:
# suppose the difference of tensor size exist in first dimension
data = data.reshape(new_shape)
resized_list.append(data)
return resized_list
else:
return data_list
class ImageLogger(Callback):
def __init__(self, batch_frequency, max_images, clamp=True, increase_log_steps=True,
rescale=True, disabled=False, log_on_batch_idx=False, log_first_step=False,
log_images_kwargs=None):
super().__init__()
self.rescale = rescale
self.batch_freq = batch_frequency
self.max_images = max_images
self.logger_log_images = {
pl.loggers.TestTubeLogger: self._testtube,
}
self.log_steps = [2 ** n for n in range(6, int(np.log2(self.batch_freq)) + 1)]
if not increase_log_steps:
self.log_steps = [self.batch_freq]
self.clamp = clamp
self.disabled = disabled
self.log_on_batch_idx = log_on_batch_idx
self.log_images_kwargs = log_images_kwargs if log_images_kwargs else {}
self.log_first_step = log_first_step
@rank_zero_only
def _testtube(self, pl_module, images, batch_idx, split):
for k in images:
grid = torchvision.utils.make_grid(images[k])
grid = (grid + 1.0) / 2.0 # -1,1 -> 0,1; c,h,w
tag = f"{split}/{k}"
pl_module.logger.experiment.add_image(
tag, grid,
global_step=pl_module.global_step)
@rank_zero_only
def log_local(self, save_dir, split, images, prompts,
global_step, current_epoch, batch_idx):
root = os.path.join(save_dir, "images", split)
names = {"reals": "before", "inputs": "after", "reconstruction": "before-vq", "samples": "after-gen"}
# print(root)
for k in images:
grid = torchvision.utils.make_grid(images[k], nrow=8)
if self.rescale:
grid = (grid + 1.0) / 2.0 # -1,1 -> 0,1; c,h,w
grid = grid.transpose(0, 1).transpose(1, 2).squeeze(-1)
grid = grid.numpy()
grid = (grid * 255).astype(np.uint8)
filename = "gs-{:06}_e-{:06}_b-{:06}_{}.png".format(
global_step,
current_epoch,
batch_idx,
names[k])
path = os.path.join(root, filename)
os.makedirs(os.path.split(path)[0], exist_ok=True)
# print(path)
Image.fromarray(grid).save(path)
filename = "gs-{:06}_e-{:06}_b-{:06}_prompt.json".format(
global_step,
current_epoch,
batch_idx)
path = os.path.join(root, filename)
with open(path, "w") as f:
for p in prompts:
f.write(f"{json.dumps(p)}\n")
def log_img(self, pl_module, batch, batch_idx, split="train"):
check_idx = batch_idx if self.log_on_batch_idx else pl_module.global_step
if (self.check_frequency(check_idx) and # batch_idx % self.batch_freq == 0
hasattr(pl_module, "log_images") and
callable(pl_module.log_images) and
self.max_images > 0) or (split == "val" and batch_idx == 0):
logger = type(pl_module.logger)
is_train = pl_module.training
if is_train:
pl_module.eval()
with torch.no_grad():
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
prompts = batch["edit"]["c_crossattn"][:self.max_images]
prompts = [p for ps in all_gather(prompts) for p in ps]
for k in images:
N = min(images[k].shape[0], self.max_images)
images[k] = images[k][:N]
images[k] = torch.cat(all_gather(images[k][:N]))
if isinstance(images[k], torch.Tensor):
images[k] = images[k].detach().cpu()
if self.clamp:
images[k] = torch.clamp(images[k], -1., 1.)
self.log_local(pl_module.logger.save_dir, split, images, prompts,
pl_module.global_step, pl_module.current_epoch, batch_idx)
logger_log_images = self.logger_log_images.get(logger, lambda *args, **kwargs: None)
logger_log_images(pl_module, images, pl_module.global_step, split)
if is_train:
pl_module.train()
def check_frequency(self, check_idx):
if ((check_idx % self.batch_freq) == 0 or (check_idx in self.log_steps)) and (
check_idx > 0 or self.log_first_step):
if len(self.log_steps) > 0:
self.log_steps.pop(0)
return True
return False
def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
if not self.disabled and (pl_module.global_step > 0 or self.log_first_step):
self.log_img(pl_module, batch, batch_idx, split="train")
def on_validation_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
if not self.disabled and pl_module.global_step > 0:
self.log_img(pl_module, batch, batch_idx, split="val")
if hasattr(pl_module, 'calibrate_grad_norm'):
if (pl_module.calibrate_grad_norm and batch_idx % 25 == 0) and batch_idx > 0:
self.log_gradients(trainer, pl_module, batch_idx=batch_idx)
class CUDACallback(Callback):
# see https://github.com/SeanNaren/minGPT/blob/master/mingpt/callback.py
def on_train_epoch_start(self, trainer, pl_module):
# Reset the memory use counter
torch.cuda.reset_peak_memory_stats(trainer.root_gpu)
torch.cuda.synchronize(trainer.root_gpu)
self.start_time = time.time()
def on_train_epoch_end(self, trainer, pl_module, outputs):
torch.cuda.synchronize(trainer.root_gpu)
max_memory = torch.cuda.max_memory_allocated(trainer.root_gpu) / 2 ** 20
epoch_time = time.time() - self.start_time
try:
max_memory = trainer.training_type_plugin.reduce(max_memory)
epoch_time = trainer.training_type_plugin.reduce(epoch_time)
rank_zero_info(f"Average Epoch time: {epoch_time:.2f} seconds")
rank_zero_info(f"Average Peak memory {max_memory:.2f}MiB")
except AttributeError:
pass
if __name__ == "__main__":
# custom parser to specify config files, train, test and debug mode,
# postfix, resume.
# `--key value` arguments are interpreted as arguments to the trainer.
# `nested.key=value` arguments are interpreted as config parameters.
# configs are merged from left-to-right followed by command line parameters.
# model:
# base_learning_rate: float
# target: path to lightning module
# params:
# key: value
# data:
# target: main.DataModuleFromConfig
# params:
# batch_size: int
# wrap: bool
# train:
# target: path to train dataset
# params:
# key: value
# validation:
# target: path to validation dataset
# params:
# key: value
# test:
# target: path to test dataset
# params:
# key: value
# lightning: (optional, has sane defaults and can be specified on cmdline)
# trainer:
# additional arguments to trainer
# logger:
# logger to instantiate
# modelcheckpoint:
# modelcheckpoint to instantiate
# callbacks:
# callback1:
# target: importpath
# params:
# key: value
now = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
# add cwd for convenience and to make classes in this file available when
# running as `python main.py`
# (in particular `main.DataModuleFromConfig`)
sys.path.append(os.getcwd())
parser = get_parser()
parser = Trainer.add_argparse_args(parser)
opt, unknown = parser.parse_known_args()
assert opt.name
cfg_fname = os.path.split(opt.base[0])[-1]
cfg_name = os.path.splitext(cfg_fname)[0]
nowname = f"{cfg_name}_{opt.name}"
logdir = os.path.join(opt.logdir, nowname)
ckpt = os.path.join(logdir, "checkpoints", "last.ckpt")
resume = False
if os.path.isfile(ckpt):
opt.resume_from_checkpoint = ckpt
base_configs = sorted(glob.glob(os.path.join(logdir, "configs/*.yaml")))
opt.base = base_configs + opt.base
_tmp = logdir.split("/")
nowname = _tmp[-1]
resume = True
ckptdir = os.path.join(logdir, "checkpoints")
cfgdir = os.path.join(logdir, "configs")
os.makedirs(logdir, exist_ok=True)
os.makedirs(ckptdir, exist_ok=True)
os.makedirs(cfgdir, exist_ok=True)
try:
# init and save configs
configs = [OmegaConf.load(cfg) for cfg in opt.base]
cli = OmegaConf.from_dotlist(unknown)
config = OmegaConf.merge(*configs, cli)
if resume:
# By default, when finetuning from Stable Diffusion, we load the EMA-only checkpoint to initialize all weights.
config.model.params.load_ema = True
lightning_config = config.pop("lightning", OmegaConf.create())
# merge trainer cli with config
trainer_config = lightning_config.get("trainer", OmegaConf.create())
# default to ddp
trainer_config["accelerator"] = "ddp"
for k in nondefault_trainer_args(opt):
trainer_config[k] = getattr(opt, k)
if not "gpus" in trainer_config:
del trainer_config["accelerator"]
cpu = True
else:
gpuinfo = trainer_config["gpus"]
print(f"Running on GPUs {gpuinfo}")
cpu = False
trainer_opt = argparse.Namespace(**trainer_config)
lightning_config.trainer = trainer_config
# model
model = instantiate_from_config(config.model)
# trainer and callbacks
trainer_kwargs = dict()
# default logger configs
default_logger_cfgs = {
"wandb": {
"target": "pytorch_lightning.loggers.WandbLogger",
"params": {
"name": nowname,
"save_dir": logdir,
"id": nowname,
}
},
"testtube": {
"target": "pytorch_lightning.loggers.TestTubeLogger",
"params": {
"name": "testtube",
"save_dir": logdir,
}
},
}
default_logger_cfg = default_logger_cfgs["wandb"]
if "logger" in lightning_config:
logger_cfg = lightning_config.logger
else:
logger_cfg = OmegaConf.create()
logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)
trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)
# modelcheckpoint - use TrainResult/EvalResult(checkpoint_on=metric) to
# specify which metric is used to determine best models
default_modelckpt_cfg = {
"target": "pytorch_lightning.callbacks.ModelCheckpoint",
"params": {
"dirpath": ckptdir,
"filename": "{epoch:06}",
"verbose": True,
"save_last": True,
}
}
if "modelcheckpoint" in lightning_config:
modelckpt_cfg = lightning_config.modelcheckpoint
else:
modelckpt_cfg = OmegaConf.create()
modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
print(f"Merged modelckpt-cfg: \n{modelckpt_cfg}")
if version.parse(pl.__version__) < version.parse('1.4.0'):
trainer_kwargs["checkpoint_callback"] = instantiate_from_config(modelckpt_cfg)
# add callback which sets up log directory
default_callbacks_cfg = {
"setup_callback": {
"target": "main.SetupCallback",
"params": {
"resume": opt.resume,
"now": now,
"logdir": logdir,
"ckptdir": ckptdir,
"cfgdir": cfgdir,
"config": config,
"lightning_config": lightning_config,
}
},
"image_logger": {
"target": "main.ImageLogger",
"params": {
"batch_frequency": 750,
"max_images": 4,
"clamp": True
}
},
"learning_rate_logger": {
"target": "main.LearningRateMonitor",
"params": {
"logging_interval": "step",
# "log_momentum": True
}
},
"cuda_callback": {
"target": "main.CUDACallback"
},
}
if version.parse(pl.__version__) >= version.parse('1.4.0'):
default_callbacks_cfg.update({'checkpoint_callback': modelckpt_cfg})
if "callbacks" in lightning_config:
callbacks_cfg = lightning_config.callbacks
else:
callbacks_cfg = OmegaConf.create()
print(
'Caution: Saving checkpoints every n train steps without deleting. This might require some free space.')
default_metrics_over_trainsteps_ckpt_dict = {
'metrics_over_trainsteps_checkpoint': {
"target": 'pytorch_lightning.callbacks.ModelCheckpoint',
'params': {
"dirpath": os.path.join(ckptdir, 'trainstep_checkpoints'),
"filename": "{epoch:06}-{step:09}",
"verbose": True,
'save_top_k': -1,
'every_n_train_steps': 1000,
'save_weights_only': True
}
}
}
default_callbacks_cfg.update(default_metrics_over_trainsteps_ckpt_dict)
callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
if 'ignore_keys_callback' in callbacks_cfg and hasattr(trainer_opt, 'resume_from_checkpoint'):
callbacks_cfg.ignore_keys_callback.params['ckpt_path'] = trainer_opt.resume_from_checkpoint
elif 'ignore_keys_callback' in callbacks_cfg:
del callbacks_cfg['ignore_keys_callback']
trainer_kwargs["callbacks"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
trainer = Trainer.from_argparse_args(trainer_opt, plugins=DDPPlugin(find_unused_parameters=False), **trainer_kwargs)
trainer.logdir = logdir ###
# data
data = instantiate_from_config(config.data)
# NOTE according to https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html
# calling these ourselves should not be necessary but it is.
# lightning still takes care of proper multiprocessing though
data.prepare_data()
data.setup()
print("#### Data #####")
for k in data.datasets:
print(f"{k}, {data.datasets[k].__class__.__name__}, {len(data.datasets[k])}")
# configure learning rate
bs, base_lr = config.data.params.batch_size, config.model.base_learning_rate
if not cpu:
ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
else:
ngpu = 1
if 'accumulate_grad_batches' in lightning_config.trainer:
accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches
else:
accumulate_grad_batches = 1
print(f"accumulate_grad_batches = {accumulate_grad_batches}")
lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
if opt.scale_lr:
model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr
print(
"Setting learning rate to {:.2e} = {} (accumulate_grad_batches) * {} (num_gpus) * {} (batchsize) * {:.2e} (base_lr)".format(
model.learning_rate, accumulate_grad_batches, ngpu, bs, base_lr))
else:
model.learning_rate = base_lr
print("++++ NOT USING LR SCALING ++++")
print(f"Setting learning rate to {model.learning_rate:.2e}")
# allow checkpointing via USR1
def melk(*args, **kwargs):
# run all checkpoint hooks
if trainer.global_rank == 0:
print("Summoning checkpoint.")
ckpt_path = os.path.join(ckptdir, "last.ckpt")
trainer.save_checkpoint(ckpt_path)
def divein(*args, **kwargs):
if trainer.global_rank == 0:
import pudb;
pudb.set_trace()
import signal
signal.signal(signal.SIGUSR1, melk)
signal.signal(signal.SIGUSR2, divein)
# run
if opt.train:
try:
trainer.fit(model, data)
except Exception:
melk()
raise
if not opt.no_test and not trainer.interrupted:
trainer.test(model, data)
except Exception:
if opt.debug and trainer.global_rank == 0:
try:
import pudb as debugger
except ImportError:
import pdb as debugger
debugger.post_mortem()
raise
finally:
# move newly created debug project to debug_runs
if opt.debug and not opt.resume and trainer.global_rank == 0:
dst, name = os.path.split(logdir)
dst = os.path.join(dst, "debug_runs", name)
os.makedirs(os.path.split(dst)[0], exist_ok=True)
os.rename(logdir, dst)
if trainer.global_rank == 0:
print(trainer.profiler.summary())
================================================
FILE: requirements.txt
================================================
--extra-index-url https://download.pytorch.org/whl/cu117
numpy==1.24.4
torch==2.0.0
torchvision==0.15.1
torchmetrics==0.6.0
pytorch-lightning==1.4.2
transformers==4.19.2
tqdm==4.66.2
gradio==3.50.2
openai==1.12.0
opencv-python
einops==0.3.0
omegaconf==2.1.1
-e git+https://github.com/crowsonkb/k-diffusion.git@v0.0.16#egg=k-diffusion
-e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
imageio==2.9.0
imageio-ffmpeg==0.4.2
================================================
FILE: stable_diffusion/LICENSE
================================================
Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors
CreativeML Open RAIL-M
dated August 22, 2022
Section I: PREAMBLE
Multimodal generative models are being widely adopted and used, and have the potential to transform the way artists, among other individuals, conceive and benefit from AI or ML technologies as a tool for content creation.
Notwithstanding the current and potential benefits that these artifacts can bring to society at large, there are also concerns about potential misuses of them, either due to their technical limitations or ethical considerations.
In short, this license strives for both the open and responsible downstream use of the accompanying model. When it comes to the open character, we took inspiration from open source permissive licenses regarding the grant of IP rights. Referring to the downstream responsible use, we added use-based restrictions not permitting the use of the Model in very specific scenarios, in order for the licensor to be able to enforce the license in case potential misuses of the Model may occur. At the same time, we strive to promote open and responsible research on generative models for art and content generation.
Even though downstream derivative versions of the model could be released under different licensing terms, the latter will always have to include - at minimum - the same use-based restrictions as the ones in the original license (this license). We believe in the intersection between open and responsible AI development; thus, this License aims to strike a balance between both in order to enable responsible open-science in the field of AI.
This License governs the use of the model (and its derivatives) and is informed by the model card associated with the model.
NOW THEREFORE, You and Licensor agree as follows:
1. Definitions
- "License" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
- "Data" means a collection of information and/or content extracted from the dataset used with the Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
- "Output" means the results of operating a Model as embodied in informational content resulting therefrom.
- "Model" means any accompanying machine-learning based assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or in part on the Data, using the Complementary Material.
- "Derivatives of the Model" means all modifications to the Model, works based on the Model, or any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
- "Complementary Material" means the accompanying source code and scripts used to define, run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if any. This includes any accompanying documentation, tutorials, examples, etc, if any.
- "Distribution" means any transmission, reproduction, publication or other sharing of the Model or Derivatives of the Model to a third party, including providing the Model as a hosted service made available by electronic or other remote means - e.g. API-based or web access.
- "Licensor" means the copyright owner or entity authorized by the copyright owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
- "You" (or "Your") means an individual or Legal Entity exercising permissions granted by this License and/or making use of the Model for whichever purpose and in any field of use, including usage of the Model in an end-use application - e.g. chatbot, translator, image generator.
- "Third Parties" means individuals or legal entities that are not under common control with Licensor or You.
- "Contribution" means any work of authorship, including the original version of the Model and any modifications or additions to that Model or Derivatives of the Model thereof, that is intentionally submitted to Licensor for inclusion in the Model by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Model, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
- "Contributor" means Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Model.
Section II: INTELLECTUAL PROPERTY RIGHTS
Both copyright and patent grants apply to the Model, Derivatives of the Model and Complementary Material. The Model and Derivatives of the Model are subject to additional terms as described in Section III.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.
Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
4. Distribution and Redistribution. You may host for Third Party remote access purposes (e.g. software-as-a-service), reproduce and distribute copies of the Model or Derivatives of the Model thereof in any medium, with or without modifications, provided that You meet the following conditions:
Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply to the use of Complementary Material.
You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License;
You must cause any modified files to carry prominent notices stating that You changed the files;
You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model, Derivatives of the Model.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions - respecting paragraph 4.a. - for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions. Therefore You cannot use the Model and the Derivatives of the Model for the specified restricted uses. You may use the Model subject to this License, including only for lawful purposes and in accordance with the License. Use may include creating any content with, finetuning, updating, running, training, evaluating and/or reparametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model to comply with the terms of this paragraph (paragraph 5).
6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You generate using the Model. You are accountable for the Output you generate and its subsequent uses. No use of the output can contravene any provision as stated in the License.
Section IV: OTHER PROVISIONS
7. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model through electronic means, or modify the Output of the Model based on updates. You shall undertake reasonable efforts to use the latest version of the Model.
8. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
9. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Model and the Complementary Material (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model, Derivatives of the Model, and the Complementary Material and assume any risks associated with Your exercise of permissions under this License.
10. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Model and the Complementary Material (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
11. Accepting Warranty or Additional Liability. While redistributing the Model, Derivatives of the Model and the Complementary Material thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
12. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
END OF TERMS AND CONDITIONS
Attachment A
Use Restrictions
You agree not to use the Model or Derivatives of the Model:
- In any way that violates any applicable national, federal, state, local or international law or regulation;
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
- To generate or disseminate verifiably false information and/or content with the purpose of harming others;
- To generate or disseminate personal identifiable information that can be used to harm an individual;
- To defame, disparage or otherwise harass others;
- For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
- For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
- To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
- For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
- To provide medical advice and medical results interpretation;
- To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
================================================
FILE: stable_diffusion/README.md
================================================
# Stable Diffusion
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
[Robin Rombach](https://github.com/rromb)\*,
[Andreas Blattmann](https://github.com/ablattmann)\*,
[Dominik Lorenz](https://github.com/qp-qp)\,
[Patrick Esser](https://github.com/pesser),
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
_[CVPR '22 Oral](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html) |
[GitHub](https://github.com/CompVis/latent-diffusion) | [arXiv](https://arxiv.org/abs/2112.10752) | [Project page](https://ommer-lab.com/research/latent-diffusion-models/)_

[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
## Requirements
A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:
```
conda env create -f environment.yaml
conda activate ldm
```
You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running
```
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
```
## Stable Diffusion v1
Stable Diffusion v1 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
then finetuned on 512x512 images.
*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
in its training data.
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](Stable_Diffusion_v1_Model_Card.md).*
The weights are available via [the CompVis organization at Hugging Face](https://huggingface.co/CompVis) under [a license which contains specific use-based restrictions to prevent misuse and harm as informed by the model card, but otherwise remains permissive](LICENSE). While commercial use is permitted under the terms of the license, **we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations**, since there are [known limitations and biases](Stable_Diffusion_v1_Model_Card.md#limitations-and-bias) of the weights, and research on safe and ethical deployment of general text-to-image models is an ongoing effort. **The weights are research artifacts and should be treated as such.**
[The CreativeML OpenRAIL M license](LICENSE) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
### Weights
We currently provide the following checkpoints:
- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
515k steps at resolution `512x512` on [laion-aesthetics v2 5+](https://laion.ai/blog/laion-aesthetics/) (a subset of laion2B-en with estimated aesthetics score `> 5.0`, and additionally
filtered to images with an original size `>= 512x512`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the [LAION-5B](https://laion.ai/blog/laion-5b/) metadata, the aesthetics score is estimated using the [LAION-Aesthetics Predictor V2](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
- `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:

### Text-to-Image with Stable Diffusion


Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
We provide a [reference script for sampling](#reference-sampling-script), but
there also exists a [diffusers integration](#diffusers-integration), which we
expect to see more active community development.
#### Reference Sampling Script
We provide a reference sampling script, which incorporates
- a [Safety Checker Module](https://github.com/CompVis/stable-diffusion/pull/36),
to reduce the probability of explicit outputs,
- an [invisible watermarking](https://github.com/ShieldMnt/invisible-watermark)
of the outputs, to help viewers [identify the images as machine-generated](scripts/tests/test_watermark.py).
After [obtaining the `stable-diffusion-v1-*-original` weights](#weights), link them
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
```
and sample with
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).
```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
[--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
[--seed SEED] [--precision {full,autocast}]
optional arguments:
-h, --help show this help message and exit
--prompt [PROMPT] the prompt to render
--outdir [OUTDIR] dir to write results to
--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples
--skip_save do not save individual samples. For speed measurements.
--ddim_steps DDIM_STEPS
number of ddim sampling steps
--plms use plms sampling
--laion400m uses the LAION400M model
--fixed_code if enabled, uses the same starting code across samples
--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling
--n_iter N_ITER sample this often
--H H image height, in pixel space
--W W image width, in pixel space
--C C latent channels
--f F downsampling factor
--n_samples N_SAMPLES
how many samples to produce for each given prompt. A.k.a. batch size
--n_rows N_ROWS rows in the grid (default: n_samples)
--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
--from-file FROM_FILE
if specified, load prompts from this file
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
--precision {full,autocast}
evaluate at this precision
```
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
#### Diffusers Integration
A simple way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers):
```py
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
use_auth_token=True
).to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
```
### Image Modification with Stable Diffusion
By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,
we provide a script to perform image modification with Stable Diffusion.
The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
```
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
```
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
**Input**

**Outputs**


This procedure can, for example, also be used to upscale samples from the base model.
## Comments
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
Thanks for open-sourcing!
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
## BibTeX
```
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
================================================
FILE: stable_diffusion/Stable_Diffusion_v1_Model_Card.md
================================================
# Stable Diffusion v1 Model Card
This model card focuses on the model associated with the Stable Diffusion model, available [here](https://github.com/CompVis/stable-diffusion).
## Model Details
- **Developed by:** Robin Rombach, Patrick Esser
- **Model type:** Diffusion-based text-to-image generation model
- **Language(s):** English
- **License:** [Proprietary](LICENSE)
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
- **Resources for more information:** [GitHub Repository](https://github.com/CompVis/stable-diffusion), [Paper](https://arxiv.org/abs/2112.10752).
- **Cite as:**
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
# Uses
## Direct Use
The model is intended for research purposes only. Possible research areas and
tasks include
- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
- Research on generative models.
Excluded uses are described below.
### Misuse, Malicious Use, and Out-of-Scope Use
_Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.
The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
#### Out-of-Scope Use
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
#### Misuse and Malicious Use
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
- Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.
- Intentionally promoting or propagating discriminatory content or harmful stereotypes.
- Impersonating individuals without their consent.
- Sexual content without consent of the people who might see it.
- Mis- and disinformation
- Representations of egregious violence and gore
- Sharing of copyrighted or licensed material in violation of its terms of use.
- Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.
## Limitations and Bias
### Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.
- The model was trained mainly with English captions and will not work as well in other languages.
- The autoencoding part of the model is lossy
- The model was trained on a large-scale dataset
[LAION-5B](https://laion.ai/blog/laion-5b/) which contains adult material
and is not fit for product use without additional safety mechanisms and
considerations.
- No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data.
The training data can be searched at [https://rom1504.github.io/clip-retrieval/](https://rom1504.github.io/clip-retrieval/) to possibly assist in the detection of memorized images.
### Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
Stable Diffusion v1 was primarily trained on subsets of [LAION-2B(en)](https://laion.ai/blog/laion-5b/),
which consists of images that are limited to English descriptions.
Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
Stable Diffusion v1 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.
## Training
**Training Data**
The model developers used the following dataset for training the model:
- LAION-5B and subsets thereof (see next section)
**Training Procedure**
Stable Diffusion v1 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
- Text prompts are encoded through a ViT-L/14 text-encoder.
- The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
- The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
We currently provide the following checkpoints:
- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
515k steps at resolution `512x512` on [laion-aesthetics v2 5+](https://laion.ai/blog/laion-aesthetics/) (a subset of laion2B-en with estimated aesthetics score `> 5.0`, and additionally
filtered to images with an original size `>= 512x512`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the [LAION-5B](https://laion.ai/blog/laion-5b/) metadata, the aesthetics score is estimated using the [LAION-Aesthetics Predictor V2](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
- `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
- **Hardware:** 32 x 8 x A100 GPUs
- **Optimizer:** AdamW
- **Gradient Accumulations**: 2
- **Batch:** 32 x 8 x 2 x 4 = 2048
- **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant
## Evaluation Results
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:

Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores.
## Environmental Impact
**Stable Diffusion v1** **Estimated Emissions**
Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.
- **Hardware Type:** A100 PCIe 40GB
- **Hours used:** 150000
- **Cloud Provider:** AWS
- **Compute Region:** US-east
- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.
## Citation
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*
================================================
FILE: stable_diffusion/assets/results.gif.REMOVED.git-id
================================================
82b6590e670a32196093cc6333ea19e6547d07de
================================================
FILE: stable_diffusion/assets/stable-samples/img2img/upscaling-in.png.REMOVED.git-id
================================================
501c31c21751664957e69ce52cad1818b6d2f4ce
================================================
FILE: stable_diffusion/assets/stable-samples/img2img/upscaling-out.png.REMOVED.git-id
================================================
1c4bb25a779f34d86b2d90e584ac67af91bb1303
================================================
FILE: stable_diffusion/assets/stable-samples/txt2img/merged-0005.png.REMOVED.git-id
================================================
ca0a1af206555f0f208a1ab879e95efedc1b1c5b
================================================
FILE: stable_diffusion/assets/stable-samples/txt2img/merged-0006.png.REMOVED.git-id
================================================
999f3703230580e8c89e9081abd6a1f8f50896d4
================================================
FILE: stable_diffusion/assets/stable-samples/txt2img/merged-0007.png.REMOVED.git-id
================================================
af390acaf601283782d6f479d4cade4d78e30b26
================================================
FILE: stable_diffusion/assets/txt2img-preview.png.REMOVED.git-id
================================================
51ee1c235dfdc63d4c41de7d303d03730e43c33c
================================================
FILE: stable_diffusion/configs/autoencoder/autoencoder_kl_16x16x16.yaml
================================================
model:
base_learning_rate: 4.5e-6
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: "val/rec_loss"
embed_dim: 16
lossconfig:
target: ldm.modules.losses.LPIPSWithDiscriminator
params:
disc_start: 50001
kl_weight: 0.000001
disc_weight: 0.5
ddconfig:
double_z: True
z_channels: 16
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,1,2,2,4] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [16]
dropout: 0.0
data:
target: main.DataModuleFromConfig
params:
batch_size: 12
wrap: True
train:
target: ldm.data.imagenet.ImageNetSRTrain
params:
size: 256
degradation: pil_nearest
validation:
target: ldm.data.imagenet.ImageNetSRValidation
params:
size: 256
degradation: pil_nearest
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 1000
max_images: 8
increase_log_steps: True
trainer:
benchmark: True
accumulate_grad_batches: 2
================================================
FILE: stable_diffusion/configs/autoencoder/autoencoder_kl_32x32x4.yaml
================================================
model:
base_learning_rate: 4.5e-6
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: "val/rec_loss"
embed_dim: 4
lossconfig:
target: ldm.modules.losses.LPIPSWithDiscriminator
params:
disc_start: 50001
kl_weight: 0.000001
disc_weight: 0.5
ddconfig:
double_z: True
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,2,4,4 ] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [ ]
dropout: 0.0
data:
target: main.DataModuleFromConfig
params:
batch_size: 12
wrap: True
train:
target: ldm.data.imagenet.ImageNetSRTrain
params:
size: 256
degradation: pil_nearest
validation:
target: ldm.data.imagenet.ImageNetSRValidation
params:
size: 256
degradation: pil_nearest
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 1000
max_images: 8
increase_log_steps: True
trainer:
benchmark: True
accumulate_grad_batches: 2
================================================
FILE: stable_diffusion/configs/autoencoder/autoencoder_kl_64x64x3.yaml
================================================
model:
base_learning_rate: 4.5e-6
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: "val/rec_loss"
embed_dim: 3
lossconfig:
target: ldm.modules.losses.LPIPSWithDiscriminator
params:
disc_start: 50001
kl_weight: 0.000001
disc_weight: 0.5
ddconfig:
double_z: True
z_channels: 3
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,2,4 ] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [ ]
dropout: 0.0
data:
target: main.DataModuleFromConfig
params:
batch_size: 12
wrap: True
train:
target: ldm.data.imagenet.ImageNetSRTrain
params:
size: 256
degradation: pil_nearest
validation:
target: ldm.data.imagenet.ImageNetSRValidation
params:
size: 256
degradation: pil_nearest
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 1000
max_images: 8
increase_log_steps: True
trainer:
benchmark: True
accumulate_grad_batches: 2
================================================
FILE: stable_diffusion/configs/autoencoder/autoencoder_kl_8x8x64.yaml
================================================
model:
base_learning_rate: 4.5e-6
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: "val/rec_loss"
embed_dim: 64
lossconfig:
target: ldm.modules.losses.LPIPSWithDiscriminator
params:
disc_start: 50001
kl_weight: 0.000001
disc_weight: 0.5
ddconfig:
double_z: True
z_channels: 64
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,1,2,2,4,4] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [16,8]
dropout: 0.0
data:
target: main.DataModuleFromConfig
params:
batch_size: 12
wrap: True
train:
target: ldm.data.imagenet.ImageNetSRTrain
params:
size: 256
degradation: pil_nearest
validation:
target: ldm.data.imagenet.ImageNetSRValidation
params:
size: 256
degradation: pil_nearest
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 1000
max_images: 8
increase_log_steps: True
trainer:
benchmark: True
accumulate_grad_batches: 2
================================================
FILE: stable_diffusion/configs/latent-diffusion/celebahq-ldm-vq-4.yaml
================================================
model:
base_learning_rate: 2.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
image_size: 64
channels: 3
monitor: val/loss_simple_ema
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 64
in_channels: 3
out_channels: 3
model_channels: 224
attention_resolutions:
# note: this isn\t actually the resolution but
# the downsampling factor, i.e. this corresnponds to
# attention on spatial resolution 8,16,32, as the
# spatial reolution of the latents is 64 for f4
- 8
- 4
- 2
num_res_blocks: 2
channel_mult:
- 1
- 2
- 3
- 4
num_head_channels: 32
first_stage_config:
target: ldm.models.autoencoder.VQModelInterface
params:
embed_dim: 3
n_embed: 8192
ckpt_path: models/first_stage_models/vq-f4/model.ckpt
ddconfig:
double_z: false
z_channels: 3
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config: __is_unconditional__
data:
target: main.DataModuleFromConfig
params:
batch_size: 48
num_workers: 5
wrap: false
train:
target: taming.data.faceshq.CelebAHQTrain
params:
size: 256
validation:
target: taming.data.faceshq.CelebAHQValidation
params:
size: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 5000
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
================================================
FILE: stable_diffusion/configs/latent-diffusion/cin-ldm-vq-f8.yaml
================================================
model:
base_learning_rate: 1.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: class_label
image_size: 32
channels: 4
cond_stage_trainable: true
conditioning_key: crossattn
monitor: val/loss_simple_ema
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 256
attention_resolutions:
#note: this isn\t actually the resolution but
# the downsampling factor, i.e. this corresnponds to
# attention on spatial resolution 8,16,32, as the
# spatial reolution of the latents is 32 for f8
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
num_head_channels: 32
use_spatial_transformer: true
transformer_depth: 1
context_dim: 512
first_stage_config:
target: ldm.models.autoencoder.VQModelInterface
params:
embed_dim: 4
n_embed: 16384
ckpt_path: configs/first_stage_models/vq-f8/model.yaml
ddconfig:
double_z: false
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 32
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.ClassEmbedder
params:
embed_dim: 512
key: class_label
data:
target: main.DataModuleFromConfig
params:
batch_size: 64
num_workers: 12
wrap: false
train:
target: ldm.data.imagenet.ImageNetTrain
params:
config:
size: 256
validation:
target: ldm.data.imagenet.ImageNetValidation
params:
config:
size: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 5000
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
================================================
FILE: stable_diffusion/configs/latent-diffusion/cin256-v2.yaml
================================================
model:
base_learning_rate: 0.0001
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: class_label
image_size: 64
channels: 3
cond_stage_trainable: true
conditioning_key: crossattn
monitor: val/loss
use_ema: False
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 64
in_channels: 3
out_channels: 3
model_channels: 192
attention_resolutions:
- 8
- 4
- 2
num_res_blocks: 2
channel_mult:
- 1
- 2
- 3
- 5
num_heads: 1
use_spatial_transformer: true
transformer_depth: 1
context_dim: 512
first_stage_config:
target: ldm.models.autoencoder.VQModelInterface
params:
embed_dim: 3
n_embed: 8192
ddconfig:
double_z: false
z_channels: 3
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.ClassEmbedder
params:
n_classes: 1001
embed_dim: 512
key: class_label
================================================
FILE: stable_diffusion/configs/latent-diffusion/ffhq-ldm-vq-4.yaml
================================================
model:
base_learning_rate: 2.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
image_size: 64
channels: 3
monitor: val/loss_simple_ema
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 64
in_channels: 3
out_channels: 3
model_channels: 224
attention_resolutions:
# note: this isn\t actually the resolution but
# the downsampling factor, i.e. this corresnponds to
# attention on spatial resolution 8,16,32, as the
# spatial reolution of the latents is 64 for f4
- 8
- 4
- 2
num_res_blocks: 2
channel_mult:
- 1
- 2
- 3
- 4
num_head_channels: 32
first_stage_config:
target: ldm.models.autoencoder.VQModelInterface
params:
embed_dim: 3
n_embed: 8192
ckpt_path: configs/first_stage_models/vq-f4/model.yaml
ddconfig:
double_z: false
z_channels: 3
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config: __is_unconditional__
data:
target: main.DataModuleFromConfig
params:
batch_size: 42
num_workers: 5
wrap: false
train:
target: taming.data.faceshq.FFHQTrain
params:
size: 256
validation:
target: taming.data.faceshq.FFHQValidation
params:
size: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 5000
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
================================================
FILE: stable_diffusion/configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml
================================================
model:
base_learning_rate: 2.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
image_size: 64
channels: 3
monitor: val/loss_simple_ema
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 64
in_channels: 3
out_channels: 3
model_channels: 224
attention_resolutions:
# note: this isn\t actually the resolution but
# the downsampling factor, i.e. this corresnponds to
# attention on spatial resolution 8,16,32, as the
# spatial reolution of the latents is 64 for f4
- 8
- 4
- 2
num_res_blocks: 2
channel_mult:
- 1
- 2
- 3
- 4
num_head_channels: 32
first_stage_config:
target: ldm.models.autoencoder.VQModelInterface
params:
ckpt_path: configs/first_stage_models/vq-f4/model.yaml
embed_dim: 3
n_embed: 8192
ddconfig:
double_z: false
z_channels: 3
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config: __is_unconditional__
data:
target: main.DataModuleFromConfig
params:
batch_size: 48
num_workers: 5
wrap: false
train:
target: ldm.data.lsun.LSUNBedroomsTrain
params:
size: 256
validation:
target: ldm.data.lsun.LSUNBedroomsValidation
params:
size: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 5000
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
================================================
FILE: stable_diffusion/configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml
================================================
model:
base_learning_rate: 5.0e-5 # set to target_lr by starting main.py with '--scale_lr False'
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0155
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
loss_type: l1
first_stage_key: "image"
cond_stage_key: "image"
image_size: 32
channels: 4
cond_stage_trainable: False
concat_mode: False
scale_by_std: True
monitor: 'val/loss_simple_ema'
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [10000]
cycle_lengths: [10000000000000]
f_start: [1.e-6]
f_max: [1.]
f_min: [ 1.]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 192
attention_resolutions: [ 1, 2, 4, 8 ] # 32, 16, 8, 4
num_res_blocks: 2
channel_mult: [ 1,2,2,4,4 ] # 32, 16, 8, 4, 2
num_heads: 8
use_scale_shift_norm: True
resblock_updown: True
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: "val/rec_loss"
ckpt_path: "models/first_stage_models/kl-f8/model.ckpt"
ddconfig:
double_z: True
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,2,4,4 ] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [ ]
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config: "__is_unconditional__"
data:
target: main.DataModuleFromConfig
params:
batch_size: 96
num_workers: 5
wrap: False
train:
target: ldm.data.lsun.LSUNChurchesTrain
params:
size: 256
validation:
target: ldm.data.lsun.LSUNChurchesValidation
params:
size: 256
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 5000
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
================================================
FILE: stable_diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml
================================================
model:
base_learning_rate: 5.0e-05
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: caption
image_size: 32
channels: 4
cond_stage_trainable: true
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_heads: 8
use_spatial_transformer: true
transformer_depth: 1
context_dim: 1280
use_checkpoint: true
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.BERTEmbedder
params:
n_embed: 1280
n_layer: 32
================================================
FILE: stable_diffusion/configs/retrieval-augmented-diffusion/768x768.yaml
================================================
model:
base_learning_rate: 0.0001
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.015
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: jpg
cond_stage_key: nix
image_size: 48
channels: 16
cond_stage_trainable: false
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_by_std: false
scale_factor: 0.22765929
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 48
in_channels: 16
out_channels: 16
model_channels: 448
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 3
- 4
use_scale_shift_norm: false
resblock_updown: false
num_head_channels: 32
use_spatial_transformer: true
transformer_depth: 1
context_dim: 768
use_checkpoint: true
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: val/rec_loss
embed_dim: 16
ddconfig:
double_z: true
z_channels: 16
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 1
- 2
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 16
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: torch.nn.Identity
================================================
FILE: stable_diffusion/configs/stable-diffusion/v1-inference.yaml
================================================
model:
base_learning_rate: 1.0e-04
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: "jpg"
cond_stage_key: "txt"
image_size: 64
channels: 4
cond_stage_trainable: false # Note: different from the one we trained before
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [ 10000 ]
cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
f_start: [ 1.e-6 ]
f_max: [ 1. ]
f_min: [ 1. ]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32 # unused
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
================================================
FILE: stable_diffusion/data/example_conditioning/text_conditional/sample_0.txt
================================================
A basket of cerries
================================================
FILE: stable_diffusion/data/imagenet_clsidx_to_label.txt
================================================
0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
3: 'tiger shark, Galeocerdo cuvieri',
4: 'hammerhead, hammerhead shark',
5: 'electric ray, crampfish, numbfish, torpedo',
6: 'stingray',
7: 'cock',
8: 'hen',
9: 'ostrich, Struthio camelus',
10: 'brambling, Fringilla montifringilla',
11: 'goldfinch, Carduelis carduelis',
12: 'house finch, linnet, Carpodacus mexicanus',
13: 'junco, snowbird',
14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
15: 'robin, American robin, Turdus migratorius',
16: 'bulbul',
17: 'jay',
18: 'magpie',
19: 'chickadee',
20: 'water ouzel, dipper',
21: 'kite',
22: 'bald eagle, American eagle, Haliaeetus leucocephalus',
23: 'vulture',
24: 'great grey owl, great gray owl, Strix nebulosa',
25: 'European fire salamander, Salamandra salamandra',
26: 'common newt, Triturus vulgaris',
27: 'eft',
28: 'spotted salamander, Ambystoma maculatum',
29: 'axolotl, mud puppy, Ambystoma mexicanum',
30: 'bullfrog, Rana catesbeiana',
31: 'tree frog, tree-frog',
32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
33: 'loggerhead, loggerhead turtle, Caretta caretta',
34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
35: 'mud turtle',
36: 'terrapin',
37: 'box turtle, box tortoise',
38: 'banded gecko',
39: 'common iguana, iguana, Iguana iguana',
40: 'American chameleon, anole, Anolis carolinensis',
41: 'whiptail, whiptail lizard',
42: 'agama',
43: 'frilled lizard, Chlamydosaurus kingi',
44: 'alligator lizard',
45: 'Gila monster, Heloderma suspectum',
46: 'green lizard, Lacerta viridis',
47: 'African chameleon, Chamaeleo chamaeleon',
48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
49: 'African crocodile, Nile crocodile, Crocodylus niloticus',
50: 'American alligator, Alligator mississipiensis',
51: 'triceratops',
52: 'thunder snake, worm snake, Carphophis amoenus',
53: 'ringneck snake, ring-necked snake, ring snake',
54: 'hognose snake, puff adder, sand viper',
55: 'green snake, grass snake',
56: 'king snake, kingsnake',
57: 'garter snake, grass snake',
58: 'water snake',
59: 'vine snake',
60: 'night snake, Hypsiglena torquata',
61: 'boa constrictor, Constrictor constrictor',
62: 'rock python, rock snake, Python sebae',
63: 'Indian cobra, Naja naja',
64: 'green mamba',
65: 'sea snake',
66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus',
68: 'sidewinder, horned rattlesnake, Crotalus cerastes',
69: 'trilobite',
70: 'harvestman, daddy longlegs, Phalangium opilio',
71: 'scorpion',
72: 'black and gold garden spider, Argiope aurantia',
73: 'barn spider, Araneus cavaticus',
74: 'garden spider, Aranea diademata',
75: 'black widow, Latrodectus mactans',
76: 'tarantula',
77: 'wolf spider, hunting spider',
78: 'tick',
79: 'centipede',
80: 'black grouse',
81: 'ptarmigan',
82: 'ruffed grouse, partridge, Bonasa umbellus',
83: 'prairie chicken, prairie grouse, prairie fowl',
84: 'peacock',
85: 'quail',
86: 'partridge',
87: 'African grey, African gray, Psittacus erithacus',
88: 'macaw',
89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
90: 'lorikeet',
91: 'coucal',
92: 'bee eater',
93: 'hornbill',
94: 'hummingbird',
95: 'jacamar',
96: 'toucan',
97: 'drake',
98: 'red-breasted merganser, Mergus serrator',
99: 'goose',
100: 'black swan, Cygnus atratus',
101: 'tusker',
102: 'echidna, spiny anteater, anteater',
103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
104: 'wallaby, brush kangaroo',
105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
106: 'wombat',
107: 'jellyfish',
108: 'sea anemone, anemone',
109: 'brain coral',
110: 'flatworm, platyhelminth',
111: 'nematode, nematode worm, roundworm',
112: 'conch',
113: 'snail',
114: 'slug',
115: 'sea slug, nudibranch',
116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore',
117: 'chambered nautilus, pearly nautilus, nautilus',
118: 'Dungeness crab, Cancer magister',
119: 'rock crab, Cancer irroratus',
120: 'fiddler crab',
121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus',
123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
124: 'crayfish, crawfish, crawdad, crawdaddy',
125: 'hermit crab',
126: 'isopod',
127: 'white stork, Ciconia ciconia',
128: 'black stork, Ciconia nigra',
129: 'spoonbill',
130: 'flamingo',
131: 'little blue heron, Egretta caerulea',
132: 'American egret, great white heron, Egretta albus',
133: 'bittern',
134: 'crane',
135: 'limpkin, Aramus pictus',
136: 'European gallinule, Porphyrio porphyrio',
137: 'American coot, marsh hen, mud hen, water hen, Fulica americana',
138: 'bustard',
139: 'ruddy turnstone, Arenaria interpres',
140: 'red-backed sandpiper, dunlin, Erolia alpina',
141: 'redshank, Tringa totanus',
142: 'dowitcher',
143: 'oystercatcher, oyster catcher',
144: 'pelican',
145: 'king penguin, Aptenodytes patagonica',
146: 'albatross, mollymawk',
147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
149: 'dugong, Dugong dugon',
150: 'sea lion',
151: 'Chihuahua',
152: 'Japanese spaniel',
153: 'Maltese dog, Maltese terrier, Maltese',
154: 'Pekinese, Pekingese, Peke',
155: 'Shih-Tzu',
156: 'Blenheim spaniel',
157: 'papillon',
158: 'toy terrier',
159: 'Rhodesian ridgeback',
160: 'Afghan hound, Afghan',
161: 'basset, basset hound',
162: 'beagle',
163: 'bloodhound, sleuthhound',
164: 'bluetick',
165: 'black-and-tan coonhound',
166: 'Walker hound, Walker foxhound',
167: 'English foxhound',
168: 'redbone',
169: 'borzoi, Russian wolfhound',
170: 'Irish wolfhound',
171: 'Italian greyhound',
172: 'whippet',
173: 'Ibizan hound, Ibizan Podenco',
174: 'Norwegian elkhound, elkhound',
175: 'otterhound, otter hound',
176: 'Saluki, gazelle hound',
177: 'Scottish deerhound, deerhound',
178: 'Weimaraner',
179: 'Staffordshire bullterrier, Staffordshire bull terrier',
180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
181: 'Bedlington terrier',
182: 'Border terrier',
183: 'Kerry blue terrier',
184: 'Irish terrier',
185: 'Norfolk terrier',
186: 'Norwich terrier',
187: 'Yorkshire terrier',
188: 'wire-haired fox terrier',
189: 'Lakeland terrier',
190: 'Sealyham terrier, Sealyham',
191: 'Airedale, Airedale terrier',
192: 'cairn, cairn terrier',
193: 'Australian terrier',
194: 'Dandie Dinmont, Dandie Dinmont terrier',
195: 'Boston bull, Boston terrier',
196: 'miniature schnauzer',
197: 'giant schnauzer',
198: 'standard schnauzer',
199: 'Scotch terrier, Scottish terrier, Scottie',
200: 'Tibetan terrier, chrysanthemum dog',
201: 'silky terrier, Sydney silky',
202: 'soft-coated wheaten terrier',
203: 'West Highland white terrier',
204: 'Lhasa, Lhasa apso',
205: 'flat-coated retriever',
206: 'curly-coated retriever',
207: 'golden retriever',
208: 'Labrador retriever',
209: 'Chesapeake Bay retriever',
210: 'German short-haired pointer',
211: 'vizsla, Hungarian pointer',
212: 'English setter',
213: 'Irish setter, red setter',
214: 'Gordon setter',
215: 'Brittany spaniel',
216: 'clumber, clumber spaniel',
217: 'English springer, English springer spaniel',
218: 'Welsh springer spaniel',
219: 'cocker spaniel, English cocker spaniel, cocker',
220: 'Sussex spaniel',
221: 'Irish water spaniel',
222: 'kuvasz',
223: 'schipperke',
224: 'groenendael',
225: 'malinois',
226: 'briard',
227: 'kelpie',
228: 'komondor',
229: 'Old English sheepdog, bobtail',
230: 'Shetland sheepdog, Shetland sheep dog, Shetland',
231: 'collie',
232: 'Border collie',
233: 'Bouvier des Flandres, Bouviers des Flandres',
234: 'Rottweiler',
235: 'German shepherd, German shepherd dog, German police dog, alsatian',
236: 'Doberman, Doberman pinscher',
237: 'miniature pinscher',
238: 'Greater Swiss Mountain dog',
239: 'Bernese mountain dog',
240: 'Appenzeller',
241: 'EntleBucher',
242: 'boxer',
243: 'bull mastiff',
244: 'Tibetan mastiff',
245: 'French bulldog',
246: 'Great Dane',
247: 'Saint Bernard, St Bernard',
248: 'Eskimo dog, husky',
249: 'malamute, malemute, Alaskan malamute',
250: 'Siberian husky',
251: 'dalmatian, coach dog, carriage dog',
252: 'affenpinscher, monkey pinscher, monkey dog',
253: 'basenji',
254: 'pug, pug-dog',
255: 'Leonberg',
256: 'Newfoundland, Newfoundland dog',
257: 'Great Pyrenees',
258: 'Samoyed, Samoyede',
259: 'Pomeranian',
260: 'chow, chow chow',
261: 'keeshond',
262: 'Brabancon griffon',
263: 'Pembroke, Pembroke Welsh corgi',
264: 'Cardigan, Cardigan Welsh corgi',
265: 'toy poodle',
266: 'miniature poodle',
267: 'standard poodle',
268: 'Mexican hairless',
269: 'timber wolf, grey wolf, gray wolf, Canis lupus',
270: 'white wolf, Arctic wolf, Canis lupus tundrarum',
271: 'red wolf, maned wolf, Canis rufus, Canis niger',
272: 'coyote, prairie wolf, brush wolf, Canis latrans',
273: 'dingo, warrigal, warragal, Canis dingo',
274: 'dhole, Cuon alpinus',
275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
276: 'hyena, hyaena',
277: 'red fox, Vulpes vulpes',
278: 'kit fox, Vulpes macrotis',
279: 'Arctic fox, white fox, Alopex lagopus',
280: 'grey fox, gray fox, Urocyon cinereoargenteus',
281: 'tabby, tabby cat',
282: 'tiger cat',
283: 'Persian cat',
284: 'Siamese cat, Siamese',
285: 'Egyptian cat',
286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
287: 'lynx, catamount',
288: 'leopard, Panthera pardus',
289: 'snow leopard, ounce, Panthera uncia',
290: 'jaguar, panther, Panthera onca, Felis onca',
291: 'lion, king of beasts, Panthera leo',
292: 'tiger, Panthera tigris',
293: 'cheetah, chetah, Acinonyx jubatus',
294: 'brown bear, bruin, Ursus arctos',
295: 'American black bear, black bear, Ursus americanus, Euarctos americanus',
296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
297: 'sloth bear, Melursus ursinus, Ursus ursinus',
298: 'mongoose',
299: 'meerkat, mierkat',
300: 'tiger beetle',
301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
302: 'ground beetle, carabid beetle',
303: 'long-horned beetle, longicorn, longicorn beetle',
304: 'leaf beetle, chrysomelid',
305: 'dung beetle',
306: 'rhinoceros beetle',
307: 'weevil',
308: 'fly',
309: 'bee',
310: 'ant, emmet, pismire',
311: 'grasshopper, hopper',
312: 'cricket',
313: 'walking stick, walkingstick, stick insect',
314: 'cockroach, roach',
315: 'mantis, mantid',
316: 'cicada, cicala',
317: 'leafhopper',
318: 'lacewing, lacewing fly',
319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
320: 'damselfly',
321: 'admiral',
322: 'ringlet, ringlet butterfly',
323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
324: 'cabbage butterfly',
325: 'sulphur butterfly, sulfur butterfly',
326: 'lycaenid, lycaenid butterfly',
327: 'starfish, sea star',
328: 'sea urchin',
329: 'sea cucumber, holothurian',
330: 'wood rabbit, cottontail, cottontail rabbit',
331: 'hare',
332: 'Angora, Angora rabbit',
333: 'hamster',
334: 'porcupine, hedgehog',
335: 'fox squirrel, eastern fox squirrel, Sciurus niger',
336: 'marmot',
337: 'beaver',
338: 'guinea pig, Cavia cobaya',
339: 'sorrel',
340: 'zebra',
341: 'hog, pig, grunter, squealer, Sus scrofa',
342: 'wild boar, boar, Sus scrofa',
343: 'warthog',
344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius',
345: 'ox',
346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
347: 'bison',
348: 'ram, tup',
349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
350: 'ibex, Capra ibex',
351: 'hartebeest',
352: 'impala, Aepyceros melampus',
353: 'gazelle',
354: 'Arabian camel, dromedary, Camelus dromedarius',
355: 'llama',
356: 'weasel',
357: 'mink',
358: 'polecat, fitch, foulmart, foumart, Mustela putorius',
359: 'black-footed ferret, ferret, Mustela nigripes',
360: 'otter',
361: 'skunk, polecat, wood pussy',
362: 'badger',
363: 'armadillo',
364: 'three-toed sloth, ai, Bradypus tridactylus',
365: 'orangutan, orang, orangutang, Pongo pygmaeus',
366: 'gorilla, Gorilla gorilla',
367: 'chimpanzee, chimp, Pan troglodytes',
368: 'gibbon, Hylobates lar',
369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus',
370: 'guenon, guenon monkey',
371: 'patas, hussar monkey, Erythrocebus patas',
372: 'baboon',
373: 'macaque',
374: 'langur',
375: 'colobus, colobus monkey',
376: 'proboscis monkey, Nasalis larvatus',
377: 'marmoset',
378: 'capuchin, ringtail, Cebus capucinus',
379: 'howler monkey, howler',
380: 'titi, titi monkey',
381: 'spider monkey, Ateles geoffroyi',
382: 'squirrel monkey, Saimiri sciureus',
383: 'Madagascar cat, ring-tailed lemur, Lemur catta',
384: 'indri, indris, Indri indri, Indri brevicaudatus',
385: 'Indian elephant, Elephas maximus',
386: 'African elephant, Loxodonta africana',
387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
389: 'barracouta, snoek',
390: 'eel',
391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
392: 'rock beauty, Holocanthus tricolor',
393: 'anemone fish',
394: 'sturgeon',
395: 'gar, garfish, garpike, billfish, Lepisosteus osseus',
396: 'lionfish',
397: 'puffer, pufferfish, blowfish, globefish',
398: 'abacus',
399: 'abaya',
400: "academic gown, academic robe, judge's robe",
401: 'accordion, piano accordion, squeeze box',
402: 'acoustic guitar',
403: 'aircraft carrier, carrier, flattop, attack aircraft carrier',
404: 'airliner',
405: 'airship, dirigible',
406: 'altar',
407: 'ambulance',
408: 'amphibian, amphibious vehicle',
409: 'analog clock',
410: 'apiary, bee house',
411: 'apron',
412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
413: 'assault rifle, assault gun',
414: 'backpack, back pack, knapsack, packsack, rucksack, haversack',
415: 'bakery, bakeshop, bakehouse',
416: 'balance beam, beam',
417: 'balloon',
418: 'ballpoint, ballpoint pen, ballpen, Biro',
419: 'Band Aid',
420: 'banjo',
421: 'bannister, banister, balustrade, balusters, handrail',
422: 'barbell',
423: 'barber chair',
424: 'barbershop',
425: 'barn',
426: 'barometer',
427: 'barrel, cask',
428: 'barrow, garden cart, lawn cart, wheelbarrow',
429: 'baseball',
430: 'basketball',
431: 'bassinet',
432: 'bassoon',
433: 'bathing cap, swimming cap',
434: 'bath towel',
435: 'bathtub, bathing tub, bath, tub',
436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
437: 'beacon, lighthouse, beacon light, pharos',
438: 'beaker',
439: 'bearskin, busby, shako',
440: 'beer bottle',
441: 'beer glass',
442: 'bell cote, bell cot',
443: 'bib',
444: 'bicycle-built-for-two, tandem bicycle, tandem',
445: 'bikini, two-piece',
446: 'binder, ring-binder',
447: 'binoculars, field glasses, opera glasses',
448: 'birdhouse',
449: 'boathouse',
450: 'bobsled, bobsleigh, bob',
451: 'bolo tie, bolo, bola tie, bola',
452: 'bonnet, poke bonnet',
453: 'bookcase',
454: 'bookshop, bookstore, bookstall',
455: 'bottlecap',
456: 'bow',
457: 'bow tie, bow-tie, bowtie',
458: 'brass, memorial tablet, plaque',
459: 'brassiere, bra, bandeau',
460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
461: 'breastplate, aegis, egis',
462: 'broom',
463: 'bucket, pail',
464: 'buckle',
465: 'bulletproof vest',
466: 'bullet train, bullet',
467: 'butcher shop, meat market',
468: 'cab, hack, taxi, taxicab',
469: 'caldron, cauldron',
470: 'candle, taper, wax light',
471: 'cannon',
472: 'canoe',
473: 'can opener, tin opener',
474: 'cardigan',
475: 'car mirror',
476: 'carousel, carrousel, merry-go-round, roundabout, whirligig',
477: "carpenter's kit, tool kit",
478: 'carton',
479: 'car wheel',
480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
481: 'cassette',
482: 'cassette player',
483: 'castle',
484: 'catamaran',
485: 'CD player',
486: 'cello, violoncello',
487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone',
488: 'chain',
489: 'chainlink fence',
490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
491: 'chain saw, chainsaw',
492: 'chest',
493: 'chiffonier, commode',
494: 'chime, bell, gong',
495: 'china cabinet, china closet',
496: 'Christmas stocking',
497: 'church, church building',
498: 'cinema, movie theater, movie theatre, movie house, picture palace',
499: 'cleaver, meat cleaver, chopper',
500: 'cliff dwelling',
501: 'cloak',
502: 'clog, geta, patten, sabot',
503: 'cocktail shaker',
504: 'coffee mug',
505: 'coffeepot',
506: 'coil, spiral, volute, whorl, helix',
507: 'combination lock',
508: 'computer keyboard, keypad',
509: 'confectionery, confectionary, candy store',
510: 'container ship, containership, container vessel',
511: 'convertible',
512: 'corkscrew, bottle screw',
513: 'cornet, horn, trumpet, trump',
514: 'cowboy boot',
515: 'cowboy hat, ten-gallon hat',
516: 'cradle',
517: 'crane',
518: 'crash helmet',
519: 'crate',
520: 'crib, cot',
521: 'Crock Pot',
522: 'croquet ball',
523: 'crutch',
524: 'cuirass',
525: 'dam, dike, dyke',
526: 'desk',
527: 'desktop computer',
528: 'dial telephone, dial phone',
529: 'diaper, nappy, napkin',
530: 'digital clock',
531: 'digital watch',
532: 'dining table, board',
533: 'dishrag, dishcloth',
534: 'dishwasher, dish washer, dishwashing machine',
535: 'disk brake, disc brake',
536: 'dock, dockage, docking facility',
537: 'dogsled, dog sled, dog sleigh',
538: 'dome',
539: 'doormat, welcome mat',
540: 'drilling platform, offshore rig',
541: 'drum, membranophone, tympan',
542: 'drumstick',
543: 'dumbbell',
544: 'Dutch oven',
545: 'electric fan, blower',
546: 'electric guitar',
547: 'electric locomotive',
548: 'entertainment center',
549: 'envelope',
550: 'espresso maker',
551: 'face powder',
552: 'feather boa, boa',
553: 'file, file cabinet, filing cabinet',
554: 'fireboat',
555: 'fire engine, fire truck',
556: 'fire screen, fireguard',
557: 'flagpole, flagstaff',
558: 'flute, transverse flute',
559: 'folding chair',
560: 'football helmet',
561: 'forklift',
562: 'fountain',
563: 'fountain pen',
564: 'four-poster',
565: 'freight car',
566: 'French horn, horn',
567: 'frying pan, frypan, skillet',
568: 'fur coat',
569: 'garbage truck, dustcart',
570: 'gasmask, respirator, gas helmet',
571: 'gas pump, gasoline pump, petrol pump, island dispenser',
572: 'goblet',
573: 'go-kart',
574: 'golf ball',
575: 'golfcart, golf cart',
576: 'gondola',
577: 'gong, tam-tam',
578: 'gown',
579: 'grand piano, grand',
580: 'greenhouse, nursery, glasshouse',
581: 'grille, radiator grille',
582: 'grocery store, grocery, food market, market',
583: 'guillotine',
584: 'hair slide',
585: 'hair spray',
586: 'half track',
587: 'hammer',
588: 'hamper',
589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier',
590: 'hand-held computer, hand-held microcomputer',
591: 'handkerchief, hankie, hanky, hankey',
592: 'hard disc, hard disk, fixed disk',
593: 'harmonica, mouth organ, harp, mouth harp',
594: 'harp',
595: 'harvester, reaper',
596: 'hatchet',
597: 'holster',
598: 'home theater, home theatre',
599: 'honeycomb',
600: 'hook, claw',
601: 'hoopskirt, crinoline',
602: 'horizontal bar, high bar',
603: 'horse cart, horse-cart',
604: 'hourglass',
605: 'iPod',
606: 'iron, smoothing iron',
607: "jack-o'-lantern",
608: 'jean, blue jean, denim',
609: 'jeep, landrover',
610: 'jersey, T-shirt, tee shirt',
611: 'jigsaw puzzle',
612: 'jinrikisha, ricksha, rickshaw',
613: 'joystick',
614: 'kimono',
615: 'knee pad',
616: 'knot',
617: 'lab coat, laboratory coat',
618: 'ladle',
619: 'lampshade, lamp shade',
620: 'laptop, laptop computer',
621: 'lawn mower, mower',
622: 'lens cap, lens cover',
623: 'letter opener, paper knife, paperknife',
624: 'library',
625: 'lifeboat',
626: 'lighter, light, igniter, ignitor',
627: 'limousine, limo',
628: 'liner, ocean liner',
629: 'lipstick, lip rouge',
630: 'Loafer',
631: 'lotion',
632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
633: "loupe, jeweler's loupe",
634: 'lumbermill, sawmill',
635: 'magnetic compass',
636: 'mailbag, postbag',
637: 'mailbox, letter box',
638: 'maillot',
639: 'maillot, tank suit',
640: 'manhole cover',
641: 'maraca',
642: 'marimba, xylophone',
643: 'mask',
644: 'matchstick',
645: 'maypole',
646: 'maze, labyrinth',
647: 'measuring cup',
648: 'medicine chest, medicine cabinet',
649: 'megalith, megalithic structure',
650: 'microphone, mike',
651: 'microwave, microwave oven',
652: 'military uniform',
653: 'milk can',
654: 'minibus',
655: 'miniskirt, mini',
656: 'minivan',
657: 'missile',
658: 'mitten',
659: 'mixing bowl',
660: 'mobile home, manufactured home',
661: 'Model T',
662: 'modem',
663: 'monastery',
664: 'monitor',
665: 'moped',
666: 'mortar',
667: 'mortarboard',
668: 'mosque',
669: 'mosquito net',
670: 'motor scooter, scooter',
671: 'mountain bike, all-terrain bike, off-roader',
672: 'mountain tent',
673: 'mouse, computer mouse',
674: 'mousetrap',
675: 'moving van',
676: 'muzzle',
677: 'nail',
678: 'neck brace',
679: 'necklace',
680: 'nipple',
681: 'notebook, notebook computer',
682: 'obelisk',
683: 'oboe, hautboy, hautbois',
684: 'ocarina, sweet potato',
685: 'odometer, hodometer, mileometer, milometer',
686: 'oil filter',
687: 'organ, pipe organ',
688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO',
689: 'overskirt',
690: 'oxcart',
691: 'oxygen mask',
692: 'packet',
693: 'paddle, boat paddle',
694: 'paddlewheel, paddle wheel',
695: 'padlock',
696: 'paintbrush',
697: "pajama, pyjama, pj's, jammies",
698: 'palace',
699: 'panpipe, pandean pipe, syrinx',
700: 'paper towel',
701: 'parachute, chute',
702: 'parallel bars, bars',
703: 'park bench',
704: 'parking meter',
705: 'passenger car, coach, carriage',
706: 'patio, terrace',
707: 'pay-phone, pay-station',
708: 'pedestal, plinth, footstall',
709: 'pencil box, pencil case',
710: 'pencil sharpener',
711: 'perfume, essence',
712: 'Petri dish',
713: 'photocopier',
714: 'pick, plectrum, plectron',
715: 'pickelhaube',
716: 'picket fence, paling',
717: 'pickup, pickup truck',
718: 'pier',
719: 'piggy bank, penny bank',
720: 'pill bottle',
721: 'pillow',
722: 'ping-pong ball',
723: 'pinwheel',
724: 'pirate, pirate ship',
725: 'pitcher, ewer',
726: "plane, carpenter's plane, woodworking plane",
727: 'planetarium',
728: 'plastic bag',
729: 'plate rack',
730: 'plow, plough',
731: "plunger, plumber's helper",
732: 'Polaroid camera, Polaroid Land camera',
733: 'pole',
734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
735: 'poncho',
736: 'pool table, billiard table, snooker table',
737: 'pop bottle, soda bottle',
738: 'pot, flowerpot',
739: "potter's wheel",
740: 'power drill',
741: 'prayer rug, prayer mat',
742: 'printer',
743: 'prison, prison house',
744: 'projectile, missile',
745: 'projector',
746: 'puck, hockey puck',
747: 'punching bag, punch bag, punching ball, punchball',
748: 'purse',
749: 'quill, quill pen',
750: 'quilt, comforter, comfort, puff',
751: 'racer, race car, racing car',
752: 'racket, racquet',
753: 'radiator',
754: 'radio, wireless',
755: 'radio telescope, radio reflector',
756: 'rain barrel',
757: 'recreational vehicle, RV, R.V.',
758: 'reel',
759: 'reflex camera',
760: 'refrigerator, icebox',
761: 'remote control, remote',
762: 'restaurant, eating house, eating place, eatery',
763: 'revolver, six-gun, six-shooter',
764: 'rifle',
765: 'rocking chair, rocker',
766: 'rotisserie',
767: 'rubber eraser, rubber, pencil eraser',
768: 'rugby ball',
769: 'rule, ruler',
770: 'running shoe',
771: 'safe',
772: 'safety pin',
773: 'saltshaker, salt shaker',
774: 'sandal',
775: 'sarong',
776: 'sax, saxophone',
777: 'scabbard',
778: 'scale, weighing machine',
779: 'school bus',
780: 'schooner',
781: 'scoreboard',
782: 'screen, CRT screen',
783: 'screw',
784: 'screwdriver',
785: 'seat belt, seatbelt',
786: 'sewing machine',
787: 'shield, buckler',
788: 'shoe shop, shoe-shop, shoe store',
789: 'shoji',
790: 'shopping basket',
791: 'shopping cart',
792: 'shovel',
793: 'shower cap',
794: 'shower curtain',
795: 'ski',
796: 'ski mask',
797: 'sleeping bag',
798: 'slide rule, slipstick',
799: 'sliding door',
800: 'slot, one-armed bandit',
801: 'snorkel',
802: 'snowmobile',
803: 'snowplow, snowplough',
804: 'soap dispenser',
805: 'soccer ball',
806: 'sock',
807: 'solar dish, solar collector, solar furnace',
808: 'sombrero',
809: 'soup bowl',
810: 'space bar',
811: 'space heater',
812: 'space shuttle',
813: 'spatula',
814: 'speedboat',
815: "spider web, spider's web",
816: 'spindle',
817: 'sports car, sport car',
818: 'spotlight, spot',
819: 'stage',
820: 'steam locomotive',
821: 'steel arch bridge',
822: 'steel drum',
823: 'stethoscope',
824: 'stole',
825: 'stone wall',
826: 'stopwatch, stop watch',
827: 'stove',
828: 'strainer',
829: 'streetcar, tram, tramcar, trolley, trolley car',
830: 'stretcher',
831: 'studio couch, day bed',
832: 'stupa, tope',
833: 'submarine, pigboat, sub, U-boat',
834: 'suit, suit of clothes',
835: 'sundial',
836: 'sunglass',
837: 'sunglasses, dark glasses, shades',
838: 'sunscreen, sunblock, sun blocker',
839: 'suspension bridge',
840: 'swab, swob, mop',
841: 'sweatshirt',
842: 'swimming trunks, bathing trunks',
843: 'swing',
844: 'switch, electric switch, electrical switch',
845: 'syringe',
846: 'table lamp',
847: 'tank, army tank, armored combat vehicle, armoured combat vehicle',
848: 'tape player',
849: 'teapot',
850: 'teddy, teddy bear',
851: 'television, television system',
852: 'tennis ball',
853: 'thatch, thatched roof',
854: 'theater curtain, theatre curtain',
855: 'thimble',
856: 'thresher, thrasher, threshing machine',
857: 'throne',
858: 'tile roof',
859: 'toaster',
860: 'tobacco shop, tobacconist shop, tobacconist',
861: 'toilet seat',
862: 'torch',
863: 'totem pole',
864: 'tow truck, tow car, wrecker',
865: 'toyshop',
866: 'tractor',
867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
868: 'tray',
869: 'trench coat',
870: 'tricycle, trike, velocipede',
871: 'trimaran',
872: 'tripod',
873: 'triumphal arch',
874: 'trolleybus, trolley coach, trackless trolley',
875: 'trombone',
876: 'tub, vat',
877: 'turnstile',
878: 'typewriter keyboard',
879: 'umbrella',
880: 'unicycle, monocycle',
881: 'upright, upright piano',
882: 'vacuum, vacuum cleaner',
883: 'vase',
884: 'vault',
885: 'velvet',
886: 'vending machine',
887: 'vestment',
888: 'viaduct',
889: 'violin, fiddle',
890: 'volleyball',
891: 'waffle iron',
892: 'wall clock',
893: 'wallet, billfold, notecase, pocketbook',
894: 'wardrobe, closet, press',
895: 'warplane, military plane',
896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
897: 'washer, automatic washer, washing machine',
898: 'water bottle',
899: 'water jug',
900: 'water tower',
901: 'whiskey jug',
902: 'whistle',
903: 'wig',
904: 'window screen',
905: 'window shade',
906: 'Windsor tie',
907: 'wine bottle',
908: 'wing',
909: 'wok',
910: 'wooden spoon',
911: 'wool, woolen, woollen',
912: 'worm fence, snake fence, snake-rail fence, Virginia fence',
913: 'wreck',
914: 'yawl',
915: 'yurt',
916: 'web site, website, internet site, site',
917: 'comic book',
918: 'crossword puzzle, crossword',
919: 'street sign',
920: 'traffic light, traffic signal, stoplight',
921: 'book jacket, dust cover, dust jacket, dust wrapper',
922: 'menu',
923: 'plate',
924: 'guacamole',
925: 'consomme',
926: 'hot pot, hotpot',
927: 'trifle',
928: 'ice cream, icecream',
929: 'ice lolly, lolly, lollipop, popsicle',
930: 'French loaf',
931: 'bagel, beigel',
932: 'pretzel',
933: 'cheeseburger',
934: 'hotdog, hot dog, red hot',
935: 'mashed potato',
936: 'head cabbage',
937: 'broccoli',
938: 'cauliflower',
939: 'zucchini, courgette',
940: 'spaghetti squash',
941: 'acorn squash',
942: 'butternut squash',
943: 'cucumber, cuke',
944: 'artichoke, globe artichoke',
945: 'bell pepper',
946: 'cardoon',
947: 'mushroom',
948: 'Granny Smith',
949: 'strawberry',
950: 'orange',
951: 'lemon',
952: 'fig',
953: 'pineapple, ananas',
954: 'banana',
955: 'jackfruit, jak, jack',
956: 'custard apple',
957: 'pomegranate',
958: 'hay',
959: 'carbonara',
960: 'chocolate sauce, chocolate syrup',
961: 'dough',
962: 'meat loaf, meatloaf',
963: 'pizza, pizza pie',
964: 'potpie',
965: 'burrito',
966: 'red wine',
967: 'espresso',
968: 'cup',
969: 'eggnog',
970: 'alp',
971: 'bubble',
972: 'cliff, drop, drop-off',
973: 'coral reef',
974: 'geyser',
975: 'lakeside, lakeshore',
976: 'promontory, headland, head, foreland',
977: 'sandbar, sand bar',
978: 'seashore, coast, seacoast, sea-coast',
979: 'valley, vale',
980: 'volcano',
981: 'ballplayer, baseball player',
982: 'groom, bridegroom',
983: 'scuba diver',
984: 'rapeseed',
985: 'daisy',
986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
987: 'corn',
988: 'acorn',
989: 'hip, rose hip, rosehip',
990: 'buckeye, horse chestnut, conker',
991: 'coral fungus',
992: 'agaric',
993: 'gyromitra',
994: 'stinkhorn, carrion fungus',
995: 'earthstar',
996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
997: 'bolete',
998: 'ear, spike, capitulum',
999: 'toilet tissue, toilet paper, bathroom tissue'
================================================
FILE: stable_diffusion/data/imagenet_train_hr_indices.p.REMOVED.git-id
================================================
b8d6d4689d2ecf32147e9cc2f5e6c50e072df26f
================================================
FILE: stable_diffusion/data/index_synset.yaml
================================================
0: n01440764
1: n01443537
2: n01484850
3: n01491361
4: n01494475
5: n01496331
6: n01498041
7: n01514668
8: n07646067
9: n01518878
10: n01530575
11: n01531178
12: n01532829
13: n01534433
14: n01537544
15: n01558993
16: n01560419
17: n01580077
18: n01582220
19: n01592084
20: n01601694
21: n13382471
22: n01614925
23: n01616318
24: n01622779
25: n01629819
26: n01630670
27: n01631663
28: n01632458
29: n01632777
30: n01641577
31: n01644373
32: n01644900
33: n01664065
34: n01665541
35: n01667114
36: n01667778
37: n01669191
38: n01675722
39: n01677366
40: n01682714
41: n01685808
42: n01687978
43: n01688243
44: n01689811
45: n01692333
46: n01693334
47: n01694178
48: n01695060
49: n01697457
50: n01698640
51: n01704323
52: n01728572
53: n01728920
54: n01729322
55: n01729977
56: n01734418
57: n01735189
58: n01737021
59: n01739381
60: n01740131
61: n01742172
62: n01744401
63: n01748264
64: n01749939
65: n01751748
66: n01753488
67: n01755581
68: n01756291
69: n01768244
70: n01770081
71: n01770393
72: n01773157
73: n01773549
74: n01773797
75: n01774384
76: n01774750
77: n01775062
78: n04432308
79: n01784675
80: n01795545
81: n01796340
82: n01797886
83: n01798484
84: n01806143
85: n07647321
86: n07647496
87: n01817953
88: n01818515
89: n01819313
90: n01820546
91: n01824575
92: n01828970
93: n01829413
94: n01833805
95: n01843065
96: n01843383
97: n01847000
98: n01855032
99: n07646821
100: n01860187
101: n01871265
102: n01872772
103: n01873310
104: n01877812
105: n01882714
106: n01883070
107: n01910747
108: n01914609
109: n01917289
110: n01924916
111: n01930112
112: n01943899
113: n01944390
114: n13719102
115: n01950731
116: n01955084
117: n01968897
118: n01978287
119: n01978455
120: n01980166
121: n01981276
122: n01983481
123: n01984695
124: n01985128
125: n01986214
126: n01990800
127: n02002556
128: n02002724
129: n02006656
130: n02007558
131: n02009229
132: n02009912
133: n02011460
134: n03126707
135: n02013706
136: n02017213
137: n02018207
138: n02018795
139: n02025239
140: n02027492
141: n02028035
142: n02033041
143: n02037110
144: n02051845
145: n02056570
146: n02058221
147: n02066245
148: n02071294
149: n02074367
150: n02077923
151: n08742578
152: n02085782
153: n02085936
154: n02086079
155: n02086240
156: n02086646
157: n02086910
158: n02087046
159: n02087394
160: n02088094
161: n02088238
162: n02088364
163: n02088466
164: n02088632
165: n02089078
166: n02089867
167: n02089973
168: n02090379
169: n02090622
170: n02090721
171: n02091032
172: n02091134
173: n02091244
174: n02091467
175: n02091635
176: n02091831
177: n02092002
178: n02092339
179: n02093256
180: n02093428
181: n02093647
182: n02093754
183: n02093859
184: n02093991
185: n02094114
186: n02094258
187: n02094433
188: n02095314
189: n02095570
190: n02095889
191: n02096051
192: n02096177
193: n02096294
194: n02096437
195: n02096585
196: n02097047
197: n02097130
198: n02097209
199: n02097298
200: n02097474
201: n02097658
202: n02098105
203: n02098286
204: n02098413
205: n02099267
206: n02099429
207: n02099601
208: n02099712
209: n02099849
210: n02100236
211: n02100583
212: n02100735
213: n02100877
214: n02101006
215: n02101388
216: n02101556
217: n02102040
218: n02102177
219: n02102318
220: n02102480
221: n02102973
222: n02104029
223: n02104365
224: n02105056
225: n02105162
226: n02105251
227: n02105412
228: n02105505
229: n02105641
230: n02105855
231: n02106030
232: n02106166
233: n02106382
234: n02106550
235: n02106662
236: n02107142
237: n02107312
238: n02107574
239: n02107683
240: n02107908
241: n02108000
242: n02108089
243: n02108422
244: n02108551
245: n02108915
246: n02109047
247: n02109525
248: n02109961
249: n02110063
250: n02110185
251: n02110341
252: n02110627
253: n02110806
254: n02110958
255: n02111129
256: n02111277
257: n02111500
258: n02111889
259: n02112018
260: n02112137
261: n02112350
262: n02112706
263: n02113023
264: n02113186
265: n02113624
266: n02113712
267: n02113799
268: n02113978
269: n02114367
270: n02114548
271: n02114712
272: n02114855
273: n02115641
274: n02115913
275: n02116738
276: n02117135
277: n02119022
278: n02119789
279: n02120079
280: n02120505
281: n02123045
282: n02123159
283: n02123394
284: n02123597
285: n02124075
286: n02125311
287: n02127052
288: n02128385
289: n02128757
290: n02128925
291: n02129165
292: n02129604
293: n02130308
294: n02132136
295: n02133161
296: n02134084
297: n02134418
298: n02137549
299: n02138441
300: n02165105
301: n02165456
302: n02167151
303: n02168699
304: n02169497
305: n02172182
306: n02174001
307: n02177972
308: n03373237
309: n07975909
310: n02219486
311: n02226429
312: n02229544
313: n02231487
314: n02233338
315: n02236044
316: n02256656
317: n02259212
318: n02264363
319: n02268443
320: n02268853
321: n02276258
322: n02277742
323: n02279972
324: n02280649
325: n02281406
326: n02281787
327: n02317335
328: n02319095
329: n02321529
330: n02325366
331: n02326432
332: n02328150
333: n02342885
334: n02346627
335: n02356798
336: n02361337
337: n05262120
338: n02364673
339: n02389026
340: n02391049
341: n02395406
342: n02396427
343: n02397096
344: n02398521
345: n02403003
346: n02408429
347: n02410509
348: n02412080
349: n02415577
350: n02417914
351: n02422106
352: n02422699
353: n02423022
354: n02437312
355: n02437616
356: n10771990
357: n14765497
358: n02443114
359: n02443484
360: n14765785
361: n02445715
362: n02447366
363: n02454379
364: n02457408
365: n02480495
366: n02480855
367: n02481823
368: n02483362
369: n02483708
370: n02484975
371: n02486261
372: n02486410
373: n02487347
374: n02488291
375: n02488702
376: n02489166
377: n02490219
378: n02492035
379: n02492660
380: n02493509
381: n02493793
382: n02494079
383: n02497673
384: n02500267
385: n02504013
386: n02504458
387: n02509815
388: n02510455
389: n02514041
390: n07783967
391: n02536864
392: n02606052
393: n02607072
394: n02640242
395: n02641379
396: n02643566
397: n02655020
398: n02666347
399: n02667093
400: n02669723
401: n02672831
402: n02676566
403: n02687172
404: n02690373
405: n02692877
406: n02699494
407: n02701002
408: n02704792
409: n02708093
410: n02727426
411: n08496334
412: n02747177
413: n02749479
414: n02769748
415: n02776631
416: n02777292
417: n02782329
418: n02783161
419: n02786058
420: n02787622
421: n02788148
422: n02790996
423: n02791124
424: n02791270
425: n02793495
426: n02794156
427: n02795169
428: n02797295
429: n02799071
430: n02802426
431: n02804515
432: n02804610
433: n02807133
434: n02808304
435: n02808440
436: n02814533
437: n02814860
438: n02815834
439: n02817516
440: n02823428
441: n02823750
442: n02825657
443: n02834397
444: n02835271
445: n02837789
446: n02840245
447: n02841315
448: n02843684
449: n02859443
450: n02860847
451: n02865351
452: n02869837
453: n02870880
454: n02871525
455: n02877765
456: n02880308
457: n02883205
458: n02892201
459: n02892767
460: n02894605
461: n02895154
462: n12520864
463: n02909870
464: n02910353
465: n02916936
466: n02917067
467: n02927161
468: n02930766
469: n02939185
470: n02948072
471: n02950826
472: n02951358
473: n02951585
474: n02963159
475: n02965783
476: n02966193
477: n02966687
478: n02971356
479: n02974003
480: n02977058
481: n02978881
482: n02979186
483: n02980441
484: n02981792
485: n02988304
486: n02992211
487: n02992529
488: n13652994
489: n03000134
490: n03000247
491: n03000684
492: n03014705
493: n03016953
494: n03017168
495: n03018349
496: n03026506
497: n03028079
498: n03032252
499: n03041632
500: n03042490
501: n03045698
502: n03047690
503: n03062245
504: n03063599
505: n03063689
506: n03065424
507: n03075370
508: n03085013
509: n03089624
510: n03095699
511: n03100240
512: n03109150
513: n03110669
514: n03124043
515: n03124170
516: n15142452
517: n03126707
518: n03127747
519: n03127925
520: n03131574
521: n03133878
522: n03134739
523: n03141823
524: n03146219
525: n03160309
526: n03179701
527: n03180011
528: n03187595
529: n03188531
530: n03196217
531: n03197337
532: n03201208
533: n03207743
534: n03207941
535: n03208938
536: n03216828
537: n03218198
538: n13872072
539: n03223299
540: n03240683
541: n03249569
542: n07647870
543: n03255030
544: n03259401
545: n03271574
546: n03272010
547: n03272562
548: n03290653
549: n13869788
550: n03297495
551: n03314780
552: n03325584
553: n03337140
554: n03344393
555: n03345487
556: n03347037
557: n03355925
558: n03372029
559: n03376595
560: n03379051
561: n03384352
562: n03388043
563: n03388183
564: n03388549
565: n03393912
566: n03394916
567: n03400231
568: n03404251
569: n03417042
570: n03424325
571: n03425413
572: n03443371
573: n03444034
574: n03445777
575: n03445924
576: n03447447
577: n03447721
578: n08286342
579: n03452741
580: n03457902
581: n03459775
582: n03461385
583: n03467068
584: n03476684
585: n03476991
586: n03478589
587: n03482001
588: n03482405
589: n03483316
590: n03485407
591: n03485794
592: n03492542
593: n03494278
594: n03495570
595: n10161363
596: n03498962
597: n03527565
598: n03529860
599: n09218315
600: n03532672
601: n03534580
602: n03535780
603: n03538406
604: n03544143
605: n03584254
606: n03584829
607: n03590841
608: n03594734
609: n03594945
610: n03595614
611: n03598930
612: n03599486
613: n03602883
614: n03617480
615: n03623198
616: n15102712
617: n03630383
618: n03633091
619: n03637318
620: n03642806
621: n03649909
622: n03657121
623: n03658185
624: n07977870
625: n03662601
626: n03666591
627: n03670208
628: n03673027
629: n03676483
630: n03680355
631: n03690938
632: n03691459
633: n03692522
634: n03697007
635: n03706229
636: n03709823
637: n03710193
638: n03710637
639: n03710721
640: n03717622
641: n03720891
642: n03721384
643: n03725035
644: n03729826
645: n03733131
646: n03733281
647: n03733805
648: n03742115
649: n03743016
650: n03759954
651: n03761084
652: n03763968
653: n03764736
654: n03769881
655: n03770439
656: n03770679
657: n03773504
658: n03775071
659: n03775546
660: n03776460
661: n03777568
662: n03777754
663: n03781244
664: n03782006
665: n03785016
666: n14955889
667: n03787032
668: n03788195
669: n03788365
670: n03791053
671: n03792782
672: n03792972
673: n03793489
674: n03794056
675: n03796401
676: n03803284
677: n13652335
678: n03814639
679: n03814906
680: n03825788
681: n03832673
682: n03837869
683: n03838899
684: n03840681
685: n03841143
686: n03843555
687: n03854065
688: n03857828
689: n03866082
690: n03868242
691: n03868863
692: n07281099
693: n03873416
694: n03874293
695: n03874599
696: n03876231
697: n03877472
698: n08053121
699: n03884397
700: n03887697
701: n03888257
702: n03888605
703: n03891251
704: n03891332
705: n03895866
706: n03899768
707: n03902125
708: n03903868
709: n03908618
710: n03908714
711: n03916031
712: n03920288
713: n03924679
714: n03929660
715: n03929855
716: n03930313
717: n03930630
718: n03934042
719: n03935335
720: n03937543
721: n03938244
722: n03942813
723: n03944341
724: n03947888
725: n03950228
726: n03954731
727: n03956157
728: n03958227
729: n03961711
730: n03967562
731: n03970156
732: n03976467
733: n08620881
734: n03977966
735: n03980874
736: n03982430
737: n03983396
738: n03991062
739: n03992509
740: n03995372
741: n03998194
742: n04004767
743: n13937284
744: n04008634
745: n04009801
746: n04019541
747: n04023962
748: n13413294
749: n04033901
750: n04033995
751: n04037443
752: n04039381
753: n09403211
754: n04041544
755: n04044716
756: n04049303
757: n04065272
758: n07056680
759: n04069434
760: n04070727
761: n04074963
762: n04081281
763: n04086273
764: n04090263
765: n04099969
766: n04111531
767: n04116512
768: n04118538
769: n04118776
770: n04120489
771: n04125116
772: n04127249
773: n04131690
774: n04133789
775: n04136333
776: n04141076
777: n04141327
778: n04141975
779: n04146614
780: n04147291
781: n04149813
782: n04152593
783: n04154340
784: n07917272
785: n04162706
786: n04179913
787: n04192698
788: n04200800
789: n04201297
790: n04204238
791: n04204347
792: n04208427
793: n04209133
794: n04209239
795: n04228054
796: n04229816
797: n04235860
798: n04238763
799: n04239074
800: n04243546
801: n04251144
802: n04252077
803: n04252225
804: n04254120
805: n04254680
806: n04254777
807: n04258138
808: n04259630
809: n04263257
810: n04264628
811: n04265275
812: n04266014
813: n04270147
814: n04273569
815: n04275363
816: n05605498
817: n04285008
818: n04286575
819: n08646566
820: n04310018
821: n04311004
822: n04311174
823: n04317175
824: n04325704
825: n04326547
826: n04328186
827: n04330267
828: n04332243
829: n04335435
830: n04337157
831: n04344873
832: n04346328
833: n04347754
834: n04350905
835: n04355338
836: n04355933
837: n04356056
838: n04357314
839: n04366367
840: n04367480
841: n04370456
842: n04371430
843: n14009946
844: n04372370
845: n04376876
846: n04380533
847: n04389033
848: n04392985
849: n04398044
850: n04399382
851: n04404412
852: n04409515
853: n04417672
854: n04418357
855: n04423845
856: n04428191
857: n04429376
858: n04435653
859: n04442312
860: n04443257
861: n04447861
862: n04456115
863: n04458633
864: n04461696
865: n04462240
866: n04465666
867: n04467665
868: n04476259
869: n04479046
870: n04482393
871: n04483307
872: n04485082
873: n04486054
874: n04487081
875: n04487394
876: n04493381
877: n04501370
878: n04505470
879: n04507155
880: n04509417
881: n04515003
882: n04517823
883: n04522168
884: n04523525
885: n04525038
886: n04525305
887: n04532106
888: n04532670
889: n04536866
890: n04540053
891: n04542943
892: n04548280
893: n04548362
894: n04550184
895: n04552348
896: n04553703
897: n04554684
898: n04557648
899: n04560804
900: n04562935
901: n04579145
902: n04579667
903: n04584207
904: n04589890
905: n04590129
906: n04591157
907: n04591713
908: n10782135
909: n04596742
910: n04598010
911: n04599235
912: n04604644
913: n14423870
914: n04612504
915: n04613696
916: n06359193
917: n06596364
918: n06785654
919: n06794110
920: n06874185
921: n07248320
922: n07565083
923: n07657664
924: n07583066
925: n07584110
926: n07590611
927: n07613480
928: n07614500
929: n07615774
930: n07684084
931: n07693725
932: n07695742
933: n07697313
934: n07697537
935: n07711569
936: n07714571
937: n07714990
938: n07715103
939: n12159804
940: n12160303
941: n12160857
942: n07717556
943: n07718472
944: n07718747
945: n07720875
946: n07730033
947: n13001041
948: n07742313
949: n12630144
950: n14991210
951: n07749582
952: n07753113
953: n07753275
954: n07753592
955: n07754684
956: n07760859
957: n07768694
958: n07802026
959: n07831146
960: n07836838
961: n07860988
962: n07871810
963: n07873807
964: n07875152
965: n07880968
966: n07892512
967: n07920052
968: n13904665
969: n07932039
970: n09193705
971: n09229709
972: n09246464
973: n09256479
974: n09288635
975: n09332890
976: n09399592
977: n09421951
978: n09428293
979: n09468604
980: n09472597
981: n09835506
982: n10148035
983: n10565667
984: n11879895
985: n11939491
986: n12057211
987: n12144580
988: n12267677
989: n12620546
990: n12768682
991: n12985857
992: n12998815
993: n13037406
994: n13040303
995: n13044778
996: n13052670
997: n13054560
998: n13133613
999: n15075141
================================================
FILE: stable_diffusion/environment.yaml
================================================
name: ldm
channels:
- pytorch
- defaults
dependencies:
- python=3.8.5
- pip=20.3
- cudatoolkit=11.3
- pytorch=1.11.0
- torchvision=0.12.0
- numpy=1.19.2
- pip:
- albumentations==0.4.3
- diffusers
- opencv-python==4.1.2.30
- pudb==2019.2
- invisible-watermark
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
- pytorch-lightning==1.4.2
- omegaconf==2.1.1
- test-tube>=0.7.5
- streamlit>=0.73.1
- einops==0.3.0
- torch-fidelity==0.3.0
- transformers==4.19.2
- torchmetrics==0.6.0
- kornia==0.6
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e .
================================================
FILE: stable_diffusion/ldm/data/__init__.py
================================================
================================================
FILE: stable_diffusion/ldm/data/base.py
================================================
from abc import abstractmethod
from torch.utils.data import Dataset, ConcatDataset, ChainDataset, IterableDataset
class Txt2ImgIterableBaseDataset(IterableDataset):
'''
Define an interface to make the IterableDatasets for text2img data chainable
'''
def __init__(self, num_records=0, valid_ids=None, size=256):
super().__init__()
self.num_records = num_records
self.valid_ids = valid_ids
self.sample_ids = valid_ids
self.size = size
print(f'{self.__class__.__name__} dataset contains {self.__len__()} examples.')
def __len__(self):
return self.num_records
@abstractmethod
def __iter__(self):
pass
================================================
FILE: stable_diffusion/ldm/data/imagenet.py
================================================
import os, yaml, pickle, shutil, tarfile, glob
import cv2
import albumentations
import PIL
import numpy as np
import torchvision.transforms.functional as TF
from omegaconf import OmegaConf
from functools import partial
from PIL import Image
from tqdm import tqdm
from torch.utils.data import Dataset, Subset
import taming.data.utils as tdu
from taming.data.imagenet import str_to_indices, give_synsets_from_indices, download, retrieve
from taming.data.imagenet import ImagePaths
from ldm.modules.image_degradation import degradation_fn_bsr, degradation_fn_bsr_light
def synset2idx(path_to_yaml="data/index_synset.yaml"):
with open(path_to_yaml) as f:
di2s = yaml.load(f)
return dict((v,k) for k,v in di2s.items())
class ImageNetBase(Dataset):
def __init__(self, config=None):
self.config = config or OmegaConf.create()
if not type(self.config)==dict:
self.config = OmegaConf.to_container(self.config)
self.keep_orig_class_label = self.config.get("keep_orig_class_label", False)
self.process_images = True # if False we skip loading & processing images and self.data contains filepaths
self._prepare()
self._prepare_synset_to_human()
self._prepare_idx_to_synset()
self._prepare_human_to_integer_label()
self._load()
def __len__(self):
return len(self.data)
def __getitem__(self, i):
return self.data[i]
def _prepare(self):
raise NotImplementedError()
def _filter_relpaths(self, relpaths):
ignore = set([
"n06596364_9591.JPEG",
])
relpaths = [rpath for rpath in relpaths if not rpath.split("/")[-1] in ignore]
if "sub_indices" in self.config:
indices = str_to_indices(self.config["sub_indices"])
synsets = give_synsets_from_indices(indices, path_to_yaml=self.idx2syn) # returns a list of strings
self.synset2idx = synset2idx(path_to_yaml=self.idx2syn)
files = []
for rpath in relpaths:
syn = rpath.split("/")[0]
if syn in synsets:
files.append(rpath)
return files
else:
return relpaths
def _prepare_synset_to_human(self):
SIZE = 2655750
URL = "https://heibox.uni-heidelberg.de/f/9f28e956cd304264bb82/?dl=1"
self.human_dict = os.path.join(self.root, "synset_human.txt")
if (not os.path.exists(self.human_dict) or
not os.path.getsize(self.human_dict)==SIZE):
download(URL, self.human_dict)
def _prepare_idx_to_synset(self):
URL = "https://heibox.uni-heidelberg.de/f/d835d5b6ceda4d3aa910/?dl=1"
self.idx2syn = os.path.join(self.root, "index_synset.yaml")
if (not os.path.exists(self.idx2syn)):
download(URL, self.idx2syn)
def _prepare_human_to_integer_label(self):
URL = "https://heibox.uni-heidelberg.de/f/2362b797d5be43b883f6/?dl=1"
self.human2integer = os.path.join(self.root, "imagenet1000_clsidx_to_labels.txt")
if (not os.path.exists(self.human2integer)):
download(URL, self.human2integer)
with open(self.human2integer, "r") as f:
lines = f.read().splitlines()
assert len(lines) == 1000
self.human2integer_dict = dict()
for line in lines:
value, key = line.split(":")
self.human2integer_dict[key] = int(value)
def _load(self):
with open(self.txt_filelist, "r") as f:
self.relpaths = f.read().splitlines()
l1 = len(self.relpaths)
self.relpaths = self._filter_relpaths(self.relpaths)
print("Removed {} files from filelist during filtering.".format(l1 - len(self.relpaths)))
self.synsets = [p.split("/")[0] for p in self.relpaths]
self.abspaths = [os.path.join(self.datadir, p) for p in self.relpaths]
unique_synsets = np.unique(self.synsets)
class_dict = dict((synset, i) for i, synset in enumerate(unique_synsets))
if not self.keep_orig_class_label:
self.class_labels = [class_dict[s] for s in self.synsets]
else:
self.class_labels = [self.synset2idx[s] for s in self.synsets]
with open(self.human_dict, "r") as f:
human_dict = f.read().splitlines()
human_dict = dict(line.split(maxsplit=1) for line in human_dict)
self.human_labels = [human_dict[s] for s in self.synsets]
labels = {
"relpath": np.array(self.relpaths),
"synsets": np.array(self.synsets),
"class_label": np.array(self.class_labels),
"human_label": np.array(self.human_labels),
}
if self.process_images:
self.size = retrieve(self.config, "size", default=256)
self.data = ImagePaths(self.abspaths,
labels=labels,
size=self.size,
random_crop=self.random_crop,
)
else:
self.data = self.abspaths
class ImageNetTrain(ImageNetBase):
NAME = "ILSVRC2012_train"
URL = "http://www.image-net.org/challenges/LSVRC/2012/"
AT_HASH = "a306397ccf9c2ead27155983c254227c0fd938e2"
FILES = [
"ILSVRC2012_img_train.tar",
]
SIZES = [
147897477120,
]
def __init__(self, process_images=True, data_root=None, **kwargs):
self.process_images = process_images
self.data_root = data_root
super().__init__(**kwargs)
def _prepare(self):
if self.data_root:
self.root = os.path.join(self.data_root, self.NAME)
else:
cachedir = os.environ.get("XDG_CACHE_HOME", os.path.expanduser("~/.cache"))
self.root = os.path.join(cachedir, "autoencoders/data", self.NAME)
self.datadir = os.path.join(self.root, "data")
self.txt_filelist = os.path.join(self.root, "filelist.txt")
self.expected_length = 1281167
self.random_crop = retrieve(self.config, "ImageNetTrain/random_crop",
default=True)
if not tdu.is_prepared(self.root):
# prep
print("Preparing dataset {} in {}".format(self.NAME, self.root))
datadir = self.datadir
if not os.path.exists(datadir):
path = os.path.join(self.root, self.FILES[0])
if not os.path.exists(path) or not os.path.getsize(path)==self.SIZES[0]:
import academictorrents as at
atpath = at.get(self.AT_HASH, datastore=self.root)
as
gitextract_fktbw4g3/
├── LICENSE
├── README.md
├── app.py
├── config/
│ ├── generate.yaml
│ └── train.yaml
├── dataset_diffree.py
├── main.py
├── requirements.txt
└── stable_diffusion/
├── LICENSE
├── README.md
├── Stable_Diffusion_v1_Model_Card.md
├── assets/
│ ├── results.gif.REMOVED.git-id
│ ├── stable-samples/
│ │ ├── img2img/
│ │ │ ├── upscaling-in.png.REMOVED.git-id
│ │ │ └── upscaling-out.png.REMOVED.git-id
│ │ └── txt2img/
│ │ ├── merged-0005.png.REMOVED.git-id
│ │ ├── merged-0006.png.REMOVED.git-id
│ │ └── merged-0007.png.REMOVED.git-id
│ └── txt2img-preview.png.REMOVED.git-id
├── configs/
│ ├── autoencoder/
│ │ ├── autoencoder_kl_16x16x16.yaml
│ │ ├── autoencoder_kl_32x32x4.yaml
│ │ ├── autoencoder_kl_64x64x3.yaml
│ │ └── autoencoder_kl_8x8x64.yaml
│ ├── latent-diffusion/
│ │ ├── celebahq-ldm-vq-4.yaml
│ │ ├── cin-ldm-vq-f8.yaml
│ │ ├── cin256-v2.yaml
│ │ ├── ffhq-ldm-vq-4.yaml
│ │ ├── lsun_bedrooms-ldm-vq-4.yaml
│ │ ├── lsun_churches-ldm-kl-8.yaml
│ │ └── txt2img-1p4B-eval.yaml
│ ├── retrieval-augmented-diffusion/
│ │ └── 768x768.yaml
│ └── stable-diffusion/
│ └── v1-inference.yaml
├── data/
│ ├── example_conditioning/
│ │ └── text_conditional/
│ │ └── sample_0.txt
│ ├── imagenet_clsidx_to_label.txt
│ ├── imagenet_train_hr_indices.p.REMOVED.git-id
│ ├── imagenet_val_hr_indices.p
│ └── index_synset.yaml
├── environment.yaml
├── ldm/
│ ├── data/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── imagenet.py
│ │ └── lsun.py
│ ├── lr_scheduler.py
│ ├── models/
│ │ ├── autoencoder.py
│ │ └── diffusion/
│ │ ├── __init__.py
│ │ ├── classifier.py
│ │ ├── ddim.py
│ │ ├── ddpm.py
│ │ ├── ddpm_diffree.py
│ │ ├── ddpm_edit.py
│ │ ├── dpm_solver/
│ │ │ ├── __init__.py
│ │ │ ├── dpm_solver.py
│ │ │ └── sampler.py
│ │ └── plms.py
│ ├── modules/
│ │ ├── attention.py
│ │ ├── diffusionmodules/
│ │ │ ├── __init__.py
│ │ │ ├── model.py
│ │ │ ├── openaimodel.py
│ │ │ ├── openaimodel_diffree.py
│ │ │ └── util.py
│ │ ├── distributions/
│ │ │ ├── __init__.py
│ │ │ └── distributions.py
│ │ ├── ema.py
│ │ ├── encoders/
│ │ │ ├── __init__.py
│ │ │ └── modules.py
│ │ ├── image_degradation/
│ │ │ ├── __init__.py
│ │ │ ├── bsrgan.py
│ │ │ ├── bsrgan_light.py
│ │ │ └── utils_image.py
│ │ ├── losses/
│ │ │ ├── __init__.py
│ │ │ ├── contperceptual.py
│ │ │ └── vqperceptual.py
│ │ └── x_transformer.py
│ └── util.py
├── main.py
├── models/
│ ├── first_stage_models/
│ │ ├── kl-f16/
│ │ │ └── config.yaml
│ │ ├── kl-f32/
│ │ │ └── config.yaml
│ │ ├── kl-f4/
│ │ │ └── config.yaml
│ │ ├── kl-f8/
│ │ │ └── config.yaml
│ │ ├── vq-f16/
│ │ │ └── config.yaml
│ │ ├── vq-f4/
│ │ │ └── config.yaml
│ │ ├── vq-f4-noattn/
│ │ │ └── config.yaml
│ │ ├── vq-f8/
│ │ │ └── config.yaml
│ │ └── vq-f8-n256/
│ │ └── config.yaml
│ └── ldm/
│ ├── bsr_sr/
│ │ └── config.yaml
│ ├── celeba256/
│ │ └── config.yaml
│ ├── cin256/
│ │ └── config.yaml
│ ├── ffhq256/
│ │ └── config.yaml
│ ├── inpainting_big/
│ │ └── config.yaml
│ ├── layout2img-openimages256/
│ │ └── config.yaml
│ ├── lsun_beds256/
│ │ └── config.yaml
│ ├── lsun_churches256/
│ │ └── config.yaml
│ ├── semantic_synthesis256/
│ │ └── config.yaml
│ ├── semantic_synthesis512/
│ │ └── config.yaml
│ └── text2img256/
│ └── config.yaml
├── notebook_helpers.py
├── scripts/
│ ├── download_first_stages.sh
│ ├── download_models.sh
│ ├── img2img.py
│ ├── inpaint.py
│ ├── knn2img.py
│ ├── latent_imagenet_diffusion.ipynb.REMOVED.git-id
│ ├── sample_diffusion.py
│ ├── tests/
│ │ └── test_watermark.py
│ ├── train_searcher.py
│ └── txt2img.py
└── setup.py
SYMBOL INDEX (961 symbols across 40 files)
FILE: app.py
class CFGDenoiser (line 26) | class CFGDenoiser(nn.Module):
method __init__ (line 27) | def __init__(self, model):
method forward (line 31) | def forward(self, z_0, z_1, sigma, cond, uncond, text_cfg_scale, image...
function load_model_from_config (line 45) | def load_model_from_config(config, ckpt, vae_ckpt=None, verbose=False):
function append_dims (line 68) | def append_dims(x, target_dims):
class CompVisDenoiser (line 75) | class CompVisDenoiser(K.external.CompVisDenoiser):
method __init__ (line 76) | def __init__(self, model, quantize=False, device='cpu'):
method get_eps (line 79) | def get_eps(self, *args, **kwargs):
method forward (line 82) | def forward(self, input_0, input_1, sigma, **kwargs):
function to_d (line 89) | def to_d(x, sigma, denoised):
function default_noise_sampler (line 93) | def default_noise_sampler(x):
function get_ancestral_step (line 96) | def get_ancestral_step(sigma_from, sigma_to, eta=1.):
function decode_mask (line 105) | def decode_mask(mask, height = 256, width = 256):
function sample_euler_ancestral (line 114) | def sample_euler_ancestral(model, x_0, x_1, sigmas, height, width, extra...
function generate (line 157) | def generate(
function generate_list (line 283) | def generate_list(
function reset (line 413) | def reset():
function get_example (line 416) | def get_example():
FILE: dataset_diffree.py
class Dataset (line 17) | class Dataset(Dataset):
method __init__ (line 18) | def __init__(
method __len__ (line 63) | def __len__(self) -> int:
method __getitem__ (line 66) | def __getitem__(self, i: int) -> dict[str, Any]:
FILE: main.py
function get_parser (line 30) | def get_parser(**parser_kwargs):
function nondefault_trainer_args (line 130) | def nondefault_trainer_args(opt):
class WrappedDataset (line 137) | class WrappedDataset(Dataset):
method __init__ (line 140) | def __init__(self, dataset):
method __len__ (line 143) | def __len__(self):
method __getitem__ (line 146) | def __getitem__(self, idx):
function worker_init_fn (line 150) | def worker_init_fn(_):
class DataModuleFromConfig (line 166) | class DataModuleFromConfig(pl.LightningDataModule):
method __init__ (line 167) | def __init__(self, batch_size, train=None, validation=None, test=None,...
method prepare_data (line 189) | def prepare_data(self):
method setup (line 193) | def setup(self, stage=None):
method _train_dataloader (line 201) | def _train_dataloader(self):
method _val_dataloader (line 211) | def _val_dataloader(self, shuffle=False):
method _test_dataloader (line 222) | def _test_dataloader(self, shuffle=False):
method _predict_dataloader (line 235) | def _predict_dataloader(self, shuffle=False):
class SetupCallback (line 244) | class SetupCallback(Callback):
method __init__ (line 245) | def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, light...
method on_keyboard_interrupt (line 255) | def on_keyboard_interrupt(self, trainer, pl_module):
method on_pretrain_routine_start (line 261) | def on_pretrain_routine_start(self, trainer, pl_module):
function get_world_size (line 281) | def get_world_size():
function all_gather (line 288) | def all_gather(data):
class ImageLogger (line 351) | class ImageLogger(Callback):
method __init__ (line 352) | def __init__(self, batch_frequency, max_images, clamp=True, increase_l...
method _testtube (line 372) | def _testtube(self, pl_module, images, batch_idx, split):
method log_local (line 383) | def log_local(self, save_dir, split, images, prompts,
method log_img (line 414) | def log_img(self, pl_module, batch, batch_idx, split="train"):
method check_frequency (line 450) | def check_frequency(self, check_idx):
method on_train_batch_end (line 458) | def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch...
method on_validation_batch_end (line 462) | def on_validation_batch_end(self, trainer, pl_module, outputs, batch, ...
class CUDACallback (line 470) | class CUDACallback(Callback):
method on_train_epoch_start (line 472) | def on_train_epoch_start(self, trainer, pl_module):
method on_train_epoch_end (line 478) | def on_train_epoch_end(self, trainer, pl_module, outputs):
function melk (line 754) | def melk(*args, **kwargs):
function divein (line 762) | def divein(*args, **kwargs):
FILE: stable_diffusion/ldm/data/base.py
class Txt2ImgIterableBaseDataset (line 5) | class Txt2ImgIterableBaseDataset(IterableDataset):
method __init__ (line 9) | def __init__(self, num_records=0, valid_ids=None, size=256):
method __len__ (line 18) | def __len__(self):
method __iter__ (line 22) | def __iter__(self):
FILE: stable_diffusion/ldm/data/imagenet.py
function synset2idx (line 20) | def synset2idx(path_to_yaml="data/index_synset.yaml"):
class ImageNetBase (line 26) | class ImageNetBase(Dataset):
method __init__ (line 27) | def __init__(self, config=None):
method __len__ (line 39) | def __len__(self):
method __getitem__ (line 42) | def __getitem__(self, i):
method _prepare (line 45) | def _prepare(self):
method _filter_relpaths (line 48) | def _filter_relpaths(self, relpaths):
method _prepare_synset_to_human (line 66) | def _prepare_synset_to_human(self):
method _prepare_idx_to_synset (line 74) | def _prepare_idx_to_synset(self):
method _prepare_human_to_integer_label (line 80) | def _prepare_human_to_integer_label(self):
method _load (line 93) | def _load(self):
class ImageNetTrain (line 134) | class ImageNetTrain(ImageNetBase):
method __init__ (line 145) | def __init__(self, process_images=True, data_root=None, **kwargs):
method _prepare (line 150) | def _prepare(self):
class ImageNetValidation (line 197) | class ImageNetValidation(ImageNetBase):
method __init__ (line 211) | def __init__(self, process_images=True, data_root=None, **kwargs):
method _prepare (line 216) | def _prepare(self):
class ImageNetSR (line 272) | class ImageNetSR(Dataset):
method __init__ (line 273) | def __init__(self, size=None,
method __len__ (line 336) | def __len__(self):
method __getitem__ (line 339) | def __getitem__(self, i):
class ImageNetSRTrain (line 375) | class ImageNetSRTrain(ImageNetSR):
method __init__ (line 376) | def __init__(self, **kwargs):
method get_base (line 379) | def get_base(self):
class ImageNetSRValidation (line 386) | class ImageNetSRValidation(ImageNetSR):
method __init__ (line 387) | def __init__(self, **kwargs):
method get_base (line 390) | def get_base(self):
FILE: stable_diffusion/ldm/data/lsun.py
class LSUNBase (line 9) | class LSUNBase(Dataset):
method __init__ (line 10) | def __init__(self,
method __len__ (line 36) | def __len__(self):
method __getitem__ (line 39) | def __getitem__(self, i):
class LSUNChurchesTrain (line 62) | class LSUNChurchesTrain(LSUNBase):
method __init__ (line 63) | def __init__(self, **kwargs):
class LSUNChurchesValidation (line 67) | class LSUNChurchesValidation(LSUNBase):
method __init__ (line 68) | def __init__(self, flip_p=0., **kwargs):
class LSUNBedroomsTrain (line 73) | class LSUNBedroomsTrain(LSUNBase):
method __init__ (line 74) | def __init__(self, **kwargs):
class LSUNBedroomsValidation (line 78) | class LSUNBedroomsValidation(LSUNBase):
method __init__ (line 79) | def __init__(self, flip_p=0.0, **kwargs):
class LSUNCatsTrain (line 84) | class LSUNCatsTrain(LSUNBase):
method __init__ (line 85) | def __init__(self, **kwargs):
class LSUNCatsValidation (line 89) | class LSUNCatsValidation(LSUNBase):
method __init__ (line 90) | def __init__(self, flip_p=0., **kwargs):
FILE: stable_diffusion/ldm/lr_scheduler.py
class LambdaWarmUpCosineScheduler (line 4) | class LambdaWarmUpCosineScheduler:
method __init__ (line 8) | def __init__(self, warm_up_steps, lr_min, lr_max, lr_start, max_decay_...
method schedule (line 17) | def schedule(self, n, **kwargs):
method __call__ (line 32) | def __call__(self, n, **kwargs):
class LambdaWarmUpCosineScheduler2 (line 36) | class LambdaWarmUpCosineScheduler2:
method __init__ (line 41) | def __init__(self, warm_up_steps, f_min, f_max, f_start, cycle_lengths...
method find_in_interval (line 52) | def find_in_interval(self, n):
method schedule (line 59) | def schedule(self, n, **kwargs):
method __call__ (line 77) | def __call__(self, n, **kwargs):
class LambdaLinearScheduler (line 81) | class LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):
method schedule (line 83) | def schedule(self, n, **kwargs):
FILE: stable_diffusion/ldm/models/autoencoder.py
class VQModel (line 14) | class VQModel(pl.LightningModule):
method __init__ (line 15) | def __init__(self,
method ema_scope (line 64) | def ema_scope(self, context=None):
method init_from_ckpt (line 78) | def init_from_ckpt(self, path, ignore_keys=list()):
method on_train_batch_end (line 92) | def on_train_batch_end(self, *args, **kwargs):
method encode (line 96) | def encode(self, x):
method encode_to_prequant (line 102) | def encode_to_prequant(self, x):
method decode (line 107) | def decode(self, quant):
method decode_code (line 112) | def decode_code(self, code_b):
method forward (line 117) | def forward(self, input, return_pred_indices=False):
method get_input (line 124) | def get_input(self, batch, k):
method training_step (line 142) | def training_step(self, batch, batch_idx, optimizer_idx):
method validation_step (line 164) | def validation_step(self, batch, batch_idx):
method _validation_step (line 170) | def _validation_step(self, batch, batch_idx, suffix=""):
method configure_optimizers (line 197) | def configure_optimizers(self):
method get_last_layer (line 230) | def get_last_layer(self):
method log_images (line 233) | def log_images(self, batch, only_inputs=False, plot_ema=False, **kwargs):
method to_rgb (line 255) | def to_rgb(self, x):
class VQModelInterface (line 264) | class VQModelInterface(VQModel):
method __init__ (line 265) | def __init__(self, embed_dim, *args, **kwargs):
method encode (line 269) | def encode(self, x):
method decode (line 274) | def decode(self, h, force_not_quantize=False):
class AutoencoderKL (line 285) | class AutoencoderKL(pl.LightningModule):
method __init__ (line 286) | def __init__(self,
method init_from_ckpt (line 313) | def init_from_ckpt(self, path, ignore_keys=list()):
method encode (line 324) | def encode(self, x):
method decode (line 330) | def decode(self, z):
method forward (line 335) | def forward(self, input, sample_posterior=True):
method get_input (line 344) | def get_input(self, batch, k):
method training_step (line 351) | def training_step(self, batch, batch_idx, optimizer_idx):
method validation_step (line 372) | def validation_step(self, batch, batch_idx):
method configure_optimizers (line 386) | def configure_optimizers(self):
method get_last_layer (line 397) | def get_last_layer(self):
method log_images (line 401) | def log_images(self, batch, only_inputs=False, **kwargs):
method to_rgb (line 417) | def to_rgb(self, x):
class IdentityFirstStage (line 426) | class IdentityFirstStage(torch.nn.Module):
method __init__ (line 427) | def __init__(self, *args, vq_interface=False, **kwargs):
method encode (line 431) | def encode(self, x, *args, **kwargs):
method decode (line 434) | def decode(self, x, *args, **kwargs):
method quantize (line 437) | def quantize(self, x, *args, **kwargs):
method forward (line 442) | def forward(self, x, *args, **kwargs):
FILE: stable_diffusion/ldm/models/diffusion/classifier.py
function disabled_train (line 22) | def disabled_train(self, mode=True):
class NoisyLatentImageClassifier (line 28) | class NoisyLatentImageClassifier(pl.LightningModule):
method __init__ (line 30) | def __init__(self,
method init_from_ckpt (line 70) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
method load_diffusion (line 88) | def load_diffusion(self):
method load_classifier (line 95) | def load_classifier(self, ckpt_path, pool):
method get_x_noisy (line 110) | def get_x_noisy(self, x, t, noise=None):
method forward (line 120) | def forward(self, x_noisy, t, *args, **kwargs):
method get_input (line 124) | def get_input(self, batch, k):
method get_conditioning (line 133) | def get_conditioning(self, batch, k=None):
method compute_top_k (line 150) | def compute_top_k(self, logits, labels, k, reduction="mean"):
method on_train_epoch_start (line 157) | def on_train_epoch_start(self):
method write_logs (line 162) | def write_logs(self, loss, logits, targets):
method shared_step (line 179) | def shared_step(self, batch, t=None):
method training_step (line 198) | def training_step(self, batch, batch_idx):
method reset_noise_accs (line 202) | def reset_noise_accs(self):
method on_validation_start (line 206) | def on_validation_start(self):
method validation_step (line 210) | def validation_step(self, batch, batch_idx):
method configure_optimizers (line 220) | def configure_optimizers(self):
method log_images (line 238) | def log_images(self, batch, N=8, *args, **kwargs):
FILE: stable_diffusion/ldm/models/diffusion/ddim.py
class DDIMSampler (line 12) | class DDIMSampler(object):
method __init__ (line 13) | def __init__(self, model, schedule="linear", **kwargs):
method register_buffer (line 19) | def register_buffer(self, name, attr):
method make_schedule (line 25) | def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddi...
method sample (line 57) | def sample(self,
method ddim_sampling (line 114) | def ddim_sampling(self, cond, shape,
method p_sample_ddim (line 166) | def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_origin...
method stochastic_encode (line 207) | def stochastic_encode(self, x0, t, use_original_steps=False, noise=None):
method decode (line 223) | def decode(self, x_latent, cond, t_start, unconditional_guidance_scale...
FILE: stable_diffusion/ldm/models/diffusion/ddpm.py
function disabled_train (line 34) | def disabled_train(self, mode=True):
function uniform_on_device (line 40) | def uniform_on_device(r1, r2, shape, device):
class DDPM (line 44) | class DDPM(pl.LightningModule):
method __init__ (line 46) | def __init__(self,
method register_schedule (line 117) | def register_schedule(self, given_betas=None, beta_schedule="linear", ...
method ema_scope (line 172) | def ema_scope(self, context=None):
method init_from_ckpt (line 186) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
method q_mean_variance (line 204) | def q_mean_variance(self, x_start, t):
method predict_start_from_noise (line 216) | def predict_start_from_noise(self, x_t, t, noise):
method q_posterior (line 222) | def q_posterior(self, x_start, x_t, t):
method p_mean_variance (line 231) | def p_mean_variance(self, x, t, clip_denoised: bool):
method p_sample (line 244) | def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):
method p_sample_loop (line 253) | def p_sample_loop(self, shape, return_intermediates=False):
method sample (line 268) | def sample(self, batch_size=16, return_intermediates=False):
method q_sample (line 274) | def q_sample(self, x_start, t, noise=None):
method get_loss (line 279) | def get_loss(self, pred, target, mean=True):
method p_losses (line 294) | def p_losses(self, x_start, t, noise=None):
method forward (line 323) | def forward(self, x, *args, **kwargs):
method get_input (line 329) | def get_input(self, batch, k):
method shared_step (line 337) | def shared_step(self, batch):
method training_step (line 342) | def training_step(self, batch, batch_idx):
method validation_step (line 358) | def validation_step(self, batch, batch_idx):
method on_train_batch_end (line 366) | def on_train_batch_end(self, *args, **kwargs):
method _get_rows_from_list (line 370) | def _get_rows_from_list(self, samples):
method log_images (line 378) | def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=Non...
method configure_optimizers (line 415) | def configure_optimizers(self):
class LatentDiffusion (line 424) | class LatentDiffusion(DDPM):
method __init__ (line 426) | def __init__(self,
method make_cond_schedule (line 471) | def make_cond_schedule(self, ):
method on_train_batch_start (line 478) | def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
method register_schedule (line 493) | def register_schedule(self,
method instantiate_first_stage (line 502) | def instantiate_first_stage(self, config):
method instantiate_cond_stage (line 509) | def instantiate_cond_stage(self, config):
method _get_denoise_row_from_list (line 530) | def _get_denoise_row_from_list(self, samples, desc='', force_no_decode...
method get_first_stage_encoding (line 542) | def get_first_stage_encoding(self, encoder_posterior):
method get_learned_conditioning (line 551) | def get_learned_conditioning(self, c):
method meshgrid (line 564) | def meshgrid(self, h, w):
method delta_border (line 571) | def delta_border(self, h, w):
method get_weighting (line 585) | def get_weighting(self, h, w, Ly, Lx, device):
method get_fold_unfold (line 601) | def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1): # todo...
method get_input (line 654) | def get_input(self, batch, k, return_first_stage_outputs=False, force_...
method decode_first_stage (line 706) | def decode_first_stage(self, z, predict_cids=False, force_not_quantize...
method differentiable_decode_first_stage (line 766) | def differentiable_decode_first_stage(self, z, predict_cids=False, for...
method encode_first_stage (line 826) | def encode_first_stage(self, x):
method shared_step (line 865) | def shared_step(self, batch, **kwargs):
method forward (line 870) | def forward(self, x, c, *args, **kwargs):
method _rescale_annotations (line 881) | def _rescale_annotations(self, bboxes, crop_coordinates): # TODO: mov...
method apply_model (line 891) | def apply_model(self, x_noisy, t, cond, return_ids=False):
method _predict_eps_from_xstart (line 994) | def _predict_eps_from_xstart(self, x_t, t, pred_xstart):
method _prior_bpd (line 998) | def _prior_bpd(self, x_start):
method p_losses (line 1012) | def p_losses(self, x_start, cond, t, noise=None):
method p_mean_variance (line 1047) | def p_mean_variance(self, x, c, t, clip_denoised: bool, return_codeboo...
method p_sample (line 1079) | def p_sample(self, x, c, t, clip_denoised=False, repeat_noise=False,
method progressive_denoising (line 1110) | def progressive_denoising(self, cond, shape, verbose=True, callback=No...
method p_sample_loop (line 1166) | def p_sample_loop(self, cond, shape, return_intermediates=False,
method sample (line 1217) | def sample(self, cond, batch_size=16, return_intermediates=False, x_T=...
method sample_log (line 1235) | def sample_log(self,cond,batch_size,ddim, ddim_steps,**kwargs):
method log_images (line 1251) | def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=200,...
method configure_optimizers (line 1361) | def configure_optimizers(self):
method to_rgb (line 1386) | def to_rgb(self, x):
class DiffusionWrapper (line 1395) | class DiffusionWrapper(pl.LightningModule):
method __init__ (line 1396) | def __init__(self, diff_model_config, conditioning_key):
method forward (line 1402) | def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):
class Layout2ImgDiffusion (line 1424) | class Layout2ImgDiffusion(LatentDiffusion):
method __init__ (line 1426) | def __init__(self, cond_stage_key, *args, **kwargs):
method log_images (line 1430) | def log_images(self, batch, N=8, *args, **kwargs):
FILE: stable_diffusion/ldm/models/diffusion/ddpm_diffree.py
function disabled_train (line 35) | def disabled_train(self, mode=True):
function uniform_on_device (line 41) | def uniform_on_device(r1, r2, shape, device):
class DDPM (line 45) | class DDPM(pl.LightningModule):
method __init__ (line 47) | def __init__(self,
method register_schedule (line 131) | def register_schedule(self, given_betas=None, beta_schedule="linear", ...
method ema_scope (line 186) | def ema_scope(self, context=None):
method init_from_ckpt (line 200) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
method q_mean_variance (line 279) | def q_mean_variance(self, x_start, t):
method predict_start_from_noise (line 291) | def predict_start_from_noise(self, x_t, t, noise):
method q_posterior (line 297) | def q_posterior(self, x_start, x_t, t):
method p_mean_variance (line 306) | def p_mean_variance(self, x, t, clip_denoised: bool):
method p_sample (line 319) | def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):
method p_sample_loop (line 328) | def p_sample_loop(self, shape, return_intermediates=False):
method sample (line 343) | def sample(self, batch_size=16, return_intermediates=False):
method q_sample (line 349) | def q_sample(self, x_start, t, noise=None):
method get_loss (line 354) | def get_loss(self, pred, target, mean=True):
method p_losses (line 369) | def p_losses(self, x_start, t, noise=None):
method forward (line 398) | def forward(self, x, *args, **kwargs):
method get_input (line 404) | def get_input(self, batch, k):
method shared_step (line 407) | def shared_step(self, batch):
method training_step (line 412) | def training_step(self, batch, batch_idx):
method validation_step (line 428) | def validation_step(self, batch, batch_idx):
method on_train_batch_end (line 436) | def on_train_batch_end(self, *args, **kwargs):
method _get_rows_from_list (line 440) | def _get_rows_from_list(self, samples):
method log_images (line 448) | def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=Non...
method configure_optimizers (line 485) | def configure_optimizers(self):
class LatentDiffusion (line 494) | class LatentDiffusion(DDPM):
method __init__ (line 496) | def __init__(self,
method make_cond_schedule (line 548) | def make_cond_schedule(self, ):
method on_train_batch_start (line 555) | def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
method register_schedule (line 570) | def register_schedule(self,
method instantiate_first_stage (line 579) | def instantiate_first_stage(self, config):
method instantiate_cond_stage (line 586) | def instantiate_cond_stage(self, config):
method _get_denoise_row_from_list (line 607) | def _get_denoise_row_from_list(self, samples, desc='', force_no_decode...
method get_first_stage_encoding (line 619) | def get_first_stage_encoding(self, encoder_posterior):
method get_learned_conditioning (line 628) | def get_learned_conditioning(self, c):
method get_vision_conditioning (line 641) | def get_vision_conditioning(self, c):
method meshgrid (line 645) | def meshgrid(self, h, w):
method delta_border (line 652) | def delta_border(self, h, w):
method get_weighting (line 666) | def get_weighting(self, h, w, Ly, Lx, device):
method get_fold_unfold (line 682) | def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1): # todo...
method get_input (line 735) | def get_input(self, batch, keys, return_first_stage_outputs=False, for...
method forward_mask_decoder (line 781) | def forward_mask_decoder(self, input_image, output_image, c, time_step):
method decode_first_stage (line 800) | def decode_first_stage(self, z, predict_cids=False, force_not_quantize...
method differentiable_decode_first_stage (line 860) | def differentiable_decode_first_stage(self, z, predict_cids=False, for...
method encode_first_stage (line 920) | def encode_first_stage(self, x):
method shared_step (line 959) | def shared_step(self, batch, **kwargs):
method forward (line 964) | def forward(self, x_0, x_1, c, *args, **kwargs):
method _rescale_annotations (line 975) | def _rescale_annotations(self, bboxes, crop_coordinates): # TODO: mov...
method apply_model (line 985) | def apply_model(self, x_noisy_0, t, cond, return_ids=False):
method _predict_eps_from_xstart (line 1091) | def _predict_eps_from_xstart(self, x_t, t, pred_xstart):
method _prior_bpd (line 1095) | def _prior_bpd(self, x_start):
method p_losses (line 1109) | def p_losses(self, x_start_0, x_start_1, cond, t, noise=None):
method p_mean_variance (line 1155) | def p_mean_variance(self, x_0, x_1, c, t, clip_denoised: bool, return_...
method p_sample (line 1189) | def p_sample(self, x_0, x_1, c, t, clip_denoised=False, repeat_noise=F...
method progressive_denoising (line 1222) | def progressive_denoising(self, cond, shape, verbose=True, callback=No...
method p_sample_loop (line 1278) | def p_sample_loop(self, cond, shape, return_intermediates=False,
method sample (line 1332) | def sample(self, cond, batch_size=16, return_intermediates=False, x_T=...
method sample_log (line 1350) | def sample_log(self,cond,batch_size,ddim, ddim_steps,**kwargs):
method log_images (line 1366) | def log_images(self, batch, N=4, n_row=4, sample=True, ddim_steps=200,...
method configure_optimizers (line 1479) | def configure_optimizers(self):
method to_rgb (line 1505) | def to_rgb(self, x):
class DiffusionWrapper (line 1514) | class DiffusionWrapper(pl.LightningModule):
method __init__ (line 1515) | def __init__(self, diff_model_config, conditioning_key):
method forward (line 1521) | def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):
class OMPWrapper (line 1542) | class OMPWrapper(pl.LightningModule):
method __init__ (line 1543) | def __init__(self, omp_module_config, conditioning_key):
method forward (line 1549) | def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):
class Layout2ImgDiffusion (line 1570) | class Layout2ImgDiffusion(LatentDiffusion):
method __init__ (line 1572) | def __init__(self, cond_stage_key, *args, **kwargs):
method log_images (line 1576) | def log_images(self, batch, N=8, *args, **kwargs):
FILE: stable_diffusion/ldm/models/diffusion/ddpm_edit.py
function disabled_train (line 34) | def disabled_train(self, mode=True):
function uniform_on_device (line 40) | def uniform_on_device(r1, r2, shape, device):
class DDPM (line 44) | class DDPM(pl.LightningModule):
method __init__ (line 46) | def __init__(self,
method register_schedule (line 125) | def register_schedule(self, given_betas=None, beta_schedule="linear", ...
method ema_scope (line 180) | def ema_scope(self, context=None):
method init_from_ckpt (line 194) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
method q_mean_variance (line 233) | def q_mean_variance(self, x_start, t):
method predict_start_from_noise (line 245) | def predict_start_from_noise(self, x_t, t, noise):
method q_posterior (line 251) | def q_posterior(self, x_start, x_t, t):
method p_mean_variance (line 260) | def p_mean_variance(self, x, t, clip_denoised: bool):
method p_sample (line 273) | def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):
method p_sample_loop (line 282) | def p_sample_loop(self, shape, return_intermediates=False):
method sample (line 297) | def sample(self, batch_size=16, return_intermediates=False):
method q_sample (line 303) | def q_sample(self, x_start, t, noise=None):
method get_loss (line 308) | def get_loss(self, pred, target, mean=True):
method p_losses (line 323) | def p_losses(self, x_start, t, noise=None):
method forward (line 352) | def forward(self, x, *args, **kwargs):
method get_input (line 358) | def get_input(self, batch, k):
method shared_step (line 361) | def shared_step(self, batch):
method training_step (line 366) | def training_step(self, batch, batch_idx):
method validation_step (line 382) | def validation_step(self, batch, batch_idx):
method on_train_batch_end (line 390) | def on_train_batch_end(self, *args, **kwargs):
method _get_rows_from_list (line 394) | def _get_rows_from_list(self, samples):
method log_images (line 402) | def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=Non...
method configure_optimizers (line 439) | def configure_optimizers(self):
class LatentDiffusion (line 448) | class LatentDiffusion(DDPM):
method __init__ (line 450) | def __init__(self,
method make_cond_schedule (line 500) | def make_cond_schedule(self, ):
method on_train_batch_start (line 507) | def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
method register_schedule (line 522) | def register_schedule(self,
method instantiate_first_stage (line 531) | def instantiate_first_stage(self, config):
method instantiate_cond_stage (line 538) | def instantiate_cond_stage(self, config):
method _get_denoise_row_from_list (line 559) | def _get_denoise_row_from_list(self, samples, desc='', force_no_decode...
method get_first_stage_encoding (line 571) | def get_first_stage_encoding(self, encoder_posterior):
method get_learned_conditioning (line 580) | def get_learned_conditioning(self, c):
method meshgrid (line 593) | def meshgrid(self, h, w):
method delta_border (line 600) | def delta_border(self, h, w):
method get_weighting (line 614) | def get_weighting(self, h, w, Ly, Lx, device):
method get_fold_unfold (line 630) | def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1): # todo...
method get_input (line 683) | def get_input(self, batch, k, return_first_stage_outputs=False, force_...
method decode_first_stage (line 716) | def decode_first_stage(self, z, predict_cids=False, force_not_quantize...
method differentiable_decode_first_stage (line 776) | def differentiable_decode_first_stage(self, z, predict_cids=False, for...
method encode_first_stage (line 836) | def encode_first_stage(self, x):
method shared_step (line 875) | def shared_step(self, batch, **kwargs):
method forward (line 880) | def forward(self, x, c, *args, **kwargs):
method _rescale_annotations (line 891) | def _rescale_annotations(self, bboxes, crop_coordinates): # TODO: mov...
method apply_model (line 901) | def apply_model(self, x_noisy, t, cond, return_ids=False):
method _predict_eps_from_xstart (line 1004) | def _predict_eps_from_xstart(self, x_t, t, pred_xstart):
method _prior_bpd (line 1008) | def _prior_bpd(self, x_start):
method p_losses (line 1022) | def p_losses(self, x_start, cond, t, noise=None):
method p_mean_variance (line 1060) | def p_mean_variance(self, x, c, t, clip_denoised: bool, return_codeboo...
method p_sample (line 1092) | def p_sample(self, x, c, t, clip_denoised=False, repeat_noise=False,
method progressive_denoising (line 1123) | def progressive_denoising(self, cond, shape, verbose=True, callback=No...
method p_sample_loop (line 1179) | def p_sample_loop(self, cond, shape, return_intermediates=False,
method sample (line 1230) | def sample(self, cond, batch_size=16, return_intermediates=False, x_T=...
method sample_log (line 1248) | def sample_log(self,cond,batch_size,ddim, ddim_steps,**kwargs):
method log_images (line 1264) | def log_images(self, batch, N=4, n_row=4, sample=True, ddim_steps=200,...
method configure_optimizers (line 1375) | def configure_optimizers(self):
method to_rgb (line 1400) | def to_rgb(self, x):
class DiffusionWrapper (line 1409) | class DiffusionWrapper(pl.LightningModule):
method __init__ (line 1410) | def __init__(self, diff_model_config, conditioning_key):
method forward (line 1416) | def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):
class Layout2ImgDiffusion (line 1438) | class Layout2ImgDiffusion(LatentDiffusion):
method __init__ (line 1440) | def __init__(self, cond_stage_key, *args, **kwargs):
method log_images (line 1444) | def log_images(self, batch, N=8, *args, **kwargs):
FILE: stable_diffusion/ldm/models/diffusion/dpm_solver/dpm_solver.py
class NoiseScheduleVP (line 6) | class NoiseScheduleVP:
method __init__ (line 7) | def __init__(
method marginal_log_mean_coeff (line 125) | def marginal_log_mean_coeff(self, t):
method marginal_alpha (line 138) | def marginal_alpha(self, t):
method marginal_std (line 144) | def marginal_std(self, t):
method marginal_lambda (line 150) | def marginal_lambda(self, t):
method inverse_lambda (line 158) | def inverse_lambda(self, lamb):
function model_wrapper (line 177) | def model_wrapper(
class DPM_Solver (line 351) | class DPM_Solver:
method __init__ (line 352) | def __init__(self, model_fn, noise_schedule, predict_x0=False, thresho...
method noise_prediction_fn (line 380) | def noise_prediction_fn(self, x, t):
method data_prediction_fn (line 386) | def data_prediction_fn(self, x, t):
method model_fn (line 401) | def model_fn(self, x, t):
method get_time_steps (line 410) | def get_time_steps(self, skip_type, t_T, t_0, N, device):
method get_orders_and_timesteps_for_singlestep_solver (line 439) | def get_orders_and_timesteps_for_singlestep_solver(self, steps, order,...
method denoise_to_zero_fn (line 498) | def denoise_to_zero_fn(self, x, s):
method dpm_solver_first_update (line 504) | def dpm_solver_first_update(self, x, s, t, model_s=None, return_interm...
method singlestep_dpm_solver_second_update (line 551) | def singlestep_dpm_solver_second_update(self, x, s, t, r1=0.5, model_s...
method singlestep_dpm_solver_third_update (line 633) | def singlestep_dpm_solver_third_update(self, x, s, t, r1=1./3., r2=2./...
method multistep_dpm_solver_second_update (line 755) | def multistep_dpm_solver_second_update(self, x, model_prev_list, t_pre...
method multistep_dpm_solver_third_update (line 812) | def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev...
method singlestep_dpm_solver_update (line 859) | def singlestep_dpm_solver_update(self, x, s, t, order, return_intermed...
method multistep_dpm_solver_update (line 885) | def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list,...
method dpm_solver_adaptive (line 909) | def dpm_solver_adaptive(self, x, order, t_T, t_0, h_init=0.05, atol=0....
method sample (line 965) | def sample(self, x, steps=20, t_start=None, t_end=None, order=3, skip_...
function interpolate_fn (line 1132) | def interpolate_fn(x, xp, yp):
function expand_dims (line 1174) | def expand_dims(v, dims):
FILE: stable_diffusion/ldm/models/diffusion/dpm_solver/sampler.py
class DPMSolverSampler (line 8) | class DPMSolverSampler(object):
method __init__ (line 9) | def __init__(self, model, **kwargs):
method register_buffer (line 15) | def register_buffer(self, name, attr):
method sample (line 22) | def sample(self,
FILE: stable_diffusion/ldm/models/diffusion/plms.py
class PLMSSampler (line 11) | class PLMSSampler(object):
method __init__ (line 12) | def __init__(self, model, schedule="linear", **kwargs):
method register_buffer (line 18) | def register_buffer(self, name, attr):
method make_schedule (line 24) | def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddi...
method sample (line 58) | def sample(self,
method plms_sampling (line 115) | def plms_sampling(self, cond, shape,
method p_sample_plms (line 173) | def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_origin...
FILE: stable_diffusion/ldm/modules/attention.py
function exists (line 11) | def exists(val):
function uniq (line 15) | def uniq(arr):
function default (line 19) | def default(val, d):
function max_neg_value (line 25) | def max_neg_value(t):
function init_ (line 29) | def init_(tensor):
class GEGLU (line 37) | class GEGLU(nn.Module):
method __init__ (line 38) | def __init__(self, dim_in, dim_out):
method forward (line 42) | def forward(self, x):
class FeedForward (line 47) | class FeedForward(nn.Module):
method __init__ (line 48) | def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):
method forward (line 63) | def forward(self, x):
function zero_module (line 67) | def zero_module(module):
function Normalize (line 76) | def Normalize(in_channels):
class LinearAttention (line 80) | class LinearAttention(nn.Module):
method __init__ (line 81) | def __init__(self, dim, heads=4, dim_head=32):
method forward (line 88) | def forward(self, x):
class SpatialSelfAttention (line 99) | class SpatialSelfAttention(nn.Module):
method __init__ (line 100) | def __init__(self, in_channels):
method forward (line 126) | def forward(self, x):
class CrossAttention (line 152) | class CrossAttention(nn.Module):
method __init__ (line 153) | def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, ...
method forward (line 172) | def forward(self, x, context=None, mask=None):
class BasicTransformerBlock (line 207) | class BasicTransformerBlock(nn.Module):
method __init__ (line 208) | def __init__(self, dim, n_heads, d_head, dropout=0., context_dim=None,...
method forward (line 219) | def forward(self, x, context=None):
method _forward (line 222) | def _forward(self, x, context=None):
class SpatialTransformer (line 229) | class SpatialTransformer(nn.Module):
method __init__ (line 237) | def __init__(self, in_channels, n_heads, d_head,
method forward (line 261) | def forward(self, x, context=None):
FILE: stable_diffusion/ldm/modules/diffusionmodules/model.py
function get_timestep_embedding (line 12) | def get_timestep_embedding(timesteps, embedding_dim):
function nonlinearity (line 33) | def nonlinearity(x):
function Normalize (line 38) | def Normalize(in_channels, num_groups=32):
class Upsample (line 42) | class Upsample(nn.Module):
method __init__ (line 43) | def __init__(self, in_channels, with_conv):
method forward (line 53) | def forward(self, x):
class Downsample (line 60) | class Downsample(nn.Module):
method __init__ (line 61) | def __init__(self, in_channels, with_conv):
method forward (line 72) | def forward(self, x):
class ResnetBlock (line 82) | class ResnetBlock(nn.Module):
method __init__ (line 83) | def __init__(self, *, in_channels, out_channels=None, conv_shortcut=Fa...
method forward (line 121) | def forward(self, x, temb):
class LinAttnBlock (line 144) | class LinAttnBlock(LinearAttention):
method __init__ (line 146) | def __init__(self, in_channels):
class AttnBlock (line 150) | class AttnBlock(nn.Module):
method __init__ (line 151) | def __init__(self, in_channels):
method forward (line 178) | def forward(self, x):
function make_attn (line 205) | def make_attn(in_channels, attn_type="vanilla"):
class Model (line 216) | class Model(nn.Module):
method __init__ (line 217) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
method forward (line 316) | def forward(self, x, t=None, context=None):
method get_last_layer (line 364) | def get_last_layer(self):
class Encoder (line 368) | class Encoder(nn.Module):
method __init__ (line 369) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
method forward (line 434) | def forward(self, x):
class Decoder (line 462) | class Decoder(nn.Module):
method __init__ (line 463) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
method forward (line 535) | def forward(self, z):
class SimpleDecoder (line 571) | class SimpleDecoder(nn.Module):
method __init__ (line 572) | def __init__(self, in_channels, out_channels, *args, **kwargs):
method forward (line 594) | def forward(self, x):
class UpsampleDecoder (line 607) | class UpsampleDecoder(nn.Module):
method __init__ (line 608) | def __init__(self, in_channels, out_channels, ch, num_res_blocks, reso...
method forward (line 641) | def forward(self, x):
class LatentRescaler (line 655) | class LatentRescaler(nn.Module):
method __init__ (line 656) | def __init__(self, factor, in_channels, mid_channels, out_channels, de...
method forward (line 680) | def forward(self, x):
class MergedRescaleEncoder (line 692) | class MergedRescaleEncoder(nn.Module):
method __init__ (line 693) | def __init__(self, in_channels, ch, resolution, out_ch, num_res_blocks,
method forward (line 705) | def forward(self, x):
class MergedRescaleDecoder (line 711) | class MergedRescaleDecoder(nn.Module):
method __init__ (line 712) | def __init__(self, z_channels, out_ch, resolution, num_res_blocks, att...
method forward (line 722) | def forward(self, x):
class Upsampler (line 728) | class Upsampler(nn.Module):
method __init__ (line 729) | def __init__(self, in_size, out_size, in_channels, out_channels, ch_mu...
method forward (line 741) | def forward(self, x):
class Resize (line 747) | class Resize(nn.Module):
method __init__ (line 748) | def __init__(self, in_channels=None, learned=False, mode="bilinear"):
method forward (line 763) | def forward(self, x, scale_factor=1.0):
class FirstStagePostProcessor (line 770) | class FirstStagePostProcessor(nn.Module):
method __init__ (line 772) | def __init__(self, ch_mult:list, in_channels,
method instantiate_pretrained (line 807) | def instantiate_pretrained(self, config):
method encode_with_pretrained (line 816) | def encode_with_pretrained(self,x):
method forward (line 822) | def forward(self,x):
FILE: stable_diffusion/ldm/modules/diffusionmodules/openaimodel.py
function convert_module_to_f16 (line 24) | def convert_module_to_f16(x):
function convert_module_to_f32 (line 27) | def convert_module_to_f32(x):
class AttentionPool2d (line 32) | class AttentionPool2d(nn.Module):
method __init__ (line 37) | def __init__(
method forward (line 51) | def forward(self, x):
class TimestepBlock (line 62) | class TimestepBlock(nn.Module):
method forward (line 68) | def forward(self, x, emb):
class TimestepEmbedSequential (line 74) | class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
method forward (line 80) | def forward(self, x, emb, context=None):
class Upsample (line 91) | class Upsample(nn.Module):
method __init__ (line 100) | def __init__(self, channels, use_conv, dims=2, out_channels=None, padd...
method forward (line 109) | def forward(self, x):
class TransposedUpsample (line 121) | class TransposedUpsample(nn.Module):
method __init__ (line 123) | def __init__(self, channels, out_channels=None, ks=5):
method forward (line 130) | def forward(self,x):
class Downsample (line 134) | class Downsample(nn.Module):
method __init__ (line 143) | def __init__(self, channels, use_conv, dims=2, out_channels=None,paddi...
method forward (line 158) | def forward(self, x):
class ResBlock (line 163) | class ResBlock(TimestepBlock):
method __init__ (line 179) | def __init__(
method forward (line 243) | def forward(self, x, emb):
method _forward (line 255) | def _forward(self, x, emb):
class AttentionBlock (line 278) | class AttentionBlock(nn.Module):
method __init__ (line 285) | def __init__(
method forward (line 314) | def forward(self, x):
method _forward (line 318) | def _forward(self, x):
function count_flops_attn (line 327) | def count_flops_attn(model, _x, y):
class QKVAttentionLegacy (line 347) | class QKVAttentionLegacy(nn.Module):
method __init__ (line 352) | def __init__(self, n_heads):
method forward (line 356) | def forward(self, qkv):
method count_flops (line 375) | def count_flops(model, _x, y):
class QKVAttention (line 379) | class QKVAttention(nn.Module):
method __init__ (line 384) | def __init__(self, n_heads):
method forward (line 388) | def forward(self, qkv):
method count_flops (line 409) | def count_flops(model, _x, y):
class UNetModel (line 413) | class UNetModel(nn.Module):
method __init__ (line 443) | def __init__(
method convert_to_fp16 (line 694) | def convert_to_fp16(self):
method convert_to_fp32 (line 702) | def convert_to_fp32(self):
method forward (line 710) | def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
class EncoderUNetModel (line 745) | class EncoderUNetModel(nn.Module):
method __init__ (line 751) | def __init__(
method convert_to_fp16 (line 924) | def convert_to_fp16(self):
method convert_to_fp32 (line 931) | def convert_to_fp32(self):
method forward (line 938) | def forward(self, x, timesteps):
FILE: stable_diffusion/ldm/modules/diffusionmodules/openaimodel_diffree.py
function convert_module_to_f16 (line 24) | def convert_module_to_f16(x):
function convert_module_to_f32 (line 27) | def convert_module_to_f32(x):
class AttentionPool2d (line 32) | class AttentionPool2d(nn.Module):
method __init__ (line 37) | def __init__(
method forward (line 51) | def forward(self, x):
class TimestepBlock (line 62) | class TimestepBlock(nn.Module):
method forward (line 68) | def forward(self, x, emb):
class TimestepEmbedSequential (line 74) | class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
method forward (line 80) | def forward(self, x, emb, context=None):
class Upsample (line 91) | class Upsample(nn.Module):
method __init__ (line 100) | def __init__(self, channels, use_conv, dims=2, out_channels=None, padd...
method forward (line 109) | def forward(self, x):
class TransposedUpsample (line 121) | class TransposedUpsample(nn.Module):
method __init__ (line 123) | def __init__(self, channels, out_channels=None, ks=5):
method forward (line 130) | def forward(self,x):
class Downsample (line 133) | class Downsample(nn.Module):
method __init__ (line 142) | def __init__(self, channels, use_conv, dims=2, out_channels=None,paddi...
method forward (line 157) | def forward(self, x):
class ResBlockWithoutEmb (line 161) | class ResBlockWithoutEmb(nn.Module):
method __init__ (line 176) | def __init__(
method forward (line 218) | def forward(self, x):
method _forward (line 230) | def _forward(self, x):
class ResBlock (line 235) | class ResBlock(TimestepBlock):
method __init__ (line 251) | def __init__(
method forward (line 315) | def forward(self, x, emb):
method _forward (line 327) | def _forward(self, x, emb):
class AttentionBlock (line 349) | class AttentionBlock(nn.Module):
method __init__ (line 356) | def __init__(
method forward (line 385) | def forward(self, x):
method _forward (line 389) | def _forward(self, x):
function count_flops_attn (line 398) | def count_flops_attn(model, _x, y):
class QKVAttentionLegacy (line 418) | class QKVAttentionLegacy(nn.Module):
method __init__ (line 423) | def __init__(self, n_heads):
method forward (line 427) | def forward(self, qkv):
method count_flops (line 446) | def count_flops(model, _x, y):
class QKVAttention (line 450) | class QKVAttention(nn.Module):
method __init__ (line 455) | def __init__(self, n_heads):
method forward (line 459) | def forward(self, qkv):
method count_flops (line 480) | def count_flops(model, _x, y):
class UNetModel (line 484) | class UNetModel(nn.Module):
method __init__ (line 514) | def __init__(
method convert_to_fp16 (line 767) | def convert_to_fp16(self):
method convert_to_fp32 (line 775) | def convert_to_fp32(self):
method forward (line 783) | def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
class OMPModule (line 819) | class OMPModule(nn.Module):
method __init__ (line 848) | def __init__(
method convert_to_fp16 (line 954) | def convert_to_fp16(self):
method convert_to_fp32 (line 961) | def convert_to_fp32(self):
method forward (line 968) | def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
class EncoderUNetModel (line 984) | class EncoderUNetModel(nn.Module):
method __init__ (line 990) | def __init__(
method convert_to_fp16 (line 1163) | def convert_to_fp16(self):
method convert_to_fp32 (line 1170) | def convert_to_fp32(self):
method forward (line 1177) | def forward(self, x, timesteps):
FILE: stable_diffusion/ldm/modules/diffusionmodules/util.py
function make_beta_schedule (line 21) | def make_beta_schedule(schedule, n_timestep, linear_start=1e-4, linear_e...
function make_ddim_timesteps (line 46) | def make_ddim_timesteps(ddim_discr_method, num_ddim_timesteps, num_ddpm_...
function make_ddim_sampling_parameters (line 63) | def make_ddim_sampling_parameters(alphacums, ddim_timesteps, eta, verbos...
function betas_for_alpha_bar (line 77) | def betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.9...
function extract_into_tensor (line 96) | def extract_into_tensor(a, t, x_shape):
function checkpoint (line 102) | def checkpoint(func, inputs, params, flag):
class CheckpointFunction (line 119) | class CheckpointFunction(torch.autograd.Function):
method forward (line 121) | def forward(ctx, run_function, length, *args):
method backward (line 131) | def backward(ctx, *output_grads):
function timestep_embedding (line 151) | def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=Fal...
function zero_module (line 174) | def zero_module(module):
function scale_module (line 183) | def scale_module(module, scale):
function mean_flat (line 192) | def mean_flat(tensor):
function normalization (line 199) | def normalization(channels):
class SiLU (line 209) | class SiLU(nn.Module):
method forward (line 210) | def forward(self, x):
class GroupNorm32 (line 214) | class GroupNorm32(nn.GroupNorm):
method forward (line 215) | def forward(self, x):
function conv_nd (line 218) | def conv_nd(dims, *args, **kwargs):
function linear (line 231) | def linear(*args, **kwargs):
function avg_pool_nd (line 238) | def avg_pool_nd(dims, *args, **kwargs):
class HybridConditioner (line 251) | class HybridConditioner(nn.Module):
method __init__ (line 253) | def __init__(self, c_concat_config, c_crossattn_config):
method forward (line 258) | def forward(self, c_concat, c_crossattn):
function noise_like (line 264) | def noise_like(shape, device, repeat=False):
FILE: stable_diffusion/ldm/modules/distributions/distributions.py
class AbstractDistribution (line 5) | class AbstractDistribution:
method sample (line 6) | def sample(self):
method mode (line 9) | def mode(self):
class DiracDistribution (line 13) | class DiracDistribution(AbstractDistribution):
method __init__ (line 14) | def __init__(self, value):
method sample (line 17) | def sample(self):
method mode (line 20) | def mode(self):
class DiagonalGaussianDistribution (line 24) | class DiagonalGaussianDistribution(object):
method __init__ (line 25) | def __init__(self, parameters, deterministic=False):
method sample (line 35) | def sample(self):
method kl (line 39) | def kl(self, other=None):
method nll (line 53) | def nll(self, sample, dims=[1,2,3]):
method mode (line 61) | def mode(self):
function normal_kl (line 65) | def normal_kl(mean1, logvar1, mean2, logvar2):
FILE: stable_diffusion/ldm/modules/ema.py
class LitEma (line 5) | class LitEma(nn.Module):
method __init__ (line 6) | def __init__(self, model, decay=0.9999, use_num_upates=True):
method forward (line 25) | def forward(self,model):
method copy_to (line 46) | def copy_to(self, model):
method store (line 55) | def store(self, parameters):
method restore (line 64) | def restore(self, parameters):
FILE: stable_diffusion/ldm/modules/encoders/modules.py
class AbstractEncoder (line 12) | class AbstractEncoder(nn.Module):
method __init__ (line 13) | def __init__(self):
method encode (line 16) | def encode(self, *args, **kwargs):
class ClassEmbedder (line 21) | class ClassEmbedder(nn.Module):
method __init__ (line 22) | def __init__(self, embed_dim, n_classes=1000, key='class'):
method forward (line 27) | def forward(self, batch, key=None):
class TransformerEmbedder (line 36) | class TransformerEmbedder(AbstractEncoder):
method __init__ (line 38) | def __init__(self, n_embed, n_layer, vocab_size, max_seq_len=77, devic...
method forward (line 44) | def forward(self, tokens):
method encode (line 49) | def encode(self, x):
class BERTTokenizer (line 53) | class BERTTokenizer(AbstractEncoder):
method __init__ (line 55) | def __init__(self, device="cuda", vq_interface=True, max_length=77):
method forward (line 63) | def forward(self, text):
method encode (line 70) | def encode(self, text):
method decode (line 76) | def decode(self, text):
class BERTEmbedder (line 80) | class BERTEmbedder(AbstractEncoder):
method __init__ (line 82) | def __init__(self, n_embed, n_layer, vocab_size=30522, max_seq_len=77,
method forward (line 93) | def forward(self, text):
method encode (line 101) | def encode(self, text):
class SpatialRescaler (line 106) | class SpatialRescaler(nn.Module):
method __init__ (line 107) | def __init__(self,
method forward (line 125) | def forward(self,x):
method encode (line 134) | def encode(self, x):
class FrozenCLIPEmbedder (line 137) | class FrozenCLIPEmbedder(AbstractEncoder):
method __init__ (line 139) | def __init__(self, version="openai/clip-vit-large-patch14", device="cu...
method freeze (line 147) | def freeze(self):
method forward (line 152) | def forward(self, text):
method encode (line 161) | def encode(self, text):
class FrozenCLIPEmbedderBoth (line 164) | class FrozenCLIPEmbedderBoth(AbstractEncoder):
method __init__ (line 166) | def __init__(self, version="openai/clip-vit-large-patch14", device="cu...
method freeze (line 181) | def freeze(self):
method preprocess (line 190) | def preprocess(self, x):
method forward (line 199) | def forward(self, text):
method encode (line 208) | def encode(self, text):
method vision_forward (line 211) | def vision_forward(self, x):
class CLIPEmbedderWithLearnableTokens (line 218) | class CLIPEmbedderWithLearnableTokens(AbstractEncoder):
method __init__ (line 220) | def __init__(self, version="openai/clip-vit-large-patch14", device="cu...
method freeze (line 230) | def freeze(self):
method forward (line 235) | def forward(self, text):
method encode (line 245) | def encode(self, text):
method encode_image (line 248) | def encode_image(self, x):
class FrozenCLIPTextEmbedder (line 252) | class FrozenCLIPTextEmbedder(nn.Module):
method __init__ (line 256) | def __init__(self, version='ViT-L/14', device="cuda", max_length=77, n...
method freeze (line 264) | def freeze(self):
method forward (line 269) | def forward(self, text):
method encode (line 276) | def encode(self, text):
class FrozenClipImageEmbedder (line 284) | class FrozenClipImageEmbedder(nn.Module):
method __init__ (line 288) | def __init__(
method preprocess (line 303) | def preprocess(self, x):
method forward (line 313) | def forward(self, x):
FILE: stable_diffusion/ldm/modules/image_degradation/bsrgan.py
function modcrop_np (line 29) | def modcrop_np(img, sf):
function analytic_kernel (line 49) | def analytic_kernel(k):
function anisotropic_Gaussian (line 65) | def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
function gm_blur_kernel (line 86) | def gm_blur_kernel(mean, cov, size=15):
function shift_pixel (line 99) | def shift_pixel(x, sf, upper_left=True):
function blur (line 128) | def blur(x, k):
function gen_kernel (line 145) | def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]),...
function fspecial_gaussian (line 187) | def fspecial_gaussian(hsize, sigma):
function fspecial_laplacian (line 201) | def fspecial_laplacian(alpha):
function fspecial (line 210) | def fspecial(filter_type, *args, **kwargs):
function bicubic_degradation (line 228) | def bicubic_degradation(x, sf=3):
function srmd_degradation (line 240) | def srmd_degradation(x, k, sf=3):
function dpsr_degradation (line 262) | def dpsr_degradation(x, k, sf=3):
function classical_degradation (line 284) | def classical_degradation(x, k, sf=3):
function add_sharpening (line 299) | def add_sharpening(img, weight=0.5, radius=50, threshold=10):
function add_blur (line 325) | def add_blur(img, sf=4):
function add_resize (line 339) | def add_resize(img, sf=4):
function add_Gaussian_noise (line 369) | def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
function add_speckle_noise (line 386) | def add_speckle_noise(img, noise_level1=2, noise_level2=25):
function add_Poisson_noise (line 404) | def add_Poisson_noise(img):
function add_JPEG_noise (line 418) | def add_JPEG_noise(img):
function random_crop (line 427) | def random_crop(lq, hq, sf=4, lq_patchsize=64):
function degradation_bsrgan (line 438) | def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
function degradation_bsrgan_variant (line 530) | def degradation_bsrgan_variant(image, sf=4, isp_model=None):
function degradation_bsrgan_plus (line 617) | def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True,...
FILE: stable_diffusion/ldm/modules/image_degradation/bsrgan_light.py
function modcrop_np (line 29) | def modcrop_np(img, sf):
function analytic_kernel (line 49) | def analytic_kernel(k):
function anisotropic_Gaussian (line 65) | def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
function gm_blur_kernel (line 86) | def gm_blur_kernel(mean, cov, size=15):
function shift_pixel (line 99) | def shift_pixel(x, sf, upper_left=True):
function blur (line 128) | def blur(x, k):
function gen_kernel (line 145) | def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]),...
function fspecial_gaussian (line 187) | def fspecial_gaussian(hsize, sigma):
function fspecial_laplacian (line 201) | def fspecial_laplacian(alpha):
function fspecial (line 210) | def fspecial(filter_type, *args, **kwargs):
function bicubic_degradation (line 228) | def bicubic_degradation(x, sf=3):
function srmd_degradation (line 240) | def srmd_degradation(x, k, sf=3):
function dpsr_degradation (line 262) | def dpsr_degradation(x, k, sf=3):
function classical_degradation (line 284) | def classical_degradation(x, k, sf=3):
function add_sharpening (line 299) | def add_sharpening(img, weight=0.5, radius=50, threshold=10):
function add_blur (line 325) | def add_blur(img, sf=4):
function add_resize (line 343) | def add_resize(img, sf=4):
function add_Gaussian_noise (line 373) | def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
function add_speckle_noise (line 390) | def add_speckle_noise(img, noise_level1=2, noise_level2=25):
function add_Poisson_noise (line 408) | def add_Poisson_noise(img):
function add_JPEG_noise (line 422) | def add_JPEG_noise(img):
function random_crop (line 431) | def random_crop(lq, hq, sf=4, lq_patchsize=64):
function degradation_bsrgan (line 442) | def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
function degradation_bsrgan_variant (line 534) | def degradation_bsrgan_variant(image, sf=4, isp_model=None):
FILE: stable_diffusion/ldm/modules/image_degradation/utils_image.py
function is_image_file (line 29) | def is_image_file(filename):
function get_timestamp (line 33) | def get_timestamp():
function imshow (line 37) | def imshow(x, title=None, cbar=False, figsize=None):
function surf (line 47) | def surf(Z, cmap='rainbow', figsize=None):
function get_image_paths (line 67) | def get_image_paths(dataroot):
function _get_paths_from_images (line 74) | def _get_paths_from_images(path):
function patches_from_image (line 93) | def patches_from_image(img, p_size=512, p_overlap=64, p_max=800):
function imssave (line 112) | def imssave(imgs, img_path):
function split_imageset (line 125) | def split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_si...
function mkdir (line 153) | def mkdir(path):
function mkdirs (line 158) | def mkdirs(paths):
function mkdir_and_rename (line 166) | def mkdir_and_rename(path):
function imread_uint (line 185) | def imread_uint(path, n_channels=3):
function imsave (line 203) | def imsave(img, img_path):
function imwrite (line 209) | def imwrite(img, img_path):
function read_img (line 220) | def read_img(path):
function uint2single (line 249) | def uint2single(img):
function single2uint (line 254) | def single2uint(img):
function uint162single (line 259) | def uint162single(img):
function single2uint16 (line 264) | def single2uint16(img):
function uint2tensor4 (line 275) | def uint2tensor4(img):
function uint2tensor3 (line 282) | def uint2tensor3(img):
function tensor2uint (line 289) | def tensor2uint(img):
function single2tensor3 (line 302) | def single2tensor3(img):
function single2tensor4 (line 307) | def single2tensor4(img):
function tensor2single (line 312) | def tensor2single(img):
function tensor2single3 (line 320) | def tensor2single3(img):
function single2tensor5 (line 329) | def single2tensor5(img):
function single32tensor5 (line 333) | def single32tensor5(img):
function single42tensor4 (line 337) | def single42tensor4(img):
function tensor2img (line 342) | def tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):
function augment_img (line 380) | def augment_img(img, mode=0):
function augment_img_tensor4 (line 401) | def augment_img_tensor4(img, mode=0):
function augment_img_tensor (line 422) | def augment_img_tensor(img, mode=0):
function augment_img_np3 (line 441) | def augment_img_np3(img, mode=0):
function augment_imgs (line 469) | def augment_imgs(img_list, hflip=True, rot=True):
function modcrop (line 494) | def modcrop(img_in, scale):
function shave (line 510) | def shave(img_in, border=0):
function rgb2ycbcr (line 529) | def rgb2ycbcr(img, only_y=True):
function ycbcr2rgb (line 553) | def ycbcr2rgb(img):
function bgr2ycbcr (line 573) | def bgr2ycbcr(img, only_y=True):
function channel_convert (line 597) | def channel_convert(in_c, tar_type, img_list):
function calculate_psnr (line 621) | def calculate_psnr(img1, img2, border=0):
function calculate_ssim (line 642) | def calculate_ssim(img1, img2, border=0):
function ssim (line 669) | def ssim(img1, img2):
function cubic (line 700) | def cubic(x):
function calculate_weights_indices (line 708) | def calculate_weights_indices(in_length, out_length, scale, kernel, kern...
function imresize (line 766) | def imresize(img, scale, antialiasing=True):
function imresize_np (line 839) | def imresize_np(img, scale, antialiasing=True):
FILE: stable_diffusion/ldm/modules/losses/contperceptual.py
class LPIPSWithDiscriminator (line 7) | class LPIPSWithDiscriminator(nn.Module):
method __init__ (line 8) | def __init__(self, disc_start, logvar_init=0.0, kl_weight=1.0, pixello...
method calculate_adaptive_weight (line 32) | def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
method forward (line 45) | def forward(self, inputs, reconstructions, posteriors, optimizer_idx,
FILE: stable_diffusion/ldm/modules/losses/vqperceptual.py
function hinge_d_loss_with_exemplar_weights (line 11) | def hinge_d_loss_with_exemplar_weights(logits_real, logits_fake, weights):
function adopt_weight (line 20) | def adopt_weight(weight, global_step, threshold=0, value=0.):
function measure_perplexity (line 26) | def measure_perplexity(predicted_indices, n_embed):
function l1 (line 35) | def l1(x, y):
function l2 (line 39) | def l2(x, y):
class VQLPIPSWithDiscriminator (line 43) | class VQLPIPSWithDiscriminator(nn.Module):
method __init__ (line 44) | def __init__(self, disc_start, codebook_weight=1.0, pixelloss_weight=1.0,
method calculate_adaptive_weight (line 85) | def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
method forward (line 98) | def forward(self, codebook_loss, inputs, reconstructions, optimizer_idx,
FILE: stable_diffusion/ldm/modules/x_transformer.py
class AbsolutePositionalEmbedding (line 25) | class AbsolutePositionalEmbedding(nn.Module):
method __init__ (line 26) | def __init__(self, dim, max_seq_len):
method init_ (line 31) | def init_(self):
method forward (line 34) | def forward(self, x):
class FixedPositionalEmbedding (line 39) | class FixedPositionalEmbedding(nn.Module):
method __init__ (line 40) | def __init__(self, dim):
method forward (line 45) | def forward(self, x, seq_dim=1, offset=0):
function exists (line 54) | def exists(val):
function default (line 58) | def default(val, d):
function always (line 64) | def always(val):
function not_equals (line 70) | def not_equals(val):
function equals (line 76) | def equals(val):
function max_neg_value (line 82) | def max_neg_value(tensor):
function pick_and_pop (line 88) | def pick_and_pop(keys, d):
function group_dict_by_key (line 93) | def group_dict_by_key(cond, d):
function string_begins_with (line 102) | def string_begins_with(prefix, str):
function group_by_key_prefix (line 106) | def group_by_key_prefix(prefix, d):
function groupby_prefix_and_trim (line 110) | def groupby_prefix_and_trim(prefix, d):
class Scale (line 117) | class Scale(nn.Module):
method __init__ (line 118) | def __init__(self, value, fn):
method forward (line 123) | def forward(self, x, **kwargs):
class Rezero (line 128) | class Rezero(nn.Module):
method __init__ (line 129) | def __init__(self, fn):
method forward (line 134) | def forward(self, x, **kwargs):
class ScaleNorm (line 139) | class ScaleNorm(nn.Module):
method __init__ (line 140) | def __init__(self, dim, eps=1e-5):
method forward (line 146) | def forward(self, x):
class RMSNorm (line 151) | class RMSNorm(nn.Module):
method __init__ (line 152) | def __init__(self, dim, eps=1e-8):
method forward (line 158) | def forward(self, x):
class Residual (line 163) | class Residual(nn.Module):
method forward (line 164) | def forward(self, x, residual):
class GRUGating (line 168) | class GRUGating(nn.Module):
method __init__ (line 169) | def __init__(self, dim):
method forward (line 173) | def forward(self, x, residual):
class GEGLU (line 184) | class GEGLU(nn.Module):
method __init__ (line 185) | def __init__(self, dim_in, dim_out):
method forward (line 189) | def forward(self, x):
class FeedForward (line 194) | class FeedForward(nn.Module):
method __init__ (line 195) | def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):
method forward (line 210) | def forward(self, x):
class Attention (line 215) | class Attention(nn.Module):
method __init__ (line 216) | def __init__(
method forward (line 268) | def forward(
class AttentionLayers (line 370) | class AttentionLayers(nn.Module):
method __init__ (line 371) | def __init__(
method forward (line 481) | def forward(
class Encoder (line 541) | class Encoder(AttentionLayers):
method __init__ (line 542) | def __init__(self, **kwargs):
class TransformerWrapper (line 548) | class TransformerWrapper(nn.Module):
method __init__ (line 549) | def __init__(
method init_ (line 595) | def init_(self):
method forward (line 598) | def forward(
FILE: stable_diffusion/ldm/util.py
function log_txt_as_img (line 17) | def log_txt_as_img(wh, xc, size=10):
function ismap (line 41) | def ismap(x):
function isimage (line 47) | def isimage(x):
function exists (line 53) | def exists(x):
function default (line 57) | def default(val, d):
function mean_flat (line 63) | def mean_flat(tensor):
function count_params (line 71) | def count_params(model, verbose=False):
function instantiate_from_config (line 78) | def instantiate_from_config(config):
function get_obj_from_str (line 88) | def get_obj_from_str(string, reload=False):
function _do_parallel_data_prefetch (line 96) | def _do_parallel_data_prefetch(func, Q, data, idx, idx_to_fn=False):
function parallel_data_prefetch (line 108) | def parallel_data_prefetch(
FILE: stable_diffusion/main.py
function get_parser (line 24) | def get_parser(**parser_kwargs):
function nondefault_trainer_args (line 126) | def nondefault_trainer_args(opt):
class WrappedDataset (line 133) | class WrappedDataset(Dataset):
method __init__ (line 136) | def __init__(self, dataset):
method __len__ (line 139) | def __len__(self):
method __getitem__ (line 142) | def __getitem__(self, idx):
function worker_init_fn (line 146) | def worker_init_fn(_):
class DataModuleFromConfig (line 162) | class DataModuleFromConfig(pl.LightningDataModule):
method __init__ (line 163) | def __init__(self, batch_size, train=None, validation=None, test=None,...
method prepare_data (line 185) | def prepare_data(self):
method setup (line 189) | def setup(self, stage=None):
method _train_dataloader (line 197) | def _train_dataloader(self):
method _val_dataloader (line 207) | def _val_dataloader(self, shuffle=False):
method _test_dataloader (line 218) | def _test_dataloader(self, shuffle=False):
method _predict_dataloader (line 231) | def _predict_dataloader(self, shuffle=False):
class SetupCallback (line 240) | class SetupCallback(Callback):
method __init__ (line 241) | def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, light...
method on_keyboard_interrupt (line 251) | def on_keyboard_interrupt(self, trainer, pl_module):
method on_pretrain_routine_start (line 257) | def on_pretrain_routine_start(self, trainer, pl_module):
class ImageLogger (line 289) | class ImageLogger(Callback):
method __init__ (line 290) | def __init__(self, batch_frequency, max_images, clamp=True, increase_l...
method _testtube (line 310) | def _testtube(self, pl_module, images, batch_idx, split):
method log_local (line 321) | def log_local(self, save_dir, split, images,
method log_img (line 340) | def log_img(self, pl_module, batch, batch_idx, split="train"):
method check_frequency (line 372) | def check_frequency(self, check_idx):
method on_train_batch_end (line 383) | def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch...
method on_validation_batch_end (line 387) | def on_validation_batch_end(self, trainer, pl_module, outputs, batch, ...
class CUDACallback (line 395) | class CUDACallback(Callback):
method on_train_epoch_start (line 397) | def on_train_epoch_start(self, trainer, pl_module):
method on_train_epoch_end (line 403) | def on_train_epoch_end(self, trainer, pl_module, outputs):
function melk (line 697) | def melk(*args, **kwargs):
function divein (line 705) | def divein(*args, **kwargs):
FILE: stable_diffusion/notebook_helpers.py
function download_models (line 19) | def download_models(mode):
function load_model_from_config (line 40) | def load_model_from_config(config, ckpt):
function get_model (line 52) | def get_model(mode):
function get_custom_cond (line 59) | def get_custom_cond(mode):
function get_cond_options (line 85) | def get_cond_options(mode):
function select_cond_path (line 92) | def select_cond_path(mode):
function get_cond (line 107) | def get_cond(mode, selected_path):
function visualize_cond_img (line 127) | def visualize_cond_img(path):
function run (line 131) | def run(model, selected_path, task, custom_steps, resize_enabled=False, ...
function convsample_ddim (line 188) | def convsample_ddim(model, cond, steps, shape, eta=1.0, callback=None, n...
function make_convolutional_sample (line 208) | def make_convolutional_sample(batch, model, mode="vanilla", custom_steps...
FILE: stable_diffusion/scripts/img2img.py
function chunk (line 23) | def chunk(it, size):
function load_model_from_config (line 28) | def load_model_from_config(config, ckpt, verbose=False):
function load_img (line 48) | def load_img(path):
function main (line 60) | def main():
FILE: stable_diffusion/scripts/inpaint.py
function make_batch (line 11) | def make_batch(image, mask, device):
FILE: stable_diffusion/scripts/knn2img.py
function chunk (line 36) | def chunk(it, size):
function load_model_from_config (line 41) | def load_model_from_config(config, ckpt, verbose=False):
class Searcher (line 61) | class Searcher(object):
method __init__ (line 62) | def __init__(self, database, retriever_version='ViT-L/14'):
method train_searcher (line 75) | def train_searcher(self, k,
method load_single_file (line 91) | def load_single_file(self, saved_embeddings):
method load_multi_files (line 96) | def load_multi_files(self, data_archive):
method load_database (line 104) | def load_database(self):
method load_retriever (line 123) | def load_retriever(self, version='ViT-L/14', ):
method load_searcher (line 130) | def load_searcher(self):
method search (line 135) | def search(self, x, k):
method __call__ (line 163) | def __call__(self, x, n):
FILE: stable_diffusion/scripts/sample_diffusion.py
function custom_to_pil (line 15) | def custom_to_pil(x):
function custom_to_np (line 27) | def custom_to_np(x):
function logs2pil (line 36) | def logs2pil(logs, keys=["sample"]):
function convsample (line 54) | def convsample(model, shape, return_intermediates=True,
function convsample_ddim (line 69) | def convsample_ddim(model, steps, shape, eta=1.0
function make_convolutional_sample (line 79) | def make_convolutional_sample(model, batch_size, vanilla=False, custom_s...
function run (line 108) | def run(model, logdir, batch_size=50, vanilla=False, custom_steps=None, ...
function save_logs (line 143) | def save_logs(logs, path, n_saved=0, key="sample", np_path=None):
function get_parser (line 162) | def get_parser():
function load_model_from_config (line 220) | def load_model_from_config(config, sd):
function load_model (line 228) | def load_model(config, ckpt, gpu, eval_mode):
FILE: stable_diffusion/scripts/tests/test_watermark.py
function testit (line 6) | def testit(img_path):
FILE: stable_diffusion/scripts/train_searcher.py
function search_bruteforce (line 12) | def search_bruteforce(searcher):
function search_partioned_ah (line 16) | def search_partioned_ah(searcher, dims_per_block, aiq_threshold, reorder_k,
function search_ah (line 24) | def search_ah(searcher, dims_per_block, aiq_threshold, reorder_k):
function load_datapool (line 28) | def load_datapool(dpath):
function train_searcher (line 62) | def train_searcher(opt,
FILE: stable_diffusion/scripts/txt2img.py
function chunk (line 32) | def chunk(it, size):
function numpy_to_pil (line 37) | def numpy_to_pil(images):
function load_model_from_config (line 49) | def load_model_from_config(config, ckpt, verbose=False):
function put_watermark (line 69) | def put_watermark(img, wm_encoder=None):
function load_replacement (line 77) | def load_replacement(x):
function check_safety (line 88) | def check_safety(x_image):
function main (line 98) | def main():
Condensed preview — 106 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (966K chars).
[
{
"path": "LICENSE",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 6211,
"preview": "# Diffree\nOfficial PyTorch implement of paper \"Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model\"\n\n"
},
{
"path": "app.py",
"chars": 23764,
"preview": "from __future__ import annotations\n\nimport math\nimport random\nimport sys\nfrom argparse import ArgumentParser\n\nfrom tqdm."
},
{
"path": "config/generate.yaml",
"chars": 2928,
"preview": "model:\n base_learning_rate: 5.0e-05\n target: ldm.models.diffusion.ddpm_diffree.LatentDiffusion\n params:\n linear_st"
},
{
"path": "config/train.yaml",
"chars": 3281,
"preview": "model:\n base_learning_rate: 2.0e-05\n target: ldm.models.diffusion.ddpm_diffree.LatentDiffusion\n params:\n ckpt_path"
},
{
"path": "dataset_diffree.py",
"chars": 4010,
"preview": "from __future__ import annotations\n\nimport os\nimport json\nimport math\nfrom pathlib import Path\nfrom typing import Any\n\ni"
},
{
"path": "main.py",
"chars": 29869,
"preview": "import argparse, os, sys, datetime, glob\nimport numpy as np\nimport time\nimport torch\nimport torchvision\nimport pytorch_l"
},
{
"path": "requirements.txt",
"chars": 460,
"preview": "--extra-index-url https://download.pytorch.org/whl/cu117\n\nnumpy==1.24.4\ntorch==2.0.0\ntorchvision==0.15.1\ntorchmetrics==0"
},
{
"path": "stable_diffusion/LICENSE",
"chars": 14381,
"preview": "Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors\n\nCreativeML Open RAIL-M\ndated August 22, 2022\n\nSecti"
},
{
"path": "stable_diffusion/README.md",
"chars": 12439,
"preview": "# Stable Diffusion\n*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.a"
},
{
"path": "stable_diffusion/Stable_Diffusion_v1_Model_Card.md",
"chars": 9340,
"preview": "# Stable Diffusion v1 Model Card\nThis model card focuses on the model associated with the Stable Diffusion model, availa"
},
{
"path": "stable_diffusion/assets/results.gif.REMOVED.git-id",
"chars": 40,
"preview": "82b6590e670a32196093cc6333ea19e6547d07de"
},
{
"path": "stable_diffusion/assets/stable-samples/img2img/upscaling-in.png.REMOVED.git-id",
"chars": 40,
"preview": "501c31c21751664957e69ce52cad1818b6d2f4ce"
},
{
"path": "stable_diffusion/assets/stable-samples/img2img/upscaling-out.png.REMOVED.git-id",
"chars": 40,
"preview": "1c4bb25a779f34d86b2d90e584ac67af91bb1303"
},
{
"path": "stable_diffusion/assets/stable-samples/txt2img/merged-0005.png.REMOVED.git-id",
"chars": 40,
"preview": "ca0a1af206555f0f208a1ab879e95efedc1b1c5b"
},
{
"path": "stable_diffusion/assets/stable-samples/txt2img/merged-0006.png.REMOVED.git-id",
"chars": 40,
"preview": "999f3703230580e8c89e9081abd6a1f8f50896d4"
},
{
"path": "stable_diffusion/assets/stable-samples/txt2img/merged-0007.png.REMOVED.git-id",
"chars": 40,
"preview": "af390acaf601283782d6f479d4cade4d78e30b26"
},
{
"path": "stable_diffusion/assets/txt2img-preview.png.REMOVED.git-id",
"chars": 40,
"preview": "51ee1c235dfdc63d4c41de7d303d03730e43c33c"
},
{
"path": "stable_diffusion/configs/autoencoder/autoencoder_kl_16x16x16.yaml",
"chars": 1145,
"preview": "model:\n base_learning_rate: 4.5e-6\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: \"val/rec_loss\""
},
{
"path": "stable_diffusion/configs/autoencoder/autoencoder_kl_32x32x4.yaml",
"chars": 1140,
"preview": "model:\n base_learning_rate: 4.5e-6\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: \"val/rec_loss\""
},
{
"path": "stable_diffusion/configs/autoencoder/autoencoder_kl_64x64x3.yaml",
"chars": 1139,
"preview": "model:\n base_learning_rate: 4.5e-6\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: \"val/rec_loss\""
},
{
"path": "stable_diffusion/configs/autoencoder/autoencoder_kl_8x8x64.yaml",
"chars": 1148,
"preview": "model:\n base_learning_rate: 4.5e-6\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: \"val/rec_loss\""
},
{
"path": "stable_diffusion/configs/latent-diffusion/celebahq-ldm-vq-4.yaml",
"chars": 2028,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/configs/latent-diffusion/cin-ldm-vq-f8.yaml",
"chars": 2360,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/configs/latent-diffusion/cin256-v2.yaml",
"chars": 1553,
"preview": "model:\n base_learning_rate: 0.0001\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.00"
},
{
"path": "stable_diffusion/configs/latent-diffusion/ffhq-ldm-vq-4.yaml",
"chars": 2020,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml",
"chars": 2024,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml",
"chars": 2284,
"preview": "model:\n base_learning_rate: 5.0e-5 # set to target_lr by starting main.py with '--scale_lr False'\n target: ldm.model"
},
{
"path": "stable_diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml",
"chars": 1614,
"preview": "model:\n base_learning_rate: 5.0e-05\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/configs/retrieval-augmented-diffusion/768x768.yaml",
"chars": 1615,
"preview": "model:\n base_learning_rate: 0.0001\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.00"
},
{
"path": "stable_diffusion/configs/stable-diffusion/v1-inference.yaml",
"chars": 1873,
"preview": "model:\n base_learning_rate: 1.0e-04\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/data/example_conditioning/text_conditional/sample_0.txt",
"chars": 20,
"preview": "A basket of cerries\n"
},
{
"path": "stable_diffusion/data/imagenet_clsidx_to_label.txt",
"chars": 30563,
"preview": " 0: 'tench, Tinca tinca',\n 1: 'goldfish, Carassius auratus',\n 2: 'great white shark, white shark, man-eater, man-eating "
},
{
"path": "stable_diffusion/data/imagenet_train_hr_indices.p.REMOVED.git-id",
"chars": 40,
"preview": "b8d6d4689d2ecf32147e9cc2f5e6c50e072df26f"
},
{
"path": "stable_diffusion/data/index_synset.yaml",
"chars": 14890,
"preview": "0: n01440764\n1: n01443537\n2: n01484850\n3: n01491361\n4: n01494475\n5: n01496331\n6: n01498041\n7: n01514668\n8: n07646067\n9: "
},
{
"path": "stable_diffusion/environment.yaml",
"chars": 734,
"preview": "name: ldm\nchannels:\n - pytorch\n - defaults\ndependencies:\n - python=3.8.5\n - pip=20.3\n - cudatoolkit=11.3\n - pytorc"
},
{
"path": "stable_diffusion/ldm/data/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "stable_diffusion/ldm/data/base.py",
"chars": 693,
"preview": "from abc import abstractmethod\nfrom torch.utils.data import Dataset, ConcatDataset, ChainDataset, IterableDataset\n\n\nclas"
},
{
"path": "stable_diffusion/ldm/data/imagenet.py",
"chars": 15497,
"preview": "import os, yaml, pickle, shutil, tarfile, glob\nimport cv2\nimport albumentations\nimport PIL\nimport numpy as np\nimport tor"
},
{
"path": "stable_diffusion/ldm/data/lsun.py",
"chars": 3274,
"preview": "import os\nimport numpy as np\nimport PIL\nfrom PIL import Image\nfrom torch.utils.data import Dataset\nfrom torchvision impo"
},
{
"path": "stable_diffusion/ldm/lr_scheduler.py",
"chars": 3882,
"preview": "import numpy as np\n\n\nclass LambdaWarmUpCosineScheduler:\n \"\"\"\n note: use with a base_lr of 1.0\n \"\"\"\n def __in"
},
{
"path": "stable_diffusion/ldm/models/autoencoder.py",
"chars": 17619,
"preview": "import torch\nimport pytorch_lightning as pl\nimport torch.nn.functional as F\nfrom contextlib import contextmanager\n\nfrom "
},
{
"path": "stable_diffusion/ldm/models/diffusion/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "stable_diffusion/ldm/models/diffusion/classifier.py",
"chars": 10276,
"preview": "import os\nimport torch\nimport pytorch_lightning as pl\nfrom omegaconf import OmegaConf\nfrom torch.nn import functional as"
},
{
"path": "stable_diffusion/ldm/models/diffusion/ddim.py",
"chars": 12797,
"preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ldm.modu"
},
{
"path": "stable_diffusion/ldm/models/diffusion/ddpm.py",
"chars": 67425,
"preview": "\"\"\"\nwild mixture of\nhttps://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e316"
},
{
"path": "stable_diffusion/ldm/models/diffusion/ddpm_diffree.py",
"chars": 74407,
"preview": "\"\"\"\nwild mixture of\nhttps://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e316"
},
{
"path": "stable_diffusion/ldm/models/diffusion/ddpm_edit.py",
"chars": 68315,
"preview": "\"\"\"\nwild mixture of\nhttps://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e316"
},
{
"path": "stable_diffusion/ldm/models/diffusion/dpm_solver/__init__.py",
"chars": 37,
"preview": "from .sampler import DPMSolverSampler"
},
{
"path": "stable_diffusion/ldm/models/diffusion/dpm_solver/dpm_solver.py",
"chars": 64057,
"preview": "import torch\nimport torch.nn.functional as F\nimport math\n\n\nclass NoiseScheduleVP:\n def __init__(\n self,\n "
},
{
"path": "stable_diffusion/ldm/models/diffusion/dpm_solver/sampler.py",
"chars": 2908,
"preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\n\nfrom .dpm_solver import NoiseScheduleVP, model_wrapper, DPM_Solver\n\n\nclass DPMSolver"
},
{
"path": "stable_diffusion/ldm/models/diffusion/plms.py",
"chars": 12450,
"preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ldm.modu"
},
{
"path": "stable_diffusion/ldm/modules/attention.py",
"chars": 9035,
"preview": "from inspect import isfunction\nimport math\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn, einsum\nfro"
},
{
"path": "stable_diffusion/ldm/modules/diffusionmodules/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "stable_diffusion/ldm/modules/diffusionmodules/model.py",
"chars": 33407,
"preview": "# pytorch_diffusion + derived encoder decoder\nimport math\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom ein"
},
{
"path": "stable_diffusion/ldm/modules/diffusionmodules/openaimodel.py",
"chars": 34953,
"preview": "from abc import abstractmethod\nfrom functools import partial\nimport math\nfrom typing import Iterable\n\nimport numpy as np"
},
{
"path": "stable_diffusion/ldm/modules/diffusionmodules/openaimodel_diffree.py",
"chars": 43930,
"preview": "from abc import abstractmethod\nfrom functools import partial\nimport math\nfrom typing import Iterable\n\nimport numpy as np"
},
{
"path": "stable_diffusion/ldm/modules/diffusionmodules/util.py",
"chars": 9558,
"preview": "# adopted from\n# https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# and\n#"
},
{
"path": "stable_diffusion/ldm/modules/distributions/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "stable_diffusion/ldm/modules/distributions/distributions.py",
"chars": 2970,
"preview": "import torch\nimport numpy as np\n\n\nclass AbstractDistribution:\n def sample(self):\n raise NotImplementedError()\n"
},
{
"path": "stable_diffusion/ldm/modules/ema.py",
"chars": 2982,
"preview": "import torch\nfrom torch import nn\n\n\nclass LitEma(nn.Module):\n def __init__(self, model, decay=0.9999, use_num_upates="
},
{
"path": "stable_diffusion/ldm/modules/encoders/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "stable_diffusion/ldm/modules/encoders/modules.py",
"chars": 11924,
"preview": "import torch\nimport torch.nn as nn\nfrom functools import partial\nimport clip\nfrom einops import rearrange, repeat\nfrom t"
},
{
"path": "stable_diffusion/ldm/modules/image_degradation/__init__.py",
"chars": 208,
"preview": "from ldm.modules.image_degradation.bsrgan import degradation_bsrgan_variant as degradation_fn_bsr\nfrom ldm.modules.image"
},
{
"path": "stable_diffusion/ldm/modules/image_degradation/bsrgan.py",
"chars": 25198,
"preview": "# -*- coding: utf-8 -*-\n\"\"\"\n# --------------------------------------------\n# Super-Resolution\n# ------------------------"
},
{
"path": "stable_diffusion/ldm/modules/image_degradation/bsrgan_light.py",
"chars": 22238,
"preview": "# -*- coding: utf-8 -*-\nimport numpy as np\nimport cv2\nimport torch\n\nfrom functools import partial\nimport random\nfrom sci"
},
{
"path": "stable_diffusion/ldm/modules/image_degradation/utils_image.py",
"chars": 29022,
"preview": "import os\nimport math\nimport random\nimport numpy as np\nimport torch\nimport cv2\nfrom torchvision.utils import make_grid\nf"
},
{
"path": "stable_diffusion/ldm/modules/losses/__init__.py",
"chars": 68,
"preview": "from ldm.modules.losses.contperceptual import LPIPSWithDiscriminator"
},
{
"path": "stable_diffusion/ldm/modules/losses/contperceptual.py",
"chars": 5581,
"preview": "import torch\nimport torch.nn as nn\n\nfrom taming.modules.losses.vqperceptual import * # TODO: taming dependency yes/no?\n"
},
{
"path": "stable_diffusion/ldm/modules/losses/vqperceptual.py",
"chars": 7941,
"preview": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nfrom einops import repeat\n\nfrom taming.modules.discrim"
},
{
"path": "stable_diffusion/ldm/modules/x_transformer.py",
"chars": 20168,
"preview": "\"\"\"shout-out to https://github.com/lucidrains/x-transformers/tree/main/x_transformers\"\"\"\nimport torch\nfrom torch import "
},
{
"path": "stable_diffusion/ldm/util.py",
"chars": 5857,
"preview": "import importlib\n\nimport torch\nimport numpy as np\nfrom collections import abc\nfrom einops import rearrange\nfrom functool"
},
{
"path": "stable_diffusion/main.py",
"chars": 28229,
"preview": "import argparse, os, sys, datetime, glob, importlib, csv\nimport numpy as np\nimport time\nimport torch\nimport torchvision\n"
},
{
"path": "stable_diffusion/models/first_stage_models/kl-f16/config.yaml",
"chars": 909,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: val/rec_loss\n"
},
{
"path": "stable_diffusion/models/first_stage_models/kl-f32/config.yaml",
"chars": 929,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: val/rec_loss\n"
},
{
"path": "stable_diffusion/models/first_stage_models/kl-f4/config.yaml",
"chars": 880,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: val/rec_loss\n"
},
{
"path": "stable_diffusion/models/first_stage_models/kl-f8/config.yaml",
"chars": 889,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.AutoencoderKL\n params:\n monitor: val/rec_loss\n"
},
{
"path": "stable_diffusion/models/first_stage_models/vq-f16/config.yaml",
"chars": 1026,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.VQModel\n params:\n embed_dim: 8\n n_embed: 16"
},
{
"path": "stable_diffusion/models/first_stage_models/vq-f4/config.yaml",
"chars": 955,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.VQModel\n params:\n embed_dim: 3\n n_embed: 81"
},
{
"path": "stable_diffusion/models/first_stage_models/vq-f4-noattn/config.yaml",
"chars": 978,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.VQModel\n params:\n embed_dim: 3\n n_embed: 81"
},
{
"path": "stable_diffusion/models/first_stage_models/vq-f8/config.yaml",
"chars": 1035,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.VQModel\n params:\n embed_dim: 4\n n_embed: 16"
},
{
"path": "stable_diffusion/models/first_stage_models/vq-f8-n256/config.yaml",
"chars": 1013,
"preview": "model:\n base_learning_rate: 4.5e-06\n target: ldm.models.autoencoder.VQModel\n params:\n embed_dim: 4\n n_embed: 25"
},
{
"path": "stable_diffusion/models/ldm/bsr_sr/config.yaml",
"chars": 1900,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/celeba256/config.yaml",
"chars": 1599,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/cin256/config.yaml",
"chars": 1862,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/ffhq256/config.yaml",
"chars": 1591,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/inpainting_big/config.yaml",
"chars": 1619,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/layout2img-openimages256/config.yaml",
"chars": 1924,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/lsun_beds256/config.yaml",
"chars": 1601,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/lsun_churches256/config.yaml",
"chars": 2018,
"preview": "model:\n base_learning_rate: 5.0e-05\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/semantic_synthesis256/config.yaml",
"chars": 1378,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/semantic_synthesis512/config.yaml",
"chars": 1820,
"preview": "model:\n base_learning_rate: 1.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/models/ldm/text2img256/config.yaml",
"chars": 1831,
"preview": "model:\n base_learning_rate: 2.0e-06\n target: ldm.models.diffusion.ddpm.LatentDiffusion\n params:\n linear_start: 0.0"
},
{
"path": "stable_diffusion/notebook_helpers.py",
"chars": 10099,
"preview": "from torchvision.datasets.utils import download_url\nfrom ldm.util import instantiate_from_config\nimport torch\nimport os\n"
},
{
"path": "stable_diffusion/scripts/download_first_stages.sh",
"chars": 1324,
"preview": "#!/bin/bash\nwget -O models/first_stage_models/kl-f4/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f4.zip\nwge"
},
{
"path": "stable_diffusion/scripts/download_models.sh",
"chars": 1681,
"preview": "#!/bin/bash\nwget -O models/ldm/celeba256/celeba-256.zip https://ommer-lab.com/files/latent-diffusion/celeba.zip\nwget -O "
},
{
"path": "stable_diffusion/scripts/img2img.py",
"chars": 9181,
"preview": "\"\"\"make variations of input image\"\"\"\n\nimport argparse, os, sys, glob\nimport PIL\nimport torch\nimport numpy as np\nfrom ome"
},
{
"path": "stable_diffusion/scripts/inpaint.py",
"chars": 3644,
"preview": "import argparse, os, sys, glob\nfrom omegaconf import OmegaConf\nfrom PIL import Image\nfrom tqdm import tqdm\nimport numpy "
},
{
"path": "stable_diffusion/scripts/knn2img.py",
"chars": 13707,
"preview": "import argparse, os, sys, glob\nimport clip\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom omegaconf import O"
},
{
"path": "stable_diffusion/scripts/latent_imagenet_diffusion.ipynb.REMOVED.git-id",
"chars": 40,
"preview": "607f94fc7d3ef6d8d1627017215476d9dfc7ddc4"
},
{
"path": "stable_diffusion/scripts/sample_diffusion.py",
"chars": 9606,
"preview": "import argparse, os, sys, glob, datetime, yaml\nimport torch\nimport time\nimport numpy as np\nfrom tqdm import trange\n\nfrom"
},
{
"path": "stable_diffusion/scripts/tests/test_watermark.py",
"chars": 357,
"preview": "import cv2\nimport fire\nfrom imwatermark import WatermarkDecoder\n\n\ndef testit(img_path):\n bgr = cv2.imread(img_path)\n "
},
{
"path": "stable_diffusion/scripts/train_searcher.py",
"chars": 5807,
"preview": "import os, sys\nimport numpy as np\nimport scann\nimport argparse\nimport glob\nfrom multiprocessing import cpu_count\nfrom tq"
},
{
"path": "stable_diffusion/scripts/txt2img.py",
"chars": 11666,
"preview": "import argparse, os, sys, glob\nimport cv2\nimport torch\nimport numpy as np\nfrom omegaconf import OmegaConf\nfrom PIL impor"
},
{
"path": "stable_diffusion/setup.py",
"chars": 233,
"preview": "from setuptools import setup, find_packages\n\nsetup(\n name='latent-diffusion',\n version='0.0.1',\n description=''"
}
]
// ... and 1 more files (download for full content)
About this extraction
This page contains the full source code of the OpenGVLab/Diffree GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 106 files (906.8 KB), approximately 238.6k tokens, and a symbol index with 961 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.