Full Code of CompVis/stable-diffusion for AI

main 21f890f9da3c cached

88 files

4.6 MB

1.2M tokens

701 symbols

2 requests

Download .txt

Showing preview only (4,843K chars total). Download the full file or copy to clipboard to get everything.

Repository: CompVis/stable-diffusion
Branch: main
Commit: 21f890f9da3c
Files: 88
Total size: 4.6 MB

Directory structure:
gitextract_a5rozwkp/

├── LICENSE
├── README.md
├── Stable_Diffusion_v1_Model_Card.md
├── configs/
│   ├── autoencoder/
│   │   ├── autoencoder_kl_16x16x16.yaml
│   │   ├── autoencoder_kl_32x32x4.yaml
│   │   ├── autoencoder_kl_64x64x3.yaml
│   │   └── autoencoder_kl_8x8x64.yaml
│   ├── latent-diffusion/
│   │   ├── celebahq-ldm-vq-4.yaml
│   │   ├── cin-ldm-vq-f8.yaml
│   │   ├── cin256-v2.yaml
│   │   ├── ffhq-ldm-vq-4.yaml
│   │   ├── lsun_bedrooms-ldm-vq-4.yaml
│   │   ├── lsun_churches-ldm-kl-8.yaml
│   │   └── txt2img-1p4B-eval.yaml
│   ├── retrieval-augmented-diffusion/
│   │   └── 768x768.yaml
│   └── stable-diffusion/
│       └── v1-inference.yaml
├── data/
│   ├── example_conditioning/
│   │   └── text_conditional/
│   │       └── sample_0.txt
│   ├── imagenet_clsidx_to_label.txt
│   ├── imagenet_train_hr_indices.p
│   ├── imagenet_val_hr_indices.p
│   └── index_synset.yaml
├── environment.yaml
├── ldm/
│   ├── data/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── imagenet.py
│   │   └── lsun.py
│   ├── lr_scheduler.py
│   ├── models/
│   │   ├── autoencoder.py
│   │   └── diffusion/
│   │       ├── __init__.py
│   │       ├── classifier.py
│   │       ├── ddim.py
│   │       ├── ddpm.py
│   │       ├── dpm_solver/
│   │       │   ├── __init__.py
│   │       │   ├── dpm_solver.py
│   │       │   └── sampler.py
│   │       └── plms.py
│   ├── modules/
│   │   ├── attention.py
│   │   ├── diffusionmodules/
│   │   │   ├── __init__.py
│   │   │   ├── model.py
│   │   │   ├── openaimodel.py
│   │   │   └── util.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   └── distributions.py
│   │   ├── ema.py
│   │   ├── encoders/
│   │   │   ├── __init__.py
│   │   │   └── modules.py
│   │   ├── image_degradation/
│   │   │   ├── __init__.py
│   │   │   ├── bsrgan.py
│   │   │   ├── bsrgan_light.py
│   │   │   └── utils_image.py
│   │   ├── losses/
│   │   │   ├── __init__.py
│   │   │   ├── contperceptual.py
│   │   │   └── vqperceptual.py
│   │   └── x_transformer.py
│   └── util.py
├── main.py
├── models/
│   ├── first_stage_models/
│   │   ├── kl-f16/
│   │   │   └── config.yaml
│   │   ├── kl-f32/
│   │   │   └── config.yaml
│   │   ├── kl-f4/
│   │   │   └── config.yaml
│   │   ├── kl-f8/
│   │   │   └── config.yaml
│   │   ├── vq-f16/
│   │   │   └── config.yaml
│   │   ├── vq-f4/
│   │   │   └── config.yaml
│   │   ├── vq-f4-noattn/
│   │   │   └── config.yaml
│   │   ├── vq-f8/
│   │   │   └── config.yaml
│   │   └── vq-f8-n256/
│   │       └── config.yaml
│   └── ldm/
│       ├── bsr_sr/
│       │   └── config.yaml
│       ├── celeba256/
│       │   └── config.yaml
│       ├── cin256/
│       │   └── config.yaml
│       ├── ffhq256/
│       │   └── config.yaml
│       ├── inpainting_big/
│       │   └── config.yaml
│       ├── layout2img-openimages256/
│       │   └── config.yaml
│       ├── lsun_beds256/
│       │   └── config.yaml
│       ├── lsun_churches256/
│       │   └── config.yaml
│       ├── semantic_synthesis256/
│       │   └── config.yaml
│       ├── semantic_synthesis512/
│       │   └── config.yaml
│       └── text2img256/
│           └── config.yaml
├── notebook_helpers.py
├── scripts/
│   ├── download_first_stages.sh
│   ├── download_models.sh
│   ├── img2img.py
│   ├── inpaint.py
│   ├── knn2img.py
│   ├── latent_imagenet_diffusion.ipynb
│   ├── sample_diffusion.py
│   ├── tests/
│   │   └── test_watermark.py
│   ├── train_searcher.py
│   └── txt2img.py
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors

CreativeML Open RAIL-M
dated August 22, 2022

Section I: PREAMBLE

Multimodal generative models are being widely adopted and used, and have the potential to transform the way artists, among other individuals, conceive and benefit from AI or ML technologies as a tool for content creation.

Notwithstanding the current and potential benefits that these artifacts can bring to society at large, there are also concerns about potential misuses of them, either due to their technical limitations or ethical considerations.

In short, this license strives for both the open and responsible downstream use of the accompanying model. When it comes to the open character, we took inspiration from open source permissive licenses regarding the grant of IP rights. Referring to the downstream responsible use, we added use-based restrictions not permitting the use of the Model in very specific scenarios, in order for the licensor to be able to enforce the license in case potential misuses of the Model may occur. At the same time, we strive to promote open and responsible research on generative models for art and content generation.

Even though downstream derivative versions of the model could be released under different licensing terms, the latter will always have to include - at minimum - the same use-based restrictions as the ones in the original license (this license). We believe in the intersection between open and responsible AI development; thus, this License aims to strike a balance between both in order to enable responsible open-science in the field of AI.

This License governs the use of the model (and its derivatives) and is informed by the model card associated with the model.

NOW THEREFORE, You and Licensor agree as follows:

1. Definitions

- "License" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
- "Data" means a collection of information and/or content extracted from the dataset used with the Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
- "Output" means the results of operating a Model as embodied in informational content resulting therefrom.
- "Model" means any accompanying machine-learning based assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or in part on the Data, using the Complementary Material.
- "Derivatives of the Model" means all modifications to the Model, works based on the Model, or any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
- "Complementary Material" means the accompanying source code and scripts used to define, run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if any. This includes any accompanying documentation, tutorials, examples, etc, if any.
- "Distribution" means any transmission, reproduction, publication or other sharing of the Model or Derivatives of the Model to a third party, including providing the Model as a hosted service made available by electronic or other remote means - e.g. API-based or web access.
- "Licensor" means the copyright owner or entity authorized by the copyright owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
- "You" (or "Your") means an individual or Legal Entity exercising permissions granted by this License and/or making use of the Model for whichever purpose and in any field of use, including usage of the Model in an end-use application - e.g. chatbot, translator, image generator.
- "Third Parties" means individuals or legal entities that are not under common control with Licensor or You.
- "Contribution" means any work of authorship, including the original version of the Model and any modifications or additions to that Model or Derivatives of the Model thereof, that is intentionally submitted to Licensor for inclusion in the Model by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Model, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
- "Contributor" means Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Model.

Section II: INTELLECTUAL PROPERTY RIGHTS

Both copyright and patent grants apply to the Model, Derivatives of the Model and Complementary Material. The Model and Derivatives of the Model are subject to additional terms as described in Section III.

2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.

Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION

4. Distribution and Redistribution. You may host for Third Party remote access purposes (e.g. software-as-a-service), reproduce and distribute copies of the Model or Derivatives of the Model thereof in any medium, with or without modifications, provided that You meet the following conditions:
Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply to the use of Complementary Material.
You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License;
You must cause any modified files to carry prominent notices stating that You changed the files;
You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model, Derivatives of the Model.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions - respecting paragraph 4.a. - for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions. Therefore You cannot use the Model and the Derivatives of the Model for the specified restricted uses. You may use the Model subject to this License, including only for lawful purposes and in accordance with the License. Use may include creating any content with, finetuning, updating, running, training, evaluating and/or reparametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model to comply with the terms of this paragraph (paragraph 5).
6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You generate using the Model. You are accountable for the Output you generate and its subsequent uses. No use of the output can contravene any provision as stated in the License.

Section IV: OTHER PROVISIONS

7. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model through electronic means, or modify the Output of the Model based on updates. You shall undertake reasonable efforts to use the latest version of the Model.
8. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
9. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Model and the Complementary Material (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model, Derivatives of the Model, and the Complementary Material and assume any risks associated with Your exercise of permissions under this License.
10. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Model and the Complementary Material (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
11. Accepting Warranty or Additional Liability. While redistributing the Model, Derivatives of the Model and the Complementary Material thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
12. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.

END OF TERMS AND CONDITIONS




Attachment A

Use Restrictions

You agree not to use the Model or Derivatives of the Model:
- In any way that violates any applicable national, federal, state, local or international law or regulation;
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
- To generate or disseminate verifiably false information and/or content with the purpose of harming others;
- To generate or disseminate personal identifiable information that can be used to harm an individual;
- To defame, disparage or otherwise harass others;
- For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
- For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
- To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
- For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
- To provide medical advice and medical results interpretation;
- To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).


================================================
FILE: README.md
================================================
# Stable Diffusion
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*

[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
[Robin Rombach](https://github.com/rromb)\*,
[Andreas Blattmann](https://github.com/ablattmann)\*,
[Dominik Lorenz](https://github.com/qp-qp)\,
[Patrick Esser](https://github.com/pesser),
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
_[CVPR '22 Oral](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html) |
[GitHub](https://github.com/CompVis/latent-diffusion) | [arXiv](https://arxiv.org/abs/2112.10752) | [Project page](https://ommer-lab.com/research/latent-diffusion-models/)_

![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. 
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487), 
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).

  
## Requirements
A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:

```
conda env create -f environment.yaml
conda activate ldm
```

You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running

```
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
``` 


## Stable Diffusion v1

Stable Diffusion v1 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and 
then finetuned on 512x512 images.

*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
in its training data. 
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](Stable_Diffusion_v1_Model_Card.md).*

The weights are available via [the CompVis organization at Hugging Face](https://huggingface.co/CompVis) under [a license which contains specific use-based restrictions to prevent misuse and harm as informed by the model card, but otherwise remains permissive](LICENSE). While commercial use is permitted under the terms of the license, **we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations**, since there are [known limitations and biases](Stable_Diffusion_v1_Model_Card.md#limitations-and-bias) of the weights, and research on safe and ethical deployment of general text-to-image models is an ongoing effort. **The weights are research artifacts and should be treated as such.**

[The CreativeML OpenRAIL M license](LICENSE) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.

### Weights

We currently provide the following checkpoints:

- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
  194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
  515k steps at resolution `512x512` on [laion-aesthetics v2 5+](https://laion.ai/blog/laion-aesthetics/) (a subset of laion2B-en with estimated aesthetics score `> 5.0`, and additionally
filtered to images with an original size `>= 512x512`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the [LAION-5B](https://laion.ai/blog/laion-5b/) metadata, the aesthetics score is estimated using the [LAION-Aesthetics Predictor V2](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
- `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:
![sd evaluation results](assets/v1-variants-scores.jpg)



### Text-to-Image with Stable Diffusion
![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)
![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)

Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
We provide a [reference script for sampling](#reference-sampling-script), but
there also exists a [diffusers integration](#diffusers-integration), which we
expect to see more active community development.

#### Reference Sampling Script

We provide a reference sampling script, which incorporates

- a [Safety Checker Module](https://github.com/CompVis/stable-diffusion/pull/36),
  to reduce the probability of explicit outputs,
- an [invisible watermarking](https://github.com/ShieldMnt/invisible-watermark)
  of the outputs, to help viewers [identify the images as machine-generated](scripts/tests/test_watermark.py).

After [obtaining the `stable-diffusion-v1-*-original` weights](#weights), link them
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 
```
and sample with
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
```

By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler, 
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).


```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
                  [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
                  [--seed SEED] [--precision {full,autocast}]

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision
```
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints. 
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.


#### Diffusers Integration

A simple way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers):
```py
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-4", 
	use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")
```


### Image Modification with Stable Diffusion

By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different 
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script, 
we provide a script to perform image modification with Stable Diffusion.  

The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
```
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
```
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. 
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.

**Input**

![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)

**Outputs**

![out3](assets/stable-samples/img2img/mountains-3.png)
![out2](assets/stable-samples/img2img/mountains-2.png)

This procedure can, for example, also be used to upscale samples from the base model.


## Comments 

- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). 
Thanks for open-sourcing!

- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). 


## BibTeX

```
@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```




================================================
FILE: Stable_Diffusion_v1_Model_Card.md
================================================
# Stable Diffusion v1 Model Card
This model card focuses on the model associated with the Stable Diffusion model, available [here](https://github.com/CompVis/stable-diffusion).

## Model Details
- **Developed by:** Robin Rombach, Patrick Esser
- **Model type:** Diffusion-based text-to-image generation model
- **Language(s):** English
- **License:** [Proprietary](LICENSE)
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
- **Resources for more information:** [GitHub Repository](https://github.com/CompVis/stable-diffusion), [Paper](https://arxiv.org/abs/2112.10752).
- **Cite as:**

      @InProceedings{Rombach_2022_CVPR,
          author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
          title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
          booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
          month     = {June},
          year      = {2022},
          pages     = {10684-10695}
      }

# Uses

## Direct Use 
The model is intended for research purposes only. Possible research areas and
tasks include

- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
- Research on generative models.

Excluded uses are described below.

 ### Misuse, Malicious Use, and Out-of-Scope Use
_Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.

The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

#### Out-of-Scope Use
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

#### Misuse and Malicious Use
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:

- Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.
- Intentionally promoting or propagating discriminatory content or harmful stereotypes.
- Impersonating individuals without their consent.
- Sexual content without consent of the people who might see it.
- Mis- and disinformation
- Representations of egregious violence and gore
- Sharing of copyrighted or licensed material in violation of its terms of use.
- Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.

## Limitations and Bias

### Limitations

- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.
- The model was trained mainly with English captions and will not work as well in other languages.
- The autoencoding part of the model is lossy
- The model was trained on a large-scale dataset
  [LAION-5B](https://laion.ai/blog/laion-5b/) which contains adult material
  and is not fit for product use without additional safety mechanisms and
  considerations.
- No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data.
  The training data can be searched at [https://rom1504.github.io/clip-retrieval/](https://rom1504.github.io/clip-retrieval/) to possibly assist in the detection of memorized images.

### Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. 
Stable Diffusion v1 was primarily trained on subsets of [LAION-2B(en)](https://laion.ai/blog/laion-5b/), 
which consists of images that are limited to English descriptions. 
Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. 
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the 
ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
Stable Diffusion v1 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.


## Training

**Training Data**
The model developers used the following dataset for training the model:

- LAION-5B and subsets thereof (see next section)

**Training Procedure**
Stable Diffusion v1 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, 

- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
- Text prompts are encoded through a ViT-L/14 text-encoder.
- The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
- The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.

We currently provide the following checkpoints:

- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
  194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
  515k steps at resolution `512x512` on [laion-aesthetics v2 5+](https://laion.ai/blog/laion-aesthetics/) (a subset of laion2B-en with estimated aesthetics score `> 5.0`, and additionally
filtered to images with an original size `>= 512x512`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the [LAION-5B](https://laion.ai/blog/laion-5b/) metadata, the aesthetics score is estimated using the [LAION-Aesthetics Predictor V2](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
- `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).

- **Hardware:** 32 x 8 x A100 GPUs
- **Optimizer:** AdamW
- **Gradient Accumulations**: 2
- **Batch:** 32 x 8 x 2 x 4 = 2048
- **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant

## Evaluation Results 
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:

![pareto](assets/v1-variants-scores.jpg) 

Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution.  Not optimized for FID scores.

## Environmental Impact

**Stable Diffusion v1** **Estimated Emissions**
Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

- **Hardware Type:** A100 PCIe 40GB
- **Hours used:** 150000
- **Cloud Provider:** AWS
- **Compute Region:** US-east
- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.

## Citation
    @InProceedings{Rombach_2022_CVPR,
        author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
        title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        month     = {June},
        year      = {2022},
        pages     = {10684-10695}
    }

*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*


================================================
FILE: configs/autoencoder/autoencoder_kl_16x16x16.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    monitor: "val/rec_loss"
    embed_dim: 16
    lossconfig:
      target: ldm.modules.losses.LPIPSWithDiscriminator
      params:
        disc_start: 50001
        kl_weight: 0.000001
        disc_weight: 0.5

    ddconfig:
      double_z: True
      z_channels: 16
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult: [ 1,1,2,2,4]  # num_down = len(ch_mult)-1
      num_res_blocks: 2
      attn_resolutions: [16]
      dropout: 0.0


data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 12
    wrap: True
    train:
      target: ldm.data.imagenet.ImageNetSRTrain
      params:
        size: 256
        degradation: pil_nearest
    validation:
      target: ldm.data.imagenet.ImageNetSRValidation
      params:
        size: 256
        degradation: pil_nearest

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    benchmark: True
    accumulate_grad_batches: 2


================================================
FILE: configs/autoencoder/autoencoder_kl_32x32x4.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    monitor: "val/rec_loss"
    embed_dim: 4
    lossconfig:
      target: ldm.modules.losses.LPIPSWithDiscriminator
      params:
        disc_start: 50001
        kl_weight: 0.000001
        disc_weight: 0.5

    ddconfig:
      double_z: True
      z_channels: 4
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult: [ 1,2,4,4 ]  # num_down = len(ch_mult)-1
      num_res_blocks: 2
      attn_resolutions: [ ]
      dropout: 0.0

data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 12
    wrap: True
    train:
      target: ldm.data.imagenet.ImageNetSRTrain
      params:
        size: 256
        degradation: pil_nearest
    validation:
      target: ldm.data.imagenet.ImageNetSRValidation
      params:
        size: 256
        degradation: pil_nearest

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    benchmark: True
    accumulate_grad_batches: 2


================================================
FILE: configs/autoencoder/autoencoder_kl_64x64x3.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    monitor: "val/rec_loss"
    embed_dim: 3
    lossconfig:
      target: ldm.modules.losses.LPIPSWithDiscriminator
      params:
        disc_start: 50001
        kl_weight: 0.000001
        disc_weight: 0.5

    ddconfig:
      double_z: True
      z_channels: 3
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult: [ 1,2,4 ]  # num_down = len(ch_mult)-1
      num_res_blocks: 2
      attn_resolutions: [ ]
      dropout: 0.0


data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 12
    wrap: True
    train:
      target: ldm.data.imagenet.ImageNetSRTrain
      params:
        size: 256
        degradation: pil_nearest
    validation:
      target: ldm.data.imagenet.ImageNetSRValidation
      params:
        size: 256
        degradation: pil_nearest

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    benchmark: True
    accumulate_grad_batches: 2


================================================
FILE: configs/autoencoder/autoencoder_kl_8x8x64.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    monitor: "val/rec_loss"
    embed_dim: 64
    lossconfig:
      target: ldm.modules.losses.LPIPSWithDiscriminator
      params:
        disc_start: 50001
        kl_weight: 0.000001
        disc_weight: 0.5

    ddconfig:
      double_z: True
      z_channels: 64
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult: [ 1,1,2,2,4,4]  # num_down = len(ch_mult)-1
      num_res_blocks: 2
      attn_resolutions: [16,8]
      dropout: 0.0

data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 12
    wrap: True
    train:
      target: ldm.data.imagenet.ImageNetSRTrain
      params:
        size: 256
        degradation: pil_nearest
    validation:
      target: ldm.data.imagenet.ImageNetSRValidation
      params:
        size: 256
        degradation: pil_nearest

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    benchmark: True
    accumulate_grad_batches: 2


================================================
FILE: configs/latent-diffusion/celebahq-ldm-vq-4.yaml
================================================
model:
  base_learning_rate: 2.0e-06
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0195
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    image_size: 64
    channels: 3
    monitor: val/loss_simple_ema

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 64
        in_channels: 3
        out_channels: 3
        model_channels: 224
        attention_resolutions:
        # note: this isn\t actually the resolution but
        # the downsampling factor, i.e. this corresnponds to
        # attention on spatial resolution 8,16,32, as the
        # spatial reolution of the latents is 64 for f4
        - 8
        - 4
        - 2
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 3
        - 4
        num_head_channels: 32
    first_stage_config:
      target: ldm.models.autoencoder.VQModelInterface
      params:
        embed_dim: 3
        n_embed: 8192
        ckpt_path: models/first_stage_models/vq-f4/model.ckpt
        ddconfig:
          double_z: false
          z_channels: 3
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    cond_stage_config: __is_unconditional__
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 48
    num_workers: 5
    wrap: false
    train:
      target: taming.data.faceshq.CelebAHQTrain
      params:
        size: 256
    validation:
      target: taming.data.faceshq.CelebAHQValidation
      params:
        size: 256


lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True

================================================
FILE: configs/latent-diffusion/cin-ldm-vq-f8.yaml
================================================
model:
  base_learning_rate: 1.0e-06
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0195
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: class_label
    image_size: 32
    channels: 4
    cond_stage_trainable: true
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32
        in_channels: 4
        out_channels: 4
        model_channels: 256
        attention_resolutions:
        #note: this isn\t actually the resolution but
        # the downsampling factor, i.e. this corresnponds to
        # attention on spatial resolution 8,16,32, as the
        # spatial reolution of the latents is 32 for f8
        - 4
        - 2
        - 1
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 4
        num_head_channels: 32
        use_spatial_transformer: true
        transformer_depth: 1
        context_dim: 512
    first_stage_config:
      target: ldm.models.autoencoder.VQModelInterface
      params:
        embed_dim: 4
        n_embed: 16384
        ckpt_path: configs/first_stage_models/vq-f8/model.yaml
        ddconfig:
          double_z: false
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions:
          - 32
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    cond_stage_config:
      target: ldm.modules.encoders.modules.ClassEmbedder
      params:
        embed_dim: 512
        key: class_label
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 64
    num_workers: 12
    wrap: false
    train:
      target: ldm.data.imagenet.ImageNetTrain
      params:
        config:
          size: 256
    validation:
      target: ldm.data.imagenet.ImageNetValidation
      params:
        config:
          size: 256


lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True

================================================
FILE: configs/latent-diffusion/cin256-v2.yaml
================================================
model:
  base_learning_rate: 0.0001
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0195
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: class_label
    image_size: 64
    channels: 3
    cond_stage_trainable: true
    conditioning_key: crossattn
    monitor: val/loss
    use_ema: False
    
    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 64
        in_channels: 3
        out_channels: 3
        model_channels: 192
        attention_resolutions:
        - 8
        - 4
        - 2
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 3
        - 5
        num_heads: 1
        use_spatial_transformer: true
        transformer_depth: 1
        context_dim: 512
    
    first_stage_config:
      target: ldm.models.autoencoder.VQModelInterface
      params:
        embed_dim: 3
        n_embed: 8192
        ddconfig:
          double_z: false
          z_channels: 3
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    
    cond_stage_config:
      target: ldm.modules.encoders.modules.ClassEmbedder
      params:
        n_classes: 1001
        embed_dim: 512
        key: class_label


================================================
FILE: configs/latent-diffusion/ffhq-ldm-vq-4.yaml
================================================
model:
  base_learning_rate: 2.0e-06
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0195
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    image_size: 64
    channels: 3
    monitor: val/loss_simple_ema
    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 64
        in_channels: 3
        out_channels: 3
        model_channels: 224
        attention_resolutions:
        # note: this isn\t actually the resolution but
        # the downsampling factor, i.e. this corresnponds to
        # attention on spatial resolution 8,16,32, as the
        # spatial reolution of the latents is 64 for f4
        - 8
        - 4
        - 2
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 3
        - 4
        num_head_channels: 32
    first_stage_config:
      target: ldm.models.autoencoder.VQModelInterface
      params:
        embed_dim: 3
        n_embed: 8192
        ckpt_path: configs/first_stage_models/vq-f4/model.yaml
        ddconfig:
          double_z: false
          z_channels: 3
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    cond_stage_config: __is_unconditional__
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 42
    num_workers: 5
    wrap: false
    train:
      target: taming.data.faceshq.FFHQTrain
      params:
        size: 256
    validation:
      target: taming.data.faceshq.FFHQValidation
      params:
        size: 256


lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True

================================================
FILE: configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml
================================================
model:
  base_learning_rate: 2.0e-06
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0195
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    image_size: 64
    channels: 3
    monitor: val/loss_simple_ema
    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 64
        in_channels: 3
        out_channels: 3
        model_channels: 224
        attention_resolutions:
        # note: this isn\t actually the resolution but
        # the downsampling factor, i.e. this corresnponds to
        # attention on spatial resolution 8,16,32, as the
        # spatial reolution of the latents is 64 for f4
        - 8
        - 4
        - 2
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 3
        - 4
        num_head_channels: 32
    first_stage_config:
      target: ldm.models.autoencoder.VQModelInterface
      params:
        ckpt_path: configs/first_stage_models/vq-f4/model.yaml
        embed_dim: 3
        n_embed: 8192
        ddconfig:
          double_z: false
          z_channels: 3
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    cond_stage_config: __is_unconditional__
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 48
    num_workers: 5
    wrap: false
    train:
      target: ldm.data.lsun.LSUNBedroomsTrain
      params:
        size: 256
    validation:
      target: ldm.data.lsun.LSUNBedroomsValidation
      params:
        size: 256


lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True

================================================
FILE: configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml
================================================
model:
  base_learning_rate: 5.0e-5   # set to target_lr by starting main.py with '--scale_lr False'
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.0155
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    loss_type: l1
    first_stage_key: "image"
    cond_stage_key: "image"
    image_size: 32
    channels: 4
    cond_stage_trainable: False
    concat_mode: False
    scale_by_std: True
    monitor: 'val/loss_simple_ema'

    scheduler_config: # 10000 warmup steps
      target: ldm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [10000]
        cycle_lengths: [10000000000000]
        f_start: [1.e-6]
        f_max: [1.]
        f_min: [ 1.]

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32
        in_channels: 4
        out_channels: 4
        model_channels: 192
        attention_resolutions: [ 1, 2, 4, 8 ]   # 32, 16, 8, 4
        num_res_blocks: 2
        channel_mult: [ 1,2,2,4,4 ]  # 32, 16, 8, 4, 2
        num_heads: 8
        use_scale_shift_norm: True
        resblock_updown: True

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: "val/rec_loss"
        ckpt_path: "models/first_stage_models/kl-f8/model.ckpt"
        ddconfig:
          double_z: True
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [ 1,2,4,4 ]  # num_down = len(ch_mult)-1
          num_res_blocks: 2
          attn_resolutions: [ ]
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config: "__is_unconditional__"

data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 96
    num_workers: 5
    wrap: False
    train:
      target: ldm.data.lsun.LSUNChurchesTrain
      params:
        size: 256
    validation:
      target: ldm.data.lsun.LSUNChurchesValidation
      params:
        size: 256

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000
        max_images: 8
        increase_log_steps: False


  trainer:
    benchmark: True

================================================
FILE: configs/latent-diffusion/txt2img-1p4B-eval.yaml
================================================
model:
  base_learning_rate: 5.0e-05
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.012
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: caption
    image_size: 32
    channels: 4
    cond_stage_trainable: true
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions:
        - 4
        - 2
        - 1
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 4
        - 4
        num_heads: 8
        use_spatial_transformer: true
        transformer_depth: 1
        context_dim: 1280
        use_checkpoint: true
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.BERTEmbedder
      params:
        n_embed: 1280
        n_layer: 32


================================================
FILE: configs/retrieval-augmented-diffusion/768x768.yaml
================================================
model:
  base_learning_rate: 0.0001
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.0015
    linear_end: 0.015
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: jpg
    cond_stage_key: nix
    image_size: 48
    channels: 16
    cond_stage_trainable: false
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_by_std: false
    scale_factor: 0.22765929
    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 48
        in_channels: 16
        out_channels: 16
        model_channels: 448
        attention_resolutions:
        - 4
        - 2
        - 1
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 3
        - 4
        use_scale_shift_norm: false
        resblock_updown: false
        num_head_channels: 32
        use_spatial_transformer: true
        transformer_depth: 1
        context_dim: 768
        use_checkpoint: true
    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        monitor: val/rec_loss
        embed_dim: 16
        ddconfig:
          double_z: true
          z_channels: 16
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 1
          - 2
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions:
          - 16
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity
    cond_stage_config:
      target: torch.nn.Identity

================================================
FILE: configs/stable-diffusion/v1-inference.yaml
================================================
model:
  base_learning_rate: 1.0e-04
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.0120
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: "jpg"
    cond_stage_key: "txt"
    image_size: 64
    channels: 4
    cond_stage_trainable: false   # Note: different from the one we trained before
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    scheduler_config: # 10000 warmup steps
      target: ldm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [ 10000 ]
        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
        f_start: [ 1.e-6 ]
        f_max: [ 1. ]
        f_min: [ 1. ]

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32 # unused
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 8
        use_spatial_transformer: True
        transformer_depth: 1
        context_dim: 768
        use_checkpoint: True
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder


================================================
FILE: data/example_conditioning/text_conditional/sample_0.txt
================================================
A basket of cerries


================================================
FILE: data/imagenet_clsidx_to_label.txt
================================================
 0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
 3: 'tiger shark, Galeocerdo cuvieri',
 4: 'hammerhead, hammerhead shark',
 5: 'electric ray, crampfish, numbfish, torpedo',
 6: 'stingray',
 7: 'cock',
 8: 'hen',
 9: 'ostrich, Struthio camelus',
 10: 'brambling, Fringilla montifringilla',
 11: 'goldfinch, Carduelis carduelis',
 12: 'house finch, linnet, Carpodacus mexicanus',
 13: 'junco, snowbird',
 14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
 15: 'robin, American robin, Turdus migratorius',
 16: 'bulbul',
 17: 'jay',
 18: 'magpie',
 19: 'chickadee',
 20: 'water ouzel, dipper',
 21: 'kite',
 22: 'bald eagle, American eagle, Haliaeetus leucocephalus',
 23: 'vulture',
 24: 'great grey owl, great gray owl, Strix nebulosa',
 25: 'European fire salamander, Salamandra salamandra',
 26: 'common newt, Triturus vulgaris',
 27: 'eft',
 28: 'spotted salamander, Ambystoma maculatum',
 29: 'axolotl, mud puppy, Ambystoma mexicanum',
 30: 'bullfrog, Rana catesbeiana',
 31: 'tree frog, tree-frog',
 32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
 33: 'loggerhead, loggerhead turtle, Caretta caretta',
 34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
 35: 'mud turtle',
 36: 'terrapin',
 37: 'box turtle, box tortoise',
 38: 'banded gecko',
 39: 'common iguana, iguana, Iguana iguana',
 40: 'American chameleon, anole, Anolis carolinensis',
 41: 'whiptail, whiptail lizard',
 42: 'agama',
 43: 'frilled lizard, Chlamydosaurus kingi',
 44: 'alligator lizard',
 45: 'Gila monster, Heloderma suspectum',
 46: 'green lizard, Lacerta viridis',
 47: 'African chameleon, Chamaeleo chamaeleon',
 48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
 49: 'African crocodile, Nile crocodile, Crocodylus niloticus',
 50: 'American alligator, Alligator mississipiensis',
 51: 'triceratops',
 52: 'thunder snake, worm snake, Carphophis amoenus',
 53: 'ringneck snake, ring-necked snake, ring snake',
 54: 'hognose snake, puff adder, sand viper',
 55: 'green snake, grass snake',
 56: 'king snake, kingsnake',
 57: 'garter snake, grass snake',
 58: 'water snake',
 59: 'vine snake',
 60: 'night snake, Hypsiglena torquata',
 61: 'boa constrictor, Constrictor constrictor',
 62: 'rock python, rock snake, Python sebae',
 63: 'Indian cobra, Naja naja',
 64: 'green mamba',
 65: 'sea snake',
 66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
 67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus',
 68: 'sidewinder, horned rattlesnake, Crotalus cerastes',
 69: 'trilobite',
 70: 'harvestman, daddy longlegs, Phalangium opilio',
 71: 'scorpion',
 72: 'black and gold garden spider, Argiope aurantia',
 73: 'barn spider, Araneus cavaticus',
 74: 'garden spider, Aranea diademata',
 75: 'black widow, Latrodectus mactans',
 76: 'tarantula',
 77: 'wolf spider, hunting spider',
 78: 'tick',
 79: 'centipede',
 80: 'black grouse',
 81: 'ptarmigan',
 82: 'ruffed grouse, partridge, Bonasa umbellus',
 83: 'prairie chicken, prairie grouse, prairie fowl',
 84: 'peacock',
 85: 'quail',
 86: 'partridge',
 87: 'African grey, African gray, Psittacus erithacus',
 88: 'macaw',
 89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
 90: 'lorikeet',
 91: 'coucal',
 92: 'bee eater',
 93: 'hornbill',
 94: 'hummingbird',
 95: 'jacamar',
 96: 'toucan',
 97: 'drake',
 98: 'red-breasted merganser, Mergus serrator',
 99: 'goose',
 100: 'black swan, Cygnus atratus',
 101: 'tusker',
 102: 'echidna, spiny anteater, anteater',
 103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
 104: 'wallaby, brush kangaroo',
 105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
 106: 'wombat',
 107: 'jellyfish',
 108: 'sea anemone, anemone',
 109: 'brain coral',
 110: 'flatworm, platyhelminth',
 111: 'nematode, nematode worm, roundworm',
 112: 'conch',
 113: 'snail',
 114: 'slug',
 115: 'sea slug, nudibranch',
 116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore',
 117: 'chambered nautilus, pearly nautilus, nautilus',
 118: 'Dungeness crab, Cancer magister',
 119: 'rock crab, Cancer irroratus',
 120: 'fiddler crab',
 121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
 122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus',
 123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
 124: 'crayfish, crawfish, crawdad, crawdaddy',
 125: 'hermit crab',
 126: 'isopod',
 127: 'white stork, Ciconia ciconia',
 128: 'black stork, Ciconia nigra',
 129: 'spoonbill',
 130: 'flamingo',
 131: 'little blue heron, Egretta caerulea',
 132: 'American egret, great white heron, Egretta albus',
 133: 'bittern',
 134: 'crane',
 135: 'limpkin, Aramus pictus',
 136: 'European gallinule, Porphyrio porphyrio',
 137: 'American coot, marsh hen, mud hen, water hen, Fulica americana',
 138: 'bustard',
 139: 'ruddy turnstone, Arenaria interpres',
 140: 'red-backed sandpiper, dunlin, Erolia alpina',
 141: 'redshank, Tringa totanus',
 142: 'dowitcher',
 143: 'oystercatcher, oyster catcher',
 144: 'pelican',
 145: 'king penguin, Aptenodytes patagonica',
 146: 'albatross, mollymawk',
 147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
 148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
 149: 'dugong, Dugong dugon',
 150: 'sea lion',
 151: 'Chihuahua',
 152: 'Japanese spaniel',
 153: 'Maltese dog, Maltese terrier, Maltese',
 154: 'Pekinese, Pekingese, Peke',
 155: 'Shih-Tzu',
 156: 'Blenheim spaniel',
 157: 'papillon',
 158: 'toy terrier',
 159: 'Rhodesian ridgeback',
 160: 'Afghan hound, Afghan',
 161: 'basset, basset hound',
 162: 'beagle',
 163: 'bloodhound, sleuthhound',
 164: 'bluetick',
 165: 'black-and-tan coonhound',
 166: 'Walker hound, Walker foxhound',
 167: 'English foxhound',
 168: 'redbone',
 169: 'borzoi, Russian wolfhound',
 170: 'Irish wolfhound',
 171: 'Italian greyhound',
 172: 'whippet',
 173: 'Ibizan hound, Ibizan Podenco',
 174: 'Norwegian elkhound, elkhound',
 175: 'otterhound, otter hound',
 176: 'Saluki, gazelle hound',
 177: 'Scottish deerhound, deerhound',
 178: 'Weimaraner',
 179: 'Staffordshire bullterrier, Staffordshire bull terrier',
 180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
 181: 'Bedlington terrier',
 182: 'Border terrier',
 183: 'Kerry blue terrier',
 184: 'Irish terrier',
 185: 'Norfolk terrier',
 186: 'Norwich terrier',
 187: 'Yorkshire terrier',
 188: 'wire-haired fox terrier',
 189: 'Lakeland terrier',
 190: 'Sealyham terrier, Sealyham',
 191: 'Airedale, Airedale terrier',
 192: 'cairn, cairn terrier',
 193: 'Australian terrier',
 194: 'Dandie Dinmont, Dandie Dinmont terrier',
 195: 'Boston bull, Boston terrier',
 196: 'miniature schnauzer',
 197: 'giant schnauzer',
 198: 'standard schnauzer',
 199: 'Scotch terrier, Scottish terrier, Scottie',
 200: 'Tibetan terrier, chrysanthemum dog',
 201: 'silky terrier, Sydney silky',
 202: 'soft-coated wheaten terrier',
 203: 'West Highland white terrier',
 204: 'Lhasa, Lhasa apso',
 205: 'flat-coated retriever',
 206: 'curly-coated retriever',
 207: 'golden retriever',
 208: 'Labrador retriever',
 209: 'Chesapeake Bay retriever',
 210: 'German short-haired pointer',
 211: 'vizsla, Hungarian pointer',
 212: 'English setter',
 213: 'Irish setter, red setter',
 214: 'Gordon setter',
 215: 'Brittany spaniel',
 216: 'clumber, clumber spaniel',
 217: 'English springer, English springer spaniel',
 218: 'Welsh springer spaniel',
 219: 'cocker spaniel, English cocker spaniel, cocker',
 220: 'Sussex spaniel',
 221: 'Irish water spaniel',
 222: 'kuvasz',
 223: 'schipperke',
 224: 'groenendael',
 225: 'malinois',
 226: 'briard',
 227: 'kelpie',
 228: 'komondor',
 229: 'Old English sheepdog, bobtail',
 230: 'Shetland sheepdog, Shetland sheep dog, Shetland',
 231: 'collie',
 232: 'Border collie',
 233: 'Bouvier des Flandres, Bouviers des Flandres',
 234: 'Rottweiler',
 235: 'German shepherd, German shepherd dog, German police dog, alsatian',
 236: 'Doberman, Doberman pinscher',
 237: 'miniature pinscher',
 238: 'Greater Swiss Mountain dog',
 239: 'Bernese mountain dog',
 240: 'Appenzeller',
 241: 'EntleBucher',
 242: 'boxer',
 243: 'bull mastiff',
 244: 'Tibetan mastiff',
 245: 'French bulldog',
 246: 'Great Dane',
 247: 'Saint Bernard, St Bernard',
 248: 'Eskimo dog, husky',
 249: 'malamute, malemute, Alaskan malamute',
 250: 'Siberian husky',
 251: 'dalmatian, coach dog, carriage dog',
 252: 'affenpinscher, monkey pinscher, monkey dog',
 253: 'basenji',
 254: 'pug, pug-dog',
 255: 'Leonberg',
 256: 'Newfoundland, Newfoundland dog',
 257: 'Great Pyrenees',
 258: 'Samoyed, Samoyede',
 259: 'Pomeranian',
 260: 'chow, chow chow',
 261: 'keeshond',
 262: 'Brabancon griffon',
 263: 'Pembroke, Pembroke Welsh corgi',
 264: 'Cardigan, Cardigan Welsh corgi',
 265: 'toy poodle',
 266: 'miniature poodle',
 267: 'standard poodle',
 268: 'Mexican hairless',
 269: 'timber wolf, grey wolf, gray wolf, Canis lupus',
 270: 'white wolf, Arctic wolf, Canis lupus tundrarum',
 271: 'red wolf, maned wolf, Canis rufus, Canis niger',
 272: 'coyote, prairie wolf, brush wolf, Canis latrans',
 273: 'dingo, warrigal, warragal, Canis dingo',
 274: 'dhole, Cuon alpinus',
 275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
 276: 'hyena, hyaena',
 277: 'red fox, Vulpes vulpes',
 278: 'kit fox, Vulpes macrotis',
 279: 'Arctic fox, white fox, Alopex lagopus',
 280: 'grey fox, gray fox, Urocyon cinereoargenteus',
 281: 'tabby, tabby cat',
 282: 'tiger cat',
 283: 'Persian cat',
 284: 'Siamese cat, Siamese',
 285: 'Egyptian cat',
 286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
 287: 'lynx, catamount',
 288: 'leopard, Panthera pardus',
 289: 'snow leopard, ounce, Panthera uncia',
 290: 'jaguar, panther, Panthera onca, Felis onca',
 291: 'lion, king of beasts, Panthera leo',
 292: 'tiger, Panthera tigris',
 293: 'cheetah, chetah, Acinonyx jubatus',
 294: 'brown bear, bruin, Ursus arctos',
 295: 'American black bear, black bear, Ursus americanus, Euarctos americanus',
 296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
 297: 'sloth bear, Melursus ursinus, Ursus ursinus',
 298: 'mongoose',
 299: 'meerkat, mierkat',
 300: 'tiger beetle',
 301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
 302: 'ground beetle, carabid beetle',
 303: 'long-horned beetle, longicorn, longicorn beetle',
 304: 'leaf beetle, chrysomelid',
 305: 'dung beetle',
 306: 'rhinoceros beetle',
 307: 'weevil',
 308: 'fly',
 309: 'bee',
 310: 'ant, emmet, pismire',
 311: 'grasshopper, hopper',
 312: 'cricket',
 313: 'walking stick, walkingstick, stick insect',
 314: 'cockroach, roach',
 315: 'mantis, mantid',
 316: 'cicada, cicala',
 317: 'leafhopper',
 318: 'lacewing, lacewing fly',
 319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
 320: 'damselfly',
 321: 'admiral',
 322: 'ringlet, ringlet butterfly',
 323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
 324: 'cabbage butterfly',
 325: 'sulphur butterfly, sulfur butterfly',
 326: 'lycaenid, lycaenid butterfly',
 327: 'starfish, sea star',
 328: 'sea urchin',
 329: 'sea cucumber, holothurian',
 330: 'wood rabbit, cottontail, cottontail rabbit',
 331: 'hare',
 332: 'Angora, Angora rabbit',
 333: 'hamster',
 334: 'porcupine, hedgehog',
 335: 'fox squirrel, eastern fox squirrel, Sciurus niger',
 336: 'marmot',
 337: 'beaver',
 338: 'guinea pig, Cavia cobaya',
 339: 'sorrel',
 340: 'zebra',
 341: 'hog, pig, grunter, squealer, Sus scrofa',
 342: 'wild boar, boar, Sus scrofa',
 343: 'warthog',
 344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius',
 345: 'ox',
 346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
 347: 'bison',
 348: 'ram, tup',
 349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
 350: 'ibex, Capra ibex',
 351: 'hartebeest',
 352: 'impala, Aepyceros melampus',
 353: 'gazelle',
 354: 'Arabian camel, dromedary, Camelus dromedarius',
 355: 'llama',
 356: 'weasel',
 357: 'mink',
 358: 'polecat, fitch, foulmart, foumart, Mustela putorius',
 359: 'black-footed ferret, ferret, Mustela nigripes',
 360: 'otter',
 361: 'skunk, polecat, wood pussy',
 362: 'badger',
 363: 'armadillo',
 364: 'three-toed sloth, ai, Bradypus tridactylus',
 365: 'orangutan, orang, orangutang, Pongo pygmaeus',
 366: 'gorilla, Gorilla gorilla',
 367: 'chimpanzee, chimp, Pan troglodytes',
 368: 'gibbon, Hylobates lar',
 369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus',
 370: 'guenon, guenon monkey',
 371: 'patas, hussar monkey, Erythrocebus patas',
 372: 'baboon',
 373: 'macaque',
 374: 'langur',
 375: 'colobus, colobus monkey',
 376: 'proboscis monkey, Nasalis larvatus',
 377: 'marmoset',
 378: 'capuchin, ringtail, Cebus capucinus',
 379: 'howler monkey, howler',
 380: 'titi, titi monkey',
 381: 'spider monkey, Ateles geoffroyi',
 382: 'squirrel monkey, Saimiri sciureus',
 383: 'Madagascar cat, ring-tailed lemur, Lemur catta',
 384: 'indri, indris, Indri indri, Indri brevicaudatus',
 385: 'Indian elephant, Elephas maximus',
 386: 'African elephant, Loxodonta africana',
 387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
 388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
 389: 'barracouta, snoek',
 390: 'eel',
 391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
 392: 'rock beauty, Holocanthus tricolor',
 393: 'anemone fish',
 394: 'sturgeon',
 395: 'gar, garfish, garpike, billfish, Lepisosteus osseus',
 396: 'lionfish',
 397: 'puffer, pufferfish, blowfish, globefish',
 398: 'abacus',
 399: 'abaya',
 400: "academic gown, academic robe, judge's robe",
 401: 'accordion, piano accordion, squeeze box',
 402: 'acoustic guitar',
 403: 'aircraft carrier, carrier, flattop, attack aircraft carrier',
 404: 'airliner',
 405: 'airship, dirigible',
 406: 'altar',
 407: 'ambulance',
 408: 'amphibian, amphibious vehicle',
 409: 'analog clock',
 410: 'apiary, bee house',
 411: 'apron',
 412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
 413: 'assault rifle, assault gun',
 414: 'backpack, back pack, knapsack, packsack, rucksack, haversack',
 415: 'bakery, bakeshop, bakehouse',
 416: 'balance beam, beam',
 417: 'balloon',
 418: 'ballpoint, ballpoint pen, ballpen, Biro',
 419: 'Band Aid',
 420: 'banjo',
 421: 'bannister, banister, balustrade, balusters, handrail',
 422: 'barbell',
 423: 'barber chair',
 424: 'barbershop',
 425: 'barn',
 426: 'barometer',
 427: 'barrel, cask',
 428: 'barrow, garden cart, lawn cart, wheelbarrow',
 429: 'baseball',
 430: 'basketball',
 431: 'bassinet',
 432: 'bassoon',
 433: 'bathing cap, swimming cap',
 434: 'bath towel',
 435: 'bathtub, bathing tub, bath, tub',
 436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
 437: 'beacon, lighthouse, beacon light, pharos',
 438: 'beaker',
 439: 'bearskin, busby, shako',
 440: 'beer bottle',
 441: 'beer glass',
 442: 'bell cote, bell cot',
 443: 'bib',
 444: 'bicycle-built-for-two, tandem bicycle, tandem',
 445: 'bikini, two-piece',
 446: 'binder, ring-binder',
 447: 'binoculars, field glasses, opera glasses',
 448: 'birdhouse',
 449: 'boathouse',
 450: 'bobsled, bobsleigh, bob',
 451: 'bolo tie, bolo, bola tie, bola',
 452: 'bonnet, poke bonnet',
 453: 'bookcase',
 454: 'bookshop, bookstore, bookstall',
 455: 'bottlecap',
 456: 'bow',
 457: 'bow tie, bow-tie, bowtie',
 458: 'brass, memorial tablet, plaque',
 459: 'brassiere, bra, bandeau',
 460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
 461: 'breastplate, aegis, egis',
 462: 'broom',
 463: 'bucket, pail',
 464: 'buckle',
 465: 'bulletproof vest',
 466: 'bullet train, bullet',
 467: 'butcher shop, meat market',
 468: 'cab, hack, taxi, taxicab',
 469: 'caldron, cauldron',
 470: 'candle, taper, wax light',
 471: 'cannon',
 472: 'canoe',
 473: 'can opener, tin opener',
 474: 'cardigan',
 475: 'car mirror',
 476: 'carousel, carrousel, merry-go-round, roundabout, whirligig',
 477: "carpenter's kit, tool kit",
 478: 'carton',
 479: 'car wheel',
 480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
 481: 'cassette',
 482: 'cassette player',
 483: 'castle',
 484: 'catamaran',
 485: 'CD player',
 486: 'cello, violoncello',
 487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone',
 488: 'chain',
 489: 'chainlink fence',
 490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
 491: 'chain saw, chainsaw',
 492: 'chest',
 493: 'chiffonier, commode',
 494: 'chime, bell, gong',
 495: 'china cabinet, china closet',
 496: 'Christmas stocking',
 497: 'church, church building',
 498: 'cinema, movie theater, movie theatre, movie house, picture palace',
 499: 'cleaver, meat cleaver, chopper',
 500: 'cliff dwelling',
 501: 'cloak',
 502: 'clog, geta, patten, sabot',
 503: 'cocktail shaker',
 504: 'coffee mug',
 505: 'coffeepot',
 506: 'coil, spiral, volute, whorl, helix',
 507: 'combination lock',
 508: 'computer keyboard, keypad',
 509: 'confectionery, confectionary, candy store',
 510: 'container ship, containership, container vessel',
 511: 'convertible',
 512: 'corkscrew, bottle screw',
 513: 'cornet, horn, trumpet, trump',
 514: 'cowboy boot',
 515: 'cowboy hat, ten-gallon hat',
 516: 'cradle',
 517: 'crane',
 518: 'crash helmet',
 519: 'crate',
 520: 'crib, cot',
 521: 'Crock Pot',
 522: 'croquet ball',
 523: 'crutch',
 524: 'cuirass',
 525: 'dam, dike, dyke',
 526: 'desk',
 527: 'desktop computer',
 528: 'dial telephone, dial phone',
 529: 'diaper, nappy, napkin',
 530: 'digital clock',
 531: 'digital watch',
 532: 'dining table, board',
 533: 'dishrag, dishcloth',
 534: 'dishwasher, dish washer, dishwashing machine',
 535: 'disk brake, disc brake',
 536: 'dock, dockage, docking facility',
 537: 'dogsled, dog sled, dog sleigh',
 538: 'dome',
 539: 'doormat, welcome mat',
 540: 'drilling platform, offshore rig',
 541: 'drum, membranophone, tympan',
 542: 'drumstick',
 543: 'dumbbell',
 544: 'Dutch oven',
 545: 'electric fan, blower',
 546: 'electric guitar',
 547: 'electric locomotive',
 548: 'entertainment center',
 549: 'envelope',
 550: 'espresso maker',
 551: 'face powder',
 552: 'feather boa, boa',
 553: 'file, file cabinet, filing cabinet',
 554: 'fireboat',
 555: 'fire engine, fire truck',
 556: 'fire screen, fireguard',
 557: 'flagpole, flagstaff',
 558: 'flute, transverse flute',
 559: 'folding chair',
 560: 'football helmet',
 561: 'forklift',
 562: 'fountain',
 563: 'fountain pen',
 564: 'four-poster',
 565: 'freight car',
 566: 'French horn, horn',
 567: 'frying pan, frypan, skillet',
 568: 'fur coat',
 569: 'garbage truck, dustcart',
 570: 'gasmask, respirator, gas helmet',
 571: 'gas pump, gasoline pump, petrol pump, island dispenser',
 572: 'goblet',
 573: 'go-kart',
 574: 'golf ball',
 575: 'golfcart, golf cart',
 576: 'gondola',
 577: 'gong, tam-tam',
 578: 'gown',
 579: 'grand piano, grand',
 580: 'greenhouse, nursery, glasshouse',
 581: 'grille, radiator grille',
 582: 'grocery store, grocery, food market, market',
 583: 'guillotine',
 584: 'hair slide',
 585: 'hair spray',
 586: 'half track',
 587: 'hammer',
 588: 'hamper',
 589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier',
 590: 'hand-held computer, hand-held microcomputer',
 591: 'handkerchief, hankie, hanky, hankey',
 592: 'hard disc, hard disk, fixed disk',
 593: 'harmonica, mouth organ, harp, mouth harp',
 594: 'harp',
 595: 'harvester, reaper',
 596: 'hatchet',
 597: 'holster',
 598: 'home theater, home theatre',
 599: 'honeycomb',
 600: 'hook, claw',
 601: 'hoopskirt, crinoline',
 602: 'horizontal bar, high bar',
 603: 'horse cart, horse-cart',
 604: 'hourglass',
 605: 'iPod',
 606: 'iron, smoothing iron',
 607: "jack-o'-lantern",
 608: 'jean, blue jean, denim',
 609: 'jeep, landrover',
 610: 'jersey, T-shirt, tee shirt',
 611: 'jigsaw puzzle',
 612: 'jinrikisha, ricksha, rickshaw',
 613: 'joystick',
 614: 'kimono',
 615: 'knee pad',
 616: 'knot',
 617: 'lab coat, laboratory coat',
 618: 'ladle',
 619: 'lampshade, lamp shade',
 620: 'laptop, laptop computer',
 621: 'lawn mower, mower',
 622: 'lens cap, lens cover',
 623: 'letter opener, paper knife, paperknife',
 624: 'library',
 625: 'lifeboat',
 626: 'lighter, light, igniter, ignitor',
 627: 'limousine, limo',
 628: 'liner, ocean liner',
 629: 'lipstick, lip rouge',
 630: 'Loafer',
 631: 'lotion',
 632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
 633: "loupe, jeweler's loupe",
 634: 'lumbermill, sawmill',
 635: 'magnetic compass',
 636: 'mailbag, postbag',
 637: 'mailbox, letter box',
 638: 'maillot',
 639: 'maillot, tank suit',
 640: 'manhole cover',
 641: 'maraca',
 642: 'marimba, xylophone',
 643: 'mask',
 644: 'matchstick',
 645: 'maypole',
 646: 'maze, labyrinth',
 647: 'measuring cup',
 648: 'medicine chest, medicine cabinet',
 649: 'megalith, megalithic structure',
 650: 'microphone, mike',
 651: 'microwave, microwave oven',
 652: 'military uniform',
 653: 'milk can',
 654: 'minibus',
 655: 'miniskirt, mini',
 656: 'minivan',
 657: 'missile',
 658: 'mitten',
 659: 'mixing bowl',
 660: 'mobile home, manufactured home',
 661: 'Model T',
 662: 'modem',
 663: 'monastery',
 664: 'monitor',
 665: 'moped',
 666: 'mortar',
 667: 'mortarboard',
 668: 'mosque',
 669: 'mosquito net',
 670: 'motor scooter, scooter',
 671: 'mountain bike, all-terrain bike, off-roader',
 672: 'mountain tent',
 673: 'mouse, computer mouse',
 674: 'mousetrap',
 675: 'moving van',
 676: 'muzzle',
 677: 'nail',
 678: 'neck brace',
 679: 'necklace',
 680: 'nipple',
 681: 'notebook, notebook computer',
 682: 'obelisk',
 683: 'oboe, hautboy, hautbois',
 684: 'ocarina, sweet potato',
 685: 'odometer, hodometer, mileometer, milometer',
 686: 'oil filter',
 687: 'organ, pipe organ',
 688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO',
 689: 'overskirt',
 690: 'oxcart',
 691: 'oxygen mask',
 692: 'packet',
 693: 'paddle, boat paddle',
 694: 'paddlewheel, paddle wheel',
 695: 'padlock',
 696: 'paintbrush',
 697: "pajama, pyjama, pj's, jammies",
 698: 'palace',
 699: 'panpipe, pandean pipe, syrinx',
 700: 'paper towel',
 701: 'parachute, chute',
 702: 'parallel bars, bars',
 703: 'park bench',
 704: 'parking meter',
 705: 'passenger car, coach, carriage',
 706: 'patio, terrace',
 707: 'pay-phone, pay-station',
 708: 'pedestal, plinth, footstall',
 709: 'pencil box, pencil case',
 710: 'pencil sharpener',
 711: 'perfume, essence',
 712: 'Petri dish',
 713: 'photocopier',
 714: 'pick, plectrum, plectron',
 715: 'pickelhaube',
 716: 'picket fence, paling',
 717: 'pickup, pickup truck',
 718: 'pier',
 719: 'piggy bank, penny bank',
 720: 'pill bottle',
 721: 'pillow',
 722: 'ping-pong ball',
 723: 'pinwheel',
 724: 'pirate, pirate ship',
 725: 'pitcher, ewer',
 726: "plane, carpenter's plane, woodworking plane",
 727: 'planetarium',
 728: 'plastic bag',
 729: 'plate rack',
 730: 'plow, plough',
 731: "plunger, plumber's helper",
 732: 'Polaroid camera, Polaroid Land camera',
 733: 'pole',
 734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
 735: 'poncho',
 736: 'pool table, billiard table, snooker table',
 737: 'pop bottle, soda bottle',
 738: 'pot, flowerpot',
 739: "potter's wheel",
 740: 'power drill',
 741: 'prayer rug, prayer mat',
 742: 'printer',
 743: 'prison, prison house',
 744: 'projectile, missile',
 745: 'projector',
 746: 'puck, hockey puck',
 747: 'punching bag, punch bag, punching ball, punchball',
 748: 'purse',
 749: 'quill, quill pen',
 750: 'quilt, comforter, comfort, puff',
 751: 'racer, race car, racing car',
 752: 'racket, racquet',
 753: 'radiator',
 754: 'radio, wireless',
 755: 'radio telescope, radio reflector',
 756: 'rain barrel',
 757: 'recreational vehicle, RV, R.V.',
 758: 'reel',
 759: 'reflex camera',
 760: 'refrigerator, icebox',
 761: 'remote control, remote',
 762: 'restaurant, eating house, eating place, eatery',
 763: 'revolver, six-gun, six-shooter',
 764: 'rifle',
 765: 'rocking chair, rocker',
 766: 'rotisserie',
 767: 'rubber eraser, rubber, pencil eraser',
 768: 'rugby ball',
 769: 'rule, ruler',
 770: 'running shoe',
 771: 'safe',
 772: 'safety pin',
 773: 'saltshaker, salt shaker',
 774: 'sandal',
 775: 'sarong',
 776: 'sax, saxophone',
 777: 'scabbard',
 778: 'scale, weighing machine',
 779: 'school bus',
 780: 'schooner',
 781: 'scoreboard',
 782: 'screen, CRT screen',
 783: 'screw',
 784: 'screwdriver',
 785: 'seat belt, seatbelt',
 786: 'sewing machine',
 787: 'shield, buckler',
 788: 'shoe shop, shoe-shop, shoe store',
 789: 'shoji',
 790: 'shopping basket',
 791: 'shopping cart',
 792: 'shovel',
 793: 'shower cap',
 794: 'shower curtain',
 795: 'ski',
 796: 'ski mask',
 797: 'sleeping bag',
 798: 'slide rule, slipstick',
 799: 'sliding door',
 800: 'slot, one-armed bandit',
 801: 'snorkel',
 802: 'snowmobile',
 803: 'snowplow, snowplough',
 804: 'soap dispenser',
 805: 'soccer ball',
 806: 'sock',
 807: 'solar dish, solar collector, solar furnace',
 808: 'sombrero',
 809: 'soup bowl',
 810: 'space bar',
 811: 'space heater',
 812: 'space shuttle',
 813: 'spatula',
 814: 'speedboat',
 815: "spider web, spider's web",
 816: 'spindle',
 817: 'sports car, sport car',
 818: 'spotlight, spot',
 819: 'stage',
 820: 'steam locomotive',
 821: 'steel arch bridge',
 822: 'steel drum',
 823: 'stethoscope',
 824: 'stole',
 825: 'stone wall',
 826: 'stopwatch, stop watch',
 827: 'stove',
 828: 'strainer',
 829: 'streetcar, tram, tramcar, trolley, trolley car',
 830: 'stretcher',
 831: 'studio couch, day bed',
 832: 'stupa, tope',
 833: 'submarine, pigboat, sub, U-boat',
 834: 'suit, suit of clothes',
 835: 'sundial',
 836: 'sunglass',
 837: 'sunglasses, dark glasses, shades',
 838: 'sunscreen, sunblock, sun blocker',
 839: 'suspension bridge',
 840: 'swab, swob, mop',
 841: 'sweatshirt',
 842: 'swimming trunks, bathing trunks',
 843: 'swing',
 844: 'switch, electric switch, electrical switch',
 845: 'syringe',
 846: 'table lamp',
 847: 'tank, army tank, armored combat vehicle, armoured combat vehicle',
 848: 'tape player',
 849: 'teapot',
 850: 'teddy, teddy bear',
 851: 'television, television system',
 852: 'tennis ball',
 853: 'thatch, thatched roof',
 854: 'theater curtain, theatre curtain',
 855: 'thimble',
 856: 'thresher, thrasher, threshing machine',
 857: 'throne',
 858: 'tile roof',
 859: 'toaster',
 860: 'tobacco shop, tobacconist shop, tobacconist',
 861: 'toilet seat',
 862: 'torch',
 863: 'totem pole',
 864: 'tow truck, tow car, wrecker',
 865: 'toyshop',
 866: 'tractor',
 867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
 868: 'tray',
 869: 'trench coat',
 870: 'tricycle, trike, velocipede',
 871: 'trimaran',
 872: 'tripod',
 873: 'triumphal arch',
 874: 'trolleybus, trolley coach, trackless trolley',
 875: 'trombone',
 876: 'tub, vat',
 877: 'turnstile',
 878: 'typewriter keyboard',
 879: 'umbrella',
 880: 'unicycle, monocycle',
 881: 'upright, upright piano',
 882: 'vacuum, vacuum cleaner',
 883: 'vase',
 884: 'vault',
 885: 'velvet',
 886: 'vending machine',
 887: 'vestment',
 888: 'viaduct',
 889: 'violin, fiddle',
 890: 'volleyball',
 891: 'waffle iron',
 892: 'wall clock',
 893: 'wallet, billfold, notecase, pocketbook',
 894: 'wardrobe, closet, press',
 895: 'warplane, military plane',
 896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
 897: 'washer, automatic washer, washing machine',
 898: 'water bottle',
 899: 'water jug',
 900: 'water tower',
 901: 'whiskey jug',
 902: 'whistle',
 903: 'wig',
 904: 'window screen',
 905: 'window shade',
 906: 'Windsor tie',
 907: 'wine bottle',
 908: 'wing',
 909: 'wok',
 910: 'wooden spoon',
 911: 'wool, woolen, woollen',
 912: 'worm fence, snake fence, snake-rail fence, Virginia fence',
 913: 'wreck',
 914: 'yawl',
 915: 'yurt',
 916: 'web site, website, internet site, site',
 917: 'comic book',
 918: 'crossword puzzle, crossword',
 919: 'street sign',
 920: 'traffic light, traffic signal, stoplight',
 921: 'book jacket, dust cover, dust jacket, dust wrapper',
 922: 'menu',
 923: 'plate',
 924: 'guacamole',
 925: 'consomme',
 926: 'hot pot, hotpot',
 927: 'trifle',
 928: 'ice cream, icecream',
 929: 'ice lolly, lolly, lollipop, popsicle',
 930: 'French loaf',
 931: 'bagel, beigel',
 932: 'pretzel',
 933: 'cheeseburger',
 934: 'hotdog, hot dog, red hot',
 935: 'mashed potato',
 936: 'head cabbage',
 937: 'broccoli',
 938: 'cauliflower',
 939: 'zucchini, courgette',
 940: 'spaghetti squash',
 941: 'acorn squash',
 942: 'butternut squash',
 943: 'cucumber, cuke',
 944: 'artichoke, globe artichoke',
 945: 'bell pepper',
 946: 'cardoon',
 947: 'mushroom',
 948: 'Granny Smith',
 949: 'strawberry',
 950: 'orange',
 951: 'lemon',
 952: 'fig',
 953: 'pineapple, ananas',
 954: 'banana',
 955: 'jackfruit, jak, jack',
 956: 'custard apple',
 957: 'pomegranate',
 958: 'hay',
 959: 'carbonara',
 960: 'chocolate sauce, chocolate syrup',
 961: 'dough',
 962: 'meat loaf, meatloaf',
 963: 'pizza, pizza pie',
 964: 'potpie',
 965: 'burrito',
 966: 'red wine',
 967: 'espresso',
 968: 'cup',
 969: 'eggnog',
 970: 'alp',
 971: 'bubble',
 972: 'cliff, drop, drop-off',
 973: 'coral reef',
 974: 'geyser',
 975: 'lakeside, lakeshore',
 976: 'promontory, headland, head, foreland',
 977: 'sandbar, sand bar',
 978: 'seashore, coast, seacoast, sea-coast',
 979: 'valley, vale',
 980: 'volcano',
 981: 'ballplayer, baseball player',
 982: 'groom, bridegroom',
 983: 'scuba diver',
 984: 'rapeseed',
 985: 'daisy',
 986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
 987: 'corn',
 988: 'acorn',
 989: 'hip, rose hip, rosehip',
 990: 'buckeye, horse chestnut, conker',
 991: 'coral fungus',
 992: 'agaric',
 993: 'gyromitra',
 994: 'stinkhorn, carrion fungus',
 995: 'earthstar',
 996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
 997: 'bolete',
 998: 'ear, spike, capitulum',
 999: 'toilet tissue, toilet paper, bathroom tissue'

================================================
FILE: data/index_synset.yaml
================================================
0: n01440764
1: n01443537
2: n01484850
3: n01491361
4: n01494475
5: n01496331
6: n01498041
7: n01514668
8: n07646067
9: n01518878
10: n01530575
11: n01531178
12: n01532829
13: n01534433
14: n01537544
15: n01558993
16: n01560419
17: n01580077
18: n01582220
19: n01592084
20: n01601694
21: n13382471
22: n01614925
23: n01616318
24: n01622779
25: n01629819
26: n01630670
27: n01631663
28: n01632458
29: n01632777
30: n01641577
31: n01644373
32: n01644900
33: n01664065
34: n01665541
35: n01667114
36: n01667778
37: n01669191
38: n01675722
39: n01677366
40: n01682714
41: n01685808
42: n01687978
43: n01688243
44: n01689811
45: n01692333
46: n01693334
47: n01694178
48: n01695060
49: n01697457
50: n01698640
51: n01704323
52: n01728572
53: n01728920
54: n01729322
55: n01729977
56: n01734418
57: n01735189
58: n01737021
59: n01739381
60: n01740131
61: n01742172
62: n01744401
63: n01748264
64: n01749939
65: n01751748
66: n01753488
67: n01755581
68: n01756291
69: n01768244
70: n01770081
71: n01770393
72: n01773157
73: n01773549
74: n01773797
75: n01774384
76: n01774750
77: n01775062
78: n04432308
79: n01784675
80: n01795545
81: n01796340
82: n01797886
83: n01798484
84: n01806143
85: n07647321
86: n07647496
87: n01817953
88: n01818515
89: n01819313
90: n01820546
91: n01824575
92: n01828970
93: n01829413
94: n01833805
95: n01843065
96: n01843383
97: n01847000
98: n01855032
99: n07646821
100: n01860187
101: n01871265
102: n01872772
103: n01873310
104: n01877812
105: n01882714
106: n01883070
107: n01910747
108: n01914609
109: n01917289
110: n01924916
111: n01930112
112: n01943899
113: n01944390
114: n13719102
115: n01950731
116: n01955084
117: n01968897
118: n01978287
119: n01978455
120: n01980166
121: n01981276
122: n01983481
123: n01984695
124: n01985128
125: n01986214
126: n01990800
127: n02002556
128: n02002724
129: n02006656
130: n02007558
131: n02009229
132: n02009912
133: n02011460
134: n03126707
135: n02013706
136: n02017213
137: n02018207
138: n02018795
139: n02025239
140: n02027492
141: n02028035
142: n02033041
143: n02037110
144: n02051845
145: n02056570
146: n02058221
147: n02066245
148: n02071294
149: n02074367
150: n02077923
151: n08742578
152: n02085782
153: n02085936
154: n02086079
155: n02086240
156: n02086646
157: n02086910
158: n02087046
159: n02087394
160: n02088094
161: n02088238
162: n02088364
163: n02088466
164: n02088632
165: n02089078
166: n02089867
167: n02089973
168: n02090379
169: n02090622
170: n02090721
171: n02091032
172: n02091134
173: n02091244
174: n02091467
175: n02091635
176: n02091831
177: n02092002
178: n02092339
179: n02093256
180: n02093428
181: n02093647
182: n02093754
183: n02093859
184: n02093991
185: n02094114
186: n02094258
187: n02094433
188: n02095314
189: n02095570
190: n02095889
191: n02096051
192: n02096177
193: n02096294
194: n02096437
195: n02096585
196: n02097047
197: n02097130
198: n02097209
199: n02097298
200: n02097474
201: n02097658
202: n02098105
203: n02098286
204: n02098413
205: n02099267
206: n02099429
207: n02099601
208: n02099712
209: n02099849
210: n02100236
211: n02100583
212: n02100735
213: n02100877
214: n02101006
215: n02101388
216: n02101556
217: n02102040
218: n02102177
219: n02102318
220: n02102480
221: n02102973
222: n02104029
223: n02104365
224: n02105056
225: n02105162
226: n02105251
227: n02105412
228: n02105505
229: n02105641
230: n02105855
231: n02106030
232: n02106166
233: n02106382
234: n02106550
235: n02106662
236: n02107142
237: n02107312
238: n02107574
239: n02107683
240: n02107908
241: n02108000
242: n02108089
243: n02108422
244: n02108551
245: n02108915
246: n02109047
247: n02109525
248: n02109961
249: n02110063
250: n02110185
251: n02110341
252: n02110627
253: n02110806
254: n02110958
255: n02111129
256: n02111277
257: n02111500
258: n02111889
259: n02112018
260: n02112137
261: n02112350
262: n02112706
263: n02113023
264: n02113186
265: n02113624
266: n02113712
267: n02113799
268: n02113978
269: n02114367
270: n02114548
271: n02114712
272: n02114855
273: n02115641
274: n02115913
275: n02116738
276: n02117135
277: n02119022
278: n02119789
279: n02120079
280: n02120505
281: n02123045
282: n02123159
283: n02123394
284: n02123597
285: n02124075
286: n02125311
287: n02127052
288: n02128385
289: n02128757
290: n02128925
291: n02129165
292: n02129604
293: n02130308
294: n02132136
295: n02133161
296: n02134084
297: n02134418
298: n02137549
299: n02138441
300: n02165105
301: n02165456
302: n02167151
303: n02168699
304: n02169497
305: n02172182
306: n02174001
307: n02177972
308: n03373237
309: n07975909
310: n02219486
311: n02226429
312: n02229544
313: n02231487
314: n02233338
315: n02236044
316: n02256656
317: n02259212
318: n02264363
319: n02268443
320: n02268853
321: n02276258
322: n02277742
323: n02279972
324: n02280649
325: n02281406
326: n02281787
327: n02317335
328: n02319095
329: n02321529
330: n02325366
331: n02326432
332: n02328150
333: n02342885
334: n02346627
335: n02356798
336: n02361337
337: n05262120
338: n02364673
339: n02389026
340: n02391049
341: n02395406
342: n02396427
343: n02397096
344: n02398521
345: n02403003
346: n02408429
347: n02410509
348: n02412080
349: n02415577
350: n02417914
351: n02422106
352: n02422699
353: n02423022
354: n02437312
355: n02437616
356: n10771990
357: n14765497
358: n02443114
359: n02443484
360: n14765785
361: n02445715
362: n02447366
363: n02454379
364: n02457408
365: n02480495
366: n02480855
367: n02481823
368: n02483362
369: n02483708
370: n02484975
371: n02486261
372: n02486410
373: n02487347
374: n02488291
375: n02488702
376: n02489166
377: n02490219
378: n02492035
379: n02492660
380: n02493509
381: n02493793
382: n02494079
383: n02497673
384: n02500267
385: n02504013
386: n02504458
387: n02509815
388: n02510455
389: n02514041
390: n07783967
391: n02536864
392: n02606052
393: n02607072
394: n02640242
395: n02641379
396: n02643566
397: n02655020
398: n02666347
399: n02667093
400: n02669723
401: n02672831
402: n02676566
403: n02687172
404: n02690373
405: n02692877
406: n02699494
407: n02701002
408: n02704792
409: n02708093
410: n02727426
411: n08496334
412: n02747177
413: n02749479
414: n02769748
415: n02776631
416: n02777292
417: n02782329
418: n02783161
419: n02786058
420: n02787622
421: n02788148
422: n02790996
423: n02791124
424: n02791270
425: n02793495
426: n02794156
427: n02795169
428: n02797295
429: n02799071
430: n02802426
431: n02804515
432: n02804610
433: n02807133
434: n02808304
435: n02808440
436: n02814533
437: n02814860
438: n02815834
439: n02817516
440: n02823428
441: n02823750
442: n02825657
443: n02834397
444: n02835271
445: n02837789
446: n02840245
447: n02841315
448: n02843684
449: n02859443
450: n02860847
451: n02865351
452: n02869837
453: n02870880
454: n02871525
455: n02877765
456: n02880308
457: n02883205
458: n02892201
459: n02892767
460: n02894605
461: n02895154
462: n12520864
463: n02909870
464: n02910353
465: n02916936
466: n02917067
467: n02927161
468: n02930766
469: n02939185
470: n02948072
471: n02950826
472: n02951358
473: n02951585
474: n02963159
475: n02965783
476: n02966193
477: n02966687
478: n02971356
479: n02974003
480: n02977058
481: n02978881
482: n02979186
483: n02980441
484: n02981792
485: n02988304
486: n02992211
487: n02992529
488: n13652994
489: n03000134
490: n03000247
491: n03000684
492: n03014705
493: n03016953
494: n03017168
495: n03018349
496: n03026506
497: n03028079
498: n03032252
499: n03041632
500: n03042490
501: n03045698
502: n03047690
503: n03062245
504: n03063599
505: n03063689
506: n03065424
507: n03075370
508: n03085013
509: n03089624
510: n03095699
511: n03100240
512: n03109150
513: n03110669
514: n03124043
515: n03124170
516: n15142452
517: n03126707
518: n03127747
519: n03127925
520: n03131574
521: n03133878
522: n03134739
523: n03141823
524: n03146219
525: n03160309
526: n03179701
527: n03180011
528: n03187595
529: n03188531
530: n03196217
531: n03197337
532: n03201208
533: n03207743
534: n03207941
535: n03208938
536: n03216828
537: n03218198
538: n13872072
539: n03223299
540: n03240683
541: n03249569
542: n07647870
543: n03255030
544: n03259401
545: n03271574
546: n03272010
547: n03272562
548: n03290653
549: n13869788
550: n03297495
551: n03314780
552: n03325584
553: n03337140
554: n03344393
555: n03345487
556: n03347037
557: n03355925
558: n03372029
559: n03376595
560: n03379051
561: n03384352
562: n03388043
563: n03388183
564: n03388549
565: n03393912
566: n03394916
567: n03400231
568: n03404251
569: n03417042
570: n03424325
571: n03425413
572: n03443371
573: n03444034
574: n03445777
575: n03445924
576: n03447447
577: n03447721
578: n08286342
579: n03452741
580: n03457902
581: n03459775
582: n03461385
583: n03467068
584: n03476684
585: n03476991
586: n03478589
587: n03482001
588: n03482405
589: n03483316
590: n03485407
591: n03485794
592: n03492542
593: n03494278
594: n03495570
595: n10161363
596: n03498962
597: n03527565
598: n03529860
599: n09218315
600: n03532672
601: n03534580
602: n03535780
603: n03538406
604: n03544143
605: n03584254
606: n03584829
607: n03590841
608: n03594734
609: n03594945
610: n03595614
611: n03598930
612: n03599486
613: n03602883
614: n03617480
615: n03623198
616: n15102712
617: n03630383
618: n03633091
619: n03637318
620: n03642806
621: n03649909
622: n03657121
623: n03658185
624: n07977870
625: n03662601
626: n03666591
627: n03670208
628: n03673027
629: n03676483
630: n03680355
631: n03690938
632: n03691459
633: n03692522
634: n03697007
635: n03706229
636: n03709823
637: n03710193
638: n03710637
639: n03710721
640: n03717622
641: n03720891
642: n03721384
643: n03725035
644: n03729826
645: n03733131
646: n03733281
647: n03733805
648: n03742115
649: n03743016
650: n03759954
651: n03761084
652: n03763968
653: n03764736
654: n03769881
655: n03770439
656: n03770679
657: n03773504
658: n03775071
659: n03775546
660: n03776460
661: n03777568
662: n03777754
663: n03781244
664: n03782006
665: n03785016
666: n14955889
667: n03787032
668: n03788195
669: n03788365
670: n03791053
671: n03792782
672: n03792972
673: n03793489
674: n03794056
675: n03796401
676: n03803284
677: n13652335
678: n03814639
679: n03814906
680: n03825788
681: n03832673
682: n03837869
683: n03838899
684: n03840681
685: n03841143
686: n03843555
687: n03854065
688: n03857828
689: n03866082
690: n03868242
691: n03868863
692: n07281099
693: n03873416
694: n03874293
695: n03874599
696: n03876231
697: n03877472
698: n08053121
699: n03884397
700: n03887697
701: n03888257
702: n03888605
703: n03891251
704: n03891332
705: n03895866
706: n03899768
707: n03902125
708: n03903868
709: n03908618
710: n03908714
711: n03916031
712: n03920288
713: n03924679
714: n03929660
715: n03929855
716: n03930313
717: n03930630
718: n03934042
719: n03935335
720: n03937543
721: n03938244
722: n03942813
723: n03944341
724: n03947888
725: n03950228
726: n03954731
727: n03956157
728: n03958227
729: n03961711
730: n03967562
731: n03970156
732: n03976467
733: n08620881
734: n03977966
735: n03980874
736: n03982430
737: n03983396
738: n03991062
739: n03992509
740: n03995372
741: n03998194
742: n04004767
743: n13937284
744: n04008634
745: n04009801
746: n04019541
747: n04023962
748: n13413294
749: n04033901
750: n04033995
751: n04037443
752: n04039381
753: n09403211
754: n04041544
755: n04044716
756: n04049303
757: n04065272
758: n07056680
759: n04069434
760: n04070727
761: n04074963
762: n04081281
763: n04086273
764: n04090263
765: n04099969
766: n04111531
767: n04116512
768: n04118538
769: n04118776
770: n04120489
771: n04125116
772: n04127249
773: n04131690
774: n04133789
775: n04136333
776: n04141076
777: n04141327
778: n04141975
779: n04146614
780: n04147291
781: n04149813
782: n04152593
783: n04154340
784: n07917272
785: n04162706
786: n04179913
787: n04192698
788: n04200800
789: n04201297
790: n04204238
791: n04204347
792: n04208427
793: n04209133
794: n04209239
795: n04228054
796: n04229816
797: n04235860
798: n04238763
799: n04239074
800: n04243546
801: n04251144
802: n04252077
803: n04252225
804: n04254120
805: n04254680
806: n04254777
807: n04258138
808: n04259630
809: n04263257
810: n04264628
811: n04265275
812: n04266014
813: n04270147
814: n04273569
815: n04275363
816: n05605498
817: n04285008
818: n04286575
819: n08646566
820: n04310018
821: n04311004
822: n04311174
823: n04317175
824: n04325704
825: n04326547
826: n04328186
827: n04330267
828: n04332243
829: n04335435
830: n04337157
831: n04344873
832: n04346328
833: n04347754
834: n04350905
835: n04355338
836: n04355933
837: n04356056
838: n04357314
839: n04366367
840: n04367480
841: n04370456
842: n04371430
843: n14009946
844: n04372370
845: n04376876
846: n04380533
847: n04389033
848: n04392985
849: n04398044
850: n04399382
851: n04404412
852: n04409515
853: n04417672
854: n04418357
855: n04423845
856: n04428191
857: n04429376
858: n04435653
859: n04442312
860: n04443257
861: n04447861
862: n04456115
863: n04458633
864: n04461696
865: n04462240
866: n04465666
867: n04467665
868: n04476259
869: n04479046
870: n04482393
871: n04483307
872: n04485082
873: n04486054
874: n04487081
875: n04487394
876: n04493381
877: n04501370
878: n04505470
879: n04507155
880: n04509417
881: n04515003
882: n04517823
883: n04522168
884: n04523525
885: n04525038
886: n04525305
887: n04532106
888: n04532670
889: n04536866
890: n04540053
891: n04542943
892: n04548280
893: n04548362
894: n04550184
895: n04552348
896: n04553703
897: n04554684
898: n04557648
899: n04560804
900: n04562935
901: n04579145
902: n04579667
903: n04584207
904: n04589890
905: n04590129
906: n04591157
907: n04591713
908: n10782135
909: n04596742
910: n04598010
911: n04599235
912: n04604644
913: n14423870
914: n04612504
915: n04613696
916: n06359193
917: n06596364
918: n06785654
919: n06794110
920: n06874185
921: n07248320
922: n07565083
923: n07657664
924: n07583066
925: n07584110
926: n07590611
927: n07613480
928: n07614500
929: n07615774
930: n07684084
931: n07693725
932: n07695742
933: n07697313
934: n07697537
935: n07711569
936: n07714571
937: n07714990
938: n07715103
939: n12159804
940: n12160303
941: n12160857
942: n07717556
943: n07718472
944: n07718747
945: n07720875
946: n07730033
947: n13001041
948: n07742313
949: n12630144
950: n14991210
951: n07749582
952: n07753113
953: n07753275
954: n07753592
955: n07754684
956: n07760859
957: n07768694
958: n07802026
959: n07831146
960: n07836838
961: n07860988
962: n07871810
963: n07873807
964: n07875152
965: n07880968
966: n07892512
967: n07920052
968: n13904665
969: n07932039
970: n09193705
971: n09229709
972: n09246464
973: n09256479
974: n09288635
975: n09332890
976: n09399592
977: n09421951
978: n09428293
979: n09468604
980: n09472597
981: n09835506
982: n10148035
983: n10565667
984: n11879895
985: n11939491
986: n12057211
987: n12144580
988: n12267677
989: n12620546
990: n12768682
991: n12985857
992: n12998815
993: n13037406
994: n13040303
995: n13044778
996: n13052670
997: n13054560
998: n13133613
999: n15075141


================================================
FILE: environment.yaml
================================================
name: ldm
channels:
  - pytorch
  - defaults
dependencies:
  - python=3.8.5
  - pip=20.3
  - cudatoolkit=11.3
  - pytorch=1.11.0
  - torchvision=0.12.0
  - numpy=1.19.2
  - pip:
    - albumentations==0.4.3
    - diffusers
    - opencv-python==4.1.2.30
    - pudb==2019.2
    - invisible-watermark
    - imageio==2.9.0
    - imageio-ffmpeg==0.4.2
    - pytorch-lightning==1.4.2
    - omegaconf==2.1.1
    - test-tube>=0.7.5
    - streamlit>=0.73.1
    - einops==0.3.0
    - torch-fidelity==0.3.0
    - transformers==4.19.2
    - torchmetrics==0.6.0
    - kornia==0.6
    - -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
    - -e git+https://github.com/openai/CLIP.git@main#egg=clip
    - -e .


================================================
FILE: ldm/data/__init__.py
================================================


================================================
FILE: ldm/data/base.py
================================================
from abc import abstractmethod
from torch.utils.data import Dataset, ConcatDataset, ChainDataset, IterableDataset


class Txt2ImgIterableBaseDataset(IterableDataset):
    '''
    Define an interface to make the IterableDatasets for text2img data chainable
    '''
    def __init__(self, num_records=0, valid_ids=None, size=256):
        super().__init__()
        self.num_records = num_records
        self.valid_ids = valid_ids
        self.sample_ids = valid_ids
        self.size = size

        print(f'{self.__class__.__name__} dataset contains {self.__len__()} examples.')

    def __len__(self):
        return self.num_records

    @abstractmethod
    def __iter__(self):
        pass

================================================
FILE: ldm/data/imagenet.py
================================================
import os, yaml, pickle, shutil, tarfile, glob
import cv2
import albumentations
import PIL
import numpy as np
import torchvision.transforms.functional as TF
from omegaconf import OmegaConf
from functools import partial
from PIL import Image
from tqdm import tqdm
from torch.utils.data import Dataset, Subset

import taming.data.utils as tdu
from taming.data.imagenet import str_to_indices, give_synsets_from_indices, download, retrieve
from taming.data.imagenet import ImagePaths

from ldm.modules.image_degradation import degradation_fn_bsr, degradation_fn_bsr_light


def synset2idx(path_to_yaml="data/index_synset.yaml"):
    with open(path_to_yaml) as f:
        di2s = yaml.load(f)
    return dict((v,k) for k,v in di2s.items())


class ImageNetBase(Dataset):
    def __init__(self, config=None):
        self.config = config or OmegaConf.create()
        if not type(self.config)==dict:
            self.config = OmegaConf.to_container(self.config)
        self.keep_orig_class_label = self.config.get("keep_orig_class_label", False)
        self.process_images = True  # if False we skip loading & processing images and self.data contains filepaths
        self._prepare()
        self._prepare_synset_to_human()
        self._prepare_idx_to_synset()
        self._prepare_human_to_integer_label()
        self._load()

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):
        return self.data[i]

    def _prepare(self):
        raise NotImplementedError()

    def _filter_relpaths(self, relpaths):
        ignore = set([
            "n06596364_9591.JPEG",
        ])
        relpaths = [rpath for rpath in relpaths if not rpath.split("/")[-1] in ignore]
        if "sub_indices" in self.config:
            indices = str_to_indices(self.config["sub_indices"])
            synsets = give_synsets_from_indices(indices, path_to_yaml=self.idx2syn)  # returns a list of strings
            self.synset2idx = synset2idx(path_to_yaml=self.idx2syn)
            files = []
            for rpath in relpaths:
                syn = rpath.split("/")[0]
                if syn in synsets:
                    files.append(rpath)
            return files
        else:
            return relpaths

    def _prepare_synset_to_human(self):
        SIZE = 2655750
        URL = "https://heibox.uni-heidelberg.de/f/9f28e956cd304264bb82/?dl=1"
        self.human_dict = os.path.join(self.root, "synset_human.txt")
        if (not os.path.exists(self.human_dict) or
                not os.path.getsize(self.human_dict)==SIZE):
            download(URL, self.human_dict)

    def _prepare_idx_to_synset(self):
        URL = "https://heibox.uni-heidelberg.de/f/d835d5b6ceda4d3aa910/?dl=1"
        self.idx2syn = os.path.join(self.root, "index_synset.yaml")
        if (not os.path.exists(self.idx2syn)):
            download(URL, self.idx2syn)

    def _prepare_human_to_integer_label(self):
        URL = "https://heibox.uni-heidelberg.de/f/2362b797d5be43b883f6/?dl=1"
        self.human2integer = os.path.join(self.root, "imagenet1000_clsidx_to_labels.txt")
        if (not os.path.exists(self.human2integer)):
            download(URL, self.human2integer)
        with open(self.human2integer, "r") as f:
            lines = f.read().splitlines()
            assert len(lines) == 1000
            self.human2integer_dict = dict()
            for line in lines:
                value, key = line.split(":")
                self.human2integer_dict[key] = int(value)

    def _load(self):
        with open(self.txt_filelist, "r") as f:
            self.relpaths = f.read().splitlines()
            l1 = len(self.relpaths)
            self.relpaths = self._filter_relpaths(self.relpaths)
            print("Removed {} files from filelist during filtering.".format(l1 - len(self.relpaths)))

        self.synsets = [p.split("/")[0] for p in self.relpaths]
        self.abspaths = [os.path.join(self.datadir, p) for p in self.relpaths]

        unique_synsets = np.unique(self.synsets)
        class_dict = dict((synset, i) for i, synset in enumerate(unique_synsets))
        if not self.keep_orig_class_label:
            self.class_labels = [class_dict[s] for s in self.synsets]
        else:
            self.class_labels = [self.synset2idx[s] for s in self.synsets]

        with open(self.human_dict, "r") as f:
            human_dict = f.read().splitlines()
            human_dict = dict(line.split(maxsplit=1) for line in human_dict)

        self.human_labels = [human_dict[s] for s in self.synsets]

        labels = {
            "relpath": np.array(self.relpaths),
            "synsets": np.array(self.synsets),
            "class_label": np.array(self.class_labels),
            "human_label": np.array(self.human_labels),
        }

        if self.process_images:
            self.size = retrieve(self.config, "size", default=256)
            self.data = ImagePaths(self.abspaths,
                                   labels=labels,
                                   size=self.size,
                                   random_crop=self.random_crop,
                                   )
        else:
            self.data = self.abspaths


class ImageNetTrain(ImageNetBase):
    NAME = "ILSVRC2012_train"
    URL = "http://www.image-net.org/challenges/LSVRC/2012/"
    AT_HASH = "a306397ccf9c2ead27155983c254227c0fd938e2"
    FILES = [
        "ILSVRC2012_img_train.tar",
    ]
    SIZES = [
        147897477120,
    ]

    def __init__(self, process_images=True, data_root=None, **kwargs):
        self.process_images = process_images
        self.data_root = data_root
        super().__init__(**kwargs)

    def _prepare(self):
        if self.data_root:
            self.root = os.path.join(self.data_root, self.NAME)
        else:
            cachedir = os.environ.get("XDG_CACHE_HOME", os.path.expanduser("~/.cache"))
            self.root = os.path.join(cachedir, "autoencoders/data", self.NAME)

        self.datadir = os.path.join(self.root, "data")
        self.txt_filelist = os.path.join(self.root, "filelist.txt")
        self.expected_length = 1281167
        self.random_crop = retrieve(self.config, "ImageNetTrain/random_crop",
                                    default=True)
        if not tdu.is_prepared(self.root):
            # prep
            print("Preparing dataset {} in {}".format(self.NAME, self.root))

            datadir = self.datadir
            if not os.path.exists(datadir):
                path = os.path.join(self.root, self.FILES[0])
                if not os.path.exists(path) or not os.path.getsize(path)==self.SIZES[0]:
                    import academictorrents as at
                    atpath = at.get(self.AT_HASH, datastore=self.root)
                    assert atpath == path

                print("Extracting {} to {}".format(path, datadir))
                os.makedirs(datadir, exist_ok=True)
                with tarfile.open(path, "r:") as tar:
                    tar.extractall(path=datadir)

                print("Extracting sub-tars.")
                subpaths = sorted(glob.glob(os.path.join(datadir, "*.tar")))
                for subpath in tqdm(subpaths):
                    subdir = subpath[:-len(".tar")]
                    os.makedirs(subdir, exist_ok=True)
                    with tarfile.open(subpath, "r:") as tar:
                        tar.extractall(path=subdir)

            filelist = glob.glob(os.path.join(datadir, "**", "*.JPEG"))
            filelist = [os.path.relpath(p, start=datadir) for p in filelist]
            filelist = sorted(filelist)
            filelist = "\n".join(filelist)+"\n"
            with open(self.txt_filelist, "w") as f:
                f.write(filelist)

            tdu.mark_prepared(self.root)


class ImageNetValidation(ImageNetBase):
    NAME = "ILSVRC2012_validation"
    URL = "http://www.image-net.org/challenges/LSVRC/2012/"
    AT_HASH = "5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5"
    VS_URL = "https://heibox.uni-heidelberg.de/f/3e0f6e9c624e45f2bd73/?dl=1"
    FILES = [
        "ILSVRC2012_img_val.tar",
        "validation_synset.txt",
    ]
    SIZES = [
        6744924160,
        1950000,
    ]

    def __init__(self, process_images=True, data_root=None, **kwargs):
        self.data_root = data_root
        self.process_images = process_images
        super().__init__(**kwargs)

    def _prepare(self):
        if self.data_root:
            self.root = os.path.join(self.data_root, self.NAME)
        else:
            cachedir = os.environ.get("XDG_CACHE_HOME", os.path.expanduser("~/.cache"))
            self.root = os.path.join(cachedir, "autoencoders/data", self.NAME)
        self.datadir = os.path.join(self.root, "data")
        self.txt_filelist = os.path.join(self.root, "filelist.txt")
        self.expected_length = 50000
        self.random_crop = retrieve(self.config, "ImageNetValidation/random_crop",
                                    default=False)
        if not tdu.is_prepared(self.root):
            # prep
            print("Preparing dataset {} in {}".format(self.NAME, self.root))

            datadir = self.datadir
            if not os.path.exists(datadir):
                path = os.path.join(self.root, self.FILES[0])
                if not os.path.exists(path) or not os.path.getsize(path)==self.SIZES[0]:
                    import academictorrents as at
                    atpath = at.get(self.AT_HASH, datastore=self.root)
                    assert atpath == path

                print("Extracting {} to {}".format(path, datadir))
                os.makedirs(datadir, exist_ok=True)
                with tarfile.open(path, "r:") as tar:
                    tar.extractall(path=datadir)

                vspath = os.path.join(self.root, self.FILES[1])
                if not os.path.exists(vspath) or not os.path.getsize(vspath)==self.SIZES[1]:
                    download(self.VS_URL, vspath)

                with open(vspath, "r") as f:
                    synset_dict = f.read().splitlines()
                    synset_dict = dict(line.split() for line in synset_dict)

                print("Reorganizing into synset folders")
                synsets = np.unique(list(synset_dict.values()))
                for s in synsets:
                    os.makedirs(os.path.join(datadir, s), exist_ok=True)
                for k, v in synset_dict.items():
                    src = os.path.join(datadir, k)
                    dst = os.path.join(datadir, v)
                    shutil.move(src, dst)

            filelist = glob.glob(os.path.join(datadir, "**", "*.JPEG"))
            filelist = [os.path.relpath(p, start=datadir) for p in filelist]
            filelist = sorted(filelist)
            filelist = "\n".join(filelist)+"\n"
            with open(self.txt_filelist, "w") as f:
                f.write(filelist)

            tdu.mark_prepared(self.root)



class ImageNetSR(Dataset):
    def __init__(self, size=None,
                 degradation=None, downscale_f=4, min_crop_f=0.5, max_crop_f=1.,
                 random_crop=True):
        """
        Imagenet Superresolution Dataloader
        Performs following ops in order:
        1.  crops a crop of size s from image either as random or center crop
        2.  resizes crop to size with cv2.area_interpolation
        3.  degrades resized crop with degradation_fn

        :param size: resizing to size after cropping
        :param degradation: degradation_fn, e.g. cv_bicubic or bsrgan_light
        :param downscale_f: Low Resolution Downsample factor
        :param min_crop_f: determines crop size s,
          where s = c * min_img_side_len with c sampled from interval (min_crop_f, max_crop_f)
        :param max_crop_f: ""
        :param data_root:
        :param random_crop:
        """
        self.base = self.get_base()
        assert size
        assert (size / downscale_f).is_integer()
        self.size = size
        self.LR_size = int(size / downscale_f)
        self.min_crop_f = min_crop_f
        self.max_crop_f = max_crop_f
        assert(max_crop_f <= 1.)
        self.center_crop = not random_crop

        self.image_rescaler = albumentations.SmallestMaxSize(max_size=size, interpolation=cv2.INTER_AREA)

        self.pil_interpolation = False # gets reset later if incase interp_op is from pillow

        if degradation == "bsrgan":
            self.degradation_process = partial(degradation_fn_bsr, sf=downscale_f)

        elif degradation == "bsrgan_light":
            self.degradation_process = partial(degradation_fn_bsr_light, sf=downscale_f)

        else:
            interpolation_fn = {
            "cv_nearest": cv2.INTER_NEAREST,
            "cv_bilinear": cv2.INTER_LINEAR,
            "cv_bicubic": cv2.INTER_CUBIC,
            "cv_area": cv2.INTER_AREA,
            "cv_lanczos": cv2.INTER_LANCZOS4,
            "pil_nearest": PIL.Image.NEAREST,
            "pil_bilinear": PIL.Image.BILINEAR,
            "pil_bicubic": PIL.Image.BICUBIC,
            "pil_box": PIL.Image.BOX,
            "pil_hamming": PIL.Image.HAMMING,
            "pil_lanczos": PIL.Image.LANCZOS,
            }[degradation]

            self.pil_interpolation = degradation.startswith("pil_")

            if self.pil_interpolation:
                self.degradation_process = partial(TF.resize, size=self.LR_size, interpolation=interpolation_fn)

            else:
                self.degradation_process = albumentations.SmallestMaxSize(max_size=self.LR_size,
                                                                          interpolation=interpolation_fn)

    def __len__(self):
        return len(self.base)

    def __getitem__(self, i):
        example = self.base[i]
        image = Image.open(example["file_path_"])

        if not image.mode == "RGB":
            image = image.convert("RGB")

        image = np.array(image).astype(np.uint8)

        min_side_len = min(image.shape[:2])
        crop_side_len = min_side_len * np.random.uniform(self.min_crop_f, self.max_crop_f, size=None)
        crop_side_len = int(crop_side_len)

        if self.center_crop:
            self.cropper = albumentations.CenterCrop(height=crop_side_len, width=crop_side_len)

        else:
            self.cropper = albumentations.RandomCrop(height=crop_side_len, width=crop_side_len)

        image = self.cropper(image=image)["image"]
        image = self.image_rescaler(image=image)["image"]

        if self.pil_interpolation:
            image_pil = PIL.Image.fromarray(image)
            LR_image = self.degradation_process(image_pil)
            LR_image = np.array(LR_image).astype(np.uint8)

        else:
            LR_image = self.degradation_process(image=image)["image"]

        example["image"] = (image/127.5 - 1.0).astype(np.float32)
        example["LR_image"] = (LR_image/127.5 - 1.0).astype(np.float32)

        return example


class ImageNetSRTrain(ImageNetSR):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def get_base(self):
        with open("data/imagenet_train_hr_indices.p", "rb") as f:
            indices = pickle.load(f)
        dset = ImageNetTrain(process_images=False,)
        return Subset(dset, indices)


class ImageNetSRValidation(ImageNetSR):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def get_base(self):
        with open("data/imagenet_val_hr_indices.p", "rb") as f:
            indices = pickle.load(f)
        dset = ImageNetValidation(process_images=False,)
        return Subset(dset, indices)


================================================
FILE: ldm/data/lsun.py
================================================
import os
import numpy as np
import PIL
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms


class LSUNBase(Dataset):
    def __init__(self,
                 txt_file,
                 data_root,
                 size=None,
                 interpolation="bicubic",
                 flip_p=0.5
                 ):
        self.data_paths = txt_file
        self.data_root = data_root
        with open(self.data_paths, "r") as f:
            self.image_paths = f.read().splitlines()
        self._length = len(self.image_paths)
        self.labels = {
            "relative_file_path_": [l for l in self.image_paths],
            "file_path_": [os.path.join(self.data_root, l)
                           for l in self.image_paths],
        }

        self.size = size
        self.interpolation = {"linear": PIL.Image.LINEAR,
                              "bilinear": PIL.Image.BILINEAR,
                              "bicubic": PIL.Image.BICUBIC,
                              "lanczos": PIL.Image.LANCZOS,
                              }[interpolation]
        self.flip = transforms.RandomHorizontalFlip(p=flip_p)

    def __len__(self):
        return self._length

    def __getitem__(self, i):
        example = dict((k, self.labels[k][i]) for k in self.labels)
        image = Image.open(example["file_path_"])
        if not image.mode == "RGB":
            image = image.convert("RGB")

        # default to score-sde preprocessing
        img = np.array(image).astype(np.uint8)
        crop = min(img.shape[0], img.shape[1])
        h, w, = img.shape[0], img.shape[1]
        img = img[(h - crop) // 2:(h + crop) // 2,
              (w - crop) // 2:(w + crop) // 2]

        image = Image.fromarray(img)
        if self.size is not None:
            image = image.resize((self.size, self.size), resample=self.interpolation)

        image = self.flip(image)
        image = np.array(image).astype(np.uint8)
        example["image"] = (image / 127.5 - 1.0).astype(np.float32)
        return example


class LSUNChurchesTrain(LSUNBase):
    def __init__(self, **kwargs):
        super().__init__(txt_file="data/lsun/church_outdoor_train.txt", data_root="data/lsun/churches", **kwargs)


class LSUNChurchesValidation(LSUNBase):
    def __init__(self, flip_p=0., **kwargs):
        super().__init__(txt_file="data/lsun/church_outdoor_val.txt", data_root="data/lsun/churches",
                         flip_p=flip_p, **kwargs)


class LSUNBedroomsTrain(LSUNBase):
    def __init__(self, **kwargs):
        super().__init__(txt_file="data/lsun/bedrooms_train.txt", data_root="data/lsun/bedrooms", **kwargs)


class LSUNBedroomsValidation(LSUNBase):
    def __init__(self, flip_p=0.0, **kwargs):
        super().__init__(txt_file="data/lsun/bedrooms_val.txt", data_root="data/lsun/bedrooms",
                         flip_p=flip_p, **kwargs)


class LSUNCatsTrain(LSUNBase):
    def __init__(self, **kwargs):
        super().__init__(txt_file="data/lsun/cat_train.txt", data_root="data/lsun/cats", **kwargs)


class LSUNCatsValidation(LSUNBase):
    def __init__(self, flip_p=0., **kwargs):
        super().__init__(txt_file="data/lsun/cat_val.txt", data_root="data/lsun/cats",
                         flip_p=flip_p, **kwargs)


================================================
FILE: ldm/lr_scheduler.py
================================================
import numpy as np


class LambdaWarmUpCosineScheduler:
    """
    note: use with a base_lr of 1.0
    """
    def __init__(self, warm_up_steps, lr_min, lr_max, lr_start, max_decay_steps, verbosity_interval=0):
        self.lr_warm_up_steps = warm_up_steps
        self.lr_start = lr_start
        self.lr_min = lr_min
        self.lr_max = lr_max
        self.lr_max_decay_steps = max_decay_steps
        self.last_lr = 0.
        self.verbosity_interval = verbosity_interval

    def schedule(self, n, **kwargs):
        if self.verbosity_interval > 0:
            if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_lr}")
        if n < self.lr_warm_up_steps:
            lr = (self.lr_max - self.lr_start) / self.lr_warm_up_steps * n + self.lr_start
            self.last_lr = lr
            return lr
        else:
            t = (n - self.lr_warm_up_steps) / (self.lr_max_decay_steps - self.lr_warm_up_steps)
            t = min(t, 1.0)
            lr = self.lr_min + 0.5 * (self.lr_max - self.lr_min) * (
                    1 + np.cos(t * np.pi))
            self.last_lr = lr
            return lr

    def __call__(self, n, **kwargs):
        return self.schedule(n,**kwargs)


class LambdaWarmUpCosineScheduler2:
    """
    supports repeated iterations, configurable via lists
    note: use with a base_lr of 1.0.
    """
    def __init__(self, warm_up_steps, f_min, f_max, f_start, cycle_lengths, verbosity_interval=0):
        assert len(warm_up_steps) == len(f_min) == len(f_max) == len(f_start) == len(cycle_lengths)
        self.lr_warm_up_steps = warm_up_steps
        self.f_start = f_start
        self.f_min = f_min
        self.f_max = f_max
        self.cycle_lengths = cycle_lengths
        self.cum_cycles = np.cumsum([0] + list(self.cycle_lengths))
        self.last_f = 0.
        self.verbosity_interval = verbosity_interval

    def find_in_interval(self, n):
        interval = 0
        for cl in self.cum_cycles[1:]:
            if n <= cl:
                return interval
            interval += 1

    def schedule(self, n, **kwargs):
        cycle = self.find_in_interval(n)
        n = n - self.cum_cycles[cycle]
        if self.verbosity_interval > 0:
            if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_f}, "
                                                       f"current cycle {cycle}")
        if n < self.lr_warm_up_steps[cycle]:
            f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
            self.last_f = f
            return f
        else:
            t = (n - self.lr_warm_up_steps[cycle]) / (self.cycle_lengths[cycle] - self.lr_warm_up_steps[cycle])
            t = min(t, 1.0)
            f = self.f_min[cycle] + 0.5 * (self.f_max[cycle] - self.f_min[cycle]) * (
                    1 + np.cos(t * np.pi))
            self.last_f = f
            return f

    def __call__(self, n, **kwargs):
        return self.schedule(n, **kwargs)


class LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):

    def schedule(self, n, **kwargs):
        cycle = self.find_in_interval(n)
        n = n - self.cum_cycles[cycle]
        if self.verbosity_interval > 0:
            if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_f}, "
                                                       f"current cycle {cycle}")

        if n < self.lr_warm_up_steps[cycle]:
            f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
            self.last_f = f
            return f
        else:
            f = self.f_min[cycle] + (self.f_max[cycle] - self.f_min[cycle]) * (self.cycle_lengths[cycle] - n) / (self.cycle_lengths[cycle])
            self.last_f = f
            return f



================================================
FILE: ldm/models/autoencoder.py
================================================
import torch
import pytorch_lightning as pl
import torch.nn.functional as F
from contextlib import contextmanager

from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer

from ldm.modules.diffusionmodules.model import Encoder, Decoder
from ldm.modules.distributions.distributions import DiagonalGaussianDistribution

from ldm.util import instantiate_from_config


class VQModel(pl.LightningModule):
    def __init__(self,
                 ddconfig,
                 lossconfig,
                 n_embed,
                 embed_dim,
                 ckpt_path=None,
                 ignore_keys=[],
                 image_key="image",
                 colorize_nlabels=None,
                 monitor=None,
                 batch_resize_range=None,
                 scheduler_config=None,
                 lr_g_factor=1.0,
                 remap=None,
                 sane_index_shape=False, # tell vector quantizer to return indices as bhw
                 use_ema=False
                 ):
        super().__init__()
        self.embed_dim = embed_dim
        self.n_embed = n_embed
        self.image_key = image_key
        self.encoder = Encoder(**ddconfig)
        self.decoder = Decoder(**ddconfig)
        self.loss = instantiate_from_config(lossconfig)
        self.quantize = VectorQuantizer(n_embed, embed_dim, beta=0.25,
                                        remap=remap,
                                        sane_index_shape=sane_index_shape)
        self.quant_conv = torch.nn.Conv2d(ddconfig["z_channels"], embed_dim, 1)
        self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig["z_channels"], 1)
        if colorize_nlabels is not None:
            assert type(colorize_nlabels)==int
            self.register_buffer("colorize", torch.randn(3, colorize_nlabels, 1, 1))
        if monitor is not None:
            self.monitor = monitor
        self.batch_resize_range = batch_resize_range
        if self.batch_resize_range is not None:
            print(f"{self.__class__.__name__}: Using per-batch resizing in range {batch_resize_range}.")

        self.use_ema = use_ema
        if self.use_ema:
            self.model_ema = LitEma(self)
            print(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")

        if ckpt_path is not None:
            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)
        self.scheduler_config = scheduler_config
        self.lr_g_factor = lr_g_factor

    @contextmanager
    def ema_scope(self, context=None):
        if self.use_ema:
            self.model_ema.store(self.parameters())
            self.model_ema.copy_to(self)
            if context is not None:
                print(f"{context}: Switched to EMA weights")
        try:
            yield None
        finally:
            if self.use_ema:
                self.model_ema.restore(self.parameters())
                if context is not None:
                    print(f"{context}: Restored training weights")

    def init_from_ckpt(self, path, ignore_keys=list()):
        sd = torch.load(path, map_location="cpu")["state_dict"]
        keys = list(sd.keys())
        for k in keys:
            for ik in ignore_keys:
                if k.startswith(ik):
                    print("Deleting key {} from state_dict.".format(k))
                    del sd[k]
        missing, unexpected = self.load_state_dict(sd, strict=False)
        print(f"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys")
        if len(missing) > 0:
            print(f"Missing Keys: {missing}")
            print(f"Unexpected Keys: {unexpected}")

    def on_train_batch_end(self, *args, **kwargs):
        if self.use_ema:
            self.model_ema(self)

    def encode(self, x):
        h = self.encoder(x)
        h = self.quant_conv(h)
        quant, emb_loss, info = self.quantize(h)
        return quant, emb_loss, info

    def encode_to_prequant(self, x):
        h = self.encoder(x)
        h = self.quant_conv(h)
        return h

    def decode(self, quant):
        quant = self.post_quant_conv(quant)
        dec = self.decoder(quant)
        return dec

    def decode_code(self, code_b):
        quant_b = self.quantize.embed_code(code_b)
        dec = self.decode(quant_b)
        return dec

    def forward(self, input, return_pred_indices=False):
        quant, diff, (_,_,ind) = self.encode(input)
        dec = self.decode(quant)
        if return_pred_indices:
            return dec, diff, ind
        return dec, diff

    def get_input(self, batch, k):
        x = batch[k]
        if len(x.shape) == 3:
            x = x[..., None]
        x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format).float()
        if self.batch_resize_range is not None:
            lower_size = self.batch_resize_range[0]
            upper_size = self.batch_resize_range[1]
            if self.global_step <= 4:
                # do the first few batches with max size to avoid later oom
                new_resize = upper_size
            else:
                new_resize = np.random.choice(np.arange(lower_size, upper_size+16, 16))
            if new_resize != x.shape[2]:
                x = F.interpolate(x, size=new_resize, mode="bicubic")
            x = x.detach()
        return x

    def training_step(self, batch, batch_idx, optimizer_idx):
        # https://github.com/pytorch/pytorch/issues/37142
        # try not to fool the heuristics
        x = self.get_input(batch, self.image_key)
        xrec, qloss, ind = self(x, return_pred_indices=True)

        if optimizer_idx == 0:
            # autoencode
            aeloss, log_dict_ae = self.loss(qloss, x, xrec, optimizer_idx, self.global_step,
                                            last_layer=self.get_last_layer(), split="train",
                                            predicted_indices=ind)

            self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=True)
            return aeloss

        if optimizer_idx == 1:
            # discriminator
            discloss, log_dict_disc = self.loss(qloss, x, xrec, optimizer_idx, self.global_step,
                                            last_layer=self.get_last_layer(), split="train")
            self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=True)
            return discloss

    def validation_step(self, batch, batch_idx):
        log_dict = self._validation_step(batch, batch_idx)
        with self.ema_scope():
            log_dict_ema = self._validation_step(batch, batch_idx, suffix="_ema")
        return log_dict

    def _validation_step(self, batch, batch_idx, suffix=""):
        x = self.get_input(batch, self.image_key)
        xrec, qloss, ind = self(x, return_pred_indices=True)
        aeloss, log_dict_ae = self.loss(qloss, x, xrec, 0,
                                        self.global_step,
                                        last_layer=self.get_last_layer(),
                                        split="val"+suffix,
                                        predicted_indices=ind
                                        )

        discloss, log_dict_disc = self.loss(qloss, x, xrec, 1,
                                            self.global_step,
                                            last_layer=self.get_last_layer(),
                                            split="val"+suffix,
                                            predicted_indices=ind
                                            )
        rec_loss = log_dict_ae[f"val{suffix}/rec_loss"]
        self.log(f"val{suffix}/rec_loss", rec_loss,
                   prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)
        self.log(f"val{suffix}/aeloss", aeloss,
                   prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)
        if version.parse(pl.__version__) >= version.parse('1.4.0'):
            del log_dict_ae[f"val{suffix}/rec_loss"]
        self.log_dict(log_dict_ae)
        self.log_dict(log_dict_disc)
        return self.log_dict

    def configure_optimizers(self):
        lr_d = self.learning_rate
        lr_g = self.lr_g_factor*self.learning_rate
        print("lr_d", lr_d)
        print("lr_g", lr_g)
        opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
                                  list(self.decoder.parameters())+
                                  list(self.quantize.parameters())+
                                  list(self.quant_conv.parameters())+
                                  list(self.post_quant_conv.parameters()),
                                  lr=lr_g, betas=(0.5, 0.9))
        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
                                    lr=lr_d, betas=(0.5, 0.9))

        if self.scheduler_config is not None:
            scheduler = instantiate_from_config(self.scheduler_config)

            print("Setting up LambdaLR scheduler...")
            scheduler = [
                {
                    'scheduler': LambdaLR(opt_ae, lr_lambda=scheduler.schedule),
                    'interval': 'step',
                    'frequency': 1
                },
                {
                    'scheduler': LambdaLR(opt_disc, lr_lambda=scheduler.schedule),
                    'interval': 'step',
                    'frequency': 1
                },
            ]
            return [opt_ae, opt_disc], scheduler
        return [opt_ae, opt_disc], []

    def get_last_layer(self):
        return self.decoder.conv_out.weight

    def log_images(self, batch, only_inputs=False, plot_ema=False, **kwargs):
        log = dict()
        x = self.get_input(batch, self.image_key)
        x = x.to(self.device)
        if only_inputs:
            log["inputs"] = x
            return log
        xrec, _ = self(x)
        if x.shape[1] > 3:
            # colorize with random projection
            assert xrec.shape[1] > 3
            x = self.to_rgb(x)
            xrec = self.to_rgb(xrec)
        log["inputs"] = x
        log["reconstructions"] = xrec
        if plot_ema:
            with self.ema_scope():
                xrec_ema, _ = self(x)
                if x.shape[1] > 3: xrec_ema = self.to_rgb(xrec_ema)
                log["reconstructions_ema"] = xrec_ema
        return log

    def to_rgb(self, x):
        assert self.image_key == "segmentation"
        if not hasattr(self, "colorize"):
            self.register_buffer("colorize", torch.randn(3, x.shape[1], 1, 1).to(x))
        x = F.conv2d(x, weight=self.colorize)
        x = 2.*(x-x.min())/(x.max()-x.min()) - 1.
        return x


class VQModelInterface(VQModel):
    def __init__(self, embed_dim, *args, **kwargs):
        super().__init__(embed_dim=embed_dim, *args, **kwargs)
        self.embed_dim = embed_dim

    def encode(self, x):
        h = self.encoder(x)
        h = self.quant_conv(h)
        return h

    def decode(self, h, force_not_quantize=False):
        # also go through quantization layer
        if not force_not_quantize:
            quant, emb_loss, info = self.quantize(h)
        else:
            quant = h
        quant = self.post_quant_conv(quant)
        dec = self.decoder(quant)
        return dec


class AutoencoderKL(pl.LightningModule):
    def __init__(self,
                 ddconfig,
                 lossconfig,
                 embed_dim,
                 ckpt_path=None,
                 ignore_keys=[],
                 image_key="image",
                 colorize_nlabels=None,
                 monitor=None,
                 ):
        super().__init__()
        self.image_key = image_key
        self.encoder = Encoder(**ddconfig)
        self.decoder = Decoder(**ddconfig)
        self.loss = instantiate_from_config(lossconfig)
        assert ddconfig["double_z"]
        self.quant_conv = torch.nn.Conv2d(2*ddconfig["z_channels"], 2*embed_dim, 1)
        self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig["z_channels"], 1)
        self.embed_dim = embed_dim
        if colorize_nlabels is not None:
            assert type(colorize_nlabels)==int
            self.register_buffer("colorize", torch.randn(3, colorize_nlabels, 1, 1))
        if monitor is not None:
            self.monitor = monitor
        if ckpt_path is not None:
            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)

    def init_from_ckpt(self, path, ignore_keys=list()):
        sd = torch.load(path, map_location="cpu")["state_dict"]
        keys = list(sd.keys())
        for k in keys:
            for ik in ignore_keys:
                if k.startswith(ik):
                    print("Deleting key {} from state_dict.".format(k))
                    del sd[k]
        self.load_state_dict(sd, strict=False)
        print(f"Restored from {path}")

    def encode(self, x):
        h = self.encoder(x)
        moments = self.quant_conv(h)
        posterior = DiagonalGaussianDistribution(moments)
        return posterior

    def decode(self, z):
        z = self.post_quant_conv(z)
        dec = self.decoder(z)
        return dec

    def forward(self, input, sample_posterior=True):
        posterior = self.encode(input)
        if sample_posterior:
            z = posterior.sample()
        else:
            z = posterior.mode()
        dec = self.decode(z)
        return dec, posterior

    def get_input(self, batch, k):
        x = batch[k]
        if len(x.shape) == 3:
            x = x[..., None]
        x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format).float()
        return x

    def training_step(self, batch, batch_idx, optimizer_idx):
        inputs = self.get_input(batch, self.image_key)
        reconstructions, posterior = self(inputs)

        if optimizer_idx == 0:
            # train encoder+decoder+logvar
            aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
                                            last_layer=self.get_last_layer(), split="train")
            self.log("aeloss", aeloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
            self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=False)
            return aeloss

        if optimizer_idx == 1:
            # train the discriminator
            discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
                                                last_layer=self.get_last_layer(), split="train")

            self.log("discloss", discloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
            self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=False)
            return discloss

    def validation_step(self, batch, batch_idx):
        inputs = self.get_input(batch, self.image_key)
        reconstructions, posterior = self(inputs)
        aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, 0, self.global_step,
                                        last_layer=self.get_last_layer(), split="val")

        discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, 1, self.global_step,
                                            last_layer=self.get_last_layer(), split="val")

        self.log("val/rec_loss", log_dict_ae["val/rec_loss"])
        self.log_dict(log_dict_ae)
        self.log_dict(log_dict_disc)
        return self.log_dict

    def configure_optimizers(self):
        lr = self.learning_rate
        opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
                                  list(self.decoder.parameters())+
                                  list(self.quant_conv.parameters())+
                                  list(self.post_quant_conv.parameters()),
                                  lr=lr, betas=(0.5, 0.9))
        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
                                    lr=lr, betas=(0.5, 0.9))
        return [opt_ae, opt_disc], []

    def get_last_layer(self):
        return self.decoder.conv_out.weight

    @torch.no_grad()
    def log_images(self, batch, only_inputs=False, **kwargs):
        log = dict()
        x = self.get_input(batch, self.image_key)
        x = x.to(self.device)
        if not only_inputs:
            xrec, posterior = self(x)
            if x.shape[1] > 3:
                # colorize with random projection
                assert xrec.shape[1] > 3
                x = self.to_rgb(x)
                xrec = self.to_rgb(xrec)
            log["samples"] = self.decode(torch.randn_like(posterior.sample()))
            log["reconstructions"] = xrec
        log["inputs"] = x
        return log

    def to_rgb(self, x):
        assert self.image_key == "segmentation"
        if not hasattr(self, "colorize"):
            self.register_buffer("colorize", torch.randn(3, x.shape[1], 1, 1).to(x))
        x = F.conv2d(x, weight=self.colorize)
        x = 2.*(x-x.min())/(x.max()-x.min()) - 1.
        return x


class IdentityFirstStage(torch.nn.Module):
    def __init__(self, *args, vq_interface=False, **kwargs):
        self.vq_interface = vq_interface  # TODO: Should be true by default but check to not break older stuff
        super().__init__()

    def encode(self, x, *args, **kwargs):
        return x

    def decode(self, x, *args, **kwargs):
        return x

    def quantize(self, x, *args, **kwargs):
        if self.vq_interface:
            return x, None, [None, None, None]
        return x

    def forward(self, x, *args, **kwargs):
        return x


================================================
FILE: ldm/models/diffusion/__init__.py
================================================


================================================
FILE: ldm/models/diffusion/classifier.py
================================================
import os
import torch
import pytorch_lightning as pl
from omegaconf import OmegaConf
from torch.nn import functional as F
from torch.optim import AdamW
from torch.optim.lr_scheduler import LambdaLR
from copy import deepcopy
from einops import rearrange
from glob import glob
from natsort import natsorted

from ldm.modules.diffusionmodules.openaimodel import EncoderUNetModel, UNetModel
from ldm.util import log_txt_as_img, default, ismap, instantiate_from_config

__models__ = {
    'class_label': EncoderUNetModel,
    'segmentation': UNetModel
}


def disabled_train(self, mode=True):
    """Overwrite model.train with this function to make sure train/eval mode
    does not change anymore."""
    return self


class NoisyLatentImageClassifier(pl.LightningModule):

    def __init__(self,
                 diffusion_path,
                 num_classes,
                 ckpt_path=None,
                 pool='attention',
                 label_key=None,
                 diffusion_ckpt_path=None,
                 scheduler_config=None,
                 weight_decay=1.e-2,
                 log_steps=10,
                 monitor='val/loss',
                 *args,
                 **kwargs):
        super().__init__(*args, **kwargs)
        self.num_classes = num_classes
        # get latest config of diffusion model
        diffusion_config = natsorted(glob(os.path.join(diffusion_path, 'configs', '*-project.yaml')))[-1]
        self.diffusion_config = OmegaConf.load(diffusion_config).model
        self.diffusion_config.params.ckpt_path = diffusion_ckpt_path
        self.load_diffusion()

        self.monitor = monitor
        self.numd = self.diffusion_model.first_stage_model.encoder.num_resolutions - 1
        self.log_time_interval = self.diffusion_model.num_timesteps // log_steps
        self.log_steps = log_steps

        self.label_key = label_key if not hasattr(self.diffusion_model, 'cond_stage_key') \
            else self.diffusion_model.cond_stage_key

        assert self.label_key is not None, 'label_key neither in diffusion model nor in model.params'

        if self.label_key not in __models__:
            raise NotImplementedError()

        self.load_classifier(ckpt_path, pool)

        self.scheduler_config = scheduler_config
        self.use_scheduler = self.scheduler_config is not None
        self.weight_decay = weight_decay

    def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
        sd = torch.load(path, map_location="cpu")
        if "state_dict" in list(sd.keys()):
            sd = sd["state_dict"]
        keys = list(sd.keys())
        for k in keys:
            for ik in ignore_keys:
                if k.startswith(ik):
                    print("Deleting key {} from state_dict.".format(k))
                    del sd[k]
        missing, unexpected = self.load_state_dict(sd, strict=False) if not only_model else self.model.load_state_dict(
            sd, strict=False)
        print(f"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys")
        if len(missing) > 0:
            print(f"Missing Keys: {missing}")
        if len(unexpected) > 0:
            print(f"Unexpected Keys: {unexpected}")

    def load_diffusion(self):
        model = instantiate_from_config(self.diffusion_config)
        self.diffusion_model = model.eval()
        self.diffusion_model.train = disabled_train
        for param in self.diffusion_model.parameters():
            param.requires_grad = False

    def load_classifier(self, ckpt_path, pool):
        model_config = deepcopy(self.diffusion_config.params.unet_config.params)
        model_config.in_channels = self.diffusion_config.params.unet_config.params.out_channels
        model_config.out_channels = self.num_classes
        if self.label_key == 'class_label':
            model_config.pool = pool

        self.model = __models__[self.label_key](**model_config)
        if ckpt_path is not None:
            print('#####################################################################')
            print(f'load from ckpt "{ckpt_path}"')
            print('#####################################################################')
            self.init_from_ckpt(ckpt_path)

    @torch.no_grad()
    def get_x_noisy(self, x, t, noise=None):
        noise = default(noise, lambda: torch.randn_like(x))
        continuous_sqrt_alpha_cumprod = None
        if self.diffusion_model.use_continuous_noise:
            continuous_sqrt_alpha_cumprod = self.diffusion_model.sample_continuous_noise_level(x.shape[0], t + 1)
            # todo: make sure t+1 is correct here

        return self.diffusion_model.q_sample(x_start=x, t=t, noise=noise,
                                             continuous_sqrt_alpha_cumprod=continuous_sqrt_alpha_cumprod)

    def forward(self, x_noisy, t, *args, **kwargs):
        return self.model(x_noisy, t)

    @torch.no_grad()
    def get_input(self, batch, k):
        x = batch[k]
        if len(x.shape) == 3:
            x = x[..., None]
        x = rearrange(x, 'b h w c -> b c h w')
        x = x.to(memory_format=torch.contiguous_format).float()
        return x

    @torch.no_grad()
    def get_conditioning(self, batch, k=None):
        if k is None:
            k = self.label_key
        assert k is not None, 'Needs to provide label key'

        targets = batch[k].to(self.device)

        if self.label_key == 'segmentation':
            targets = rearrange(targets, 'b h w c -> b c h w')
            for down in range(self.numd):
                h, w = targets.shape[-2:]
                targets = F.interpolate(targets, size=(h // 2, w // 2), mode='nearest')

            # targets = rearrange(targets,'b c h w -> b h w c')

        return targets

    def compute_top_k(self, logits, labels, k, reduction="mean"):
        _, top_ks = torch.topk(logits, k, dim=1)
        if reduction == "mean":
            return (top_ks == labels[:, None]).float().sum(dim=-1).mean().item()
        elif reduction == "none":
            return (top_ks == labels[:, None]).float().sum(dim=-1)

    def on_train_epoch_start(self):
        # save some memory
        self.diffusion_model.model.to('cpu')

    @torch.no_grad()
    def write_logs(self, loss, logits, targets):
        log_prefix = 'train' if self.training else 'val'
        log = {}
        log[f"{log_prefix}/loss"] = loss.mean()
        log[f"{log_prefix}/acc@1"] = self.compute_top_k(
            logits, targets, k=1, reduction="mean"
        )
        log[f"{log_prefix}/acc@5"] = self.compute_top_k(
            logits, targets, k=5, reduction="mean"
        )

        self.log_dict(log, prog_bar=False, logger=True, on_step=self.training, on_epoch=True)
        self.log('loss', log[f"{log_prefix}/loss"], prog_bar=True, logger=False)
        self.log('global_step', self.global_step, logger=False, on_epoch=False, prog_bar=True)
        lr = self.optimizers().param_groups[0]['lr']
        self.log('lr_abs', lr, on_step=True, logger=True, on_epoch=False, prog_bar=True)

    def shared_step(self, batch, t=None):
        x, *_ = self.diffusion_model.get_input(batch, k=self.diffusion_model.first_stage_key)
        targets = self.get_conditioning(batch)
        if targets.dim() == 4:
            targets = targets.argmax(dim=1)
        if t is None:
            t = torch.randint(0, self.diffusion_model.num_timesteps, (x.shape[0],), device=self.device).long()
        else:
            t = torch.full(size=(x.shape[0],), fill_value=t, device=self.device).long()
        x_noisy = self.get_x_noisy(x, t)
        logits = self(x_noisy, t)

        loss = F.cross_entropy(logits, targets, reduction='none')

        self.write_logs(loss.detach(), logits.detach(), targets.detach())

        loss = loss.mean()
        return loss, logits, x_noisy, targets

    def training_step(self, batch, batch_idx):
        loss, *_ = self.shared_step(batch)
        return loss

    def reset_noise_accs(self):
        self.noisy_acc = {t: {'acc@1': [], 'acc@5': []} for t in
                          range(0, self.diffusion_model.num_timesteps, self.diffusion_model.log_every_t)}

    def on_validation_start(self):
        self.reset_noise_accs()

    @torch.no_grad()
    def validation_step(self, batch, batch_idx):
        loss, *_ = self.shared_step(batch)

        for t in self.noisy_acc:
            _, logits, _, targets = self.shared_step(batch, t)
            self.noisy_acc[t]['acc@1'].append(self.compute_top_k(logits, targets, k=1, reduction='mean'))
            self.noisy_acc[t]['acc@5'].append(self.compute_top_k(logits, targets, k=5, reduction='mean'))

        return loss

    def configure_optimizers(self):
        optimizer = AdamW(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)

        if self.use_scheduler:
            scheduler = instantiate_from_config(self.scheduler_config)

            print("Setting up LambdaLR scheduler...")
            scheduler = [
                {
                    'scheduler': LambdaLR(optimizer, lr_lambda=scheduler.schedule),
                    'interval': 'step',
                    'frequency': 1
                }]
            return [optimizer], scheduler

        return optimizer

    @torch.no_grad()
    def log_images(self, batch, N=8, *args, **kwargs):
        log = dict()
        x = self.get_input(batch, self.diffusion_model.first_stage_key)
        log['inputs'] = x

        y = self.get_conditioning(batch)

        if self.label_key == 'class_label':
            y = log_txt_as_img((x.shape[2], x.shape[3]), batch["human_label"])
            log['labels'] = y

        if ismap(y):
            log['labels'] = self.diffusion_model.to_rgb(y)

            for step in range(self.log_steps):
                current_time = step * self.log_time_interval

                _, logits, x_noisy, _ = self.shared_step(batch, t=current_time)

                log[f'inputs@t{current_time}'] = x_noisy

                pred = F.one_hot(logits.argmax(dim=1), num_classes=self.num_classes)
                pred = rearrange(pred, 'b h w c -> b c h w')

                log[f'pred@t{current_time}'] = self.diffusion_model.to_rgb(pred)

        for key in log:
            log[key] = log[key][:N]

        return log


================================================
FILE: ldm/models/diffusion/ddim.py
================================================
"""SAMPLING ONLY."""

import torch
import numpy as np
from tqdm import tqdm
from functools import partial

from ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like, \
    extract_into_tensor


class DDIMSampler(object):
    def __init__(self, model, schedule="linear", **kwargs):
        super().__init__()
        self.model = model
        self.ddpm_num_timesteps = model.num_timesteps
        self.schedule = schedule

    def register_buffer(self, name, attr):
        if type(attr) == torch.Tensor:
            if attr.device != torch.device("cuda"):
                attr = attr.to(torch.device("cuda"))
        setattr(self, name, attr)

    def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True):
        self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,
                                                  num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
        alphas_cumprod = self.model.alphas_cumprod
        assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
        to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)

        self.register_buffer('betas', to_torch(self.model.betas))
        self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
        self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))

        # calculations for diffusion q(x_t | x_{t-1}) and others
        self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))
        self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))
        self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))
        self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))
        self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))

        # ddim sampling parameters
        ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),
                                                                                   ddim_timesteps=self.ddim_timesteps,
                                                                                   eta=ddim_eta,verbose=verbose)
        self.register_buffer('ddim_sigmas', ddim_sigmas)
        self.register_buffer('ddim_alphas', ddim_alphas)
        self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)
        self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))
        sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
            (1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (
                        1 - self.alphas_cumprod / self.alphas_cumprod_prev))
        self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)

    @torch.no_grad()
    def sample(self,
               S,
               batch_size,
               shape,
               conditioning=None,
               callback=None,
               normals_sequence=None,
               img_callback=None,
               quantize_x0=False,
               eta=0.,
               mask=None,
               x0=None,
               temperature=1.,
               noise_dropout=0.,
               score_corrector=None,
               corrector_kwargs=None,
               verbose=True,
               x_T=None,
               log_every_t=100,
               unconditional_guidance_scale=1.,
               unconditional_conditioning=None,
               # this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
               **kwargs
               ):
        if conditioning is not None:
            if isinstance(conditioning, dict):
                cbs = conditioning[list(conditioning.keys())[0]].shape[0]
                if cbs != batch_size:
                    print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
            else:
                if conditioning.shape[0] != batch_size:
                    print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")

        self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)
        # sampling
        C, H, W = shape
        size = (batch_size, C, H, W)
        print(f'Data shape for DDIM sampling is {size}, eta {eta}')

        samples, intermediates = self.ddim_sampling(conditioning, size,
                                                    callback=callback,
                                                    img_callback=img_callback,
                                                    quantize_denoised=quantize_x0,
                                                    mask=mask, x0=x0,
                                                    ddim_use_original_steps=False,
                                                    noise_dropout=noise_dropout,
                                                    temperature=temperature,
                                                    score_corrector=score_corrector,
                                                    corrector_kwargs=corrector_kwargs,
                                                    x_T=x_T,
                                                    log_every_t=log_every_t,
                                                    unconditional_guidance_scale=unconditional_guidance_scale,
                                                    unconditional_conditioning=unconditional_conditioning,
                                                    )
        return samples, intermediates

    @torch.no_grad()
    def ddim_sampling(self, cond, shape,
                      x_T=None, ddim_use_original_steps=False,
                      callback=None, timesteps=None, quantize_denoised=False,
                      mask=None, x0=None, img_callback=None, log_every_t=100,
                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
                      unconditional_guidance_scale=1., unconditional_conditioning=None,):
        device = self.model.betas.device
        b = shape[0]
        if x_T is None:
            img = torch.randn(shape, device=device)
        else:
            img = x_T

        if timesteps is None:
            timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps
        elif timesteps is not None and not ddim_use_original_steps:
            subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1
            timesteps = self.ddim_timesteps[:subset_end]

        intermediates = {'x_inter': [img], 'pred_x0': [img]}
        time_range = reversed(range(0,timesteps)) if ddim_use_original_steps else np.flip(timesteps)
        total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]
        print(f"Running DDIM Sampling with {total_steps} timesteps")

        iterator = tqdm(time_range, desc='DDIM Sampler', total=total_steps)

        for i, step in enumerate(iterator):
            index = total_steps - i - 1
            ts = torch.full((b,), step, device=device, dtype=torch.long)

            if mask is not None:
                assert x0 is not None
                img_orig = self.model.q_sample(x0, ts)  # TODO: deterministic forward pass?
                img = img_orig * mask + (1. - mask) * img

            outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
                                      quantize_denoised=quantize_denoised, temperature=temperature,
                                      noise_dropout=noise_dropout, score_corrector=score_corrector,
                                      corrector_kwargs=corrector_kwargs,
                                      unconditional_guidance_scale=unconditional_guidance_scale,
                                      unconditional_conditioning=unconditional_conditioning)
            img, pred_x0 = outs
            if callback: callback(i)
            if img_callback: img_callback(pred_x0, i)

            if index % log_every_t == 0 or index == total_steps - 1:
                intermediates['x_inter'].append(img)
                intermediates['pred_x0'].append(pred_x0)

        return img, intermediates

    @torch.no_grad()
    def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
                      unconditional_guidance_scale=1., unconditional_conditioning=None):
        b, *_, device = *x.shape, x.device

        if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
            e_t = self.model.apply_model(x, t, c)
        else:
            x_in = torch.cat([x] * 2)
            t_in = torch.cat([t] * 2)
            c_in = torch.cat([unconditional_conditioning, c])
            e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
            e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)

        if score_corrector is not None:
            assert self.model.parameterization == "eps"
            e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)

        alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas
        alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev
        sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
        sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
        # select parameters corresponding to the currently considered timestep
        a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)
        a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)
        sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)
        sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)

        # current prediction for x_0
        pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
        if quantize_denoised:
            pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)
        # direction pointing to x_t
        dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t
        noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
        if noise_dropout > 0.:
            noise = torch.nn.functional.dropout(noise, p=noise_dropout)
        x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise
        return x_prev, pred_x0

    @torch.no_grad()
    def stochastic_encode(self, x0, t, use_original_steps=False, noise=None):
        # fast, but does not allow for exact reconstruction
        # t serves as an index to gather the correct alphas
        if use_original_steps:
            sqrt_alphas_cumprod = self.sqrt_alphas_cumprod
            sqrt_one_minus_alphas_cumprod = self.sqrt_one_minus_alphas_cumprod
        else:
            sqrt_alphas_cumprod = torch.sqrt(self.ddim_alphas)
            sqrt_one_minus_alphas_cumprod = self.ddim_sqrt_one_minus_alphas

        if noise is None:
            noise = torch.randn_like(x0)
        return (extract_into_tensor(sqrt_alphas_cumprod, t, x0.shape) * x0 +
                extract_into_tensor(sqrt_one_minus_alphas_cumprod, t, x0.shape) * noise)

    @torch.no_grad()
    def decode(self, x_latent, cond, t_start, unconditional_guidance_scale=1.0, unconditional_conditioning=None,
               use_original_steps=False):

        timesteps = np.arange(self.ddpm_num_timesteps) if use_original_steps else self.ddim_timesteps
        timesteps = timesteps[:t_start]

        time_range = np.flip(timesteps)
        total_steps = timesteps.shape[0]
        print(f"Running DDIM Sampling with {total_steps} timesteps")

        iterator = tqdm(time_range, desc='Decoding image', total=total_steps)
        x_dec = x_latent
        for i, step in enumerate(iterator):
            index = total_steps - i - 1
            ts = torch.full((x_latent.shape[0],), step, device=x_latent.device, dtype=torch.long)
            x_dec, _ = self.p_sample_ddim(x_dec, cond, ts, index=index, use_original_steps=use_original_steps,
                                          unconditional_guidance_scale=unconditional_guidance_scale,
                                          unconditional_conditioning=unconditional_conditioning)
        return x_dec

================================================
FILE: ldm/models/diffusion/ddpm.py
================================================
"""
wild mixture of
https://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e3164b3/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py
https://github.com/openai/improved-diffusion/blob/e94489283bb876ac1477d5dd7709bbbd2d9902ce/improved_diffusion/gaussian_diffusion.py
https://github.com/CompVis/taming-transformers
-- merci
"""

import torch
import torch.nn as nn
import numpy as np
import pytorch_lightning as pl
from torch.optim.lr_scheduler import LambdaLR
from einops import rearrange, repeat
from contextlib import contextmanager
from functools import partial
from tqdm import tqdm
from torchvision.utils import make_grid
from pytorch_lightning.utilities.distributed import rank_zero_only

from ldm.util import log_txt_as_img, exists, default, ismap, isimage, mean_flat, count_params, instantiate_from_config
from ldm.modules.ema import LitEma
from ldm.modules.distributions.distributions import normal_kl, DiagonalGaussianDistribution
from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
from ldm.modules.diffusionmodules.util import make_beta_schedule, extract_into_tensor, noise_like
from ldm.models.diffusion.ddim import DDIMSampler


__conditioning_keys__ = {'concat': 'c_concat',
                         'crossattn': 'c_crossattn',
                         'adm': 'y'}


def disabled_train(self, mode=True):
    """Overwrite model.train with this function to make sure train/eval mode
    does not change anymore."""
    return self


def uniform_on_device(r1, r2, shape, device):
    return (r1 - r2) * torch.rand(*shape, device=device) + r2


class DDPM(pl.LightningModule):
    # classic DDPM with Gaussian diffusion, in image space
    def __init__(self,
                 unet_config,
                 timesteps=1000,
                 beta_schedule="linear",
                 loss_type="l2",
                 ckpt_path=None,
                 ignore_keys=[],
                 load_only_unet=False,
                 monitor="val/loss",
                 use_ema=True,
                 first_stage_key="image",
                 image_size=256,
                 channels=3,
                 log_every_t=100,
                 clip_denoised=True,
                 linear_start=1e-4,
                 linear_end=2e-2,
                 cosine_s=8e-3,
                 given_betas=None,
                 original_elbo_weight=0.,
                 v_posterior=0.,  # weight for choosing posterior variance as sigma = (1-v) * beta_tilde + v * beta
                 l_simple_weight=1.,
                 conditioning_key=None,
                 parameterization="eps",  # all assuming fixed variance schedules
                 scheduler_config=None,
                 use_positional_encodings=False,
                 learn_logvar=False,
                 logvar_init=0.,
                 ):
        super().__init__()
        assert parameterization in ["eps", "x0"], 'currently only supporting "eps" and "x0"'
        self.parameterization = parameterization
        print(f"{self.__class__.__name__}: Running in {self.parameterization}-prediction mode")
        self.cond_stage_model = None
        self.clip_denoised = clip_denoised
        self.log_every_t = log_every_t
        self.first_stage_key = first_stage_key
        self.image_size = image_size  # try conv?
        self.channels = channels
        self.use_positional_encodings = use_positional_encodings
        self.model = DiffusionWrapper(unet_config, conditioning_key)
        count_params(self.model, verbose=True)
        self.use_ema = use_ema
        if self.use_ema:
            self.model_ema = LitEma(self.model)
            print(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")

        self.use_scheduler = scheduler_config is not None
        if self.use_scheduler:
            self.scheduler_config = scheduler_config

        self.v_posterior = v_posterior
        self.original_elbo_weight = original_elbo_weight
        self.l_simple_weight = l_simple_weight

        if monitor is not None:
            self.monitor = monitor
        if ckpt_path is not None:
            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys, only_model=load_only_unet)

        self.register_schedule(given_betas=given_betas, beta_schedule=beta_schedule, timesteps=timesteps,
                               linear_start=linear_start, linear_end=linear_end, cosine_s=cosine_s)

        self.loss_type = loss_type

        self.learn_logvar = learn_logvar
        self.logvar = torch.full(fill_value=logvar_init, size=(self.num_timesteps,))
        if self.learn_logvar:
            self.logvar = nn.Parameter(self.logvar, requires_grad=True)


    def register_schedule(self, given_betas=None, beta_schedule="linear", timesteps=1000,
                          linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
        if exists(given_betas):
            betas = given_betas
        else:
            betas = make_beta_schedule(beta_schedule, timesteps, linear_start=linear_start, linear_end=linear_end,
                                       cosine_s=cosine_s)
        alphas = 1. - betas
        alphas_cumprod = np.cumprod(alphas, axis=0)
        alphas_cumprod_prev = np.append(1., alphas_cumprod[:-1])

        timesteps, = betas.shape
        self.num_timesteps = int(timesteps)
        self.linear_start = linear_start
        self.linear_end = linear_end
        assert alphas_cumprod.shape[0] == self.num_timesteps, 'alphas have to be defined for each timestep'

        to_torch = partial(torch.tensor, dtype=torch.float32)

        self.register_buffer('betas', to_torch(betas))
        self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
        self.register_buffer('alphas_cumprod_prev', to_torch(alphas_cumprod_prev))

        # calculations for diffusion q(x_t | x_{t-1}) and others
        self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod)))
        self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod)))
        self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod)))
        self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod)))
        self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod - 1)))

        # calculations for posterior q(x_{t-1} | x_t, x_0)
        posterior_variance = (1 - self.v_posterior) * betas * (1. - alphas_cumprod_prev) / (
                    1. - alphas_cumprod) + self.v_posterior * betas
        # above: equal to 1. / (1. / (1. - alpha_cumprod_tm1) + alpha_t / beta_t)
        self.register_buffer('posterior_variance', to_torch(posterior_variance))
        # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain
        self.register_buffer('posterior_log_variance_clipped', to_torch(np.log(np.maximum(posterior_variance, 1e-20))))
        self.register_buffer('posterior_mean_coef1', to_torch(
            betas * np.sqrt(alphas_cumprod_prev) / (1. - alphas_cumprod)))
        self.register_buffer('posterior_mean_coef2', to_torch(
            (1. - alphas_cumprod_prev) * np.sqrt(alphas) / (1. - alphas_cumprod)))

        if self.parameterization == "eps":
            lvlb_weights = self.betas ** 2 / (
                        2 * self.posterior_variance * to_torch(alphas) * (1 - self.alphas_cumprod))
        elif self.parameterization == "x0":
            lvlb_weights = 0.5 * np.sqrt(torch.Tensor(alphas_cumprod)) / (2. * 1 - torch.Tensor(alphas_cumprod))
        else:
            raise NotImplementedError("mu not supported")
        # TODO how to choose this term
        lvlb_weights[0] = lvlb_weights[1]
        self.register_buffer('lvlb_weights', lvlb_weights, persistent=False)
        assert not torch.isnan(self.lvlb_weights).all()

    @contextmanager
    def ema_scope(self, context=None):
        if self.use_ema:
            self.model_ema.store(self.model.parameters())
            self.model_ema.copy_to(self.model)
            if context is not None:
                print(f"{context}: Switched to EMA weights")
        try:
            yield None
        finally:
            if self.use_ema:
                self.model_ema.restore(self.model.parameters())
                if context is not None:
                    print(f"{context}: Restored training weights")

    def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
        sd = torch.load(path, map_location="cpu")
        if "state_dict" in list(sd.keys()):
            sd = sd["state_dict"]
        keys = list(sd.keys())
        for k in keys:
            for ik in ignore_keys:
                if k.startswith(ik):
                    print("Deleting key {} from state_dict.".format(k))
                    del sd[k]
        missing, unexpected = self.load_state_dict(sd, strict=False) if not only_model else self.model.load_state_dict(
            sd, strict=False)
        print(f"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys")
        if len(missing) > 0:
            print(f"Missing Keys: {missing}")
        if len(unexpected) > 0:
            print(f"Unexpected Keys: {unexpected}")

    def q_mean_variance(self, x_start, t):
        """
        Get the distribution q(x_t | x_0).
        :param x_start: the [N x C x ...] tensor of noiseless inputs.
        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.
        :return: A tuple (mean, variance, log_variance), all of x_start's shape.
        """
        mean = (extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start)
        variance = extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)
        log_variance = extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)
        return mean, variance, log_variance

    def predict_start_from_noise(self, x_t, t, noise):
        return (
                extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t -
                extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * noise
        )

    def q_posterior(self, x_start, x_t, t):
        posterior_mean = (
                extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start +
                extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t
        )
        posterior_variance = extract_into_tensor(self.posterior_variance, t, x_t.shape)
        posterior_log_variance_clipped = extract_into_tensor(self.posterior_log_variance_clipped, t, x_t.shape)
        return posterior_mean, posterior_variance, posterior_log_variance_clipped

    def p_mean_variance(self, x, t, clip_denoised: bool):
        model_out = self.model(x, t)
        if self.parameterization == "eps":
            x_recon = self.predict_start_from_noise(x, t=t, noise=model_out)
        elif self.parameterization == "x0":
            x_recon = model_out
        if clip_denoised:
            x_recon.clamp_(-1., 1.)

        model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start=x_recon, x_t=x, t=t)
        return model_mean, posterior_variance, posterior_log_variance

    @torch.no_grad()
    def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):
        b, *_, device = *x.shape, x.device
        model_mean, _, model_log_variance = self.p_mean_variance(x=x, t=t, clip_denoised=clip_denoised)
        noise = noise_like(x.shape, device, repeat_noise)
        # no noise when t == 0
        nonzero_mask = (1 - (t == 0).float()).reshape(b, *((1,) * (len(x.shape) - 1)))
        return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise

    @torch.no_grad()
    def p_sample_loop(self, shape, return_intermediates=False):
        device = self.betas.device
        b = shape[0]
        img = torch.randn(shape, device=device)
        intermediates = [img]
        for i in tqdm(reversed(range(0, self.num_timesteps)), desc='Sampling t', total=self.num_timesteps):
            img = self.p_sample(img, torch.full((b,), i, device=device, dtype=torch.long),
                                clip_denoised=self.clip_denoised)
            if i % self.log_every_t == 0 or i == self.num_timesteps - 1:
                intermediates.append(img)
        if return_intermediates:
            return img, intermediates
        return img

    @torch.no_grad()
    def sample(self, batch_size=16, return_intermediates=False):
        image_size = self.image_size
        channels = self.channels
        return self.p_sample_loop((batch_size, channels, image_size, image_size),
                                  return_intermediates=return_intermediates)

    def q_sample(self, x_start, t, noise=None):
        noise = default(noise, lambda: torch.randn_like(x_start))
        return (extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start +
                extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise)

    def get_loss(self, pred, target, mean=True):
        if self.loss_type == 'l1':
            loss = (target - pred).abs()
            if mean:
                loss = loss.mean()
        elif self.loss_type == 'l2':
            if mean:
                loss = torch.nn.functional.mse_loss(target, pred)
            else:
                loss = torch.nn.functional.mse_loss(target, pred, reduction='none')
        else:
            raise NotImplementedError("unknown loss type '{loss_type}'")

        return loss

    def p_losses(self, x_start, t, noise=None):
        noise = default(noise, lambda: torch.randn_like(x_start))
        x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
        model_out = self.model(x_noisy, t)

        loss_dict = {}
        if self.parameterization == "eps":
            target = noise
        elif self.parameterization == "x0":
            target = x_start
        else:
            raise NotImplementedError(f"Paramterization {self.parameterization} not yet supported")

        loss = self.get_loss(model_out, target, mean=False).mean(dim=[1, 2, 3])

        log_prefix = 'train' if self.training else 'val'

        loss_dict.update({f'{log_prefix}/loss_simple': loss.mean()})
        loss_simple = loss.mean() * self.l_simple_weight

        loss_vlb = (self.lvlb_weights[t] * loss).mean()
        loss_dict.update({f'{log_prefix}/loss_vlb': loss_vlb})

        loss = loss_simple + self.original_elbo_weight * loss_vlb

        loss_dict.update({f'{log_prefix}/loss': loss})

        return loss, loss_dict

    def forward(self, x, *args, **kwargs):
        # b, c, h, w, device, img_size, = *x.shape, x.device, self.image_size
        # assert h == img_size and w == img_size, f'height and width of image must be {img_size}'
        t = torch.randint(0, self.num_timesteps, (x.shape[0],), device=self.device).long()
        return self.p_losses(x, t, *args, **kwargs)

    def get_input(self, batch, k):
        x = batch[k]
        if len(x.shape) == 3:
            x = x[..., None]
        x = rearrange(x, 'b h w c -> b c h w')
        x = x.to(memory_format=torch.contiguous_format).float()
        return x

    def shared_step(self, batch):
        x = self.get_input(batch, self.first_stage_key)
        loss, loss_dict = self(x)
        return loss, loss_dict

    def training_step(self, batch, batch_idx):
        loss, loss_dict = self.shared_step(batch)

        self.log_dict(loss_dict, prog_bar=True,
                      logger=True, on_step=True, on_epoch=True)

        self.log("global_step", self.global_step,
                 prog_bar=True, logger=True, on_step=True, on_epoch=False)

        if self.use_scheduler:
            lr = self.optimizers().param_groups[0]['lr']
            self.log('lr_abs', lr, prog_bar=True, logger=True, on_step=True, on_epoch=False)

        return loss

    @torch.no_grad()
    def validation_step(self, batch, batch_idx):
        _, loss_dict_no_ema = self.shared_step(batch)
        with self.ema_scope():
            _, loss_dict_ema = self.shared_step(batch)
            loss_dict_ema = {key + '_ema': loss_dict_ema[key] for key in loss_dict_ema}
        self.log_dict(loss_dict_no_ema, prog_bar=False, logger=True, on_step=False, on_epoch=True)
        self.log_dict(loss_dict_ema, prog_bar=False, logger=True, on_step=False, on_epoch=True)

    def on_train_batch_end(self, *args, **kwargs):
        if self.use_ema:
            self.model_ema(self.model)

    def _get_rows_from_list(self, samples):
        n_imgs_per_row = len(samples)
        denoise_grid = rearrange(samples, 'n b c h w -> b n c h w')
        denoise_grid = rearrange(denoise_grid, 'b n c h w -> (b n) c h w')
        denoise_grid = make_grid(denoise_grid, nrow=n_imgs_per_row)
        return denoise_grid

    @torch.no_grad()
    def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=None, **kwargs):
        log = dict()
        x = self.get_input(batch, self.first_stage_key)
        N = min(x.shape[0], N)
        n_row = min(x.shape[0], n_row)
        x = x.to(self.device)[:N]
        log["inputs"] = x

        # get diffusion row
        diffusion_row = list()
        x_start = x[:n_row]

        for t in range(self.num_timesteps):
            if t % self.log_every_t == 0 or t == self.num_timesteps - 1:
                t = repeat(torch.tensor([t]), '1 -> b', b=n_row)
                t = t.to(self.device).long()
                noise = torch.randn_like(x_start)
                x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
                diffusion_row.append(x_noisy)

        log["diffusion_row"] = self._get_rows_from_list(diffusion_row)

        if sample:
            # get denoise row
            with self.ema_scope("Plotting"):
                samples, denoise_row = self.sample(batch_size=N, return_intermediates=True)

            log["samples"] = samples
            log["denoise_row"] = self._get_rows_from_list(denoise_row)

        if return_keys:
            if np.intersect1d(list(log.keys()), return_keys).shape[0] == 0:
                return log
            else:
                return {key: log[key] for key in return_keys}
        return log

    def configure_optimizers(self):
        lr = self.learning_rate
        params = list(self.model.parameters())
        if self.learn_logvar:
            params = params + [self.logvar]
        opt = torch.optim.AdamW(params, lr=lr)
        return opt


class LatentDiffusion(DDPM):
    """main class"""
    def __init__(self,
                 first_stage_config,
                 cond_stage_config,
                 num_timesteps_cond=None,
                 cond_stage_key="image",
                 cond_stage_trainable=False,
                 concat_mode=True,
                 cond_stage_forward=None,
                 conditioning_key=None,
                 scale_factor=1.0,
                 scale_by_std=False,
                 *args, **kwargs):
        self.num_timesteps_cond = default(num_timesteps_cond, 1)
        self.scale_by_std = scale_by_std
        assert self.num_timesteps_cond <= kwargs['timesteps']
        # for backwards compatibility after implementation of DiffusionWrapper
        if conditioning_key is None:
            conditioning_key = 'concat' if concat_mode else 'crossattn'
        if cond_stage_config == '__is_unconditional__':
            conditioning_key = None
        ckpt_path = kwargs.pop("ckpt_path", None)
        ignore_keys = kwargs.pop("ignore_keys", [])
        super().__init__(conditioning_key=conditioning_key, *args, **kwargs)
        self.concat_mode = concat_mode
        self.cond_stage_trainable = cond_stage_trainable
        self.cond_stage_key = cond_stage_key
        try:
            self.num_downs = len(first_stage_config.params.ddconfig.ch_mult) - 1
        except:
            self.num_downs = 0
        if not scale_by_std:
            self.scale_factor = scale_factor
        else:
            self.register_buffer('scale_factor', torch.tensor(scale_factor))
        self.instantiate_first_stage(first_stage_config)
        self.instantiate_cond_stage(cond_stage_config)
        self.cond_stage_forward = cond_stage_forward
        self.clip_denoised = False
        self.bbox_tokenizer = None  

        self.restarted_from_ckpt = False
        if ckpt_path is not None:
            self.init_from_ckpt(ckpt_path, ignore_keys)
            self.restarted_from_ckpt = True

    def make_cond_schedule(self, ):
        self.cond_ids = torch.full(size=(self.num_timesteps,), fill_value=self.num_timesteps - 1, dtype=torch.long)
        ids = torch.round(torch.linspace(0, self.num_timesteps - 1, self.num_timesteps_cond)).long()
        self.cond_ids[:self.num_timesteps_cond] = ids

    @rank_zero_only
    @torch.no_grad()
    def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
        # only for very first batch
        if self.scale_by_std and self.current_epoch == 0 and self.global_step == 0 and batch_idx == 0 and not self.restarted_from_ckpt:
            assert self.scale_factor == 1., 'rather not use custom rescaling and std-rescaling simultaneously'
            # set rescale weight to 1./std of encodings
            print("### USING STD-RESCALING ###")
            x = super().get_input(batch, self.first_stage_key)
            x = x.to(self.device)
            encoder_posterior = self.encode_first_stage(x)
            z = self.get_first_stage_encoding(encoder_posterior).detach()
            del self.scale_factor
            self.register_buffer('scale_factor', 1. / z.flatten().std())
            print(f"setting self.scale_factor to {self.scale_factor}")
            print("### USING STD-RESCALING ###")

    def register_schedule(self,
                          given_betas=None, beta_schedule="linear", timesteps=1000,
                          linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
        super().register_schedule(given_betas, beta_schedule, timesteps, linear_start, linear_end, cosine_s)

        self.shorten_cond_schedule = self.num_timesteps_cond > 1
        if self.shorten_cond_schedule:
            self.make_cond_schedule()

    def instantiate_first_stage(self, config):
        model = instantiate_from_config(config)
        self.first_stage_model = model.eval()
        self.first_stage_model.train = disabled_train
        for param in self.first_stage_model.parameters():
            param.requires_grad = False

    def instantiate_cond_stage(self, config):
        if not self.cond_stage_trainable:
            if config == "__is_first_stage__":
                print("Using first stage also as cond stage.")
                self.cond_stage_model = self.first_stage_model
            elif config == "__is_unconditional__":
                print(f"Training {self.__class__.__name__} as an unconditional model.")
                self.cond_stage_model = None
                # self.be_unconditional = True
            else:
                model = instantiate_from_config(config)
                self.cond_stage_model = model.eval()
                self.cond_stage_model.train = disabled_train
                for param in self.cond_stage_model.parameters():
                    param.requires_grad = False
        else:
            assert config != '__is_first_stage__'
            assert config != '__is_unconditional__'
            model = instantiate_from_config(config)
            self.cond_stage_model = model

    def _get_denoise_row_from_list(self, samples, desc='', force_no_decoder_quantization=False):
        denoise_row = []
        for zd in tqdm(samples, desc=desc):
            denoise_row.append(self.decode_first_stage(zd.to(self.device),
                                                            force_not_quantize=force_no_decoder_quantization))
        n_imgs_per_row = len(denoise_row)
        denoise_row = torch.stack(denoise_row)  # n_log_step, n_row, C, H, W
        denoise_grid = rearrange(denoise_row, 'n b c h w -> b n c h w')
        denoise_grid = rearrange(denoise_grid, 'b n c h w -> (b n) c h w')
        denoise_grid = make_grid(denoise_grid, nrow=n_imgs_per_row)
        return denoise_grid

    def get_first_stage_encoding(self, encoder_posterior):
        if isinstance(encoder_posterior, DiagonalGaussianDistribution):
            z = encoder_posterior.sample()
        elif isinstance(encoder_posterior, torch.Tensor):
            z = encoder_posterior
        else:
            raise NotImplementedError(f"encoder_posterior of type '{type(encoder_posterior)}' not yet implemented")
        return self.scale_factor * z

    def get_learned_conditioning(self, c):
        if self.cond_stage_forward is None:
            if hasattr(self.cond_stage_model, 'encode') and callable(self.cond_stage_model.encode):
                c = self.cond_stage_model.encode(c)
                if isinstance(c, DiagonalGaussianDistribution):
                    c = c.mode()
            else:
                c = self.cond_stage_model(c)
        else:
            assert hasattr(self.cond_stage_model, self.cond_stage_forward)
            c = getattr(self.cond_stage_model, self.cond_stage_forward)(c)
        return c

    def meshgrid(self, h, w):
        y = torch.arange(0, h).view(h, 1, 1).repeat(1, w, 1)
        x = torch.arange(0, w).view(1, w, 1).repeat(h, 1, 1)

        arr = torch.cat([y, x], dim=-1)
        return arr

    def delta_border(self, h, w):
        """
        :param h: height
        :param w: width
        :return: normalized distance to image border,
         wtith min distance = 0 at border and max dist = 0.5 at image center
        """
        lower_right_corner = torch.tensor([h - 1, w - 1]).view(1, 1, 2)
        arr = self.meshgrid(h, w) / lower_right_corner
        dist_left_up = torch.min(arr, dim=-1, keepdims=True)[0]
        dist_right_down = torch.min(1 - arr, dim=-1, keepdims=True)[0]
        edge_dist = torch.min(torch.cat([dist_left_up, dist_right_down], dim=-1), dim=-1)[0]
        return edge_dist

    def get_weighting(self, h, w, Ly, Lx, device):
        weighting = self.delta_border(h, w)
        weighting = torch.clip(weighting, self.split_input_params["clip_min_weight"],
                               self.split_input_params["clip_max_weight"], )
        weighting = weighting.view(1, h * w, 1).repeat(1, 1, Ly * Lx).to(device)

        if self.split_input_params["tie_braker"]:
            L_weighting = self.delta_border(Ly, Lx)
            L_weighting = torch.clip(L_weighting,
                                     self.split_input_params["clip_min_tie_weight"],
                                     self.split_input_params["clip_max_tie_weight"])

            L_weighting = L_weighting.view(1, 1, Ly * Lx).to(device)
            weighting = weighting * L_weighting
        return weighting

    def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1):  # todo load once not every time, shorten code
        """
        :param x: img of size (bs, c, h, w)
        :return: n img crops of size (n, bs, c, kernel_size[0], kernel_size[1])
        """
        bs, nc, h, w = x.shape

        # number of crops in image
        Ly = (h - kernel_size[0]) // stride[0] + 1
        Lx = (w - kernel_size[1]) // stride[1] + 1

        if uf == 1 and df == 1:
            fold_params = dict(kernel_size=kernel

Download .txt

gitextract_a5rozwkp/

├── LICENSE
├── README.md
├── Stable_Diffusion_v1_Model_Card.md
├── configs/
│   ├── autoencoder/
│   │   ├── autoencoder_kl_16x16x16.yaml
│   │   ├── autoencoder_kl_32x32x4.yaml
│   │   ├── autoencoder_kl_64x64x3.yaml
│   │   └── autoencoder_kl_8x8x64.yaml
│   ├── latent-diffusion/
│   │   ├── celebahq-ldm-vq-4.yaml
│   │   ├── cin-ldm-vq-f8.yaml
│   │   ├── cin256-v2.yaml
│   │   ├── ffhq-ldm-vq-4.yaml
│   │   ├── lsun_bedrooms-ldm-vq-4.yaml
│   │   ├── lsun_churches-ldm-kl-8.yaml
│   │   └── txt2img-1p4B-eval.yaml
│   ├── retrieval-augmented-diffusion/
│   │   └── 768x768.yaml
│   └── stable-diffusion/
│       └── v1-inference.yaml
├── data/
│   ├── example_conditioning/
│   │   └── text_conditional/
│   │       └── sample_0.txt
│   ├── imagenet_clsidx_to_label.txt
│   ├── imagenet_train_hr_indices.p
│   ├── imagenet_val_hr_indices.p
│   └── index_synset.yaml
├── environment.yaml
├── ldm/
│   ├── data/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── imagenet.py
│   │   └── lsun.py
│   ├── lr_scheduler.py
│   ├── models/
│   │   ├── autoencoder.py
│   │   └── diffusion/
│   │       ├── __init__.py
│   │       ├── classifier.py
│   │       ├── ddim.py
│   │       ├── ddpm.py
│   │       ├── dpm_solver/
│   │       │   ├── __init__.py
│   │       │   ├── dpm_solver.py
│   │       │   └── sampler.py
│   │       └── plms.py
│   ├── modules/
│   │   ├── attention.py
│   │   ├── diffusionmodules/
│   │   │   ├── __init__.py
│   │   │   ├── model.py
│   │   │   ├── openaimodel.py
│   │   │   └── util.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   └── distributions.py
│   │   ├── ema.py
│   │   ├── encoders/
│   │   │   ├── __init__.py
│   │   │   └── modules.py
│   │   ├── image_degradation/
│   │   │   ├── __init__.py
│   │   │   ├── bsrgan.py
│   │   │   ├── bsrgan_light.py
│   │   │   └── utils_image.py
│   │   ├── losses/
│   │   │   ├── __init__.py
│   │   │   ├── contperceptual.py
│   │   │   └── vqperceptual.py
│   │   └── x_transformer.py
│   └── util.py
├── main.py
├── models/
│   ├── first_stage_models/
│   │   ├── kl-f16/
│   │   │   └── config.yaml
│   │   ├── kl-f32/
│   │   │   └── config.yaml
│   │   ├── kl-f4/
│   │   │   └── config.yaml
│   │   ├── kl-f8/
│   │   │   └── config.yaml
│   │   ├── vq-f16/
│   │   │   └── config.yaml
│   │   ├── vq-f4/
│   │   │   └── config.yaml
│   │   ├── vq-f4-noattn/
│   │   │   └── config.yaml
│   │   ├── vq-f8/
│   │   │   └── config.yaml
│   │   └── vq-f8-n256/
│   │       └── config.yaml
│   └── ldm/
│       ├── bsr_sr/
│       │   └── config.yaml
│       ├── celeba256/
│       │   └── config.yaml
│       ├── cin256/
│       │   └── config.yaml
│       ├── ffhq256/
│       │   └── config.yaml
│       ├── inpainting_big/
│       │   └── config.yaml
│       ├── layout2img-openimages256/
│       │   └── config.yaml
│       ├── lsun_beds256/
│       │   └── config.yaml
│       ├── lsun_churches256/
│       │   └── config.yaml
│       ├── semantic_synthesis256/
│       │   └── config.yaml
│       ├── semantic_synthesis512/
│       │   └── config.yaml
│       └── text2img256/
│           └── config.yaml
├── notebook_helpers.py
├── scripts/
│   ├── download_first_stages.sh
│   ├── download_models.sh
│   ├── img2img.py
│   ├── inpaint.py
│   ├── knn2img.py
│   ├── latent_imagenet_diffusion.ipynb
│   ├── sample_diffusion.py
│   ├── tests/
│   │   └── test_watermark.py
│   ├── train_searcher.py
│   └── txt2img.py
└── setup.py

Download .txt

SYMBOL INDEX (701 symbols across 34 files)

FILE: ldm/data/base.py
  class Txt2ImgIterableBaseDataset (line 5) | class Txt2ImgIterableBaseDataset(IterableDataset):
    method __init__ (line 9) | def __init__(self, num_records=0, valid_ids=None, size=256):
    method __len__ (line 18) | def __len__(self):
    method __iter__ (line 22) | def __iter__(self):

FILE: ldm/data/imagenet.py
  function synset2idx (line 20) | def synset2idx(path_to_yaml="data/index_synset.yaml"):
  class ImageNetBase (line 26) | class ImageNetBase(Dataset):
    method __init__ (line 27) | def __init__(self, config=None):
    method __len__ (line 39) | def __len__(self):
    method __getitem__ (line 42) | def __getitem__(self, i):
    method _prepare (line 45) | def _prepare(self):
    method _filter_relpaths (line 48) | def _filter_relpaths(self, relpaths):
    method _prepare_synset_to_human (line 66) | def _prepare_synset_to_human(self):
    method _prepare_idx_to_synset (line 74) | def _prepare_idx_to_synset(self):
    method _prepare_human_to_integer_label (line 80) | def _prepare_human_to_integer_label(self):
    method _load (line 93) | def _load(self):
  class ImageNetTrain (line 134) | class ImageNetTrain(ImageNetBase):
    method __init__ (line 145) | def __init__(self, process_images=True, data_root=None, **kwargs):
    method _prepare (line 150) | def _prepare(self):
  class ImageNetValidation (line 197) | class ImageNetValidation(ImageNetBase):
    method __init__ (line 211) | def __init__(self, process_images=True, data_root=None, **kwargs):
    method _prepare (line 216) | def _prepare(self):
  class ImageNetSR (line 272) | class ImageNetSR(Dataset):
    method __init__ (line 273) | def __init__(self, size=None,
    method __len__ (line 336) | def __len__(self):
    method __getitem__ (line 339) | def __getitem__(self, i):
  class ImageNetSRTrain (line 375) | class ImageNetSRTrain(ImageNetSR):
    method __init__ (line 376) | def __init__(self, **kwargs):
    method get_base (line 379) | def get_base(self):
  class ImageNetSRValidation (line 386) | class ImageNetSRValidation(ImageNetSR):
    method __init__ (line 387) | def __init__(self, **kwargs):
    method get_base (line 390) | def get_base(self):

FILE: ldm/data/lsun.py
  class LSUNBase (line 9) | class LSUNBase(Dataset):
    method __init__ (line 10) | def __init__(self,
    method __len__ (line 36) | def __len__(self):
    method __getitem__ (line 39) | def __getitem__(self, i):
  class LSUNChurchesTrain (line 62) | class LSUNChurchesTrain(LSUNBase):
    method __init__ (line 63) | def __init__(self, **kwargs):
  class LSUNChurchesValidation (line 67) | class LSUNChurchesValidation(LSUNBase):
    method __init__ (line 68) | def __init__(self, flip_p=0., **kwargs):
  class LSUNBedroomsTrain (line 73) | class LSUNBedroomsTrain(LSUNBase):
    method __init__ (line 74) | def __init__(self, **kwargs):
  class LSUNBedroomsValidation (line 78) | class LSUNBedroomsValidation(LSUNBase):
    method __init__ (line 79) | def __init__(self, flip_p=0.0, **kwargs):
  class LSUNCatsTrain (line 84) | class LSUNCatsTrain(LSUNBase):
    method __init__ (line 85) | def __init__(self, **kwargs):
  class LSUNCatsValidation (line 89) | class LSUNCatsValidation(LSUNBase):
    method __init__ (line 90) | def __init__(self, flip_p=0., **kwargs):

FILE: ldm/lr_scheduler.py
  class LambdaWarmUpCosineScheduler (line 4) | class LambdaWarmUpCosineScheduler:
    method __init__ (line 8) | def __init__(self, warm_up_steps, lr_min, lr_max, lr_start, max_decay_...
    method schedule (line 17) | def schedule(self, n, **kwargs):
    method __call__ (line 32) | def __call__(self, n, **kwargs):
  class LambdaWarmUpCosineScheduler2 (line 36) | class LambdaWarmUpCosineScheduler2:
    method __init__ (line 41) | def __init__(self, warm_up_steps, f_min, f_max, f_start, cycle_lengths...
    method find_in_interval (line 52) | def find_in_interval(self, n):
    method schedule (line 59) | def schedule(self, n, **kwargs):
    method __call__ (line 77) | def __call__(self, n, **kwargs):
  class LambdaLinearScheduler (line 81) | class LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):
    method schedule (line 83) | def schedule(self, n, **kwargs):

FILE: ldm/models/autoencoder.py
  class VQModel (line 14) | class VQModel(pl.LightningModule):
    method __init__ (line 15) | def __init__(self,
    method ema_scope (line 64) | def ema_scope(self, context=None):
    method init_from_ckpt (line 78) | def init_from_ckpt(self, path, ignore_keys=list()):
    method on_train_batch_end (line 92) | def on_train_batch_end(self, *args, **kwargs):
    method encode (line 96) | def encode(self, x):
    method encode_to_prequant (line 102) | def encode_to_prequant(self, x):
    method decode (line 107) | def decode(self, quant):
    method decode_code (line 112) | def decode_code(self, code_b):
    method forward (line 117) | def forward(self, input, return_pred_indices=False):
    method get_input (line 124) | def get_input(self, batch, k):
    method training_step (line 142) | def training_step(self, batch, batch_idx, optimizer_idx):
    method validation_step (line 164) | def validation_step(self, batch, batch_idx):
    method _validation_step (line 170) | def _validation_step(self, batch, batch_idx, suffix=""):
    method configure_optimizers (line 197) | def configure_optimizers(self):
    method get_last_layer (line 230) | def get_last_layer(self):
    method log_images (line 233) | def log_images(self, batch, only_inputs=False, plot_ema=False, **kwargs):
    method to_rgb (line 255) | def to_rgb(self, x):
  class VQModelInterface (line 264) | class VQModelInterface(VQModel):
    method __init__ (line 265) | def __init__(self, embed_dim, *args, **kwargs):
    method encode (line 269) | def encode(self, x):
    method decode (line 274) | def decode(self, h, force_not_quantize=False):
  class AutoencoderKL (line 285) | class AutoencoderKL(pl.LightningModule):
    method __init__ (line 286) | def __init__(self,
    method init_from_ckpt (line 313) | def init_from_ckpt(self, path, ignore_keys=list()):
    method encode (line 324) | def encode(self, x):
    method decode (line 330) | def decode(self, z):
    method forward (line 335) | def forward(self, input, sample_posterior=True):
    method get_input (line 344) | def get_input(self, batch, k):
    method training_step (line 351) | def training_step(self, batch, batch_idx, optimizer_idx):
    method validation_step (line 372) | def validation_step(self, batch, batch_idx):
    method configure_optimizers (line 386) | def configure_optimizers(self):
    method get_last_layer (line 397) | def get_last_layer(self):
    method log_images (line 401) | def log_images(self, batch, only_inputs=False, **kwargs):
    method to_rgb (line 417) | def to_rgb(self, x):
  class IdentityFirstStage (line 426) | class IdentityFirstStage(torch.nn.Module):
    method __init__ (line 427) | def __init__(self, *args, vq_interface=False, **kwargs):
    method encode (line 431) | def encode(self, x, *args, **kwargs):
    method decode (line 434) | def decode(self, x, *args, **kwargs):
    method quantize (line 437) | def quantize(self, x, *args, **kwargs):
    method forward (line 442) | def forward(self, x, *args, **kwargs):

FILE: ldm/models/diffusion/classifier.py
  function disabled_train (line 22) | def disabled_train(self, mode=True):
  class NoisyLatentImageClassifier (line 28) | class NoisyLatentImageClassifier(pl.LightningModule):
    method __init__ (line 30) | def __init__(self,
    method init_from_ckpt (line 70) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
    method load_diffusion (line 88) | def load_diffusion(self):
    method load_classifier (line 95) | def load_classifier(self, ckpt_path, pool):
    method get_x_noisy (line 110) | def get_x_noisy(self, x, t, noise=None):
    method forward (line 120) | def forward(self, x_noisy, t, *args, **kwargs):
    method get_input (line 124) | def get_input(self, batch, k):
    method get_conditioning (line 133) | def get_conditioning(self, batch, k=None):
    method compute_top_k (line 150) | def compute_top_k(self, logits, labels, k, reduction="mean"):
    method on_train_epoch_start (line 157) | def on_train_epoch_start(self):
    method write_logs (line 162) | def write_logs(self, loss, logits, targets):
    method shared_step (line 179) | def shared_step(self, batch, t=None):
    method training_step (line 198) | def training_step(self, batch, batch_idx):
    method reset_noise_accs (line 202) | def reset_noise_accs(self):
    method on_validation_start (line 206) | def on_validation_start(self):
    method validation_step (line 210) | def validation_step(self, batch, batch_idx):
    method configure_optimizers (line 220) | def configure_optimizers(self):
    method log_images (line 238) | def log_images(self, batch, N=8, *args, **kwargs):

FILE: ldm/models/diffusion/ddim.py
  class DDIMSampler (line 12) | class DDIMSampler(object):
    method __init__ (line 13) | def __init__(self, model, schedule="linear", **kwargs):
    method register_buffer (line 19) | def register_buffer(self, name, attr):
    method make_schedule (line 25) | def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddi...
    method sample (line 57) | def sample(self,
    method ddim_sampling (line 114) | def ddim_sampling(self, cond, shape,
    method p_sample_ddim (line 166) | def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_origin...
    method stochastic_encode (line 207) | def stochastic_encode(self, x0, t, use_original_steps=False, noise=None):
    method decode (line 223) | def decode(self, x_latent, cond, t_start, unconditional_guidance_scale...

FILE: ldm/models/diffusion/ddpm.py
  function disabled_train (line 34) | def disabled_train(self, mode=True):
  function uniform_on_device (line 40) | def uniform_on_device(r1, r2, shape, device):
  class DDPM (line 44) | class DDPM(pl.LightningModule):
    method __init__ (line 46) | def __init__(self,
    method register_schedule (line 117) | def register_schedule(self, given_betas=None, beta_schedule="linear", ...
    method ema_scope (line 172) | def ema_scope(self, context=None):
    method init_from_ckpt (line 186) | def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
    method q_mean_variance (line 204) | def q_mean_variance(self, x_start, t):
    method predict_start_from_noise (line 216) | def predict_start_from_noise(self, x_t, t, noise):
    method q_posterior (line 222) | def q_posterior(self, x_start, x_t, t):
    method p_mean_variance (line 231) | def p_mean_variance(self, x, t, clip_denoised: bool):
    method p_sample (line 244) | def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):
    method p_sample_loop (line 253) | def p_sample_loop(self, shape, return_intermediates=False):
    method sample (line 268) | def sample(self, batch_size=16, return_intermediates=False):
    method q_sample (line 274) | def q_sample(self, x_start, t, noise=None):
    method get_loss (line 279) | def get_loss(self, pred, target, mean=True):
    method p_losses (line 294) | def p_losses(self, x_start, t, noise=None):
    method forward (line 323) | def forward(self, x, *args, **kwargs):
    method get_input (line 329) | def get_input(self, batch, k):
    method shared_step (line 337) | def shared_step(self, batch):
    method training_step (line 342) | def training_step(self, batch, batch_idx):
    method validation_step (line 358) | def validation_step(self, batch, batch_idx):
    method on_train_batch_end (line 366) | def on_train_batch_end(self, *args, **kwargs):
    method _get_rows_from_list (line 370) | def _get_rows_from_list(self, samples):
    method log_images (line 378) | def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=Non...
    method configure_optimizers (line 415) | def configure_optimizers(self):
  class LatentDiffusion (line 424) | class LatentDiffusion(DDPM):
    method __init__ (line 426) | def __init__(self,
    method make_cond_schedule (line 471) | def make_cond_schedule(self, ):
    method on_train_batch_start (line 478) | def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
    method register_schedule (line 493) | def register_schedule(self,
    method instantiate_first_stage (line 502) | def instantiate_first_stage(self, config):
    method instantiate_cond_stage (line 509) | def instantiate_cond_stage(self, config):
    method _get_denoise_row_from_list (line 530) | def _get_denoise_row_from_list(self, samples, desc='', force_no_decode...
    method get_first_stage_encoding (line 542) | def get_first_stage_encoding(self, encoder_posterior):
    method get_learned_conditioning (line 551) | def get_learned_conditioning(self, c):
    method meshgrid (line 564) | def meshgrid(self, h, w):
    method delta_border (line 571) | def delta_border(self, h, w):
    method get_weighting (line 585) | def get_weighting(self, h, w, Ly, Lx, device):
    method get_fold_unfold (line 601) | def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1):  # todo...
    method get_input (line 654) | def get_input(self, batch, k, return_first_stage_outputs=False, force_...
    method decode_first_stage (line 706) | def decode_first_stage(self, z, predict_cids=False, force_not_quantize...
    method differentiable_decode_first_stage (line 766) | def differentiable_decode_first_stage(self, z, predict_cids=False, for...
    method encode_first_stage (line 826) | def encode_first_stage(self, x):
    method shared_step (line 865) | def shared_step(self, batch, **kwargs):
    method forward (line 870) | def forward(self, x, c, *args, **kwargs):
    method _rescale_annotations (line 881) | def _rescale_annotations(self, bboxes, crop_coordinates):  # TODO: mov...
    method apply_model (line 891) | def apply_model(self, x_noisy, t, cond, return_ids=False):
    method _predict_eps_from_xstart (line 994) | def _predict_eps_from_xstart(self, x_t, t, pred_xstart):
    method _prior_bpd (line 998) | def _prior_bpd(self, x_start):
    method p_losses (line 1012) | def p_losses(self, x_start, cond, t, noise=None):
    method p_mean_variance (line 1047) | def p_mean_variance(self, x, c, t, clip_denoised: bool, return_codeboo...
    method p_sample (line 1079) | def p_sample(self, x, c, t, clip_denoised=False, repeat_noise=False,
    method progressive_denoising (line 1110) | def progressive_denoising(self, cond, shape, verbose=True, callback=No...
    method p_sample_loop (line 1166) | def p_sample_loop(self, cond, shape, return_intermediates=False,
    method sample (line 1217) | def sample(self, cond, batch_size=16, return_intermediates=False, x_T=...
    method sample_log (line 1235) | def sample_log(self,cond,batch_size,ddim, ddim_steps,**kwargs):
    method log_images (line 1251) | def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=200,...
    method configure_optimizers (line 1361) | def configure_optimizers(self):
    method to_rgb (line 1386) | def to_rgb(self, x):
  class DiffusionWrapper (line 1395) | class DiffusionWrapper(pl.LightningModule):
    method __init__ (line 1396) | def __init__(self, diff_model_config, conditioning_key):
    method forward (line 1402) | def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):
  class Layout2ImgDiffusion (line 1424) | class Layout2ImgDiffusion(LatentDiffusion):
    method __init__ (line 1426) | def __init__(self, cond_stage_key, *args, **kwargs):
    method log_images (line 1430) | def log_images(self, batch, N=8, *args, **kwargs):

FILE: ldm/models/diffusion/dpm_solver/dpm_solver.py
  class NoiseScheduleVP (line 6) | class NoiseScheduleVP:
    method __init__ (line 7) | def __init__(
    method marginal_log_mean_coeff (line 125) | def marginal_log_mean_coeff(self, t):
    method marginal_alpha (line 138) | def marginal_alpha(self, t):
    method marginal_std (line 144) | def marginal_std(self, t):
    method marginal_lambda (line 150) | def marginal_lambda(self, t):
    method inverse_lambda (line 158) | def inverse_lambda(self, lamb):
  function model_wrapper (line 177) | def model_wrapper(
  class DPM_Solver (line 351) | class DPM_Solver:
    method __init__ (line 352) | def __init__(self, model_fn, noise_schedule, predict_x0=False, thresho...
    method noise_prediction_fn (line 380) | def noise_prediction_fn(self, x, t):
    method data_prediction_fn (line 386) | def data_prediction_fn(self, x, t):
    method model_fn (line 401) | def model_fn(self, x, t):
    method get_time_steps (line 410) | def get_time_steps(self, skip_type, t_T, t_0, N, device):
    method get_orders_and_timesteps_for_singlestep_solver (line 439) | def get_orders_and_timesteps_for_singlestep_solver(self, steps, order,...
    method denoise_to_zero_fn (line 498) | def denoise_to_zero_fn(self, x, s):
    method dpm_solver_first_update (line 504) | def dpm_solver_first_update(self, x, s, t, model_s=None, return_interm...
    method singlestep_dpm_solver_second_update (line 551) | def singlestep_dpm_solver_second_update(self, x, s, t, r1=0.5, model_s...
    method singlestep_dpm_solver_third_update (line 633) | def singlestep_dpm_solver_third_update(self, x, s, t, r1=1./3., r2=2./...
    method multistep_dpm_solver_second_update (line 755) | def multistep_dpm_solver_second_update(self, x, model_prev_list, t_pre...
    method multistep_dpm_solver_third_update (line 812) | def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev...
    method singlestep_dpm_solver_update (line 859) | def singlestep_dpm_solver_update(self, x, s, t, order, return_intermed...
    method multistep_dpm_solver_update (line 885) | def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list,...
    method dpm_solver_adaptive (line 909) | def dpm_solver_adaptive(self, x, order, t_T, t_0, h_init=0.05, atol=0....
    method sample (line 965) | def sample(self, x, steps=20, t_start=None, t_end=None, order=3, skip_...
  function interpolate_fn (line 1132) | def interpolate_fn(x, xp, yp):
  function expand_dims (line 1174) | def expand_dims(v, dims):

FILE: ldm/models/diffusion/dpm_solver/sampler.py
  class DPMSolverSampler (line 8) | class DPMSolverSampler(object):
    method __init__ (line 9) | def __init__(self, model, **kwargs):
    method register_buffer (line 15) | def register_buffer(self, name, attr):
    method sample (line 22) | def sample(self,

FILE: ldm/models/diffusion/plms.py
  class PLMSSampler (line 11) | class PLMSSampler(object):
    method __init__ (line 12) | def __init__(self, model, schedule="linear", **kwargs):
    method register_buffer (line 18) | def register_buffer(self, name, attr):
    method make_schedule (line 24) | def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddi...
    method sample (line 58) | def sample(self,
    method plms_sampling (line 115) | def plms_sampling(self, cond, shape,
    method p_sample_plms (line 173) | def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_origin...

FILE: ldm/modules/attention.py
  function exists (line 11) | def exists(val):
  function uniq (line 15) | def uniq(arr):
  function default (line 19) | def default(val, d):
  function max_neg_value (line 25) | def max_neg_value(t):
  function init_ (line 29) | def init_(tensor):
  class GEGLU (line 37) | class GEGLU(nn.Module):
    method __init__ (line 38) | def __init__(self, dim_in, dim_out):
    method forward (line 42) | def forward(self, x):
  class FeedForward (line 47) | class FeedForward(nn.Module):
    method __init__ (line 48) | def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):
    method forward (line 63) | def forward(self, x):
  function zero_module (line 67) | def zero_module(module):
  function Normalize (line 76) | def Normalize(in_channels):
  class LinearAttention (line 80) | class LinearAttention(nn.Module):
    method __init__ (line 81) | def __init__(self, dim, heads=4, dim_head=32):
    method forward (line 88) | def forward(self, x):
  class SpatialSelfAttention (line 99) | class SpatialSelfAttention(nn.Module):
    method __init__ (line 100) | def __init__(self, in_channels):
    method forward (line 126) | def forward(self, x):
  class CrossAttention (line 152) | class CrossAttention(nn.Module):
    method __init__ (line 153) | def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, ...
    method forward (line 170) | def forward(self, x, context=None, mask=None):
  class BasicTransformerBlock (line 196) | class BasicTransformerBlock(nn.Module):
    method __init__ (line 197) | def __init__(self, dim, n_heads, d_head, dropout=0., context_dim=None,...
    method forward (line 208) | def forward(self, x, context=None):
    method _forward (line 211) | def _forward(self, x, context=None):
  class SpatialTransformer (line 218) | class SpatialTransformer(nn.Module):
    method __init__ (line 226) | def __init__(self, in_channels, n_heads, d_head,
    method forward (line 250) | def forward(self, x, context=None):

FILE: ldm/modules/diffusionmodules/model.py
  function get_timestep_embedding (line 12) | def get_timestep_embedding(timesteps, embedding_dim):
  function nonlinearity (line 33) | def nonlinearity(x):
  function Normalize (line 38) | def Normalize(in_channels, num_groups=32):
  class Upsample (line 42) | class Upsample(nn.Module):
    method __init__ (line 43) | def __init__(self, in_channels, with_conv):
    method forward (line 53) | def forward(self, x):
  class Downsample (line 60) | class Downsample(nn.Module):
    method __init__ (line 61) | def __init__(self, in_channels, with_conv):
    method forward (line 72) | def forward(self, x):
  class ResnetBlock (line 82) | class ResnetBlock(nn.Module):
    method __init__ (line 83) | def __init__(self, *, in_channels, out_channels=None, conv_shortcut=Fa...
    method forward (line 121) | def forward(self, x, temb):
  class LinAttnBlock (line 144) | class LinAttnBlock(LinearAttention):
    method __init__ (line 146) | def __init__(self, in_channels):
  class AttnBlock (line 150) | class AttnBlock(nn.Module):
    method __init__ (line 151) | def __init__(self, in_channels):
    method forward (line 178) | def forward(self, x):
  function make_attn (line 205) | def make_attn(in_channels, attn_type="vanilla"):
  class Model (line 216) | class Model(nn.Module):
    method __init__ (line 217) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
    method forward (line 316) | def forward(self, x, t=None, context=None):
    method get_last_layer (line 364) | def get_last_layer(self):
  class Encoder (line 368) | class Encoder(nn.Module):
    method __init__ (line 369) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
    method forward (line 434) | def forward(self, x):
  class Decoder (line 462) | class Decoder(nn.Module):
    method __init__ (line 463) | def __init__(self, *, ch, out_ch, ch_mult=(1,2,4,8), num_res_blocks,
    method forward (line 535) | def forward(self, z):
  class SimpleDecoder (line 571) | class SimpleDecoder(nn.Module):
    method __init__ (line 572) | def __init__(self, in_channels, out_channels, *args, **kwargs):
    method forward (line 594) | def forward(self, x):
  class UpsampleDecoder (line 607) | class UpsampleDecoder(nn.Module):
    method __init__ (line 608) | def __init__(self, in_channels, out_channels, ch, num_res_blocks, reso...
    method forward (line 641) | def forward(self, x):
  class LatentRescaler (line 655) | class LatentRescaler(nn.Module):
    method __init__ (line 656) | def __init__(self, factor, in_channels, mid_channels, out_channels, de...
    method forward (line 680) | def forward(self, x):
  class MergedRescaleEncoder (line 692) | class MergedRescaleEncoder(nn.Module):
    method __init__ (line 693) | def __init__(self, in_channels, ch, resolution, out_ch, num_res_blocks,
    method forward (line 705) | def forward(self, x):
  class MergedRescaleDecoder (line 711) | class MergedRescaleDecoder(nn.Module):
    method __init__ (line 712) | def __init__(self, z_channels, out_ch, resolution, num_res_blocks, att...
    method forward (line 722) | def forward(self, x):
  class Upsampler (line 728) | class Upsampler(nn.Module):
    method __init__ (line 729) | def __init__(self, in_size, out_size, in_channels, out_channels, ch_mu...
    method forward (line 741) | def forward(self, x):
  class Resize (line 747) | class Resize(nn.Module):
    method __init__ (line 748) | def __init__(self, in_channels=None, learned=False, mode="bilinear"):
    method forward (line 763) | def forward(self, x, scale_factor=1.0):
  class FirstStagePostProcessor (line 770) | class FirstStagePostProcessor(nn.Module):
    method __init__ (line 772) | def __init__(self, ch_mult:list, in_channels,
    method instantiate_pretrained (line 807) | def instantiate_pretrained(self, config):
    method encode_with_pretrained (line 816) | def encode_with_pretrained(self,x):
    method forward (line 822) | def forward(self,x):

FILE: ldm/modules/diffusionmodules/openaimodel.py
  function convert_module_to_f16 (line 24) | def convert_module_to_f16(x):
  function convert_module_to_f32 (line 27) | def convert_module_to_f32(x):
  class AttentionPool2d (line 32) | class AttentionPool2d(nn.Module):
    method __init__ (line 37) | def __init__(
    method forward (line 51) | def forward(self, x):
  class TimestepBlock (line 62) | class TimestepBlock(nn.Module):
    method forward (line 68) | def forward(self, x, emb):
  class TimestepEmbedSequential (line 74) | class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
    method forward (line 80) | def forward(self, x, emb, context=None):
  class Upsample (line 91) | class Upsample(nn.Module):
    method __init__ (line 100) | def __init__(self, channels, use_conv, dims=2, out_channels=None, padd...
    method forward (line 109) | def forward(self, x):
  class TransposedUpsample (line 121) | class TransposedUpsample(nn.Module):
    method __init__ (line 123) | def __init__(self, channels, out_channels=None, ks=5):
    method forward (line 130) | def forward(self,x):
  class Downsample (line 134) | class Downsample(nn.Module):
    method __init__ (line 143) | def __init__(self, channels, use_conv, dims=2, out_channels=None,paddi...
    method forward (line 158) | def forward(self, x):
  class ResBlock (line 163) | class ResBlock(TimestepBlock):
    method __init__ (line 179) | def __init__(
    method forward (line 243) | def forward(self, x, emb):
    method _forward (line 255) | def _forward(self, x, emb):
  class AttentionBlock (line 278) | class AttentionBlock(nn.Module):
    method __init__ (line 285) | def __init__(
    method forward (line 314) | def forward(self, x):
    method _forward (line 318) | def _forward(self, x):
  function count_flops_attn (line 327) | def count_flops_attn(model, _x, y):
  class QKVAttentionLegacy (line 347) | class QKVAttentionLegacy(nn.Module):
    method __init__ (line 352) | def __init__(self, n_heads):
    method forward (line 356) | def forward(self, qkv):
    method count_flops (line 375) | def count_flops(model, _x, y):
  class QKVAttention (line 379) | class QKVAttention(nn.Module):
    method __init__ (line 384) | def __init__(self, n_heads):
    method forward (line 388) | def forward(self, qkv):
    method count_flops (line 409) | def count_flops(model, _x, y):
  class UNetModel (line 413) | class UNetModel(nn.Module):
    method __init__ (line 443) | def __init__(
    method convert_to_fp16 (line 694) | def convert_to_fp16(self):
    method convert_to_fp32 (line 702) | def convert_to_fp32(self):
    method forward (line 710) | def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
  class EncoderUNetModel (line 745) | class EncoderUNetModel(nn.Module):
    method __init__ (line 751) | def __init__(
    method convert_to_fp16 (line 924) | def convert_to_fp16(self):
    method convert_to_fp32 (line 931) | def convert_to_fp32(self):
    method forward (line 938) | def forward(self, x, timesteps):

FILE: ldm/modules/diffusionmodules/util.py
  function make_beta_schedule (line 21) | def make_beta_schedule(schedule, n_timestep, linear_start=1e-4, linear_e...
  function make_ddim_timesteps (line 46) | def make_ddim_timesteps(ddim_discr_method, num_ddim_timesteps, num_ddpm_...
  function make_ddim_sampling_parameters (line 63) | def make_ddim_sampling_parameters(alphacums, ddim_timesteps, eta, verbos...
  function betas_for_alpha_bar (line 77) | def betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.9...
  function extract_into_tensor (line 96) | def extract_into_tensor(a, t, x_shape):
  function checkpoint (line 102) | def checkpoint(func, inputs, params, flag):
  class CheckpointFunction (line 119) | class CheckpointFunction(torch.autograd.Function):
    method forward (line 121) | def forward(ctx, run_function, length, *args):
    method backward (line 131) | def backward(ctx, *output_grads):
  function timestep_embedding (line 151) | def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=Fal...
  function zero_module (line 174) | def zero_module(module):
  function scale_module (line 183) | def scale_module(module, scale):
  function mean_flat (line 192) | def mean_flat(tensor):
  function normalization (line 199) | def normalization(channels):
  class SiLU (line 209) | class SiLU(nn.Module):
    method forward (line 210) | def forward(self, x):
  class GroupNorm32 (line 214) | class GroupNorm32(nn.GroupNorm):
    method forward (line 215) | def forward(self, x):
  function conv_nd (line 218) | def conv_nd(dims, *args, **kwargs):
  function linear (line 231) | def linear(*args, **kwargs):
  function avg_pool_nd (line 238) | def avg_pool_nd(dims, *args, **kwargs):
  class HybridConditioner (line 251) | class HybridConditioner(nn.Module):
    method __init__ (line 253) | def __init__(self, c_concat_config, c_crossattn_config):
    method forward (line 258) | def forward(self, c_concat, c_crossattn):
  function noise_like (line 264) | def noise_like(shape, device, repeat=False):

FILE: ldm/modules/distributions/distributions.py
  class AbstractDistribution (line 5) | class AbstractDistribution:
    method sample (line 6) | def sample(self):
    method mode (line 9) | def mode(self):
  class DiracDistribution (line 13) | class DiracDistribution(AbstractDistribution):
    method __init__ (line 14) | def __init__(self, value):
    method sample (line 17) | def sample(self):
    method mode (line 20) | def mode(self):
  class DiagonalGaussianDistribution (line 24) | class DiagonalGaussianDistribution(object):
    method __init__ (line 25) | def __init__(self, parameters, deterministic=False):
    method sample (line 35) | def sample(self):
    method kl (line 39) | def kl(self, other=None):
    method nll (line 53) | def nll(self, sample, dims=[1,2,3]):
    method mode (line 61) | def mode(self):
  function normal_kl (line 65) | def normal_kl(mean1, logvar1, mean2, logvar2):

FILE: ldm/modules/ema.py
  class LitEma (line 5) | class LitEma(nn.Module):
    method __init__ (line 6) | def __init__(self, model, decay=0.9999, use_num_upates=True):
    method forward (line 25) | def forward(self,model):
    method copy_to (line 46) | def copy_to(self, model):
    method store (line 55) | def store(self, parameters):
    method restore (line 64) | def restore(self, parameters):

FILE: ldm/modules/encoders/modules.py
  class AbstractEncoder (line 12) | class AbstractEncoder(nn.Module):
    method __init__ (line 13) | def __init__(self):
    method encode (line 16) | def encode(self, *args, **kwargs):
  class ClassEmbedder (line 21) | class ClassEmbedder(nn.Module):
    method __init__ (line 22) | def __init__(self, embed_dim, n_classes=1000, key='class'):
    method forward (line 27) | def forward(self, batch, key=None):
  class TransformerEmbedder (line 36) | class TransformerEmbedder(AbstractEncoder):
    method __init__ (line 38) | def __init__(self, n_embed, n_layer, vocab_size, max_seq_len=77, devic...
    method forward (line 44) | def forward(self, tokens):
    method encode (line 49) | def encode(self, x):
  class BERTTokenizer (line 53) | class BERTTokenizer(AbstractEncoder):
    method __init__ (line 55) | def __init__(self, device="cuda", vq_interface=True, max_length=77):
    method forward (line 63) | def forward(self, text):
    method encode (line 70) | def encode(self, text):
    method decode (line 76) | def decode(self, text):
  class BERTEmbedder (line 80) | class BERTEmbedder(AbstractEncoder):
    method __init__ (line 82) | def __init__(self, n_embed, n_layer, vocab_size=30522, max_seq_len=77,
    method forward (line 93) | def forward(self, text):
    method encode (line 101) | def encode(self, text):
  class SpatialRescaler (line 106) | class SpatialRescaler(nn.Module):
    method __init__ (line 107) | def __init__(self,
    method forward (line 125) | def forward(self,x):
    method encode (line 134) | def encode(self, x):
  class FrozenCLIPEmbedder (line 137) | class FrozenCLIPEmbedder(AbstractEncoder):
    method __init__ (line 139) | def __init__(self, version="openai/clip-vit-large-patch14", device="cu...
    method freeze (line 147) | def freeze(self):
    method forward (line 152) | def forward(self, text):
    method encode (line 161) | def encode(self, text):
  class FrozenCLIPTextEmbedder (line 165) | class FrozenCLIPTextEmbedder(nn.Module):
    method __init__ (line 169) | def __init__(self, version='ViT-L/14', device="cuda", max_length=77, n...
    method freeze (line 177) | def freeze(self):
    method forward (line 182) | def forward(self, text):
    method encode (line 189) | def encode(self, text):
  class FrozenClipImageEmbedder (line 197) | class FrozenClipImageEmbedder(nn.Module):
    method __init__ (line 201) | def __init__(
    method preprocess (line 216) | def preprocess(self, x):
    method forward (line 226) | def forward(self, x):

FILE: ldm/modules/image_degradation/bsrgan.py
  function modcrop_np (line 29) | def modcrop_np(img, sf):
  function analytic_kernel (line 49) | def analytic_kernel(k):
  function anisotropic_Gaussian (line 65) | def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
  function gm_blur_kernel (line 86) | def gm_blur_kernel(mean, cov, size=15):
  function shift_pixel (line 99) | def shift_pixel(x, sf, upper_left=True):
  function blur (line 128) | def blur(x, k):
  function gen_kernel (line 145) | def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]),...
  function fspecial_gaussian (line 187) | def fspecial_gaussian(hsize, sigma):
  function fspecial_laplacian (line 201) | def fspecial_laplacian(alpha):
  function fspecial (line 210) | def fspecial(filter_type, *args, **kwargs):
  function bicubic_degradation (line 228) | def bicubic_degradation(x, sf=3):
  function srmd_degradation (line 240) | def srmd_degradation(x, k, sf=3):
  function dpsr_degradation (line 262) | def dpsr_degradation(x, k, sf=3):
  function classical_degradation (line 284) | def classical_degradation(x, k, sf=3):
  function add_sharpening (line 299) | def add_sharpening(img, weight=0.5, radius=50, threshold=10):
  function add_blur (line 325) | def add_blur(img, sf=4):
  function add_resize (line 339) | def add_resize(img, sf=4):
  function add_Gaussian_noise (line 369) | def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
  function add_speckle_noise (line 386) | def add_speckle_noise(img, noise_level1=2, noise_level2=25):
  function add_Poisson_noise (line 404) | def add_Poisson_noise(img):
  function add_JPEG_noise (line 418) | def add_JPEG_noise(img):
  function random_crop (line 427) | def random_crop(lq, hq, sf=4, lq_patchsize=64):
  function degradation_bsrgan (line 438) | def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
  function degradation_bsrgan_variant (line 530) | def degradation_bsrgan_variant(image, sf=4, isp_model=None):
  function degradation_bsrgan_plus (line 617) | def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True,...

FILE: ldm/modules/image_degradation/bsrgan_light.py
  function modcrop_np (line 29) | def modcrop_np(img, sf):
  function analytic_kernel (line 49) | def analytic_kernel(k):
  function anisotropic_Gaussian (line 65) | def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
  function gm_blur_kernel (line 86) | def gm_blur_kernel(mean, cov, size=15):
  function shift_pixel (line 99) | def shift_pixel(x, sf, upper_left=True):
  function blur (line 128) | def blur(x, k):
  function gen_kernel (line 145) | def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]),...
  function fspecial_gaussian (line 187) | def fspecial_gaussian(hsize, sigma):
  function fspecial_laplacian (line 201) | def fspecial_laplacian(alpha):
  function fspecial (line 210) | def fspecial(filter_type, *args, **kwargs):
  function bicubic_degradation (line 228) | def bicubic_degradation(x, sf=3):
  function srmd_degradation (line 240) | def srmd_degradation(x, k, sf=3):
  function dpsr_degradation (line 262) | def dpsr_degradation(x, k, sf=3):
  function classical_degradation (line 284) | def classical_degradation(x, k, sf=3):
  function add_sharpening (line 299) | def add_sharpening(img, weight=0.5, radius=50, threshold=10):
  function add_blur (line 325) | def add_blur(img, sf=4):
  function add_resize (line 343) | def add_resize(img, sf=4):
  function add_Gaussian_noise (line 373) | def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
  function add_speckle_noise (line 390) | def add_speckle_noise(img, noise_level1=2, noise_level2=25):
  function add_Poisson_noise (line 408) | def add_Poisson_noise(img):
  function add_JPEG_noise (line 422) | def add_JPEG_noise(img):
  function random_crop (line 431) | def random_crop(lq, hq, sf=4, lq_patchsize=64):
  function degradation_bsrgan (line 442) | def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
  function degradation_bsrgan_variant (line 534) | def degradation_bsrgan_variant(image, sf=4, isp_model=None):

FILE: ldm/modules/image_degradation/utils_image.py
  function is_image_file (line 29) | def is_image_file(filename):
  function get_timestamp (line 33) | def get_timestamp():
  function imshow (line 37) | def imshow(x, title=None, cbar=False, figsize=None):
  function surf (line 47) | def surf(Z, cmap='rainbow', figsize=None):
  function get_image_paths (line 67) | def get_image_paths(dataroot):
  function _get_paths_from_images (line 74) | def _get_paths_from_images(path):
  function patches_from_image (line 93) | def patches_from_image(img, p_size=512, p_overlap=64, p_max=800):
  function imssave (line 112) | def imssave(imgs, img_path):
  function split_imageset (line 125) | def split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_si...
  function mkdir (line 153) | def mkdir(path):
  function mkdirs (line 158) | def mkdirs(paths):
  function mkdir_and_rename (line 166) | def mkdir_and_rename(path):
  function imread_uint (line 185) | def imread_uint(path, n_channels=3):
  function imsave (line 203) | def imsave(img, img_path):
  function imwrite (line 209) | def imwrite(img, img_path):
  function read_img (line 220) | def read_img(path):
  function uint2single (line 249) | def uint2single(img):
  function single2uint (line 254) | def single2uint(img):
  function uint162single (line 259) | def uint162single(img):
  function single2uint16 (line 264) | def single2uint16(img):
  function uint2tensor4 (line 275) | def uint2tensor4(img):
  function uint2tensor3 (line 282) | def uint2tensor3(img):
  function tensor2uint (line 289) | def tensor2uint(img):
  function single2tensor3 (line 302) | def single2tensor3(img):
  function single2tensor4 (line 307) | def single2tensor4(img):
  function tensor2single (line 312) | def tensor2single(img):
  function tensor2single3 (line 320) | def tensor2single3(img):
  function single2tensor5 (line 329) | def single2tensor5(img):
  function single32tensor5 (line 333) | def single32tensor5(img):
  function single42tensor4 (line 337) | def single42tensor4(img):
  function tensor2img (line 342) | def tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):
  function augment_img (line 380) | def augment_img(img, mode=0):
  function augment_img_tensor4 (line 401) | def augment_img_tensor4(img, mode=0):
  function augment_img_tensor (line 422) | def augment_img_tensor(img, mode=0):
  function augment_img_np3 (line 441) | def augment_img_np3(img, mode=0):
  function augment_imgs (line 469) | def augment_imgs(img_list, hflip=True, rot=True):
  function modcrop (line 494) | def modcrop(img_in, scale):
  function shave (line 510) | def shave(img_in, border=0):
  function rgb2ycbcr (line 529) | def rgb2ycbcr(img, only_y=True):
  function ycbcr2rgb (line 553) | def ycbcr2rgb(img):
  function bgr2ycbcr (line 573) | def bgr2ycbcr(img, only_y=True):
  function channel_convert (line 597) | def channel_convert(in_c, tar_type, img_list):
  function calculate_psnr (line 621) | def calculate_psnr(img1, img2, border=0):
  function calculate_ssim (line 642) | def calculate_ssim(img1, img2, border=0):
  function ssim (line 669) | def ssim(img1, img2):
  function cubic (line 700) | def cubic(x):
  function calculate_weights_indices (line 708) | def calculate_weights_indices(in_length, out_length, scale, kernel, kern...
  function imresize (line 766) | def imresize(img, scale, antialiasing=True):
  function imresize_np (line 839) | def imresize_np(img, scale, antialiasing=True):

FILE: ldm/modules/losses/contperceptual.py
  class LPIPSWithDiscriminator (line 7) | class LPIPSWithDiscriminator(nn.Module):
    method __init__ (line 8) | def __init__(self, disc_start, logvar_init=0.0, kl_weight=1.0, pixello...
    method calculate_adaptive_weight (line 32) | def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
    method forward (line 45) | def forward(self, inputs, reconstructions, posteriors, optimizer_idx,

FILE: ldm/modules/losses/vqperceptual.py
  function hinge_d_loss_with_exemplar_weights (line 11) | def hinge_d_loss_with_exemplar_weights(logits_real, logits_fake, weights):
  function adopt_weight (line 20) | def adopt_weight(weight, global_step, threshold=0, value=0.):
  function measure_perplexity (line 26) | def measure_perplexity(predicted_indices, n_embed):
  function l1 (line 35) | def l1(x, y):
  function l2 (line 39) | def l2(x, y):
  class VQLPIPSWithDiscriminator (line 43) | class VQLPIPSWithDiscriminator(nn.Module):
    method __init__ (line 44) | def __init__(self, disc_start, codebook_weight=1.0, pixelloss_weight=1.0,
    method calculate_adaptive_weight (line 85) | def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
    method forward (line 98) | def forward(self, codebook_loss, inputs, reconstructions, optimizer_idx,

FILE: ldm/modules/x_transformer.py
  class AbsolutePositionalEmbedding (line 25) | class AbsolutePositionalEmbedding(nn.Module):
    method __init__ (line 26) | def __init__(self, dim, max_seq_len):
    method init_ (line 31) | def init_(self):
    method forward (line 34) | def forward(self, x):
  class FixedPositionalEmbedding (line 39) | class FixedPositionalEmbedding(nn.Module):
    method __init__ (line 40) | def __init__(self, dim):
    method forward (line 45) | def forward(self, x, seq_dim=1, offset=0):
  function exists (line 54) | def exists(val):
  function default (line 58) | def default(val, d):
  function always (line 64) | def always(val):
  function not_equals (line 70) | def not_equals(val):
  function equals (line 76) | def equals(val):
  function max_neg_value (line 82) | def max_neg_value(tensor):
  function pick_and_pop (line 88) | def pick_and_pop(keys, d):
  function group_dict_by_key (line 93) | def group_dict_by_key(cond, d):
  function string_begins_with (line 102) | def string_begins_with(prefix, str):
  function group_by_key_prefix (line 106) | def group_by_key_prefix(prefix, d):
  function groupby_prefix_and_trim (line 110) | def groupby_prefix_and_trim(prefix, d):
  class Scale (line 117) | class Scale(nn.Module):
    method __init__ (line 118) | def __init__(self, value, fn):
    method forward (line 123) | def forward(self, x, **kwargs):
  class Rezero (line 128) | class Rezero(nn.Module):
    method __init__ (line 129) | def __init__(self, fn):
    method forward (line 134) | def forward(self, x, **kwargs):
  class ScaleNorm (line 139) | class ScaleNorm(nn.Module):
    method __init__ (line 140) | def __init__(self, dim, eps=1e-5):
    method forward (line 146) | def forward(self, x):
  class RMSNorm (line 151) | class RMSNorm(nn.Module):
    method __init__ (line 152) | def __init__(self, dim, eps=1e-8):
    method forward (line 158) | def forward(self, x):
  class Residual (line 163) | class Residual(nn.Module):
    method forward (line 164) | def forward(self, x, residual):
  class GRUGating (line 168) | class GRUGating(nn.Module):
    method __init__ (line 169) | def __init__(self, dim):
    method forward (line 173) | def forward(self, x, residual):
  class GEGLU (line 184) | class GEGLU(nn.Module):
    method __init__ (line 185) | def __init__(self, dim_in, dim_out):
    method forward (line 189) | def forward(self, x):
  class FeedForward (line 194) | class FeedForward(nn.Module):
    method __init__ (line 195) | def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):
    method forward (line 210) | def forward(self, x):
  class Attention (line 215) | class Attention(nn.Module):
    method __init__ (line 216) | def __init__(
    method forward (line 268) | def forward(
  class AttentionLayers (line 370) | class AttentionLayers(nn.Module):
    method __init__ (line 371) | def __init__(
    method forward (line 481) | def forward(
  class Encoder (line 541) | class Encoder(AttentionLayers):
    method __init__ (line 542) | def __init__(self, **kwargs):
  class TransformerWrapper (line 548) | class TransformerWrapper(nn.Module):
    method __init__ (line 549) | def __init__(
    method init_ (line 595) | def init_(self):
    method forward (line 598) | def forward(

FILE: ldm/util.py
  function log_txt_as_img (line 17) | def log_txt_as_img(wh, xc, size=10):
  function ismap (line 41) | def ismap(x):
  function isimage (line 47) | def isimage(x):
  function exists (line 53) | def exists(x):
  function default (line 57) | def default(val, d):
  function mean_flat (line 63) | def mean_flat(tensor):
  function count_params (line 71) | def count_params(model, verbose=False):
  function instantiate_from_config (line 78) | def instantiate_from_config(config):
  function get_obj_from_str (line 88) | def get_obj_from_str(string, reload=False):
  function _do_parallel_data_prefetch (line 96) | def _do_parallel_data_prefetch(func, Q, data, idx, idx_to_fn=False):
  function parallel_data_prefetch (line 108) | def parallel_data_prefetch(

FILE: main.py
  function get_parser (line 24) | def get_parser(**parser_kwargs):
  function nondefault_trainer_args (line 126) | def nondefault_trainer_args(opt):
  class WrappedDataset (line 133) | class WrappedDataset(Dataset):
    method __init__ (line 136) | def __init__(self, dataset):
    method __len__ (line 139) | def __len__(self):
    method __getitem__ (line 142) | def __getitem__(self, idx):
  function worker_init_fn (line 146) | def worker_init_fn(_):
  class DataModuleFromConfig (line 162) | class DataModuleFromConfig(pl.LightningDataModule):
    method __init__ (line 163) | def __init__(self, batch_size, train=None, validation=None, test=None,...
    method prepare_data (line 185) | def prepare_data(self):
    method setup (line 189) | def setup(self, stage=None):
    method _train_dataloader (line 197) | def _train_dataloader(self):
    method _val_dataloader (line 207) | def _val_dataloader(self, shuffle=False):
    method _test_dataloader (line 218) | def _test_dataloader(self, shuffle=False):
    method _predict_dataloader (line 231) | def _predict_dataloader(self, shuffle=False):
  class SetupCallback (line 240) | class SetupCallback(Callback):
    method __init__ (line 241) | def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, light...
    method on_keyboard_interrupt (line 251) | def on_keyboard_interrupt(self, trainer, pl_module):
    method on_pretrain_routine_start (line 257) | def on_pretrain_routine_start(self, trainer, pl_module):
  class ImageLogger (line 289) | class ImageLogger(Callback):
    method __init__ (line 290) | def __init__(self, batch_frequency, max_images, clamp=True, increase_l...
    method _testtube (line 310) | def _testtube(self, pl_module, images, batch_idx, split):
    method log_local (line 321) | def log_local(self, save_dir, split, images,
    method log_img (line 340) | def log_img(self, pl_module, batch, batch_idx, split="train"):
    method check_frequency (line 372) | def check_frequency(self, check_idx):
    method on_train_batch_end (line 383) | def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch...
    method on_validation_batch_end (line 387) | def on_validation_batch_end(self, trainer, pl_module, outputs, batch, ...
  class CUDACallback (line 395) | class CUDACallback(Callback):
    method on_train_epoch_start (line 397) | def on_train_epoch_start(self, trainer, pl_module):
    method on_train_epoch_end (line 403) | def on_train_epoch_end(self, trainer, pl_module, outputs):
  function melk (line 697) | def melk(*args, **kwargs):
  function divein (line 705) | def divein(*args, **kwargs):

FILE: notebook_helpers.py
  function download_models (line 19) | def download_models(mode):
  function load_model_from_config (line 40) | def load_model_from_config(config, ckpt):
  function get_model (line 52) | def get_model(mode):
  function get_custom_cond (line 59) | def get_custom_cond(mode):
  function get_cond_options (line 85) | def get_cond_options(mode):
  function select_cond_path (line 92) | def select_cond_path(mode):
  function get_cond (line 107) | def get_cond(mode, selected_path):
  function visualize_cond_img (line 127) | def visualize_cond_img(path):
  function run (line 131) | def run(model, selected_path, task, custom_steps, resize_enabled=False, ...
  function convsample_ddim (line 188) | def convsample_ddim(model, cond, steps, shape, eta=1.0, callback=None, n...
  function make_convolutional_sample (line 208) | def make_convolutional_sample(batch, model, mode="vanilla", custom_steps...

FILE: scripts/img2img.py
  function chunk (line 23) | def chunk(it, size):
  function load_model_from_config (line 28) | def load_model_from_config(config, ckpt, verbose=False):
  function load_img (line 48) | def load_img(path):
  function main (line 60) | def main():

FILE: scripts/inpaint.py
  function make_batch (line 11) | def make_batch(image, mask, device):

FILE: scripts/knn2img.py
  function chunk (line 36) | def chunk(it, size):
  function load_model_from_config (line 41) | def load_model_from_config(config, ckpt, verbose=False):
  class Searcher (line 61) | class Searcher(object):
    method __init__ (line 62) | def __init__(self, database, retriever_version='ViT-L/14'):
    method train_searcher (line 75) | def train_searcher(self, k,
    method load_single_file (line 91) | def load_single_file(self, saved_embeddings):
    method load_multi_files (line 96) | def load_multi_files(self, data_archive):
    method load_database (line 104) | def load_database(self):
    method load_retriever (line 123) | def load_retriever(self, version='ViT-L/14', ):
    method load_searcher (line 130) | def load_searcher(self):
    method search (line 135) | def search(self, x, k):
    method __call__ (line 163) | def __call__(self, x, n):

FILE: scripts/sample_diffusion.py
  function custom_to_pil (line 15) | def custom_to_pil(x):
  function custom_to_np (line 27) | def custom_to_np(x):
  function logs2pil (line 36) | def logs2pil(logs, keys=["sample"]):
  function convsample (line 54) | def convsample(model, shape, return_intermediates=True,
  function convsample_ddim (line 69) | def convsample_ddim(model, steps, shape, eta=1.0
  function make_convolutional_sample (line 79) | def make_convolutional_sample(model, batch_size, vanilla=False, custom_s...
  function run (line 108) | def run(model, logdir, batch_size=50, vanilla=False, custom_steps=None, ...
  function save_logs (line 143) | def save_logs(logs, path, n_saved=0, key="sample", np_path=None):
  function get_parser (line 162) | def get_parser():
  function load_model_from_config (line 220) | def load_model_from_config(config, sd):
  function load_model (line 228) | def load_model(config, ckpt, gpu, eval_mode):

FILE: scripts/tests/test_watermark.py
  function testit (line 6) | def testit(img_path):

FILE: scripts/train_searcher.py
  function search_bruteforce (line 12) | def search_bruteforce(searcher):
  function search_partioned_ah (line 16) | def search_partioned_ah(searcher, dims_per_block, aiq_threshold, reorder_k,
  function search_ah (line 24) | def search_ah(searcher, dims_per_block, aiq_threshold, reorder_k):
  function load_datapool (line 28) | def load_datapool(dpath):
  function train_searcher (line 62) | def train_searcher(opt,

FILE: scripts/txt2img.py
  function chunk (line 32) | def chunk(it, size):
  function numpy_to_pil (line 37) | def numpy_to_pil(images):
  function load_model_from_config (line 49) | def load_model_from_config(config, ckpt, verbose=False):
  function put_watermark (line 69) | def put_watermark(img, wm_encoder=None):
  function load_replacement (line 77) | def load_replacement(x):
  function check_safety (line 88) | def check_safety(x_image):
  function main (line 98) | def main():

Download .json

Condensed preview — 88 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,856K chars).

[
  {
    "path": "LICENSE",
    "chars": 14381,
    "preview": "Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors\n\nCreativeML Open RAIL-M\ndated August 22, 2022\n\nSecti"
  },
  {
    "path": "README.md",
    "chars": 12439,
    "preview": "# Stable Diffusion\n*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.a"
  },
  {
    "path": "Stable_Diffusion_v1_Model_Card.md",
    "chars": 9340,
    "preview": "# Stable Diffusion v1 Model Card\nThis model card focuses on the model associated with the Stable Diffusion model, availa"
  },
  {
    "path": "configs/autoencoder/autoencoder_kl_16x16x16.yaml",
    "chars": 1145,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: \"val/rec_loss\""
  },
  {
    "path": "configs/autoencoder/autoencoder_kl_32x32x4.yaml",
    "chars": 1140,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: \"val/rec_loss\""
  },
  {
    "path": "configs/autoencoder/autoencoder_kl_64x64x3.yaml",
    "chars": 1139,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: \"val/rec_loss\""
  },
  {
    "path": "configs/autoencoder/autoencoder_kl_8x8x64.yaml",
    "chars": 1148,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: \"val/rec_loss\""
  },
  {
    "path": "configs/latent-diffusion/celebahq-ldm-vq-4.yaml",
    "chars": 2028,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "configs/latent-diffusion/cin-ldm-vq-f8.yaml",
    "chars": 2360,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "configs/latent-diffusion/cin256-v2.yaml",
    "chars": 1553,
    "preview": "model:\n  base_learning_rate: 0.0001\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.00"
  },
  {
    "path": "configs/latent-diffusion/ffhq-ldm-vq-4.yaml",
    "chars": 2020,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml",
    "chars": 2024,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml",
    "chars": 2284,
    "preview": "model:\n  base_learning_rate: 5.0e-5   # set to target_lr by starting main.py with '--scale_lr False'\n  target: ldm.model"
  },
  {
    "path": "configs/latent-diffusion/txt2img-1p4B-eval.yaml",
    "chars": 1614,
    "preview": "model:\n  base_learning_rate: 5.0e-05\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "configs/retrieval-augmented-diffusion/768x768.yaml",
    "chars": 1615,
    "preview": "model:\n  base_learning_rate: 0.0001\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.00"
  },
  {
    "path": "configs/stable-diffusion/v1-inference.yaml",
    "chars": 1873,
    "preview": "model:\n  base_learning_rate: 1.0e-04\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "data/example_conditioning/text_conditional/sample_0.txt",
    "chars": 20,
    "preview": "A basket of cerries\n"
  },
  {
    "path": "data/imagenet_clsidx_to_label.txt",
    "chars": 30563,
    "preview": " 0: 'tench, Tinca tinca',\n 1: 'goldfish, Carassius auratus',\n 2: 'great white shark, white shark, man-eater, man-eating "
  },
  {
    "path": "data/index_synset.yaml",
    "chars": 14890,
    "preview": "0: n01440764\n1: n01443537\n2: n01484850\n3: n01491361\n4: n01494475\n5: n01496331\n6: n01498041\n7: n01514668\n8: n07646067\n9: "
  },
  {
    "path": "environment.yaml",
    "chars": 734,
    "preview": "name: ldm\nchannels:\n  - pytorch\n  - defaults\ndependencies:\n  - python=3.8.5\n  - pip=20.3\n  - cudatoolkit=11.3\n  - pytorc"
  },
  {
    "path": "ldm/data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ldm/data/base.py",
    "chars": 693,
    "preview": "from abc import abstractmethod\nfrom torch.utils.data import Dataset, ConcatDataset, ChainDataset, IterableDataset\n\n\nclas"
  },
  {
    "path": "ldm/data/imagenet.py",
    "chars": 15497,
    "preview": "import os, yaml, pickle, shutil, tarfile, glob\nimport cv2\nimport albumentations\nimport PIL\nimport numpy as np\nimport tor"
  },
  {
    "path": "ldm/data/lsun.py",
    "chars": 3274,
    "preview": "import os\nimport numpy as np\nimport PIL\nfrom PIL import Image\nfrom torch.utils.data import Dataset\nfrom torchvision impo"
  },
  {
    "path": "ldm/lr_scheduler.py",
    "chars": 3882,
    "preview": "import numpy as np\n\n\nclass LambdaWarmUpCosineScheduler:\n    \"\"\"\n    note: use with a base_lr of 1.0\n    \"\"\"\n    def __in"
  },
  {
    "path": "ldm/models/autoencoder.py",
    "chars": 17619,
    "preview": "import torch\nimport pytorch_lightning as pl\nimport torch.nn.functional as F\nfrom contextlib import contextmanager\n\nfrom "
  },
  {
    "path": "ldm/models/diffusion/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ldm/models/diffusion/classifier.py",
    "chars": 10276,
    "preview": "import os\nimport torch\nimport pytorch_lightning as pl\nfrom omegaconf import OmegaConf\nfrom torch.nn import functional as"
  },
  {
    "path": "ldm/models/diffusion/ddim.py",
    "chars": 12797,
    "preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ldm.modu"
  },
  {
    "path": "ldm/models/diffusion/ddpm.py",
    "chars": 67425,
    "preview": "\"\"\"\nwild mixture of\nhttps://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e316"
  },
  {
    "path": "ldm/models/diffusion/dpm_solver/__init__.py",
    "chars": 37,
    "preview": "from .sampler import DPMSolverSampler"
  },
  {
    "path": "ldm/models/diffusion/dpm_solver/dpm_solver.py",
    "chars": 64057,
    "preview": "import torch\nimport torch.nn.functional as F\nimport math\n\n\nclass NoiseScheduleVP:\n    def __init__(\n            self,\n  "
  },
  {
    "path": "ldm/models/diffusion/dpm_solver/sampler.py",
    "chars": 2908,
    "preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\n\nfrom .dpm_solver import NoiseScheduleVP, model_wrapper, DPM_Solver\n\n\nclass DPMSolver"
  },
  {
    "path": "ldm/models/diffusion/plms.py",
    "chars": 12450,
    "preview": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ldm.modu"
  },
  {
    "path": "ldm/modules/attention.py",
    "chars": 8531,
    "preview": "from inspect import isfunction\nimport math\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn, einsum\nfro"
  },
  {
    "path": "ldm/modules/diffusionmodules/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ldm/modules/diffusionmodules/model.py",
    "chars": 33409,
    "preview": "# pytorch_diffusion + derived encoder decoder\nimport math\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom ein"
  },
  {
    "path": "ldm/modules/diffusionmodules/openaimodel.py",
    "chars": 34953,
    "preview": "from abc import abstractmethod\nfrom functools import partial\nimport math\nfrom typing import Iterable\n\nimport numpy as np"
  },
  {
    "path": "ldm/modules/diffusionmodules/util.py",
    "chars": 9561,
    "preview": "# adopted from\n# https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# and\n#"
  },
  {
    "path": "ldm/modules/distributions/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ldm/modules/distributions/distributions.py",
    "chars": 2970,
    "preview": "import torch\nimport numpy as np\n\n\nclass AbstractDistribution:\n    def sample(self):\n        raise NotImplementedError()\n"
  },
  {
    "path": "ldm/modules/ema.py",
    "chars": 2982,
    "preview": "import torch\nfrom torch import nn\n\n\nclass LitEma(nn.Module):\n    def __init__(self, model, decay=0.9999, use_num_upates="
  },
  {
    "path": "ldm/modules/encoders/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ldm/modules/encoders/modules.py",
    "chars": 8154,
    "preview": "import torch\nimport torch.nn as nn\nfrom functools import partial\nimport clip\nfrom einops import rearrange, repeat\nfrom t"
  },
  {
    "path": "ldm/modules/image_degradation/__init__.py",
    "chars": 208,
    "preview": "from ldm.modules.image_degradation.bsrgan import degradation_bsrgan_variant as degradation_fn_bsr\nfrom ldm.modules.image"
  },
  {
    "path": "ldm/modules/image_degradation/bsrgan.py",
    "chars": 25198,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\n# --------------------------------------------\n# Super-Resolution\n# ------------------------"
  },
  {
    "path": "ldm/modules/image_degradation/bsrgan_light.py",
    "chars": 22238,
    "preview": "# -*- coding: utf-8 -*-\nimport numpy as np\nimport cv2\nimport torch\n\nfrom functools import partial\nimport random\nfrom sci"
  },
  {
    "path": "ldm/modules/image_degradation/utils_image.py",
    "chars": 29022,
    "preview": "import os\nimport math\nimport random\nimport numpy as np\nimport torch\nimport cv2\nfrom torchvision.utils import make_grid\nf"
  },
  {
    "path": "ldm/modules/losses/__init__.py",
    "chars": 68,
    "preview": "from ldm.modules.losses.contperceptual import LPIPSWithDiscriminator"
  },
  {
    "path": "ldm/modules/losses/contperceptual.py",
    "chars": 5581,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom taming.modules.losses.vqperceptual import *  # TODO: taming dependency yes/no?\n"
  },
  {
    "path": "ldm/modules/losses/vqperceptual.py",
    "chars": 7941,
    "preview": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nfrom einops import repeat\n\nfrom taming.modules.discrim"
  },
  {
    "path": "ldm/modules/x_transformer.py",
    "chars": 20168,
    "preview": "\"\"\"shout-out to https://github.com/lucidrains/x-transformers/tree/main/x_transformers\"\"\"\nimport torch\nfrom torch import "
  },
  {
    "path": "ldm/util.py",
    "chars": 5857,
    "preview": "import importlib\n\nimport torch\nimport numpy as np\nfrom collections import abc\nfrom einops import rearrange\nfrom functool"
  },
  {
    "path": "main.py",
    "chars": 28175,
    "preview": "import argparse, os, sys, datetime, glob, importlib, csv\nimport numpy as np\nimport time\nimport torch\nimport torchvision\n"
  },
  {
    "path": "models/first_stage_models/kl-f16/config.yaml",
    "chars": 909,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: val/rec_loss\n"
  },
  {
    "path": "models/first_stage_models/kl-f32/config.yaml",
    "chars": 929,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: val/rec_loss\n"
  },
  {
    "path": "models/first_stage_models/kl-f4/config.yaml",
    "chars": 880,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: val/rec_loss\n"
  },
  {
    "path": "models/first_stage_models/kl-f8/config.yaml",
    "chars": 889,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.AutoencoderKL\n  params:\n    monitor: val/rec_loss\n"
  },
  {
    "path": "models/first_stage_models/vq-f16/config.yaml",
    "chars": 1026,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.VQModel\n  params:\n    embed_dim: 8\n    n_embed: 16"
  },
  {
    "path": "models/first_stage_models/vq-f4/config.yaml",
    "chars": 955,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.VQModel\n  params:\n    embed_dim: 3\n    n_embed: 81"
  },
  {
    "path": "models/first_stage_models/vq-f4-noattn/config.yaml",
    "chars": 978,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.VQModel\n  params:\n    embed_dim: 3\n    n_embed: 81"
  },
  {
    "path": "models/first_stage_models/vq-f8/config.yaml",
    "chars": 1035,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.VQModel\n  params:\n    embed_dim: 4\n    n_embed: 16"
  },
  {
    "path": "models/first_stage_models/vq-f8-n256/config.yaml",
    "chars": 1013,
    "preview": "model:\n  base_learning_rate: 4.5e-06\n  target: ldm.models.autoencoder.VQModel\n  params:\n    embed_dim: 4\n    n_embed: 25"
  },
  {
    "path": "models/ldm/bsr_sr/config.yaml",
    "chars": 1900,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/celeba256/config.yaml",
    "chars": 1599,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/cin256/config.yaml",
    "chars": 1862,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/ffhq256/config.yaml",
    "chars": 1591,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/inpainting_big/config.yaml",
    "chars": 1619,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/layout2img-openimages256/config.yaml",
    "chars": 1924,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/lsun_beds256/config.yaml",
    "chars": 1601,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/lsun_churches256/config.yaml",
    "chars": 2018,
    "preview": "model:\n  base_learning_rate: 5.0e-05\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/semantic_synthesis256/config.yaml",
    "chars": 1378,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/semantic_synthesis512/config.yaml",
    "chars": 1820,
    "preview": "model:\n  base_learning_rate: 1.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "models/ldm/text2img256/config.yaml",
    "chars": 1831,
    "preview": "model:\n  base_learning_rate: 2.0e-06\n  target: ldm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0"
  },
  {
    "path": "notebook_helpers.py",
    "chars": 10099,
    "preview": "from torchvision.datasets.utils import download_url\nfrom ldm.util import instantiate_from_config\nimport torch\nimport os\n"
  },
  {
    "path": "scripts/download_first_stages.sh",
    "chars": 1324,
    "preview": "#!/bin/bash\nwget -O models/first_stage_models/kl-f4/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f4.zip\nwge"
  },
  {
    "path": "scripts/download_models.sh",
    "chars": 1681,
    "preview": "#!/bin/bash\nwget -O models/ldm/celeba256/celeba-256.zip https://ommer-lab.com/files/latent-diffusion/celeba.zip\nwget -O "
  },
  {
    "path": "scripts/img2img.py",
    "chars": 9181,
    "preview": "\"\"\"make variations of input image\"\"\"\n\nimport argparse, os, sys, glob\nimport PIL\nimport torch\nimport numpy as np\nfrom ome"
  },
  {
    "path": "scripts/inpaint.py",
    "chars": 3644,
    "preview": "import argparse, os, sys, glob\nfrom omegaconf import OmegaConf\nfrom PIL import Image\nfrom tqdm import tqdm\nimport numpy "
  },
  {
    "path": "scripts/knn2img.py",
    "chars": 13707,
    "preview": "import argparse, os, sys, glob\nimport clip\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom omegaconf import O"
  },
  {
    "path": "scripts/latent_imagenet_diffusion.ipynb",
    "chars": 4172302,
    "preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n  \"colab\": {\n   \"name\": \"latent-imagenet-diffusion.ipynb\",\n   \"pr"
  },
  {
    "path": "scripts/sample_diffusion.py",
    "chars": 9606,
    "preview": "import argparse, os, sys, glob, datetime, yaml\nimport torch\nimport time\nimport numpy as np\nfrom tqdm import trange\n\nfrom"
  },
  {
    "path": "scripts/tests/test_watermark.py",
    "chars": 357,
    "preview": "import cv2\nimport fire\nfrom imwatermark import WatermarkDecoder\n\n\ndef testit(img_path):\n    bgr = cv2.imread(img_path)\n "
  },
  {
    "path": "scripts/train_searcher.py",
    "chars": 5807,
    "preview": "import os, sys\nimport numpy as np\nimport scann\nimport argparse\nimport glob\nfrom multiprocessing import cpu_count\nfrom tq"
  },
  {
    "path": "scripts/txt2img.py",
    "chars": 11666,
    "preview": "import argparse, os, sys, glob\nimport cv2\nimport torch\nimport numpy as np\nfrom omegaconf import OmegaConf\nfrom PIL impor"
  },
  {
    "path": "setup.py",
    "chars": 233,
    "preview": "from setuptools import setup, find_packages\n\nsetup(\n    name='latent-diffusion',\n    version='0.0.1',\n    description=''"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the CompVis/stable-diffusion GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 88 files (4.6 MB), approximately 1.2M tokens, and a symbol index with 701 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo