Full Code of Stability-AI/generative-models for AI

main e8cd657656fa cached

120 files

778.4 KB

190.8k tokens

814 symbols

1 requests

Download .txt

Showing preview only (818K chars total). Download the full file or copy to clipboard to get everything.

Repository: Stability-AI/generative-models
Branch: main
Commit: e8cd657656fa
Files: 120
Total size: 778.4 KB

Directory structure:
gitextract_3mk8n7n3/

├── .github/
│   └── workflows/
│       ├── black.yml
│       ├── test-build.yaml
│       └── test-inference.yml
├── .gitignore
├── CODEOWNERS
├── LICENSE-CODE
├── README.md
├── configs/
│   ├── example_training/
│   │   ├── autoencoder/
│   │   │   └── kl-f4/
│   │   │       ├── imagenet-attnfree-logvar.yaml
│   │   │       └── imagenet-kl_f8_8chn.yaml
│   │   ├── imagenet-f8_cond.yaml
│   │   ├── toy/
│   │   │   ├── cifar10_cond.yaml
│   │   │   ├── mnist.yaml
│   │   │   ├── mnist_cond.yaml
│   │   │   ├── mnist_cond_discrete_eps.yaml
│   │   │   ├── mnist_cond_l1_loss.yaml
│   │   │   └── mnist_cond_with_ema.yaml
│   │   ├── txt2img-clipl-legacy-ucg-training.yaml
│   │   └── txt2img-clipl.yaml
│   └── inference/
│       ├── sd_xl_base.yaml
│       ├── sd_xl_refiner.yaml
│       ├── sv3d_p.yaml
│       ├── sv3d_u.yaml
│       ├── svd.yaml
│       └── svd_image_decoder.yaml
├── main.py
├── model_licenses/
│   ├── LICENSE-SDXL-Turbo
│   ├── LICENSE-SDXL0.9
│   ├── LICENSE-SDXL1.0
│   ├── LICENSE-SV3D
│   └── LICENSE-SVD
├── pyproject.toml
├── pytest.ini
├── requirements/
│   └── pt2.txt
├── scripts/
│   ├── __init__.py
│   ├── demo/
│   │   ├── __init__.py
│   │   ├── detect.py
│   │   ├── discretization.py
│   │   ├── gradio_app.py
│   │   ├── gradio_app_sv4d.py
│   │   ├── sampling.py
│   │   ├── streamlit_helpers.py
│   │   ├── sv3d_helpers.py
│   │   ├── sv4d_helpers.py
│   │   ├── turbo.py
│   │   └── video_sampling.py
│   ├── sampling/
│   │   ├── configs/
│   │   │   ├── sv3d_p.yaml
│   │   │   ├── sv3d_u.yaml
│   │   │   ├── sv4d.yaml
│   │   │   ├── sv4d2.yaml
│   │   │   ├── sv4d2_8views.yaml
│   │   │   ├── svd.yaml
│   │   │   ├── svd_image_decoder.yaml
│   │   │   ├── svd_xt.yaml
│   │   │   ├── svd_xt_1_1.yaml
│   │   │   └── svd_xt_image_decoder.yaml
│   │   ├── simple_video_sample.py
│   │   ├── simple_video_sample_4d.py
│   │   └── simple_video_sample_4d2.py
│   ├── tests/
│   │   └── attention.py
│   └── util/
│       ├── __init__.py
│       └── detection/
│           ├── __init__.py
│           ├── nsfw_and_watermark_dectection.py
│           ├── p_head_v1.npz
│           └── w_head_v1.npz
├── sgm/
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── cifar10.py
│   │   ├── dataset.py
│   │   └── mnist.py
│   ├── inference/
│   │   ├── api.py
│   │   └── helpers.py
│   ├── lr_scheduler.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── autoencoder.py
│   │   └── diffusion.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── attention.py
│   │   ├── autoencoding/
│   │   │   ├── __init__.py
│   │   │   ├── losses/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── discriminator_loss.py
│   │   │   │   └── lpips.py
│   │   │   ├── lpips/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── loss/
│   │   │   │   │   ├── .gitignore
│   │   │   │   │   ├── LICENSE
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── lpips.py
│   │   │   │   ├── model/
│   │   │   │   │   ├── LICENSE
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── model.py
│   │   │   │   ├── util.py
│   │   │   │   └── vqperceptual.py
│   │   │   ├── regularizers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── quantize.py
│   │   │   └── temporal_ae.py
│   │   ├── diffusionmodules/
│   │   │   ├── __init__.py
│   │   │   ├── denoiser.py
│   │   │   ├── denoiser_scaling.py
│   │   │   ├── denoiser_weighting.py
│   │   │   ├── discretizer.py
│   │   │   ├── guiders.py
│   │   │   ├── loss.py
│   │   │   ├── loss_weighting.py
│   │   │   ├── model.py
│   │   │   ├── openaimodel.py
│   │   │   ├── sampling.py
│   │   │   ├── sampling_utils.py
│   │   │   ├── sigma_sampling.py
│   │   │   ├── util.py
│   │   │   ├── video_model.py
│   │   │   └── wrappers.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   └── distributions.py
│   │   ├── ema.py
│   │   ├── encoders/
│   │   │   ├── __init__.py
│   │   │   └── modules.py
│   │   ├── spacetime_attention.py
│   │   └── video_attention.py
│   └── util.py
└── tests/
    └── inference/
        └── test_inference.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/black.yml
================================================
name: Run black
on: [pull_request]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install venv
        run: |
          sudo apt-get -y install python3.10-venv
      - uses: psf/black@stable
        with:
          options: "--check --verbose -l88"
          src: "./sgm ./scripts ./main.py"


================================================
FILE: .github/workflows/test-build.yaml
================================================
name: Build package

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build:
    name: Build
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.8", "3.10"]
        requirements-file: ["pt2", "pt13"]
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v2
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements/${{ matrix.requirements-file }}.txt
          pip install .

================================================
FILE: .github/workflows/test-inference.yml
================================================
name: Test inference

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  test:
    name: "Test inference"
    # This action is designed only to run on the Stability research cluster at this time, so many assumptions are made about the environment
    if: github.repository == 'stability-ai/generative-models'
    runs-on: [self-hosted, slurm, g40]
    steps:
      - uses: actions/checkout@v3
      - name: "Symlink checkpoints"
        run: ln -s ${{vars.SGM_CHECKPOINTS_PATH}} checkpoints
      - name: "Setup python"
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: "Install Hatch"
        run: pip install hatch
      - name: "Run inference tests"
        run: hatch run ci:test-inference --junit-xml test-results.xml
      - name: Surface failing tests
        if: always()
        uses: pmeier/pytest-results-action@main
        with:
          path: test-results.xml
          summary: true
          display-options: fEX
          fail-on-empty: true


================================================
FILE: .gitignore
================================================
# extensions
*.egg-info
*.py[cod]

# envs
.pt13
.pt2

# directories
/checkpoints
/dist
/outputs
/build
/src
/.vscode
**/__pycache__/


================================================
FILE: CODEOWNERS
================================================
.github @Stability-AI/infrastructure

================================================
FILE: LICENSE-CODE
================================================
MIT License

Copyright (c) 2023 Stability AI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================================================
FILE: README.md
================================================
# Generative Models by Stability AI

![sample1](assets/000.jpg)

## News


**May 20, 2025**
- We are releasing **[Stable Video 4D 2.0 (SV4D 2.0)](https://huggingface.co/stabilityai/sv4d2.0)**, an enhanced video-to-4D diffusion model for high-fidelity novel-view video synthesis and 4D asset generation. For research purposes:
    - **SV4D 2.0** was trained to generate 48 frames (12 video frames x 4 camera views) at 576x576 resolution, given a 12-frame input video of the same size, ideally consisting of white-background images of a moving object.
    - Compared to our previous 4D model [SV4D](https://huggingface.co/stabilityai/sv4d), **SV4D 2.0** can generate videos with higher fidelity, sharper details during motion, and better spatio-temporal consistency. It also generalizes much better to real-world videos. Moreover, it does not rely on refernce multi-view of the first frame generated by SV3D, making it more robust to self-occlusions.
    - To generate longer novel-view videos, we autoregressively generate 12 frames at a time and use the previous generation as conditioning views for the remaining frames.
    - Please check our [project page](https://sv4d20.github.io), [arxiv paper](https://arxiv.org/pdf/2503.16396) and [video summary](https://www.youtube.com/watch?v=dtqj-s50ynU) for more details.

**QUICKSTART** :
- `python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs` (after downloading [sv4d2.safetensors](https://huggingface.co/stabilityai/sv4d2.0) from HuggingFace into `checkpoints/`)

To run **SV4D 2.0** on a single input video of 21 frames:
- Download SV4D 2.0 model (`sv4d2.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d2.0) to `checkpoints/`: `huggingface-cli download stabilityai/sv4d2.0 sv4d2.safetensors --local-dir checkpoints`
- Run inference: `python scripts/sampling/simple_video_sample_4d2.py --input_path <path/to/video>`
    - `input_path` : The input video `<path/to/video>` can be
      - a single video file in `gif` or `mp4` format, such as `assets/sv4d_videos/camel.gif`, or
      - a folder containing images of video frames in `.jpg`, `.jpeg`, or `.png` format, or
      - a file name pattern matching images of video frames.
    - `num_steps` : default is 50, can decrease to it to shorten sampling time.
    - `elevations_deg` : specified elevations (reletive to input view), default is 0.0 (same as input view).
    - **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using [Clipdrop](https://clipdrop.co/) or [SAM2](https://github.com/facebookresearch/segment-anything-2) before running SV4D.
    - **Low VRAM environment** : To run on GPUs with low VRAM, try setting `--encoding_t=1` (of frames encoded at a time) and `--decoding_t=1` (of frames decoded at a time) or lower video resolution like `--img_size=512`.

Notes:
- We also train a 8-view model that generates 5 frames x 8 views at a time (same as SV4D).
  - Download the model from huggingface: `huggingface-cli download stabilityai/sv4d2.0 sv4d2_8views.safetensors --local-dir checkpoints`
  - Run inference: `python scripts/sampling/simple_video_sample_4d2.py --model_path checkpoints/sv4d2_8views.safetensors --input_path assets/sv4d_videos/chest.gif --output_folder outputs`
  - The 5x8 model takes 5 frames of input at a time. But the inference scripts for both model take 21-frame video as input by default (same as SV3D and SV4D), we run the model autoregressively until we generate 21 frames.
- Install dependencies before running:
```
python3.10 -m venv .generativemodels
source .generativemodels/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # check CUDA version
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
```

  ![tile](assets/sv4d2.gif)


**July 24, 2024**
- We are releasing **[Stable Video 4D (SV4D)](https://huggingface.co/stabilityai/sv4d)**, a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
    - **SV4D** was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
    - To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
    - To run the community-build gradio demo locally, run `python -m scripts.demo.gradio_app_sv4d`.
    - Please check our [project page](https://sv4d.github.io), [tech report](https://sv4d.github.io/static/sv4d_technical_report.pdf) and [video summary](https://www.youtube.com/watch?v=RBP8vdAWTgk) for more details.

**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_videos/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [sv4d.safetensors](https://huggingface.co/stabilityai/sv4d) and [sv3d_u.safetensors](https://huggingface.co/stabilityai/sv3d) from HuggingFace into `checkpoints/`)

To run **SV4D** on a single input video of 21 frames:
- Download SV3D models (`sv3d_u.safetensors` and `sv3d_p.safetensors`) from [here](https://huggingface.co/stabilityai/sv3d) and SV4D model (`sv4d.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d) to `checkpoints/`
- Run `python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>`
    - `input_path` : The input video `<path/to/video>` can be
      - a single video file in `gif` or `mp4` format, such as `assets/sv4d_videos/test_video1.mp4`, or
      - a folder containing images of video frames in `.jpg`, `.jpeg`, or `.png` format, or
      - a file name pattern matching images of video frames.
    - `num_steps` : default is 20, can increase to 50 for better quality but longer sampling time.
    - `sv3d_version` : To specify the SV3D model to generate reference multi-views, set `--sv3d_version=sv3d_u` for SV3D_u or `--sv3d_version=sv3d_p` for SV3D_p.
    - `elevations_deg` : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run `python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_videos/test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0`
    - **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using [Clipdrop](https://clipdrop.co/) or [SAM2](https://github.com/facebookresearch/segment-anything-2) before running SV4D.
    - **Low VRAM environment** : To run on GPUs with low VRAM, try setting `--encoding_t=1` (of frames encoded at a time) and `--decoding_t=1` (of frames decoded at a time) or lower video resolution like `--img_size=512`.

  ![tile](assets/sv4d.gif)


**March 18, 2024**
- We are releasing **[SV3D](https://huggingface.co/stabilityai/sv3d)**, an image-to-video model for novel multi-view synthesis, for research purposes:
    - **SV3D** was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
    - **SV3D_u**: This variant generates orbital videos based on single image inputs without camera conditioning..
    - **SV3D_p**: Extending the capability of **SVD3_u**, this variant accommodates both single images and orbital views allowing for the creation of 3D video along specified camera paths.
    - We extend the streamlit demo `scripts/demo/video_sampling.py` and the standalone python script `scripts/sampling/simple_video_sample.py` for inference of both models.
    - Please check our [project page](https://sv3d.github.io), [tech report](https://sv3d.github.io/static/paper.pdf) and [video summary](https://youtu.be/Zqw4-1LcfWg) for more details.

To run **SV3D_u** on a single image:
- Download `sv3d_u.safetensors` from https://huggingface.co/stabilityai/sv3d to `checkpoints/sv3d_u.safetensors`
- Run `python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u`

To run **SV3D_p** on a single image:
- Download `sv3d_p.safetensors` from https://huggingface.co/stabilityai/sv3d to `checkpoints/sv3d_p.safetensors`
1. Generate static orbit at a specified elevation eg. 10.0 : `python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg 10.0`
2. Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to `elevations_deg` ([-90, 90]), and 21 azimuths (in degrees) to `azimuths_deg` [0, 360] in sorted order from 0 to 360. For example: `python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]`

To run SVD or SV3D on a streamlit server:
`streamlit run scripts/demo/video_sampling.py`

  ![tile](assets/sv3d.gif)


**November 28, 2023**
- We are releasing SDXL-Turbo, a lightning fast text-to image model.
  Alongside the model, we release a [technical report](https://stability.ai/research/adversarial-diffusion-distillation)
    - Usage:
        - Follow the installation instructions or update the existing environment with `pip install streamlit-keyup`.
        - Download the [weights](https://huggingface.co/stabilityai/sdxl-turbo) and place them in the `checkpoints/` directory.
        - Run `streamlit run scripts/demo/turbo.py`.

  ![tile](assets/turbo_tile.png)


**November 21, 2023**
- We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
    - [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid): This model was trained to generate 14
      frames at resolution 576x1024 given a context frame of the same size.
      We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware `deflickering decoder`.
    - [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt): Same architecture as `SVD` but finetuned
      for 25 frame generation.
    - You can run the community-build gradio demo locally by running `python -m scripts.demo.gradio_app`.
    - We provide a streamlit demo `scripts/demo/video_sampling.py` and a standalone python script `scripts/sampling/simple_video_sample.py` for inference of both models.
    - Alongside the model, we release a [technical report](https://stability.ai/research/stable-video-diffusion-scaling-latent-video-diffusion-models-to-large-datasets).

  ![tile](assets/tile.gif)

**July 26, 2023**

- We are releasing two new open models with a
  permissive [`CreativeML Open RAIL++-M` license](model_licenses/LICENSE-SDXL1.0) (see [Inference](#inference) for file
  hashes):
    - [SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0): An improved version
      over `SDXL-base-0.9`.
    - [SDXL-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0): An improved version
      over `SDXL-refiner-0.9`.

![sample2](assets/001_with_eval.png)

**July 4, 2023**

- A technical report on SDXL is now available [here](https://arxiv.org/abs/2307.01952).

**June 22, 2023**

- We are releasing two new diffusion models for research purposes:
    - `SDXL-base-0.9`: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The
      base model uses [OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip)
      and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) for text encoding whereas the refiner model only uses
      the OpenCLIP model.
    - `SDXL-refiner-0.9`: The refiner has been trained to denoise small noise levels of high quality data and as such is
      not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.

If you would like to access these models for your research, please apply using one of the following links:
[SDXL-0.9-Base model](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9),
and [SDXL-0.9-Refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your Hugging Face Account with your organization email to request access.
**We plan to do a full release soon (July).**

## The codebase

### General Philosophy

Modularity is king. This repo implements a config-driven approach where we build and combine submodules by
calling `instantiate_from_config()` on objects defined in yaml configs. See `configs/` for many examples.

### Changelog from the old `ldm` codebase

For training, we use [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), but it should be easy to use other
training wrappers around the base modules. The core diffusion model class (formerly `LatentDiffusion`,
now `DiffusionEngine`) has been cleaned up:

- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial
  conditionings, and all combinations thereof) in a single class: `GeneralConditioner`,
  see `sgm/modules/encoders/modules.py`.
- We separate guiders (such as classifier-free guidance, see `sgm/modules/diffusionmodules/guiders.py`) from the
  samplers (`sgm/modules/diffusionmodules/sampling.py`), and the samplers are independent of the model.
- We adopt the ["denoiser framework"](https://arxiv.org/abs/2206.00364) for both training and inference (most notable
  change is probably now the option to train continuous time models):
    * Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
      see `sgm/modules/diffusionmodules/denoiser.py`.
    * The following features are now independent: weighting of the diffusion loss
      function (`sgm/modules/diffusionmodules/denoiser_weighting.py`), preconditioning of the
      network (`sgm/modules/diffusionmodules/denoiser_scaling.py`), and sampling of noise levels during
      training (`sgm/modules/diffusionmodules/sigma_sampling.py`).
- Autoencoding models have also been cleaned up.

## Installation:

<a name="installation"></a>

#### 1. Clone the repo

```shell
git clone https://github.com/Stability-AI/generative-models.git
cd generative-models
```

#### 2. Setting up the virtualenv

This is assuming you have navigated to the `generative-models` root after cloning it.

**NOTE:** This is tested under `python3.10`. For other python versions, you might encounter version conflicts.

**PyTorch 2.0**

```shell
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements/pt2.txt
```

#### 3. Install `sgm`

```shell
pip3 install .
```

#### 4. Install `sdata` for training

```shell
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
```

## Packaging

This repository uses PEP 517 compliant packaging using [Hatch](https://hatch.pypa.io/latest/).

To build a distributable wheel, install `hatch` and run `hatch build`
(specifying `-t wheel` will skip building a sdist, which is not necessary).

```
pip install hatch
hatch build -t wheel
```

You will find the built package in `dist/`. You can install the wheel with `pip install dist/*.whl`.

Note that the package does **not** currently specify dependencies; you will need to install the required packages,
depending on your use case and PyTorch version, manually.

## Inference

We provide a [streamlit](https://streamlit.io/) demo for text-to-image and image-to-image sampling
in `scripts/demo/sampling.py`.
We provide file hashes for the complete file as well as for only the saved tensors in the file (
see [Model Spec](https://github.com/Stability-AI/ModelSpec) for a script to evaluate that).
The following models are currently supported:

- [SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
  ```
  File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
  Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7
  ```
- [SDXL-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)
  ```
  File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f
  Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81
  ```
- [SDXL-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9)
- [SDXL-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9)

**Weights for SDXL**:

**SDXL-1.0:**
The weights of SDXL-1.0 are available (subject to
a [`CreativeML Open RAIL++-M` license](model_licenses/LICENSE-SDXL1.0)) here:

- base model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/
- refiner model: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/

**SDXL-0.9:**
The weights of SDXL-0.9 are available and subject to a [research license](model_licenses/LICENSE-SDXL0.9).
If you would like to access these models for your research, please apply using one of the following links:
[SDXL-base-0.9 model](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9),
and [SDXL-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your Hugging Face Account with your organization email to request access.

After obtaining the weights, place them into `checkpoints/`.
Next, start the demo using

```
streamlit run scripts/demo/sampling.py --server.port <your_port>
```

### Invisible Watermark Detection

Images generated with our code use the
[invisible-watermark](https://github.com/ShieldMnt/invisible-watermark/)
library to embed an invisible watermark into the model output. We also provide
a script to easily detect that watermark. Please note that this watermark is
not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or
try an _experimental_ import using only a minimal amount of packages:

```bash
python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark
```

To run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual environment beforehand, e.g. `source .pt1/bin/activate`):

```bash
# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*
```

## Training:

We are providing example training configs in `configs/example_training`. To launch a training, run

```
python main.py --base configs/<config1.yaml> configs/<config2.yaml>
```

where configs are merged from left to right (later configs overwrite the same values).
This can be used to combine model, training and data configs. However, all of them can also be
defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST,
run

```bash
python main.py --base configs/example_training/toy/mnist_cond.yaml
```

**NOTE 1:** Using the non-toy-dataset
configs `configs/example_training/imagenet-f8_cond.yaml`, `configs/example_training/txt2img-clipl.yaml`
and `configs/example_training/txt2img-clipl-legacy-ucg-training.yaml` for training will require edits depending on the
used dataset (which is expected to stored in tar-file in
the [webdataset-format](https://github.com/webdataset/webdataset)). To find the parts which have to be adapted, search
for comments containing `USER:` in the respective config.

**NOTE 2:** This repository supports both `pytorch1.13` and `pytorch2`for training generative models. However for
autoencoder training as e.g. in `configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml`,
only `pytorch1.13` is supported.

**NOTE 3:** Training latent generative models (as e.g. in `configs/example_training/imagenet-f8_cond.yaml`) requires
retrieving the checkpoint from [Hugging Face](https://huggingface.co/stabilityai/sdxl-vae/tree/main) and replacing
the `CKPT_PATH` placeholder in [this line](configs/example_training/imagenet-f8_cond.yaml#81). The same is to be done
for the provided text-to-image configs.

### Building New Diffusion Models

#### Conditioner

The `GeneralConditioner` is configured through the `conditioner_config`. Its only attribute is `emb_models`, a list of
different embedders (all inherited from `AbstractEmbModel`) that are used to condition the generative model.
All embedders should define whether or not they are trainable (`is_trainable`, default `False`), a classifier-free
guidance dropout rate is used (`ucg_rate`, default `0`), and an input key (`input_key`), for example, `txt` for
text-conditioning or `cls` for class-conditioning.
When computing conditionings, the embedder will get `batch[input_key]` as input.
We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated
appropriately.
Note that the order of the embedders in the `conditioner_config` is important.

#### Network

The neural network is set through the `network_config`. This used to be called `unet_config`, which is not general
enough as we plan to experiment with transformer-based diffusion backbones.

#### Loss

The loss is configured through `loss_config`. For standard diffusion model training, you will have to
set `sigma_sampler_config`.

#### Sampler config

As discussed above, the sampler is independent of the model. In the `sampler_config`, we set the type of numerical
solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free
guidance.

### Dataset Handling

For large scale training we recommend using the data pipelines from
our [data pipelines](https://github.com/Stability-AI/datapipelines) project. The project is contained in the requirement
and automatically included when following the steps from the [Installation section](#installation).
Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of
data keys/values,
e.g.,

```python
example = {"jpg": x,  # this is a tensor -1...1 chw
           "txt": "a beautiful image"}
```

where we expect images in -1...1, channel-first format.


================================================
FILE: configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: sgm.models.autoencoder.AutoencodingEngine
  params:
    input_key: jpg
    monitor: val/rec_loss

    loss_config:
      target: sgm.modules.autoencoding.losses.GeneralLPIPSWithDiscriminator
      params:
        perceptual_weight: 0.25
        disc_start: 20001
        disc_weight: 0.5
        learn_logvar: True

        regularization_weights:
          kl_loss: 1.0

    regularizer_config:
      target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer

    encoder_config:
      target: sgm.modules.diffusionmodules.model.Encoder
      params:
        attn_type: none
        double_z: True
        z_channels: 4
        resolution: 256
        in_channels: 3
        out_ch: 3
        ch: 128
        ch_mult: [1, 2, 4]
        num_res_blocks: 4
        attn_resolutions: []
        dropout: 0.0

    decoder_config:
      target: sgm.modules.diffusionmodules.model.Decoder
      params: ${model.params.encoder_config.params}

data:
  target: sgm.data.dataset.StableDataModuleFromConfig
  params:
    train:
      datapipeline:
        urls:
          - DATA-PATH
        pipeline_config:
          shardshuffle: 10000
          sample_shuffle: 10000

        decoders:
          - pil

        postprocessors:
          - target: sdata.mappers.TorchVisionImageTransforms
            params:
              key: jpg
              transforms:
                - target: torchvision.transforms.Resize
                  params:
                    size: 256
                    interpolation: 3
                - target: torchvision.transforms.ToTensor
          - target: sdata.mappers.Rescaler
          - target: sdata.mappers.AddOriginalImageSizeAsTupleAndCropToSquare
            params:
              h_key: height
              w_key: width

      loader:
        batch_size: 8
        num_workers: 4


lightning:
  strategy:
    target: pytorch_lightning.strategies.DDPStrategy
    params:
      find_unused_parameters: True

  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 50000

    image_logger:
      target: main.ImageLogger
      params:
        enable_autocast: False
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    devices: 0,
    limit_val_batches: 50
    benchmark: True
    accumulate_grad_batches: 1
    val_check_interval: 10000

================================================
FILE: configs/example_training/autoencoder/kl-f4/imagenet-kl_f8_8chn.yaml
================================================
model:
  base_learning_rate: 4.5e-6
  target: sgm.models.autoencoder.AutoencodingEngine
  params:
    input_key: jpg
    monitor: val/loss/rec
    disc_start_iter: 0

    encoder_config:
      target: sgm.modules.diffusionmodules.model.Encoder
      params:
        attn_type: vanilla-xformers
        double_z: true
        z_channels: 8
        resolution: 256
        in_channels: 3
        out_ch: 3
        ch: 128
        ch_mult: [1, 2, 4, 4]
        num_res_blocks: 2
        attn_resolutions: []
        dropout: 0.0

    decoder_config:
      target: sgm.modules.diffusionmodules.model.Decoder
      params: ${model.params.encoder_config.params}

    regularizer_config:
      target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer

    loss_config:
      target: sgm.modules.autoencoding.losses.GeneralLPIPSWithDiscriminator
      params:
        perceptual_weight: 0.25
        disc_start: 20001
        disc_weight: 0.5
        learn_logvar: True

        regularization_weights:
          kl_loss: 1.0

data:
  target: sgm.data.dataset.StableDataModuleFromConfig
  params:
    train:
      datapipeline:
        urls:
          - DATA-PATH
        pipeline_config:
          shardshuffle: 10000
          sample_shuffle: 10000

        decoders:
          - pil

        postprocessors:
          - target: sdata.mappers.TorchVisionImageTransforms
            params:
              key: jpg
              transforms:
                - target: torchvision.transforms.Resize
                  params:
                    size: 256
                    interpolation: 3
                - target: torchvision.transforms.ToTensor
          - target: sdata.mappers.Rescaler
          - target: sdata.mappers.AddOriginalImageSizeAsTupleAndCropToSquare
            params:
              h_key: height
              w_key: width

      loader:
        batch_size: 8
        num_workers: 4


lightning:
  strategy:
    target: pytorch_lightning.strategies.DDPStrategy
    params:
      find_unused_parameters: True

  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 50000

    image_logger:
      target: main.ImageLogger
      params:
        enable_autocast: False
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True

  trainer:
    devices: 0,
    limit_val_batches: 50
    benchmark: True
    accumulate_grad_batches: 1
    val_check_interval: 10000


================================================
FILE: configs/example_training/imagenet-f8_cond.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True
    log_keys:
      - cls

    scheduler_config:
      target: sgm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [10000]
        cycle_lengths: [10000000000000]
        f_start: [1.e-6]
        f_max: [1.]
        f_min: [1.]

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 256
        attention_resolutions: [1, 2, 4]
        num_res_blocks: 2
        channel_mult: [1, 2, 4]
        num_head_channels: 64
        num_classes: sequential
        adm_in_channels: 1024
        transformer_depth: 1
        context_dim: 1024
        spatial_transformer_attn_type: softmax-xformers

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              add_sequence_dim: True
              embed_dim: 1024
              n_classes: 1000

          - is_trainable: False
            ucg_rate: 0.2
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: crop_coords_top_left
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        ckpt_path: CKPT_PATH
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:        
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EpsWeighting
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.DiscreteSampling
          params:
            num_idx: 1000

            discretization_config:
              target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 5.0

data:
  target: sgm.data.dataset.StableDataModuleFromConfig
  params:
    train:
      datapipeline:
        urls:
          # USER: adapt this path the root of your custom dataset
          - DATA_PATH
        pipeline_config:
          shardshuffle: 10000
          sample_shuffle: 10000 # USER: you might wanna adapt depending on your available RAM

        decoders:
          - pil

        postprocessors:
          - target: sdata.mappers.TorchVisionImageTransforms
            params:
              key: jpg # USER: you might wanna adapt this for your custom dataset
              transforms:
                - target: torchvision.transforms.Resize
                  params:
                    size: 256
                    interpolation: 3
                - target: torchvision.transforms.ToTensor
          - target: sdata.mappers.Rescaler

          - target: sdata.mappers.AddOriginalImageSizeAsTupleAndCropToSquare
            params:
              h_key: height # USER: you might wanna adapt this for your custom dataset
              w_key: width # USER: you might wanna adapt this for your custom dataset

      loader:
        batch_size: 64
        num_workers: 6

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        enable_autocast: False
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 8
          n_rows: 2

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 1000

================================================
FILE: configs/example_training/toy/cifar10_cond.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
          params:
            sigma_data: 1.0

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 3
        out_channels: 3
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32
        num_classes: sequential
        adm_in_channels: 128

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              embed_dim: 128
              n_classes: 10

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
          params:
            sigma_data: 1.0
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 3.0

data:
  target: sgm.data.cifar10.CIFAR10Loader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 64
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 64
          n_rows: 8

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 20

================================================
FILE: configs/example_training/toy/mnist.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
          params:
            sigma_data: 1.0

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 1
        out_channels: 1
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
          params:
            sigma_data: 1.0
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization

data:
  target: sgm.data.mnist.MNISTLoader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 64
        increase_log_steps: False
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 64
          n_rows: 8

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 10

================================================
FILE: configs/example_training/toy/mnist_cond.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
          params:
            sigma_data: 1.0

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 1
        out_channels: 1
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32
        num_classes: sequential
        adm_in_channels: 128

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              embed_dim: 128
              n_classes: 10

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
          params:
            sigma_data: 1.0
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 3.0

data:
  target: sgm.data.mnist.MNISTLoader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 16
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 16
          n_rows: 4

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 20

================================================
FILE: configs/example_training/toy/mnist_cond_discrete_eps.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 1
        out_channels: 1
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32
        num_classes: sequential
        adm_in_channels: 128

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              embed_dim: 128
              n_classes: 10

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.DiscreteSampling
          params:
            num_idx: 1000

            discretization_config:
              target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 5.0

data:
  target: sgm.data.mnist.MNISTLoader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 16
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 16
          n_rows: 4

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 20

================================================
FILE: configs/example_training/toy/mnist_cond_l1_loss.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
          params:
            sigma_data: 1.0

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 1
        out_channels: 1
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32
        num_classes: sequential
        adm_in_channels: 128

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              embed_dim: 128
              n_classes: 10

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_type: l1
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
          params:
            sigma_data: 1.0
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 3.0

data:
  target: sgm.data.mnist.MNISTLoader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 64
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 64
          n_rows: 8

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 20

================================================
FILE: configs/example_training/toy/mnist_cond_with_ema.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    use_ema: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EDMScaling
          params:
            sigma_data: 1.0

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        in_channels: 1
        out_channels: 1
        model_channels: 32
        attention_resolutions: []
        num_res_blocks: 4
        channel_mult: [1, 2, 2]
        num_head_channels: 32
        num_classes: sequential
        adm_in_channels: 128

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: cls
            ucg_rate: 0.2
            target: sgm.modules.encoders.modules.ClassEmbedder
            params:
              embed_dim: 128
              n_classes: 10

    first_stage_config:
      target: sgm.models.autoencoder.IdentityFirstStage

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EDMWeighting
          params:
            sigma_data: 1.0
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 3.0

data:
  target: sgm.data.mnist.MNISTLoader
  params:
    batch_size: 512
    num_workers: 1

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        batch_frequency: 1000
        max_images: 64
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 64
          n_rows: 8

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 20

================================================
FILE: configs/example_training/txt2img-clipl-legacy-ucg-training.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True
    log_keys:
      - txt

    scheduler_config:
      target: sgm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [10000]
        cycle_lengths: [10000000000000]
        f_start: [1.e-6]
        f_max: [1.]
        f_min: [1.]

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [1, 2, 4]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        num_classes: sequential
        adm_in_channels: 1792
        num_heads: 1
        transformer_depth: 1
        context_dim: 768
        spatial_transformer_attn_type: softmax-xformers

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: txt
            ucg_rate: 0.1
            legacy_ucg_value: ""
            target: sgm.modules.encoders.modules.FrozenCLIPEmbedder
            params:
              always_return_pooled: True

          - is_trainable: False
            ucg_rate: 0.1
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: crop_coords_top_left
            ucg_rate: 0.1
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        ckpt_path: CKPT_PATH
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [ 1, 2, 4, 4 ]
          num_res_blocks: 2
          attn_resolutions: [ ]
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EpsWeighting
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.DiscreteSampling
          params:
            num_idx: 1000

            discretization_config:
              target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 7.5

data:
  target: sgm.data.dataset.StableDataModuleFromConfig
  params:
    train:
      datapipeline:
        urls:
          # USER: adapt this path the root of your custom dataset
          - DATA_PATH
        pipeline_config:
          shardshuffle: 10000
          sample_shuffle: 10000 # USER: you might wanna adapt depending on your available RAM

        decoders:
          - pil

        postprocessors:
          - target: sdata.mappers.TorchVisionImageTransforms
            params:
              key: jpg # USER: you might wanna adapt this for your custom dataset
              transforms:
                - target: torchvision.transforms.Resize
                  params:
                    size: 256
                    interpolation: 3
                - target: torchvision.transforms.ToTensor
          - target: sdata.mappers.Rescaler
          - target: sdata.mappers.AddOriginalImageSizeAsTupleAndCropToSquare
            # USER: you might wanna use non-default parameters due to your custom dataset

      loader:
        batch_size: 64
        num_workers: 6

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        enable_autocast: False
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 8
          n_rows: 2

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 1000

================================================
FILE: configs/example_training/txt2img-clipl.yaml
================================================
model:
  base_learning_rate: 1.0e-4
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True
    log_keys:
      - txt

    scheduler_config:
      target: sgm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [10000]
        cycle_lengths: [10000000000000]
        f_start: [1.e-6]
        f_max: [1.]
        f_min: [1.]

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [1, 2, 4]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        num_classes: sequential
        adm_in_channels: 1792
        num_heads: 1
        transformer_depth: 1
        context_dim: 768
        spatial_transformer_attn_type: softmax-xformers

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: True
            input_key: txt
            ucg_rate: 0.1
            legacy_ucg_value: ""
            target: sgm.modules.encoders.modules.FrozenCLIPEmbedder
            params:
              always_return_pooled: True

          - is_trainable: False
            ucg_rate: 0.1
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: crop_coords_top_left
            ucg_rate: 0.1
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        ckpt_path: CKPT_PATH
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    loss_fn_config:
      target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss
      params:
        loss_weighting_config:
          target: sgm.modules.diffusionmodules.loss_weighting.EpsWeighting
        sigma_sampler_config:
          target: sgm.modules.diffusionmodules.sigma_sampling.DiscreteSampling
          params:
            num_idx: 1000

            discretization_config:
              target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 50

        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.VanillaCFG
          params:
            scale: 7.5

data:
  target: sgm.data.dataset.StableDataModuleFromConfig
  params:
    train:
      datapipeline:
        urls:
          # USER: adapt this path the root of your custom dataset
          - DATA_PATH
        pipeline_config:
          shardshuffle: 10000
          sample_shuffle: 10000


        decoders:
          - pil

        postprocessors:
          - target: sdata.mappers.TorchVisionImageTransforms
            params:
              key: jpg # USER: you might wanna adapt this for your custom dataset
              transforms:
                - target: torchvision.transforms.Resize
                  params:
                    size: 256
                    interpolation: 3
                - target: torchvision.transforms.ToTensor
          - target: sdata.mappers.Rescaler
            # USER: you might wanna use non-default parameters due to your custom dataset
          - target: sdata.mappers.AddOriginalImageSizeAsTupleAndCropToSquare
            # USER: you might wanna use non-default parameters due to your custom dataset

      loader:
        batch_size: 64
        num_workers: 6

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 5000

  callbacks:
    metrics_over_trainsteps_checkpoint:
      params:
        every_n_train_steps: 25000

    image_logger:
      target: main.ImageLogger
      params:
        disabled: False
        enable_autocast: False
        batch_frequency: 1000
        max_images: 8
        increase_log_steps: True
        log_first_step: False
        log_images_kwargs:
          use_ema_scope: False
          N: 8
          n_rows: 2

  trainer:
    devices: 0,
    benchmark: True
    num_sanity_val_steps: 0
    accumulate_grad_batches: 1
    max_epochs: 1000

================================================
FILE: configs/inference/sd_xl_base.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        adm_in_channels: 2816
        num_classes: sequential
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2]
        num_res_blocks: 2
        channel_mult: [1, 2, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: [1, 2, 10]
        context_dim: 2048
        spatial_transformer_attn_type: softmax-xformers

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: False
            input_key: txt
            target: sgm.modules.encoders.modules.FrozenCLIPEmbedder
            params:
              layer: hidden
              layer_idx: 11

          - is_trainable: False
            input_key: txt
            target: sgm.modules.encoders.modules.FrozenOpenCLIPEmbedder2
            params:
              arch: ViT-bigG-14
              version: laion2b_s39b_b160k
              freeze: True
              layer: penultimate
              always_return_pooled: True
              legacy: False

          - is_trainable: False
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: crop_coords_top_left
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: target_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity


================================================
FILE: configs/inference/sd_xl_refiner.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        adm_in_channels: 2560
        num_classes: sequential
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 384
        attention_resolutions: [4, 2]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: 4
        context_dim: [1280, 1280, 1280, 1280]
        spatial_transformer_attn_type: softmax-xformers

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          - is_trainable: False
            input_key: txt
            target: sgm.modules.encoders.modules.FrozenOpenCLIPEmbedder2
            params:
              arch: ViT-bigG-14
              version: laion2b_s39b_b160k
              legacy: False
              freeze: True
              layer: penultimate
              always_return_pooled: True

          - is_trainable: False
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: crop_coords_top_left
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

          - is_trainable: False
            input_key: aesthetic_score
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity


================================================
FILE: configs/inference/sv3d_p.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.18215
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.VScalingWithEDMcNoise

    network_config:
      target: sgm.modules.diffusionmodules.video_model.VideoUNet
      params:
        adm_in_channels: 1280
        num_classes: sequential
        use_checkpoint: True
        in_channels: 8
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2, 1]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        spatial_transformer_attn_type: softmax-xformers
        extra_ff_mix_layer: True
        use_spatial_context: True
        merge_strategy: learned_with_images
        video_kernel_size: [3, 1, 1]

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
        - input_key: cond_frames_without_noise
          is_trainable: False
          target: sgm.modules.encoders.modules.FrozenOpenCLIPImagePredictionEmbedder
          params:
            n_cond_frames: 1
            n_copies: 1
            open_clip_embedding_config:
              target: sgm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder
              params:
                freeze: True

        - input_key: cond_frames
          is_trainable: False
          target: sgm.modules.encoders.modules.VideoPredictionEmbedderWithEncoder
          params:
            disable_encoder_autocast: True
            n_cond_frames: 1
            n_copies: 1
            is_ae: True
            encoder_config:
              target: sgm.models.autoencoder.AutoencoderKLModeOnly
              params:
                embed_dim: 4
                monitor: val/rec_loss
                ddconfig:
                  attn_type: vanilla-xformers
                  double_z: True
                  z_channels: 4
                  resolution: 256
                  in_channels: 3
                  out_ch: 3
                  ch: 128
                  ch_mult: [1, 2, 4, 4]
                  num_res_blocks: 2
                  attn_resolutions: []
                  dropout: 0.0
                lossconfig:
                  target: torch.nn.Identity

        - input_key: cond_aug
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

        - input_key: polars_rad
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 512

        - input_key: azimuths_rad
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 512

    first_stage_config:
      target: sgm.models.autoencoder.AutoencodingEngine
      params:
        loss_config:
          target: torch.nn.Identity
        regularizer_config:
          target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer
        encoder_config:
          target: torch.nn.Identity
        decoder_config:
          target: sgm.modules.diffusionmodules.model.Decoder
          params:
            attn_type: vanilla-xformers
            double_z: True
            z_channels: 4
            resolution: 256
            in_channels: 3
            out_ch: 3
            ch: 128
            ch_mult: [ 1, 2, 4, 4 ]
            num_res_blocks: 2
            attn_resolutions: [ ]
            dropout: 0.0

================================================
FILE: configs/inference/sv3d_u.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.18215
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.VScalingWithEDMcNoise

    network_config:
      target: sgm.modules.diffusionmodules.video_model.VideoUNet
      params:
        adm_in_channels: 256
        num_classes: sequential
        use_checkpoint: True
        in_channels: 8
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2, 1]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        spatial_transformer_attn_type: softmax-xformers
        extra_ff_mix_layer: True
        use_spatial_context: True
        merge_strategy: learned_with_images
        video_kernel_size: [3, 1, 1]

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
        - input_key: cond_frames_without_noise
          is_trainable: False
          target: sgm.modules.encoders.modules.FrozenOpenCLIPImagePredictionEmbedder
          params:
            n_cond_frames: 1
            n_copies: 1
            open_clip_embedding_config:
              target: sgm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder
              params:
                freeze: True

        - input_key: cond_frames
          is_trainable: False
          target: sgm.modules.encoders.modules.VideoPredictionEmbedderWithEncoder
          params:
            disable_encoder_autocast: True
            n_cond_frames: 1
            n_copies: 1
            is_ae: True
            encoder_config:
              target: sgm.models.autoencoder.AutoencoderKLModeOnly
              params:
                embed_dim: 4
                monitor: val/rec_loss
                ddconfig:
                  attn_type: vanilla-xformers
                  double_z: True
                  z_channels: 4
                  resolution: 256
                  in_channels: 3
                  out_ch: 3
                  ch: 128
                  ch_mult: [1, 2, 4, 4]
                  num_res_blocks: 2
                  attn_resolutions: []
                  dropout: 0.0
                lossconfig:
                  target: torch.nn.Identity

        - input_key: cond_aug
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencodingEngine
      params:
        loss_config:
          target: torch.nn.Identity
        regularizer_config:
          target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer
        encoder_config:
          target: torch.nn.Identity
        decoder_config:
          target: sgm.modules.diffusionmodules.model.Decoder
          params:
            attn_type: vanilla-xformers
            double_z: True
            z_channels: 4
            resolution: 256
            in_channels: 3
            out_ch: 3
            ch: 128
            ch_mult: [ 1, 2, 4, 4 ]
            num_res_blocks: 2
            attn_resolutions: [ ]
            dropout: 0.0

================================================
FILE: configs/inference/svd.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.18215
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.VScalingWithEDMcNoise

    network_config:
      target: sgm.modules.diffusionmodules.video_model.VideoUNet
      params:
        adm_in_channels: 768
        num_classes: sequential
        use_checkpoint: True
        in_channels: 8
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2, 1]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        spatial_transformer_attn_type: softmax-xformers
        extra_ff_mix_layer: True
        use_spatial_context: True
        merge_strategy: learned_with_images
        video_kernel_size: [3, 1, 1]

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
        - is_trainable: False
          input_key: cond_frames_without_noise
          target: sgm.modules.encoders.modules.FrozenOpenCLIPImagePredictionEmbedder
          params:
            n_cond_frames: 1
            n_copies: 1
            open_clip_embedding_config:
              target: sgm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder
              params:
                freeze: True

        - input_key: fps_id
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

        - input_key: motion_bucket_id
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

        - input_key: cond_frames
          is_trainable: False
          target: sgm.modules.encoders.modules.VideoPredictionEmbedderWithEncoder
          params:
            disable_encoder_autocast: True
            n_cond_frames: 1
            n_copies: 1
            is_ae: True
            encoder_config:
              target: sgm.models.autoencoder.AutoencoderKLModeOnly
              params:
                embed_dim: 4
                monitor: val/rec_loss
                ddconfig:
                  attn_type: vanilla-xformers
                  double_z: True
                  z_channels: 4
                  resolution: 256
                  in_channels: 3
                  out_ch: 3
                  ch: 128
                  ch_mult: [1, 2, 4, 4]
                  num_res_blocks: 2
                  attn_resolutions: []
                  dropout: 0.0
                lossconfig:
                  target: torch.nn.Identity

        - input_key: cond_aug
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencodingEngine
      params:
        loss_config:
          target: torch.nn.Identity
        regularizer_config:
          target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer
        encoder_config: 
          target: sgm.modules.diffusionmodules.model.Encoder
          params:
            attn_type: vanilla
            double_z: True
            z_channels: 4
            resolution: 256
            in_channels: 3
            out_ch: 3
            ch: 128
            ch_mult: [1, 2, 4, 4]
            num_res_blocks: 2
            attn_resolutions: []
            dropout: 0.0
        decoder_config:
          target: sgm.modules.autoencoding.temporal_ae.VideoDecoder
          params:
            attn_type: vanilla
            double_z: True
            z_channels: 4
            resolution: 256
            in_channels: 3
            out_ch: 3
            ch: 128
            ch_mult: [1, 2, 4, 4]
            num_res_blocks: 2
            attn_resolutions: []
            dropout: 0.0
            video_kernel_size: [3, 1, 1]

================================================
FILE: configs/inference/svd_image_decoder.yaml
================================================
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.18215
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.Denoiser
      params:
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.VScalingWithEDMcNoise

    network_config:
      target: sgm.modules.diffusionmodules.video_model.VideoUNet
      params:
        adm_in_channels: 768
        num_classes: sequential
        use_checkpoint: True
        in_channels: 8
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2, 1]
        num_res_blocks: 2
        channel_mult: [1, 2, 4, 4]
        num_head_channels: 64
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        spatial_transformer_attn_type: softmax-xformers
        extra_ff_mix_layer: True
        use_spatial_context: True
        merge_strategy: learned_with_images
        video_kernel_size: [3, 1, 1]

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
        - is_trainable: False
          input_key: cond_frames_without_noise
          target: sgm.modules.encoders.modules.FrozenOpenCLIPImagePredictionEmbedder
          params:
            n_cond_frames: 1
            n_copies: 1
            open_clip_embedding_config:
              target: sgm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder
              params:
                freeze: True

        - input_key: fps_id
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

        - input_key: motion_bucket_id
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

        - input_key: cond_frames
          is_trainable: False
          target: sgm.modules.encoders.modules.VideoPredictionEmbedderWithEncoder
          params:
            disable_encoder_autocast: True
            n_cond_frames: 1
            n_copies: 1
            is_ae: True
            encoder_config:
              target: sgm.models.autoencoder.AutoencoderKLModeOnly
              params:
                embed_dim: 4
                monitor: val/rec_loss
                ddconfig:
                  attn_type: vanilla-xformers
                  double_z: True
                  z_channels: 4
                  resolution: 256
                  in_channels: 3
                  out_ch: 3
                  ch: 128
                  ch_mult: [1, 2, 4, 4]
                  num_res_blocks: 2
                  attn_resolutions: []
                  dropout: 0.0
                lossconfig:
                  target: torch.nn.Identity

        - input_key: cond_aug
          is_trainable: False
          target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
          params:
            outdim: 256

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: True
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

================================================
FILE: main.py
================================================
import argparse
import datetime
import glob
import inspect
import os
import sys
from inspect import Parameter
from typing import Union

import numpy as np
import pytorch_lightning as pl
import torch
import torchvision
import wandb
from matplotlib import pyplot as plt
from natsort import natsorted
from omegaconf import OmegaConf
from packaging import version
from PIL import Image
from pytorch_lightning import seed_everything
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.utilities import rank_zero_only

from sgm.util import exists, instantiate_from_config, isheatmap

MULTINODE_HACKS = True


def default_trainer_args():
    argspec = dict(inspect.signature(Trainer.__init__).parameters)
    argspec.pop("self")
    default_args = {
        param: argspec[param].default
        for param in argspec
        if argspec[param] != Parameter.empty
    }
    return default_args


def get_parser(**parser_kwargs):
    def str2bool(v):
        if isinstance(v, bool):
            return v
        if v.lower() in ("yes", "true", "t", "y", "1"):
            return True
        elif v.lower() in ("no", "false", "f", "n", "0"):
            return False
        else:
            raise argparse.ArgumentTypeError("Boolean value expected.")

    parser = argparse.ArgumentParser(**parser_kwargs)
    parser.add_argument(
        "-n",
        "--name",
        type=str,
        const=True,
        default="",
        nargs="?",
        help="postfix for logdir",
    )
    parser.add_argument(
        "--no_date",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="if True, skip date generation for logdir and only use naming via opt.base or opt.name (+ opt.postfix, optionally)",
    )
    parser.add_argument(
        "-r",
        "--resume",
        type=str,
        const=True,
        default="",
        nargs="?",
        help="resume from logdir or checkpoint in logdir",
    )
    parser.add_argument(
        "-b",
        "--base",
        nargs="*",
        metavar="base_config.yaml",
        help="paths to base configs. Loaded from left-to-right. "
        "Parameters can be overwritten or added with command-line options of the form `--key value`.",
        default=list(),
    )
    parser.add_argument(
        "-t",
        "--train",
        type=str2bool,
        const=True,
        default=True,
        nargs="?",
        help="train",
    )
    parser.add_argument(
        "--no-test",
        type=str2bool,
        const=True,
        default=False,
        nargs="?",
        help="disable test",
    )
    parser.add_argument(
        "-p", "--project", help="name of new or path to existing project"
    )
    parser.add_argument(
        "-d",
        "--debug",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="enable post-mortem debugging",
    )
    parser.add_argument(
        "-s",
        "--seed",
        type=int,
        default=23,
        help="seed for seed_everything",
    )
    parser.add_argument(
        "-f",
        "--postfix",
        type=str,
        default="",
        help="post-postfix for default name",
    )
    parser.add_argument(
        "--projectname",
        type=str,
        default="stablediffusion",
    )
    parser.add_argument(
        "-l",
        "--logdir",
        type=str,
        default="logs",
        help="directory for logging dat shit",
    )
    parser.add_argument(
        "--scale_lr",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="scale base-lr by ngpu * batch_size * n_accumulate",
    )
    parser.add_argument(
        "--legacy_naming",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="name run based on config file name if true, else by whole path",
    )
    parser.add_argument(
        "--enable_tf32",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="enables the TensorFloat32 format both for matmuls and cuDNN for pytorch 1.12",
    )
    parser.add_argument(
        "--startup",
        type=str,
        default=None,
        help="Startuptime from distributed script",
    )
    parser.add_argument(
        "--wandb",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,  # TODO: later default to True
        help="log to wandb",
    )
    parser.add_argument(
        "--no_base_name",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,  # TODO: later default to True
        help="log to wandb",
    )
    if version.parse(torch.__version__) >= version.parse("2.0.0"):
        parser.add_argument(
            "--resume_from_checkpoint",
            type=str,
            default=None,
            help="single checkpoint file to resume from",
        )
    default_args = default_trainer_args()
    for key in default_args:
        parser.add_argument("--" + key, default=default_args[key])
    return parser


def get_checkpoint_name(logdir):
    ckpt = os.path.join(logdir, "checkpoints", "last**.ckpt")
    ckpt = natsorted(glob.glob(ckpt))
    print('available "last" checkpoints:')
    print(ckpt)
    if len(ckpt) > 1:
        print("got most recent checkpoint")
        ckpt = sorted(ckpt, key=lambda x: os.path.getmtime(x))[-1]
        print(f"Most recent ckpt is {ckpt}")
        with open(os.path.join(logdir, "most_recent_ckpt.txt"), "w") as f:
            f.write(ckpt + "\n")
        try:
            version = int(ckpt.split("/")[-1].split("-v")[-1].split(".")[0])
        except Exception as e:
            print("version confusion but not bad")
            print(e)
            version = 1
        # version = last_version + 1
    else:
        # in this case, we only have one "last.ckpt"
        ckpt = ckpt[0]
        version = 1
    melk_ckpt_name = f"last-v{version}.ckpt"
    print(f"Current melk ckpt name: {melk_ckpt_name}")
    return ckpt, melk_ckpt_name


class SetupCallback(Callback):
    def __init__(
        self,
        resume,
        now,
        logdir,
        ckptdir,
        cfgdir,
        config,
        lightning_config,
        debug,
        ckpt_name=None,
    ):
        super().__init__()
        self.resume = resume
        self.now = now
        self.logdir = logdir
        self.ckptdir = ckptdir
        self.cfgdir = cfgdir
        self.config = config
        self.lightning_config = lightning_config
        self.debug = debug
        self.ckpt_name = ckpt_name

    def on_exception(self, trainer: pl.Trainer, pl_module, exception):
        if not self.debug and trainer.global_rank == 0:
            print("Summoning checkpoint.")
            if self.ckpt_name is None:
                ckpt_path = os.path.join(self.ckptdir, "last.ckpt")
            else:
                ckpt_path = os.path.join(self.ckptdir, self.ckpt_name)
            trainer.save_checkpoint(ckpt_path)

    def on_fit_start(self, trainer, pl_module):
        if trainer.global_rank == 0:
            # Create logdirs and save configs
            os.makedirs(self.logdir, exist_ok=True)
            os.makedirs(self.ckptdir, exist_ok=True)
            os.makedirs(self.cfgdir, exist_ok=True)

            if "callbacks" in self.lightning_config:
                if (
                    "metrics_over_trainsteps_checkpoint"
                    in self.lightning_config["callbacks"]
                ):
                    os.makedirs(
                        os.path.join(self.ckptdir, "trainstep_checkpoints"),
                        exist_ok=True,
                    )
            print("Project config")
            print(OmegaConf.to_yaml(self.config))
            if MULTINODE_HACKS:
                import time

                time.sleep(5)
            OmegaConf.save(
                self.config,
                os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)),
            )

            print("Lightning config")
            print(OmegaConf.to_yaml(self.lightning_config))
            OmegaConf.save(
                OmegaConf.create({"lightning": self.lightning_config}),
                os.path.join(self.cfgdir, "{}-lightning.yaml".format(self.now)),
            )

        else:
            # ModelCheckpoint callback created log directory --- remove it
            if not MULTINODE_HACKS and not self.resume and os.path.exists(self.logdir):
                dst, name = os.path.split(self.logdir)
                dst = os.path.join(dst, "child_runs", name)
                os.makedirs(os.path.split(dst)[0], exist_ok=True)
                try:
                    os.rename(self.logdir, dst)
                except FileNotFoundError:
                    pass


class ImageLogger(Callback):
    def __init__(
        self,
        batch_frequency,
        max_images,
        clamp=True,
        increase_log_steps=True,
        rescale=True,
        disabled=False,
        log_on_batch_idx=False,
        log_first_step=False,
        log_images_kwargs=None,
        log_before_first_step=False,
        enable_autocast=True,
    ):
        super().__init__()
        self.enable_autocast = enable_autocast
        self.rescale = rescale
        self.batch_freq = batch_frequency
        self.max_images = max_images
        self.log_steps = [2**n for n in range(int(np.log2(self.batch_freq)) + 1)]
        if not increase_log_steps:
            self.log_steps = [self.batch_freq]
        self.clamp = clamp
        self.disabled = disabled
        self.log_on_batch_idx = log_on_batch_idx
        self.log_images_kwargs = log_images_kwargs if log_images_kwargs else {}
        self.log_first_step = log_first_step
        self.log_before_first_step = log_before_first_step

    @rank_zero_only
    def log_local(
        self,
        save_dir,
        split,
        images,
        global_step,
        current_epoch,
        batch_idx,
        pl_module: Union[None, pl.LightningModule] = None,
    ):
        root = os.path.join(save_dir, "images", split)
        for k in images:
            if isheatmap(images[k]):
                fig, ax = plt.subplots()
                ax = ax.matshow(
                    images[k].cpu().numpy(), cmap="hot", interpolation="lanczos"
                )
                plt.colorbar(ax)
                plt.axis("off")

                filename = "{}_gs-{:06}_e-{:06}_b-{:06}.png".format(
                    k, global_step, current_epoch, batch_idx
                )
                os.makedirs(root, exist_ok=True)
                path = os.path.join(root, filename)
                plt.savefig(path)
                plt.close()
                # TODO: support wandb
            else:
                grid = torchvision.utils.make_grid(images[k], nrow=4)
                if self.rescale:
                    grid = (grid + 1.0) / 2.0  # -1,1 -> 0,1; c,h,w
                grid = grid.transpose(0, 1).transpose(1, 2).squeeze(-1)
                grid = grid.numpy()
                grid = (grid * 255).astype(np.uint8)
                filename = "{}_gs-{:06}_e-{:06}_b-{:06}.png".format(
                    k, global_step, current_epoch, batch_idx
                )
                path = os.path.join(root, filename)
                os.makedirs(os.path.split(path)[0], exist_ok=True)
                img = Image.fromarray(grid)
                img.save(path)
                if exists(pl_module):
                    assert isinstance(
                        pl_module.logger, WandbLogger
                    ), "logger_log_image only supports WandbLogger currently"
                    pl_module.logger.log_image(
                        key=f"{split}/{k}",
                        images=[
                            img,
                        ],
                        step=pl_module.global_step,
                    )

    @rank_zero_only
    def log_img(self, pl_module, batch, batch_idx, split="train"):
        check_idx = batch_idx if self.log_on_batch_idx else pl_module.global_step
        if (
            self.check_frequency(check_idx)
            and hasattr(pl_module, "log_images")  # batch_idx % self.batch_freq == 0
            and callable(pl_module.log_images)
            and
            # batch_idx > 5 and
            self.max_images > 0
        ):
            logger = type(pl_module.logger)
            is_train = pl_module.training
            if is_train:
                pl_module.eval()

            gpu_autocast_kwargs = {
                "enabled": self.enable_autocast,  # torch.is_autocast_enabled(),
                "dtype": torch.get_autocast_gpu_dtype(),
                "cache_enabled": torch.is_autocast_cache_enabled(),
            }
            with torch.no_grad(), torch.cuda.amp.autocast(**gpu_autocast_kwargs):
                images = pl_module.log_images(
                    batch, split=split, **self.log_images_kwargs
                )

            for k in images:
                N = min(images[k].shape[0], self.max_images)
                if not isheatmap(images[k]):
                    images[k] = images[k][:N]
                if isinstance(images[k], torch.Tensor):
                    images[k] = images[k].detach().float().cpu()
                    if self.clamp and not isheatmap(images[k]):
                        images[k] = torch.clamp(images[k], -1.0, 1.0)

            self.log_local(
                pl_module.logger.save_dir,
                split,
                images,
                pl_module.global_step,
                pl_module.current_epoch,
                batch_idx,
                pl_module=pl_module
                if isinstance(pl_module.logger, WandbLogger)
                else None,
            )

            if is_train:
                pl_module.train()

    def check_frequency(self, check_idx):
        if ((check_idx % self.batch_freq) == 0 or (check_idx in self.log_steps)) and (
            check_idx > 0 or self.log_first_step
        ):
            try:
                self.log_steps.pop(0)
            except IndexError as e:
                print(e)
                pass
            return True
        return False

    @rank_zero_only
    def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx):
        if not self.disabled and (pl_module.global_step > 0 or self.log_first_step):
            self.log_img(pl_module, batch, batch_idx, split="train")

    @rank_zero_only
    def on_train_batch_start(self, trainer, pl_module, batch, batch_idx):
        if self.log_before_first_step and pl_module.global_step == 0:
            print(f"{self.__class__.__name__}: logging before training")
            self.log_img(pl_module, batch, batch_idx, split="train")

    @rank_zero_only
    def on_validation_batch_end(
        self, trainer, pl_module, outputs, batch, batch_idx, *args, **kwargs
    ):
        if not self.disabled and pl_module.global_step > 0:
            self.log_img(pl_module, batch, batch_idx, split="val")
        if hasattr(pl_module, "calibrate_grad_norm"):
            if (
                pl_module.calibrate_grad_norm and batch_idx % 25 == 0
            ) and batch_idx > 0:
                self.log_gradients(trainer, pl_module, batch_idx=batch_idx)


@rank_zero_only
def init_wandb(save_dir, opt, config, group_name, name_str):
    print(f"setting WANDB_DIR to {save_dir}")
    os.makedirs(save_dir, exist_ok=True)

    os.environ["WANDB_DIR"] = save_dir
    if opt.debug:
        wandb.init(project=opt.projectname, mode="offline", group=group_name)
    else:
        wandb.init(
            project=opt.projectname,
            config=config,
            settings=wandb.Settings(code_dir="./sgm"),
            group=group_name,
            name=name_str,
        )


if __name__ == "__main__":
    # custom parser to specify config files, train, test and debug mode,
    # postfix, resume.
    # `--key value` arguments are interpreted as arguments to the trainer.
    # `nested.key=value` arguments are interpreted as config parameters.
    # configs are merged from left-to-right followed by command line parameters.

    # model:
    #   base_learning_rate: float
    #   target: path to lightning module
    #   params:
    #       key: value
    # data:
    #   target: main.DataModuleFromConfig
    #   params:
    #      batch_size: int
    #      wrap: bool
    #      train:
    #          target: path to train dataset
    #          params:
    #              key: value
    #      validation:
    #          target: path to validation dataset
    #          params:
    #              key: value
    #      test:
    #          target: path to test dataset
    #          params:
    #              key: value
    # lightning: (optional, has sane defaults and can be specified on cmdline)
    #   trainer:
    #       additional arguments to trainer
    #   logger:
    #       logger to instantiate
    #   modelcheckpoint:
    #       modelcheckpoint to instantiate
    #   callbacks:
    #       callback1:
    #           target: importpath
    #           params:
    #               key: value

    now = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")

    # add cwd for convenience and to make classes in this file available when
    # running as `python main.py`
    # (in particular `main.DataModuleFromConfig`)
    sys.path.append(os.getcwd())

    parser = get_parser()

    opt, unknown = parser.parse_known_args()

    if opt.name and opt.resume:
        raise ValueError(
            "-n/--name and -r/--resume cannot be specified both."
            "If you want to resume training in a new log folder, "
            "use -n/--name in combination with --resume_from_checkpoint"
        )
    melk_ckpt_name = None
    name = None
    if opt.resume:
        if not os.path.exists(opt.resume):
            raise ValueError("Cannot find {}".format(opt.resume))
        if os.path.isfile(opt.resume):
            paths = opt.resume.split("/")
            # idx = len(paths)-paths[::-1].index("logs")+1
            # logdir = "/".join(paths[:idx])
            logdir = "/".join(paths[:-2])
            ckpt = opt.resume
            _, melk_ckpt_name = get_checkpoint_name(logdir)
        else:
            assert os.path.isdir(opt.resume), opt.resume
            logdir = opt.resume.rstrip("/")
            ckpt, melk_ckpt_name = get_checkpoint_name(logdir)

        print("#" * 100)
        print(f'Resuming from checkpoint "{ckpt}"')
        print("#" * 100)

        opt.resume_from_checkpoint = ckpt
        base_configs = sorted(glob.glob(os.path.join(logdir, "configs/*.yaml")))
        opt.base = base_configs + opt.base
        _tmp = logdir.split("/")
        nowname = _tmp[-1]
    else:
        if opt.name:
            name = "_" + opt.name
        elif opt.base:
            if opt.no_base_name:
                name = ""
            else:
                if opt.legacy_naming:
                    cfg_fname = os.path.split(opt.base[0])[-1]
                    cfg_name = os.path.splitext(cfg_fname)[0]
                else:
                    assert "configs" in os.path.split(opt.base[0])[0], os.path.split(
                        opt.base[0]
                    )[0]
                    cfg_path = os.path.split(opt.base[0])[0].split(os.sep)[
                        os.path.split(opt.base[0])[0].split(os.sep).index("configs")
                        + 1 :
                    ]  # cut away the first one (we assert all configs are in "configs")
                    cfg_name = os.path.splitext(os.path.split(opt.base[0])[-1])[0]
                    cfg_name = "-".join(cfg_path) + f"-{cfg_name}"
                name = "_" + cfg_name
        else:
            name = ""
        if not opt.no_date:
            nowname = now + name + opt.postfix
        else:
            nowname = name + opt.postfix
            if nowname.startswith("_"):
                nowname = nowname[1:]
        logdir = os.path.join(opt.logdir, nowname)
        print(f"LOGDIR: {logdir}")

    ckptdir = os.path.join(logdir, "checkpoints")
    cfgdir = os.path.join(logdir, "configs")
    seed_everything(opt.seed, workers=True)

    # move before model init, in case a torch.compile(...) is called somewhere
    if opt.enable_tf32:
        # pt_version = version.parse(torch.__version__)
        torch.backends.cuda.matmul.allow_tf32 = True
        torch.backends.cudnn.allow_tf32 = True
        print(f"Enabling TF32 for PyTorch {torch.__version__}")
    else:
        print(f"Using default TF32 settings for PyTorch {torch.__version__}:")
        print(
            f"torch.backends.cuda.matmul.allow_tf32={torch.backends.cuda.matmul.allow_tf32}"
        )
        print(f"torch.backends.cudnn.allow_tf32={torch.backends.cudnn.allow_tf32}")

    try:
        # init and save configs
        configs = [OmegaConf.load(cfg) for cfg in opt.base]
        cli = OmegaConf.from_dotlist(unknown)
        config = OmegaConf.merge(*configs, cli)
        lightning_config = config.pop("lightning", OmegaConf.create())
        # merge trainer cli with config
        trainer_config = lightning_config.get("trainer", OmegaConf.create())

        # default to gpu
        trainer_config["accelerator"] = "gpu"
        #
        standard_args = default_trainer_args()
        for k in standard_args:
            if getattr(opt, k) != standard_args[k]:
                trainer_config[k] = getattr(opt, k)

        ckpt_resume_path = opt.resume_from_checkpoint

        if not "devices" in trainer_config and trainer_config["accelerator"] != "gpu":
            del trainer_config["accelerator"]
            cpu = True
        else:
            gpuinfo = trainer_config["devices"]
            print(f"Running on GPUs {gpuinfo}")
            cpu = False
        trainer_opt = argparse.Namespace(**trainer_config)
        lightning_config.trainer = trainer_config

        # model
        model = instantiate_from_config(config.model)

        # trainer and callbacks
        trainer_kwargs = dict()

        # default logger configs
        default_logger_cfgs = {
            "wandb": {
                "target": "pytorch_lightning.loggers.WandbLogger",
                "params": {
                    "name": nowname,
                    # "save_dir": logdir,
                    "offline": opt.debug,
                    "id": nowname,
                    "project": opt.projectname,
                    "log_model": False,
                    # "dir": logdir,
                },
            },
            "csv": {
                "target": "pytorch_lightning.loggers.CSVLogger",
                "params": {
                    "name": "testtube",  # hack for sbord fanatics
                    "save_dir": logdir,
                },
            },
        }
        default_logger_cfg = default_logger_cfgs["wandb" if opt.wandb else "csv"]
        if opt.wandb:
            # TODO change once leaving "swiffer" config directory
            try:
                group_name = nowname.split(now)[-1].split("-")[1]
            except:
                group_name = nowname
            default_logger_cfg["params"]["group"] = group_name
            init_wandb(
                os.path.join(os.getcwd(), logdir),
                opt=opt,
                group_name=group_name,
                config=config,
                name_str=nowname,
            )
        if "logger" in lightning_config:
            logger_cfg = lightning_config.logger
        else:
            logger_cfg = OmegaConf.create()
        logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)
        trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)

        # modelcheckpoint - use TrainResult/EvalResult(checkpoint_on=metric) to
        # specify which metric is used to determine best models
        default_modelckpt_cfg = {
            "target": "pytorch_lightning.callbacks.ModelCheckpoint",
            "params": {
                "dirpath": ckptdir,
                "filename": "{epoch:06}",
                "verbose": True,
                "save_last": True,
            },
        }
        if hasattr(model, "monitor"):
            print(f"Monitoring {model.monitor} as checkpoint metric.")
            default_modelckpt_cfg["params"]["monitor"] = model.monitor
            default_modelckpt_cfg["params"]["save_top_k"] = 3

        if "modelcheckpoint" in lightning_config:
            modelckpt_cfg = lightning_config.modelcheckpoint
        else:
            modelckpt_cfg = OmegaConf.create()
        modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
        print(f"Merged modelckpt-cfg: \n{modelckpt_cfg}")

        # https://pytorch-lightning.readthedocs.io/en/stable/extensions/strategy.html
        # default to ddp if not further specified
        default_strategy_config = {"target": "pytorch_lightning.strategies.DDPStrategy"}

        if "strategy" in lightning_config:
            strategy_cfg = lightning_config.strategy
        else:
            strategy_cfg = OmegaConf.create()
            default_strategy_config["params"] = {
                "find_unused_parameters": False,
                # "static_graph": True,
                # "ddp_comm_hook": default.fp16_compress_hook  # TODO: experiment with this, also for DDPSharded
            }
        strategy_cfg = OmegaConf.merge(default_strategy_config, strategy_cfg)
        print(
            f"strategy config: \n ++++++++++++++ \n {strategy_cfg} \n ++++++++++++++ "
        )
        trainer_kwargs["strategy"] = instantiate_from_config(strategy_cfg)

        # add callback which sets up log directory
        default_callbacks_cfg = {
            "setup_callback": {
                "target": "main.SetupCallback",
                "params": {
                    "resume": opt.resume,
                    "now": now,
                    "logdir": logdir,
                    "ckptdir": ckptdir,
                    "cfgdir": cfgdir,
                    "config": config,
                    "lightning_config": lightning_config,
                    "debug": opt.debug,
                    "ckpt_name": melk_ckpt_name,
                },
            },
            "image_logger": {
                "target": "main.ImageLogger",
                "params": {"batch_frequency": 1000, "max_images": 4, "clamp": True},
            },
            "learning_rate_logger": {
                "target": "pytorch_lightning.callbacks.LearningRateMonitor",
                "params": {
                    "logging_interval": "step",
                    # "log_momentum": True
                },
            },
        }
        if version.parse(pl.__version__) >= version.parse("1.4.0"):
            default_callbacks_cfg.update({"checkpoint_callback": modelckpt_cfg})

        if "callbacks" in lightning_config:
            callbacks_cfg = lightning_config.callbacks
        else:
            callbacks_cfg = OmegaConf.create()

        if "metrics_over_trainsteps_checkpoint" in callbacks_cfg:
            print(
                "Caution: Saving checkpoints every n train steps without deleting. This might require some free space."
            )
            default_metrics_over_trainsteps_ckpt_dict = {
                "metrics_over_trainsteps_checkpoint": {
                    "target": "pytorch_lightning.callbacks.ModelCheckpoint",
                    "params": {
                        "dirpath": os.path.join(ckptdir, "trainstep_checkpoints"),
                        "filename": "{epoch:06}-{step:09}",
                        "verbose": True,
                        "save_top_k": -1,
                        "every_n_train_steps": 10000,
                        "save_weights_only": True,
                    },
                }
            }
            default_callbacks_cfg.update(default_metrics_over_trainsteps_ckpt_dict)

        callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
        if "ignore_keys_callback" in callbacks_cfg and ckpt_resume_path is not None:
            callbacks_cfg.ignore_keys_callback.params["ckpt_path"] = ckpt_resume_path
        elif "ignore_keys_callback" in callbacks_cfg:
            del callbacks_cfg["ignore_keys_callback"]

        trainer_kwargs["callbacks"] = [
            instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg
        ]
        if not "plugins" in trainer_kwargs:
            trainer_kwargs["plugins"] = list()

        # cmd line trainer args (which are in trainer_opt) have always priority over config-trainer-args (which are in trainer_kwargs)
        trainer_opt = vars(trainer_opt)
        trainer_kwargs = {
            key: val for key, val in trainer_kwargs.items() if key not in trainer_opt
        }
        trainer = Trainer(**trainer_opt, **trainer_kwargs)

        trainer.logdir = logdir  ###

        # data
        data = instantiate_from_config(config.data)
        # NOTE according to https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html
        # calling these ourselves should not be necessary but it is.
        # lightning still takes care of proper multiprocessing though
        data.prepare_data()
        # data.setup()
        print("#### Data #####")
        try:
            for k in data.datasets:
                print(
                    f"{k}, {data.datasets[k].__class__.__name__}, {len(data.datasets[k])}"
                )
        except:
            print("datasets not yet initialized.")

        # configure learning rate
        if "batch_size" in config.data.params:
            bs, base_lr = config.data.params.batch_size, config.model.base_learning_rate
        else:
            bs, base_lr = (
                config.data.params.train.loader.batch_size,
                config.model.base_learning_rate,
            )
        if not cpu:
            ngpu = len(lightning_config.trainer.devices.strip(",").split(","))
        else:
            ngpu = 1
        if "accumulate_grad_batches" in lightning_config.trainer:
            accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches
        else:
            accumulate_grad_batches = 1
        print(f"accumulate_grad_batches = {accumulate_grad_batches}")
        lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
        if opt.scale_lr:
            model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr
            print(
                "Setting learning rate to {:.2e} = {} (accumulate_grad_batches) * {} (num_gpus) * {} (batchsize) * {:.2e} (base_lr)".format(
                    model.learning_rate, accumulate_grad_batches, ngpu, bs, base_lr
                )
            )
        else:
            model.learning_rate = base_lr
            print("++++ NOT USING LR SCALING ++++")
            print(f"Setting learning rate to {model.learning_rate:.2e}")

        # allow checkpointing via USR1
        def melk(*args, **kwargs):
            # run all checkpoint hooks
            if trainer.global_rank == 0:
                print("Summoning checkpoint.")
                if melk_ckpt_name is None:
                    ckpt_path = os.path.join(ckptdir, "last.ckpt")
                else:
                    ckpt_path = os.path.join(ckptdir, melk_ckpt_name)
                trainer.save_checkpoint(ckpt_path)

        def divein(*args, **kwargs):
            if trainer.global_rank == 0:
                import pudb

                pudb.set_trace()

        import signal

        signal.signal(signal.SIGUSR1, melk)
        signal.signal(signal.SIGUSR2, divein)

        # run
        if opt.train:
            try:
                trainer.fit(model, data, ckpt_path=ckpt_resume_path)
            except Exception:
                if not opt.debug:
                    melk()
                raise
        if not opt.no_test and not trainer.interrupted:
            trainer.test(model, data)
    except RuntimeError as err:
        if MULTINODE_HACKS:
            import datetime
            import os
            import socket

            import requests

            device = os.environ.get("CUDA_VISIBLE_DEVICES", "?")
            hostname = socket.gethostname()
            ts = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
            resp = requests.get("http://169.254.169.254/latest/meta-data/instance-id")
            print(
                f"ERROR at {ts} on {hostname}/{resp.text} (CUDA_VISIBLE_DEVICES={device}): {type(err).__name__}: {err}",
                flush=True,
            )
        raise err
    except Exception:
        if opt.debug and trainer.global_rank == 0:
            try:
                import pudb as debugger
            except ImportError:
                import pdb as debugger
            debugger.post_mortem()
        raise
    finally:
        # move newly created debug project to debug_runs
        if opt.debug and not opt.resume and trainer.global_rank == 0:
            dst, name = os.path.split(logdir)
            dst = os.path.join(dst, "debug_runs", name)
            os.makedirs(os.path.split(dst)[0], exist_ok=True)
            os.rename(logdir, dst)

        if opt.wandb:
            wandb.finish()
        # if trainer.global_rank == 0:
        #    print(trainer.profiler.summary())


================================================
FILE: model_licenses/LICENSE-SDXL-Turbo
================================================
STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE AGREEMENT        
Dated: November 28, 2023


By using or distributing any portion or element of the Models, Software, Software Products or Derivative Works, you agree to be bound by this Agreement.


"Agreement" means this Stable Non-Commercial Research Community License Agreement.


“AUP” means the Stability AI Acceptable Use Policy available at https://stability.ai/use-policy, as may be updated from time to time.


"Derivative Work(s)” means (a) any derivative work of the Software Products as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model’s output. For clarity, Derivative Works do not include the output of any Model.


“Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.


"Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.


“Model(s)" means, collectively, Stability AI’s proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing, made available under this Agreement.


“Non-Commercial Uses” means exercising any of the rights granted herein for the purpose of research or non-commercial purposes. Non-Commercial Uses does not include any production use of the Software Products or any Derivative Works. 


"Stability AI" or "we" means Stability AI Ltd. and its affiliates.

"Software" means Stability AI’s proprietary software made available under this Agreement. 


“Software Products” means the Models, Software and Documentation, individually or in any combination. 



1.     License Rights and Redistribution. 

a.  Subject to your compliance with this Agreement, the AUP (which is hereby incorporated herein by reference), and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned or controlled by Stability AI embodied in the Software Products to reproduce the Software Products and produce, reproduce, distribute, and create Derivative Works of the Software Products for Non-Commercial Uses only, respectively. 

b.  You may not use the Software Products or Derivative Works to enable third parties to use the Software Products or Derivative Works as part of your hosted service or via your APIs, whether you are adding substantial additional functionality thereto or not. Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection. If you wish to use the Software Products or any Derivative Works for commercial or production use or you wish to make the Software Products or any Derivative Works available to third parties via your hosted service or your APIs, contact Stability AI at https://stability.ai/contact.    

c.  If you distribute or make the Software Products, or any Derivative Works thereof, available to a third party, the Software Products, Derivative Works, or any portion thereof, respectively, will remain subject to this Agreement and you must (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "This Stability AI Model is licensed under the Stability AI Non-Commercial Research Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.” If you create a Derivative Work of a Software Product, you may add your own attribution notices to the Notice file included with the Software Product, provided that you clearly indicate which attributions apply to the Software Product and you must state in the NOTICE file that you changed the Software Product and how it was modified.

2.     Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS, DERIVATIVE WORKS OR ANY OUTPUT OR RESULTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS, DERIVATIVE WORKS AND ANY OUTPUT AND RESULTS. 

3.     Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 

4.     Intellectual Property.

a.  No trademark licenses are granted under this Agreement, and in connection with the Software Products or Derivative Works, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products or Derivative Works. 

b.  Subject to Stability AI’s ownership of the Software Products and Derivative Works made by or for Stability AI, with respect to any Derivative Works that are made by you, as between you and Stability AI, you are and will be the owner of such Derivative Works 

c.  If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products, Derivative Works or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products or Derivative Works in violation of this Agreement. 

5.      Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of any Software Products or Derivative Works. Sections 2-4 shall survive the termination of this Agreement.


================================================
FILE: model_licenses/LICENSE-SDXL0.9
================================================
SDXL 0.9 RESEARCH LICENSE AGREEMENT
Copyright (c) Stability AI Ltd.
This License Agreement (as may be amended in accordance with this License Agreement, “License”), between you, or your employer or other entity (if you are entering into this agreement on behalf of your employer or other entity) (“Licensee” or “you”) and Stability AI Ltd. (“Stability AI” or “we”) applies to your use of any computer program, algorithm, source code, object code, or software that is made available by Stability AI under this License (“Software”) and any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software (“Documentation”).
By clicking “I Accept” below or by using the Software, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the Software or Documentation (collectively, the “Software Products”), and you must immediately cease using the Software Products. If you are agreeing to be bound by the terms of this License on behalf of your employer or other entity, you represent and warrant to Stability AI that you have full legal authority to bind your employer or such entity to this License. If you do not have the requisite authority, you may not accept the License or access the Software Products on behalf of your employer or other entity.
1. LICENSE GRANT

a. Subject to your compliance with the Documentation and Sections 2, 3, and 5, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Stability AI’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License.

b. You may make a reasonable number of copies of the Documentation solely for use in connection with the license to the Software granted above.

c. The grant of rights expressly set forth in this Section 1 (License Grant) are the complete grant of rights to you in the Software Products, and no other licenses are granted, whether by waiver, estoppel, implication, equity or otherwise. Stability AI and its licensors reserve all rights not expressly granted by this License.


2. RESTRICTIONS

You will not, and will not permit, assist or cause any third party to:

a. use, modify, copy, reproduce, create derivative works of, or distribute the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), in whole or in part, for (i) any commercial or production purposes, (ii) military purposes or in the service of nuclear technology, (iii) purposes of surveillance, including any research or development relating to surveillance, (iv) biometric processing, (v) in any manner that infringes, misappropriates, or otherwise violates any third-party rights, or (vi) in any manner that violates any applicable law and violating any privacy or security laws, rules, regulations, directives, or governmental requirements (including the General Data Privacy Regulation (Regulation (EU) 2016/679), the California Consumer Privacy Act, and any and all laws governing the processing of biometric information), as well as all amendments and successor laws to any of the foregoing;

b. alter or remove copyright and other proprietary notices which appear on or in the Software Products;

c. utilize any equipment, device, software, or other means to circumvent or remove any security or protection used by Stability AI in connection with the Software, or to circumvent or remove any usage restrictions, or to enable functionality disabled by Stability AI; or

d. offer or impose any terms on the Software Products that alter, restrict, or are inconsistent with the terms of this License.

e. 1) violate any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”); 2) directly or indirectly export, re-export, provide, or otherwise transfer Software Products: (a) to any individual, entity, or country prohibited by Export Laws; (b) to anyone on U.S. or non-U.S. government restricted parties lists; or (c) for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications; 3) use or download Software Products if you or they are: (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) for any purpose prohibited by Export Laws; and (4) will not disguise your location through IP proxying or other methods.


3. ATTRIBUTION

Together with any copies of the Software Products (as well as derivative works thereof or works incorporating the Software Products) that you distribute, you must provide (i) a copy of this License, and (ii) the following attribution notice: “SDXL 0.9 is licensed under the SDXL Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”


4. DISCLAIMERS

THE SOFTWARE PRODUCTS ARE PROVIDED “AS IS” AND “WITH ALL FAULTS” WITH NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. STABILITY AIEXPRESSLY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS OR IMPLIED, WHETHER BY STATUTE, CUSTOM, USAGE OR OTHERWISE AS TO ANY MATTERS RELATED TO THE SOFTWARE PRODUCTS, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, SATISFACTORY QUALITY, OR NON-INFRINGEMENT. STABILITY AI MAKES NO WARRANTIES OR REPRESENTATIONS THAT THE SOFTWARE PRODUCTS WILL BE ERROR FREE OR FREE OF VIRUSES OR OTHER HARMFUL COMPONENTS, OR PRODUCE ANY PARTICULAR RESULTS.


5. LIMITATION OF LIABILITY

TO THE FULLEST EXTENT PERMITTED BY LAW, IN NO EVENT WILL STABILITY AI BE LIABLE TO YOU (A) UNDER ANY THEORY OF LIABILITY, WHETHER BASED IN CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY, WARRANTY, OR OTHERWISE UNDER THIS LICENSE, OR (B) FOR ANY INDIRECT, CONSEQUENTIAL, EXEMPLARY, INCIDENTAL, PUNITIVE OR SPECIAL DAMAGES OR LOST PROFITS, EVEN IF STABILITY AI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE SOFTWARE PRODUCTS, THEIR CONSTITUENT COMPONENTS, AND ANY OUTPUT (COLLECTIVELY, “SOFTWARE MATERIALS”) ARE NOT DESIGNED OR INTENDED FOR USE IN ANY APPLICATION OR SITUATION WHERE FAILURE OR FAULT OF THE SOFTWARE MATERIALS COULD REASONABLY BE ANTICIPATED TO LEAD TO SERIOUS INJURY OF ANY PERSON, INCLUDING POTENTIAL DISCRIMINATION OR VIOLATION OF AN INDIVIDUAL’S PRIVACY RIGHTS, OR TO SEVERE PHYSICAL, PROPERTY, OR ENVIRONMENTAL DAMAGE (EACH, A “HIGH-RISK USE”). IF YOU ELECT TO USE ANY OF THE SOFTWARE MATERIALS FOR A HIGH-RISK USE, YOU DO SO AT YOUR OWN RISK. YOU AGREE TO DESIGN AND IMPLEMENT APPROPRIATE DECISION-MAKING AND RISK-MITIGATION PROCEDURES AND POLICIES IN CONNECTION WITH A HIGH-RISK USE SUCH THAT EVEN IF THERE IS A FAILURE OR FAULT IN ANY OF THE SOFTWARE MATERIALS, THE SAFETY OF PERSONS OR PROPERTY AFFECTED BY THE ACTIVITY STAYS AT A LEVEL THAT IS REASONABLE, APPROPRIATE, AND LAWFUL FOR THE FIELD OF THE HIGH-RISK USE.


6. INDEMNIFICATION

You will indemnify, defend and hold harmless Stability AI and our subsidiaries and affiliates, and each of our respective shareholders, directors, officers, employees, agents, successors, and assigns (collectively, the “Stability AI Parties”) from and against any losses, liabilities, damages, fines, penalties, and expenses (including reasonable attorneys’ fees) incurred by any Stability AI Party in connection with any claim, demand, allegation, lawsuit, proceeding, or investigation (collectively, “Claims”) arising out of or related to: (a) your access to or use of the Software Products (as well as any results or data generated from such access or use), including any High-Risk Use (defined below); (b) your violation of this License; or (c) your violation, misappropriation or infringement of any rights of another (including intellectual property or other proprietary rights and privacy rights). You will promptly notify the Stability AI Parties of any such Claims, and cooperate with Stability AI Parties in defending such Claims. You will also grant the Stability AI Parties sole control of the defense or settlement, at Stability AI’s sole option, of any Claims. This indemnity is in addition to, and not in lieu of, any other indemnities or remedies set forth in a written agreement between you and Stability AI or the other Stability AI Parties.


7. TERMINATION; SURVIVAL

a. This License will automatically terminate upon any breach by you of the terms of this License.

b. We may terminate this License, in whole or in part, at any time upon notice (including electronic) to you.

c. The following sections survive termination of this License: 2 (Restrictions), 3 (Attribution), 4 (Disclaimers), 5 (Limitation on Liability), 6 (Indemnification) 7 (Termination; Survival), 8 (Third Party Materials), 9 (Trademarks), 10 (Applicable Law; Dispute Resolution), and 11 (Miscellaneous).


8. THIRD PARTY MATERIALS

The Software Products may contain third-party software or other components (including free and open source software) (all of the foregoing, “Third Party Materials”), which are subject to the license terms of the respective third-party licensors. Your dealings or correspondence with third parties and your use of or interaction with any Third Party Materials are solely between you and the third party. Stability AI does not control or endorse, and makes no representations or warranties regarding, any Third Party Materials, and your access to and use of such Third Party Materials are at your own risk.


9. TRADEMARKS

Licensee has not been granted any trademark license as part of this License and may not use any name or mark associated with Stability AI without the prior written permission of Stability AI, except to the extent necessary to make the reference required by the “ATTRIBUTION” section of this Agreement.


10. APPLICABLE LAW; DISPUTE RESOLUTION

This License will be governed and construed under the laws of the State of California without regard to conflicts of law provisions. Any suit or proceeding arising out of or relating to this License will be brought in the federal or state courts, as applicable, in San Mateo County, California, and each party irrevocably submits to the jurisdiction and venue of such courts.


11. MISCELLANEOUS

If any provision or part of a provision of this License is unlawful, void or unenforceable, that provision or part of the provision is deemed severed from this License, and will not affect the validity and enforceability of any remaining provisions. The failure of Stability AI to exercise or enforce any right or provision of this License will not operate as a waiver of such right or provision. This License does not confer any third-party beneficiary rights upon any other person or entity. This License, together with the Documentation, contains the entire understanding between you and Stability AI regarding the subject matter of this License, and supersedes all other written or oral agreements and understandings between you and Stability AI regarding such subject matter. No change or addition to any provision of this License will be binding unless it is in writing and signed by an authorized representative of both you and Stability AI.

================================================
FILE: model_licenses/LICENSE-SDXL1.0
================================================
Copyright (c) 2023 Stability AI CreativeML Open RAIL++-M License dated July 26, 2023

Section I: PREAMBLE Multimodal generative models are being widely adopted and used, and
have the potential to transform the way artists, among other individuals, conceive and
benefit from AI or ML technologies as a tool for content creation. Notwithstanding the
current and potential benefits that these artifacts can bring to society at large, there
are also concerns about potential misuses of them, either due to their technical
limitations or ethical considerations. In short, this license strives for both the open
and responsible downstream use of the accompanying model. When it comes to the open
character, we took inspiration from open source permissive licenses regarding the grant
of IP rights. Referring to the downstream responsible use, we added use-based
restrictions not permitting the use of the model in very specific scenarios, in order
for the licensor to be able to enforce the license in case potential misuses of the
Model may occur. At the same time, we strive to promote open and responsible research on
generative models for art and content generation. Even though downstream derivative
versions of the model could be released under different licensing terms, the latter will
always have to include - at minimum - the same use-based restrictions as the ones in the
original license (this license). We believe in the intersection between open and
responsible AI development; thus, this agreement aims to strike a balance between both
in order to enable responsible open-science in the field of AI. This CreativeML Open
RAIL++-M License governs the use of the model (and its derivatives) and is informed by
the model card associated with the model. NOW THEREFORE, You and Licensor agree as
follows: Definitions "License" means the terms and conditions for use, reproduction, and
Distribution as defined in this document. "Data" means a collection of information
and/or content extracted from the dataset used with the Model, including to train,
pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
"Output" means the results of operating a Model as embodied in informational content
resulting therefrom. "Model" means any accompanying machine-learning based assemblies
(including checkpoints), consisting of learnt weights, parameters (including optimizer
states), corresponding to the model architecture as embodied in the Complementary
Material, that have been trained or tuned, in whole or in part on the Data, using the
Complementary Material. "Derivatives of the Model" means all modifications to the Model,
works based on the Model, or any other model which is created or initialized by transfer
of patterns of the weights, parameters, activations or output of the Model, to the other
model, in order to cause the other model to perform similarly to the Model, including -
but not limited to - distillation methods entailing the use of intermediate data
representations or methods based on the generation of synthetic data by the Model for
training the other model. "Complementary Material" means the accompanying source code
and scripts used to define, run, load, benchmark or evaluate the Model, and used to
prepare data for training or evaluation, if any. This includes any accompanying
documentation, tutorials, examples, etc, if any. "Distribution" means any transmission,
reproduction, publication or other sharing of the Model or Derivatives of the Model to a
third party, including providing the Model as a hosted service made available by
electronic or other remote means - e.g. API-based or web access. "Licensor" means the
copyright owner or entity authorized by the copyright owner that is granting the
License, including the persons or entities that may have rights in the Model and/or
distributing the Model. "You" (or "Your") means an individual or Legal Entity exercising
permissions granted by this License and/or making use of the Model for whichever purpose
and in any field of use, including usage of the Model in an end-use application - e.g.
chatbot, translator, image generator. "Third Parties" means individuals or legal
entities that are not under common control with Licensor or You. "Contribution" means
any work of authorship, including the original version of the Model and any
modifications or additions to that Model or Derivatives of the Model thereof, that is
intentionally submitted to Licensor for inclusion in the Model by the copyright owner or
by an individual or Legal Entity authorized to submit on behalf of the copyright owner.
For the purposes of this definition, "submitted" means any form of electronic, verbal,
or written communication sent to the Licensor or its representatives, including but not
limited to communication on electronic mailing lists, source code control systems, and
issue tracking systems that are managed by, or on behalf of, the Licensor for the
purpose of discussing and improving the Model, but excluding communication that is
conspicuously marked or otherwise designated in writing by the copyright owner as "Not a
Contribution." "Contributor" means Licensor and any individual or Legal Entity on behalf
of whom a Contribution has been received by Licensor and subsequently incorporated
within the Model.

Section II: INTELLECTUAL PROPERTY RIGHTS Both copyright and patent grants apply to the
Model, Derivatives of the Model and Complementary Material. The Model and Derivatives of
the Model are subject to additional terms as described in

Section III. Grant of Copyright License. Subject to the terms and conditions of this
License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive,
no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly
display, publicly perform, sublicense, and distribute the Complementary Material, the
Model, and Derivatives of the Model. Grant of Patent License. Subject to the terms and
conditions of this License and where and as applicable, each Contributor hereby grants
to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this paragraph) patent license to make, have made, use, offer to
sell, sell, import, and otherwise transfer the Model and the Complementary Material,
where such license applies only to those patent claims licensable by such Contributor
that are necessarily infringed by their Contribution(s) alone or by combination of their
Contribution(s) with the Model to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a cross-claim or counterclaim
in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution
incorporated within the Model and/or Complementary Material constitutes direct or
contributory patent infringement, then any patent licenses granted to You under this
License for the Model and/or Work shall terminate as of the date such litigation is
asserted or filed. Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
Distribution and Redistribution. You may host for Third Party remote access purposes
(e.g. software-as-a-service), reproduce and distribute copies of the Model or
Derivatives of the Model thereof in any medium, with or without modifications, provided
that You meet the following conditions: Use-based restrictions as referenced in
paragraph 5 MUST be included as an enforceable provision by You in any type of legal
agreement (e.g. a license) governing the use and/or distribution of the Model or
Derivatives of the Model, and You shall give notice to subsequent users You Distribute
to, that the Model or Derivatives of the Model are subject to paragraph 5. This
provision does not apply to the use of Complementary Material. You must give any Third
Party recipients of the Model or Derivatives of the Model a copy of this License; You
must cause any modified files to carry prominent notices stating that You changed the
files; You must retain all copyright, patent, trademark, and attribution notices
excluding those notices that do not pertain to any part of the Model, Derivatives of the
Model. You may add Your own copyright statement to Your modifications and may provide
additional or different license terms and conditions - respecting paragraph 4.a. - for
use, reproduction, or Distribution of Your modifications, or for any such Derivatives of
the Model as a whole, provided Your use, reproduction, and Distribution of the Model
otherwise complies with the conditions stated in this License. Use-based restrictions.
The restrictions set forth in Attachment A are considered Use-based restrictions.
Therefore You cannot use the Model and the Derivatives of the Model for the specified
restricted uses. You may use the Model subject to this License, including only for
lawful purposes and in accordance with the License. Use may include creating any content
with, finetuning, updating, running, training, evaluating and/or reparametrizing the
Model. You shall require all of Your users who use the Model or a Derivative of the
Model to comply with the terms of this paragraph (paragraph 5). The Output You Generate.
Except as set forth herein, Licensor claims no rights in the Output You generate using
the Model. You are accountable for the Output you generate and its subsequent uses. No
use of the output can contravene any provision as stated in the License.

Section IV: OTHER PROVISIONS Updates and Runtime Restrictions. To the maximum extent
permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage
of the Model in violation of this License. Trademarks and related. Nothing in this
License permits You to make use of Licensors’ trademarks, trade names, logos or to
otherwise suggest endorsement or misrepresent the relationship between the parties; and
any rights not expressly granted herein are reserved by the Licensors. Disclaimer of
Warranty. Unless required by applicable law or agreed to in writing, Licensor provides
the Model and the Complementary Material (and each Contributor provides its
Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied, including, without limitation, any warranties or conditions of
TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
solely responsible for determining the appropriateness of using or redistributing the
Model, Derivatives of the Model, and the Complementary Material and assume any risks
associated with Your exercise of permissions under this License. Limitation of
Liability. In no event and under no legal theory, whether in tort (including
negligence), contract, or otherwise, unless required by applicable law (such as
deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special, incidental, or
consequential damages of any character arising as a result of this License or out of the
use or inability to use the Model and the Complementary Material (including but not
limited to damages for loss of goodwill, work stoppage, computer failure or malfunction,
or any and all other commercial damages or losses), even if such Contributor has been
advised of the possibility of such damages. Accepting Warranty or Additional Liability.
While redistributing the Model, Derivatives of the Model and the Complementary Material
thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty,
indemnity, or other liability obligations and/or rights consistent with this License.
However, in accepting such obligations, You may act only on Your own behalf and on Your
sole responsibility, not on behalf of any other Contributor, and only if You agree to
indemnify, defend, and hold each Contributor harmless for any liability incurred by, or
claims asserted against, such Contributor by reason of your accepting any such warranty
or additional liability. If any provision of this License is held to be invalid, illegal
or unenforceable, the remaining provisions shall be unaffected thereby and remain valid
as if such provision had not been set forth herein.

END OF TERMS AND CONDITIONS

Attachment A Use Restrictions
You agree not to use the Model or Derivatives of the Model:
In any way that violates any applicable national, federal, state, local or
international law or regulation; For the purpose of exploiting, harming or attempting to
exploit or harm minors in any way; To generate or disseminate verifiably false
information and/or content with the purpose of harming others; To generate or
disseminate personal identifiable information that can be used to harm an individual; To
defame, disparage or otherwise harass others; For fully automated decision making that
adversely impacts an individual’s legal rights or otherwise creates or modifies a
binding, enforceable obligation; For any use intended to or which has the effect of
discriminating against or harming individuals or groups based on online or offline
social behavior or known or predicted personal or personality characteristics; To
exploit any of the vulnerabilities of a specific group of persons based on their age,
social, physical or mental characteristics, in order to materially distort the behavior
of a person pertaining to that group in a manner that causes or is likely to cause that
person or another person physical or psychological harm; For any use intended to or
which has the effect of discriminating against individuals or groups based on legally
protected characteristics or categories; To provide medical advice and medical results
interpretation; To generate or disseminate information for the purpose to be used for
administration of justice, law enforcement, immigration or asylum processes, such as
predicting an individual will commit fraud/crime commitment (e.g. by text profiling,
drawing causal relationships between assertions made in documents, indiscriminate and
arbitrarily-targeted use).


================================================
FILE: model_licenses/LICENSE-SV3D
================================================
STABILITY AI NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT
Dated: March 18, 2024

"Agreement" means this Stable Non-Commercial Research Community License Agreement.

“AUP” means the Stability AI Acceptable Use Policy available at https://stability.ai/use-policy, as may be updated from time to time.

"Derivative Work(s)” means (a) any derivative work of the Software Products as recognized by U.S. copyright laws, (b) any modifications to a Model, and (c) any other model created which is based on or derived from the Model or the Model’s output. For clarity, Derivative Works do not include the output of any Model.

“Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.

"Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.

“Model(s)" means, collectively, Stability AI’s proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing, made available under this Agreement.

“Non-Commercial Uses” means exercising any of the rights granted herein for the purpose of research or non-commercial purposes. Non-Commercial Uses does not include any production use of the Software Products or any Derivative Works.

"Stability AI" or "we" means Stability AI Ltd and its affiliates.


"Software" means Stability AI’s proprietary software made available under this Agreement.

“Software Products” means the Models, Software and Documentation, individually or in any combination.



1. 	License Rights and Redistribution.
a.  	Subject to your compliance with this Agreement, the AUP (which is hereby incorporated herein by reference), and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned or controlled by Stability AI embodied in the Software Products to use, reproduce, distribute, and create Derivative Works of, the Software Products, in each case for Non-Commercial Uses only.
b.   You may not use the Software Products or Derivative Works to enable third parties to use the Software Products or Derivative Works as part of your hosted service or via your APIs, whether you are adding substantial additional functionality thereto or not. Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection. If you wish to use the Software Products or any Derivative Works for commercial or production use or you wish to make the Software Products or any Derivative Works available to third parties via your hosted service or your APIs, contact Stability AI at https://stability.ai/contact.
c.	If you distribute or make the Software Products, or any Derivative Works thereof, available to a third party, the Software Products, Derivative Works, or any portion thereof, respectively, will remain subject to this Agreement and you must (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "This Stability AI Model is licensed under the Stability AI Non-Commercial Research Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.” If you create a Derivative Work of a Software Product, you may add your own attribution notices to the Notice file included with the Software Product, provided that you clearly indicate which attributions apply to the Software Product and you must state in the NOTICE file that you changed the Software Product and how it was modified.
2.	Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS, DERIVATIVE WORKS OR ANY OUTPUT OR RESULTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS, DERIVATIVE WORKS AND ANY OUTPUT AND RESULTS.
3.	Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
4.   	Intellectual Property.
a. 	No trademark licenses are granted under this Agreement, and in connection with the Software Products or Derivative Works, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products or Derivative Works.
b.	Subject to Stability AI’s ownership of the Software Products and Derivative Works made by or for Stability AI, with respect to any Derivative Works that are made by you, as between you and Stability AI, you are and will be the owner of such Derivative Works
c. 	If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products, Derivative Works or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products or Derivative Works in violation of this Agreement.
5. 	Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of any Software Products or Derivative Works. Sections 2-4 shall survive the termination of this Agreement.

6.	Governing Law. This Agreement will be governed by and construed in accordance with the laws of the United States and the State of California without regard to choice of law
principles.



================================================
FILE: model_licenses/LICENSE-SVD
================================================
STABLE VIDEO DIFFUSION NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT	
Dated: November 21, 2023

“AUP” means the Stability AI Acceptable Use Policy available at https://stability.ai/use-policy, as may be updated from time to time.

"Agreement" means the terms and conditions for use, reproduction, distribution and modification of the Software Products set forth herein.
"Derivative Work(s)” means (a) any derivative work of the Software Products as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model’s output. For clarity, Derivative Works do not include the output of any Model.
“Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.

"Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.

"Stability AI" or "we" means Stability AI Ltd. 

"Software" means, collectively, Stability AI’s proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing, made available under this Agreement.

“Software Products” means Software and Documentation. 

By using or distributing any portion or element of the Software Products, you agree to be bound by this Agreement.



License Rights and Redistribution. 
Subject to your compliance with this Agreement, the AUP (which is hereby incorporated herein by reference), and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create Derivative Works of the Software Products for purposes other than commercial or production use.     
b.	If you distribute or make the Software Products, or any Derivative Works thereof, available to a third party, the Software Products, Derivative Works, or any portion thereof, respectively, will remain subject to this Agreement and you must (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "Stable Video Diffusion is licensed under the Stable Video Diffusion Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.” If you create a Derivative Work of a Software Product, you may add your own attribution notices to the Notice file included with the Software Product, provided that you clearly indicate which attributions apply to the Software Product and you must state in the NOTICE file that you changed the Software Product and how it was modified.
2. 	  Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS. 
3.   Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 
3.   Intellectual Property.
a. 	No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products. 
Subject to Stability AI’s ownership of the Software Products and Derivative Works made by or for Stability AI, with respect to any Derivative Works that are made by you, as between you and Stability AI, you are and will be the owner of such Derivative Works. 
If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement. 
4.   Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Software Products. Sections 2-4 shall survive the termination of this Agreement. 


================================================
FILE: pyproject.toml
================================================
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "sgm"
dynamic = ["version"]
description = "Stability Generative Models"
readme = "README.md"
license-files = { paths = ["LICENSE-CODE"] }
requires-python = ">=3.8"

[project.urls]
Homepage = "https://github.com/Stability-AI/generative-models"

[tool.hatch.version]
path = "sgm/__init__.py"

[tool.hatch.build]
# This needs to be explicitly set so the configuration files
# grafted into the `sgm` directory get included in the wheel's
# RECORD file.
include = [
    "sgm",
]
# The force-include configurations below make Hatch copy
# the configs/ directory (containing the various YAML files required
# to generatively model) into the source distribution and the wheel.

[tool.hatch.build.targets.sdist.force-include]
"./configs" = "sgm/configs"

[tool.hatch.build.targets.wheel.force-include]
"./configs" = "sgm/configs"

[tool.hatch.envs.ci]
skip-install = false

dependencies = [
    "pytest"
]

[tool.hatch.envs.ci.scripts]
test-inference = [
    "pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --index-url https://download.pytorch.org/whl/cu118",
    "pip install -r requirements/pt2.txt",    
    "pytest -v tests/inference/test_inference.py {args}",
]


================================================
FILE: pytest.ini
================================================
[pytest]
markers = 
  inference: mark as inference test (deselect with '-m "not inference"')

================================================
FILE: requirements/pt2.txt
================================================
black==23.7.0
chardet==5.1.0
clip @ git+https://github.com/openai/CLIP.git
einops>=0.6.1
fairscale>=0.4.13
fire>=0.5.0
fsspec>=2023.6.0
imageio[ffmpeg]
imageio[pyav]
invisible-watermark>=0.2.0
kornia==0.6.9
matplotlib>=3.7.2
natsort>=8.4.0
ninja>=1.11.1
numpy==2.1
omegaconf>=2.3.0
onnxruntime
open-clip-torch>=2.20.0
opencv-python==4.6.0.66
pandas>=2.0.3
pillow>=9.5.0
pudb>=2022.1.3
pytorch-lightning==2.0.1
pyyaml>=6.0.1
rembg
scipy>=1.10.1
streamlit>=0.73.1
tensorboardx==2.6
timm>=0.9.2
tokenizers==0.12.1
torch>=2.0.1
torchaudio>=2.0.2
torchdata==0.6.1
torchmetrics>=1.0.1
torchvision>=0.15.2
tqdm>=4.65.0
transformers==4.19.1
triton==2.0.0
urllib3<1.27,>=1.25.4
wandb>=0.15.6
webdataset>=0.2.33
wheel>=0.41.0
xformers>=0.0.20
gradio
streamlit-keyup==0.2.0


================================================
FILE: scripts/__init__.py
================================================


================================================
FILE: scripts/demo/__init__.py
================================================


================================================
FILE: scripts/demo/detect.py
================================================
import argparse

import cv2
import numpy as np

try:
    from imwatermark import WatermarkDecoder
except ImportError as e:
    try:
        # Assume some of the other dependencies such as torch are not fulfilled
        # import file without loading unnecessary libraries.
        import importlib.util
        import sys

        spec = importlib.util.find_spec("imwatermark.maxDct")
        assert spec is not None
        maxDct = importlib.util.module_from_spec(spec)
        sys.modules["maxDct"] = maxDct
        spec.loader.exec_module(maxDct)

        class WatermarkDecoder(object):
            """A minimal version of
            https://github.com/ShieldMnt/invisible-watermark/blob/main/imwatermark/watermark.py
            to only reconstruct bits using dwtDct"""

            def __init__(self, wm_type="bytes", length=0):
                assert wm_type == "bits", "Only bits defined in minimal import"
                self._wmType = wm_type
                self._wmLen = length

            def reconstruct(self, bits):
                if len(bits) != self._wmLen:
                    raise RuntimeError("bits are not matched with watermark length")

                return bits

            def decode(self, cv2Image, method="dwtDct", **configs):
                (r, c, channels) = cv2Image.shape
                if r * c < 256 * 256:
                    raise RuntimeError("image too small, should be larger than 256x256")

                bits = []
                assert method == "dwtDct"
                embed = maxDct.EmbedMaxDct(watermarks=[], wmLen=self._wmLen, **configs)
                bits = embed.decode(cv2Image)
                return self.reconstruct(bits)

    except:
        raise e


# A fixed 48-bit message that was choosen at random
# WATERMARK_MESSAGE = 0xB3EC907BB19E
WATERMARK_MESSAGE = 0b101100111110110010010000011110111011000110011110
# bin(x)[2:] gives bits of x as str, use int to convert them to 0/1
WATERMARK_BITS = [int(bit) for bit in bin(WATERMARK_MESSAGE)[2:]]
MATCH_VALUES = [
    [27, "No watermark detected"],
    [33, "Partial watermark match. Cannot determine with certainty."],
    [
        35,
        (
            "Likely watermarked. In our test 0.02% of real images were "
            'falsely detected as "Likely watermarked"'
        ),
    ],
    [
        49,
        (
            "Very likely watermarked. In our test no real images were "
            'falsely detected as "Very likely watermarked"'
        ),
    ],
]


class GetWatermarkMatch:
    def __init__(self, watermark):
        self.watermark = watermark
        self.num_bits = len(self.watermark)
        self.decoder = WatermarkDecoder("bits", self.num_bits)

    def __call__(self, x: np.ndarray) -> np.ndarray:
        """
        Detects the number of matching bits the predefined watermark with one
        or multiple images. Images should be in cv2 format, e.g. h x w x c BGR.

        Args:
            x: ([B], h w, c) in range [0, 255]

        Returns:
           number of matched bits ([B],)
        """
        squeeze = len(x.shape) == 3
        if squeeze:
            x = x[None, ...]

        bs = x.shape[0]
        detected = np.empty((bs, self.num_bits), dtype=bool)
        for k in range(bs):
            detected[k] = self.decoder.decode(x[k], "dwtDct")
        result = np.sum(detected == self.watermark, axis=-1)
        if squeeze:
            return result[0]
        else:
            return result


get_watermark_match = GetWatermarkMatch(WATERMARK_BITS)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "filename",
        nargs="+",
        type=str,
        help="Image files to check for watermarks",
    )
    opts = parser.parse_args()

    print(
        """
        This script tries to detect watermarked images. Please be aware of
        the following:
        - As the watermark is supposed to be invisible, there is the risk that
          watermarked images may not be detected.
        - To maximize the chance of detection make sure that the image has the same
          dimensions as when the watermark was applied (most likely 1024x1024
          or 512x512).
        - Specific image manipulation may drastically decrease the chance that
          watermarks can be detected.
        - There is also the chance that an image has the characteristics of the
          watermark by chance.
        - The watermark script is public, anybody may watermark any images, and
          could therefore claim it to be generated.
        - All numbers below are based on a test using 10,000 images without any
          modifications after applying the watermark.
        """
    )

    for fn in opts.filename:
        image = cv2.imread(fn)
        if image is None:
            print(f"Couldn't read {fn}. Skipping")
            continue

        num_bits = get_watermark_match(image)
        k = 0
        while num_bits > MATCH_VALUES[k][0]:
            k += 1
        print(
            f"{fn}: {MATCH_VALUES[k][1]}",
            f"Bits that matched the watermark {num_bits} from {len(WATERMARK_BITS)}\n",
            sep="\n\t",
        )


================================================
FILE: scripts/demo/discretization.py
================================================
import torch

from sgm.modules.diffusionmodules.discretizer import Discretization


class Img2ImgDiscretizationWrapper:
    """
    wraps a discretizer, and prunes the sigmas
    params:
        strength: float between 0.0 and 1.0. 1.0 means full sampling (all sigmas are returned)
    """

    def __init__(self, discretization: Discretization, strength: float = 1.0):
        self.discretization = discretization
        self.strength = strength
        assert 0.0 <= self.strength <= 1.0

    def __call__(self, *args, **kwargs):
        # sigmas start large first, and decrease then
        sigmas = self.discretization(*args, **kwargs)
        print(f"sigmas after discretization, before pruning img2img: ", sigmas)
        sigmas = torch.flip(sigmas, (0,))
        sigmas = sigmas[: max(int(self.strength * len(sigmas)), 1)]
        print("prune index:", max(int(self.strength * len(sigmas)), 1))
        sigmas = torch.flip(sigmas, (0,))
        print(f"sigmas after pruning: ", sigmas)
        return sigmas


class Txt2NoisyDiscretizationWrapper:
    """
    wraps a discretizer, and prunes the sigmas
    params:
        strength: float between 0.0 and 1.0. 0.0 means full sampling (all sigmas are returned)
    """

    def __init__(
        self, discretization: Discretization, strength: float = 0.0, original_steps=None
    ):
        self.discretization = discretization
        self.strength = strength
        self.original_steps = original_steps
        assert 0.0 <= self.strength <= 1.0

    def __call__(self, *args, **kwargs):
        # sigmas start large first, and decrease then
        sigmas = self.discretization(*args, **kwargs)
        print(f"sigmas after discretization, before pruning img2img: ", sigmas)
        sigmas = torch.flip(sigmas, (0,))
        if self.original_steps is None:
            steps = len(sigmas)
        else:
            steps = self.original_steps + 1
        prune_index = max(min(int(self.strength * steps) - 1, steps - 1), 0)
        sigmas = sigmas[prune_index:]
        print("prune index:", prune_index)
        sigmas = torch.flip(sigmas, (0,))
        print(f"sigmas after pruning: ", sigmas)
        return sigmas


================================================
FILE: scripts/demo/gradio_app.py
================================================
# Adding this at the very top of app.py to make 'generative-models' directory discoverable
import os
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), "generative-models"))

import math
import random
import uuid
from glob import glob
from pathlib import Path
from typing import Optional

import cv2
import gradio as gr
import numpy as np
import torch
from einops import rearrange, repeat
from fire import Fire
from huggingface_hub import hf_hub_download
from omegaconf import OmegaConf
from PIL import Image
from torchvision.transforms import ToTensor

from scripts.sampling.simple_video_sample import (
    get_batch,
    get_unique_embedder_keys_from_conditioner,
    load_model,
)
from scripts.util.detection.nsfw_and_watermark_dectection import DeepFloydDataFiltering
from sgm.inference.helpers import embed_watermark
from sgm.util import default, instantiate_from_config

# To download all svd models
# hf_hub_download(repo_id="stabilityai/stable-video-diffusion-img2vid-xt", filename="svd_xt.safetensors", local_dir="checkpoints")
# hf_hub_download(repo_id="stabilityai/stable-video-diffusion-img2vid", filename="svd.safetensors", local_dir="checkpoints")
# hf_hub_download(repo_id="stabilityai/stable-video-diffusion-img2vid-xt-1-1", filename="svd_xt_1_1.safetensors", local_dir="checkpoints")


# Define the repo, local directory and filename
repo_id = "stabilityai/stable-video-diffusion-img2vid-xt-1-1"  # replace with "stabilityai/stable-video-diffusion-img2vid-xt" or "stabilityai/stable-video-diffusion-img2vid" for other models
filename = "svd_xt_1_1.safetensors"  # replace with "svd_xt.safetensors" or "svd.safetensors" for other models
local_dir = "checkpoints"
local_file_path = os.path.join(local_dir, filename)

# Check if the file already exists
if not os.path.exists(local_file_path):
    # If the file doesn't exist, download it
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_dir)
    print("File downloaded.")
else:
    print("File already exists. No need to download.")


version = "svd_xt_1_1"  # replace with 'svd_xt' or 'svd' for other models
device = "cuda"
max_64_bit_int = 2**63 - 1

if version == "svd_xt_1_1":
    num_frames = 25
    num_steps = 30
    model_config = "scripts/sampling/configs/svd_xt_1_1.yaml"
else:
    raise ValueError(f"Version {version} does not exist.")

model, filter = load_model(
    model_config,
    device,
    num_frames,
    num_steps,
)


def sample(
    input_path: str = "assets/test_image.png",  # Can either be image file or folder with image files
    seed: Optional[int] = None,
    randomize_seed: bool = True,
    motion_bucket_id: int = 127,
    fps_id: int = 6,
    version: str = "svd_xt_1_1",
    cond_aug: float = 0.02,
    decoding_t: int = 7,  # Number of frames decoded at a time! This eats most VRAM. Reduce if necessary.
    device: str = "cuda",
    output_folder: str = "outputs",
    progress=gr.Progress(track_tqdm=True),
):
    """
    Simple script to generate a single sample conditioned on an image `input_path` or multiple images, one for each
    image file in folder `input_path`. If you run out of VRAM, try decreasing `decoding_t`.
    """
    fps_id = int(fps_id)  # casting float slider values to int)
    if randomize_seed:
        seed = random.randint(0, max_64_bit_int)

    torch.manual_seed(seed)

    path = Path(input_path)
    all_img_paths = []
    if path.is_file():
        if any([input_path.endswith(x) for x in ["jpg", "jpeg", "png"]]):
            all_img_paths = [input_path]
        else:
            raise ValueError("Path is not valid image file.")
    elif path.is_dir():
        all_img_paths = sorted(
            [
                f
                for f in path.iterdir()
                if f.is_file() and f.suffix.lower() in [".jpg", ".jpeg", ".png"]
            ]
        )
        if len(all_img_paths) == 0:
            raise ValueError("Folder does not contain any images.")
    else:
        raise ValueError

    for input_img_path in all_img_paths:
        with Image.open(input_img_path) as image:
            if image.mode == "RGBA":
                image = image.convert("RGB")
            w, h = image.size

            if h % 64 != 0 or w % 64 != 0:
                width, height = map(lambda x: x - x % 64, (w, h))
                image = image.resize((width, height))
                print(
                    f"WARNING: Your image is of size {h}x{w} which is not divisible by 64. We are resizing to {height}x{width}!"
                )

            image = ToTensor()(image)
            image = image * 2.0 - 1.0

        image = image.unsqueeze(0).to(device)
        H, W = image.shape[2:]
        assert image.shape[1] == 3
        F = 8
        C = 4
        shape = (num_frames, C, H // F, W // F)
        if (H, W) != (576, 1024):
            print(
                "WARNING: The conditioning frame you provided is not 576x1024. This leads to suboptimal performance as model was only trained on 576x1024. Consider increasing `cond_aug`."
            )
        if motion_bucket_id > 255:
            print(
                "WARNING: High motion bucket! This may lead to suboptimal performance."
            )

        if fps_id < 5:
            print("WARNING: Small fps value! This may lead to suboptimal performance.")

        if fps_id > 30:
            print("WARNING: Large fps value! This may lead to suboptimal performance.")

        value_dict = {}
        value_dict["motion_bucket_id"] = motion_bucket_id
        value_dict["fps_id"] = fps_id
        value_dict["cond_aug"] = cond_aug
        value_dict["cond_frames_without_noise"] = image
        value_dict["cond_frames"] = image + cond_aug * torch.randn_like(image)
        value_dict["cond_aug"] = cond_aug

        with torch.no_grad():
            with torch.autocast(device):
                batch, batch_uc = get_batch(
                    get_unique_embedder_keys_from_conditioner(model.conditioner),
                    value_dict,
                    [1, num_frames],
                    T=num_frames,
                    device=device,
                )
                c, uc = model.conditioner.get_unconditional_conditioning(
                    batch,
                    batch_uc=batch_uc,
                    force_uc_zero_embeddings=[
                        "cond_frames",
                        "cond_frames_without_noise",
                    ],
                )

                for k in ["crossattn", "concat"]:
                    uc[k] = repeat(uc[k], "b ... -> b t ...", t=num_frames)
                    uc[k] = rearrange(uc[k], "b t ... -> (b t) ...", t=num_frames)
                    c[k] = repeat(c[k], "b ... -> b t ...", t=num_frames)
                    c[k] = rearrange(c[k], "b t ... -> (b t) ...", t=num_frames)

                randn = torch.randn(shape, device=device)

                additional_model_inputs = {}
                additional_model_inputs["image_only_indicator"] = torch.zeros(
                    2, num_frames
                ).to(device)
                additional_model_inputs["num_video_frames"] = batch["num_video_frames"]

                def denoiser(input, sigma, c):
                    return model.denoiser(
                        model.model, input, sigma, c, **additional_model_inputs
                    )

                samples_z = model.sampler(denoiser, randn, cond=c, uc=uc)
                model.en_and_decode_n_samples_a_time = decoding_t
                samples_x = model.decode_first_stage(samples_z)
                samples = torch.clamp((samples_x + 1.0) / 2.0, min=0.0, max=1.0)

                os.makedirs(output_folder, exist_ok=True)
                base_count = len(glob(os.path.join(output_folder, "*.mp4")))
                video_path = os.path.join(output_folder, f"{base_count:06d}.mp4")
                writer = cv2.VideoWriter(
                    video_path,
                    cv2.VideoWriter_fourcc(*"mp4v"),
                    fps_id + 1,
                    (samples.shape[-1], samples.shape[-2]),
                )

                samples = embed_watermark(samples)
                samples = filter(samples)
                vid = (
                    (rearrange(samples, "t c h w -> t h w c") * 255)
                    .cpu()
                    .numpy()
                    .astype(np.uint8)
                )
                for frame in vid:
                    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
                    writer.write(frame)
                writer.release()

        return video_path, seed


def resize_image(image_path, output_size=(1024, 576)):
    image = Image.open(image_path)
    # Calculate aspect ratios
    target_aspect = output_size[0] / output_size[1]  # Aspect ratio of the desired size
    image_aspect = image.width / image.height  # Aspect ratio of the original image

    # Resize then crop if the original image is larger
    if image_aspect > target_aspect:
        # Resize the image to match the target height, maintaining aspect ratio
        new_height = output_size[1]
        new_width = int(new_height * image_aspect)
        resized_image = image.resize((new_width, new_height), Image.LANCZOS)
        # Calculate coordinates for cropping
        left = (new_width - output_size[0]) / 2
        top = 0
        right = (new_width + output_size[0]) / 2
        bottom = output_size[1]
    else:
        # Resize the image to match the target width, maintaining aspect ratio
        new_width = output_size[0]
        new_height = int(new_width / image_aspect)
        resized_image = image.resize((new_width, new_height), Image.LANCZOS)
        # Calculate coordinates for cropping
        left = 0
        top = (new_height - output_size[1]) / 2
        right = output_size[0]
        bottom = (new_height + output_size[1]) / 2

    # Crop the image
    cropped_image = resized_image.crop((left, top, right, bottom))

    return cropped_image


with gr.Blocks() as demo:
    gr.Markdown(
        """# Community demo for Stable Video Diffusion - Img2Vid - XT ([model](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt), [paper](https://stability.ai/research/stable-video-diffusion-scaling-latent-video-diffusion-models-to-large-datasets))
#### Research release ([_non-commercial_](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/blob/main/LICENSE)): generate `4s` vid from a single image at (`25 frames` at `6 fps`). Generation takes ~60s in an A100. [Join the waitlist for Stability's upcoming web experience](https://stability.ai/contact).
  """
    )
    with gr.Row():
        with gr.Column():
            image = gr.Image(label="Upload your image", type="filepath")
            generate_btn = gr.Button("Generate")
        video = gr.Video()
    with gr.Accordion("Advanced options", open=False):
        seed = gr.Slider(
            label="Seed",
            value=42,
            randomize=True,
            minimum=0,
            maximum=max_64_bit_int,
            step=1,
        )
        randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
        motion_bucket_id = gr.Slider(
            label="Motion bucket id",
            info="Controls how much motion to add/remove from the image",
            value=127,
            minimum=1,
            maximum=255,
        )
        fps_id = gr.Slider(
            label="Frames per second",
            info="The length of your video in seconds will be 25/fps",
            value=6,
            minimum=5,
            maximum=30,
        )

    image.upload(fn=resize_image, inputs=image, outputs=image, queue=False)
    generate_btn.click(
        fn=sample,
        inputs=[image, seed, randomize_seed, motion_bucket_id, fps_id],
        outputs=[video, seed],
        api_name="video",
    )

if __name__ == "__main__":
    demo.queue(max_size=20)
    demo.launch(share=True)


================================================
FILE: scripts/demo/gradio_app_sv4d.py
================================================
# Adding this at the very top of app.py to make 'generative-models' directory discoverable
import os
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), "generative-models"))

from glob import glob
from typing import Optional

import gradio as gr
import numpy as np
import torch
from huggingface_hub import hf_hub_download
from typing import List, Optional, Union
import torchvision

from sgm.modules.encoders.modules import VideoPredictionEmbedderWithEncoder
from scripts.demo.sv4d_helpers import (
    decode_latents,
    load_model,
    initial_model_load,
    read_video,
    run_img2vid,
    prepare_inputs,
    do_sample_per_step,
    sample_sv3d,
    save_video,
    preprocess_video,
)


# the tmp path, if /tmp/gradio is not writable, change it to a writable path
# os.environ["GRADIO_TEMP_DIR"] = "gradio_tmp"

version = "sv4d"  # replace with 'sv3d_p' or 'sv3d_u' for other models

# Define the repo, local directory and filename
repo_id = "stabilityai/sv4d"
filename = f"{version}.safetensors"  # replace with "sv3d_u.safetensors" or "sv3d_p.safetensors"
local_dir = "checkpoints"
local_ckpt_path = os.path.join(local_dir, filename)

# Check if the file already exists
if not os.path.exists(local_ckpt_path):
    # If the file doesn't exist, download it
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_dir)
    print("File downloaded. (sv4d)")
else:
    print("File already exists. No need to download. (sv4d)")

device = "cuda"
max_64_bit_int = 2**63 - 1

num_frames = 21
num_steps = 20
model_config = f"scripts/sampling/configs/{version}.yaml"

# Set model config
T = 5  # number of frames per sample
V = 8  # number of views per sample
F = 8  # vae factor to downsize image->latent
C = 4
H, W = 576, 576
n_frames = 21  # number of input and output video frames
n_views = V + 1  # number of output video views (1 input view + 8 novel views)
n_views_sv3d = 21
subsampled_views = np.array(
    [0, 2, 5, 7, 9, 12, 14, 16, 19]
)  # subsample (V+1=)9 (uniform) views from 21 SV3D views

version_dict = {
    "T": T * V,
    "H": H,
    "W": W,
    "C": C,
    "f": F,
    "options": {
        "discretization": 1,
        "cfg": 3,
        "sigma_min": 0.002,
        "sigma_max": 700.0,
        "rho": 7.0,
        "guider": 5,
        "num_steps": num_steps,
        "force_uc_zero_embeddings": [
            "cond_frames",
            "cond_frames_without_noise",
            "cond_view",
            "cond_motion",
        ],
        "additional_guider_kwargs": {
            "additional_cond_keys": ["cond_view", "cond_motion"]
        },
    },
}

# Load SV4D model
model, filter = load_model(
    model_config,
    device,
    version_dict["T"],
    num_steps,
)
model = initial_model_load(model)

# -----------sv3d config and model loading----------------
# if version == "sv3d_u":
sv3d_model_config = "scripts/sampling/configs/sv3d_u.yaml"
# elif version == "sv3d_p":
#     sv3d_model_config = "scripts/sampling/configs/sv3d_p.yaml"
# else:
#     raise ValueError(f"Version {version} does not exist.")

# Define the repo, local directory and filename
repo_id = "stabilityai/sv3d"
filename = f"sv3d_u.safetensors"  # replace with "sv3d_u.safetensors" or "sv3d_p.safetensors"
local_dir = "checkpoints"
local_ckpt_path = os.path.join(local_dir, filename)

# Check if the file already exists
if not os.path.exists(local_ckpt_path):
    # If the file doesn't exist, download it
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_dir)
    print("File downloaded. (sv3d)")
else:
    print("File already exists. No need to download. (sv3d)")

# load sv3d model
sv3d_model, filter = load_model(
    sv3d_model_config,
    device,
    21,
    num_steps,
    verbose=False,
)
sv3d_model = initial_model_load(sv3d_model)
# ------------------

def sample_anchor(
    input_path: str = "assets/test_image.png",  # Can either be image file or folder with image files
    seed: Optional[int] = None,
    encoding_t: int = 8,  # Number of frames encoded at a time! This eats most VRAM. Reduce if necessary.
    decoding_t: int = 4,  # Number of frames decoded at a time! This eats most VRAM. Reduce if necessary.
    num_steps: int = 20,
    sv3d_version: str = "sv3d_u",  # sv3d_u or sv3d_p
    fps_id: int = 6,
    motion_bucket_id: int = 127,
    cond_aug: float = 1e-5,
    device: str = "cuda",
    elevations_deg: Optional[Union[float, List[float]]] = 10.0,
    azimuths_deg: Optional[List[float]] = None,
    verbose: Optional[bool] = False,
):
    """
    Simple script to generate multiple novel-view videos conditioned on a video `input_path` or multiple frames, one for each
    image file in folder `input_path`. If you run out of VRAM, try decreasing `decoding_t`.
    """
    output_folder = os.path.dirname(input_path)

    torch.manual_seed(seed)
    os.makedirs(output_folder, exist_ok=True)

    # Read input video frames i.e. images at view 0
    print(f"Reading {input_path}")
    images_v0 = read_video(
        input_path,
        n_frames=n_frames,
        device=device,
    )

    # Get camera viewpoints
    if isinstance(elevations_deg, float) or isinstance(elevations_deg, int):
        elevations_deg = [elevations_deg] * n_views_sv3d
    assert (
        len(elevations_deg) == n_views_sv3d
    ), f"Please provide 1 value, or a list of {n_views_sv3d} values for elevations_deg! Given {len(elevations_deg)}"
    if azimuths_deg is None:
        azimuths_deg = np.linspace(0, 360, n_views_sv3d + 1)[1:] % 360
    assert (
        len(azimuths_deg) == n_views_sv3d
    ), f"Please provide a list of {n_views_sv3d} values for azimuths_deg! Given {len(azimuths_deg)}"
    polars_rad = np.array([np.deg2rad(90 - e) for e in elevations_deg])
    azimuths_rad = np.array(
        [np.deg2rad((a - azimuths_deg[-1]) % 360) for a in azimuths_deg]
    )

    # Sample multi-view images of the first frame using SV3D i.e. images at time 0
    sv3d_model.sampler.num_steps = num_steps
    print("sv3d_model.sampler.num_steps", sv3d_model.sampler.num_steps)
    images_t0 = sample_sv3d(
        images_v0[0],
        n_views_sv3d,
        num_steps,
        sv3d_version,
        fps_id,
        motion_bucket_id,
        cond_aug,
        decoding_t,
        device,
        polars_rad,
        azimuths_rad,
        verbose,
        sv3d_model,
    )
    images_t0 = torch.roll(images_t0, 1, 0)  # move conditioning image to first frame

    sv3d_file = os.path.join(output_folder, "t000.mp4")
    save_video(sv3d_file, images_t0.unsqueeze(1))
    
    for emb in model.conditioner.embedders:
        if isinstance(emb, VideoPredictionEmbedderWithEncoder):
            emb.en_and_decode_n_samples_a_time = encoding_t
    model.en_and_decode_n_samples_a_time = decoding_t
    # Initialize image matrix
    img_matrix = [[None] * n_views for _ in range(n_frames)]
    for i, v in enumerate(subsampled_views):
        img_matrix[0][i] = images_t0[v].unsqueeze(0)
    for t in range(n_frames):
        img_matrix[t][0] = images_v0[t]

    # Interleaved sampling for anchor frames
    t0, v0 = 0, 0
    frame_indices = np.arange(T - 1, n_frames, T - 1)  # [4, 8, 12, 16, 20]
    view_indices = np.arange(V) + 1
    print(f"Sampling anchor frames {frame_indices}")
    image = img_matrix[t0][v0]
    cond_motion = torch.cat([img_matrix[t][v0] for t in frame_indices], 0)
    cond_view = torch.cat([img_matrix[t0][v] for v in view_indices], 0)
    polars = polars_rad[subsampled_views[1:]][None].repeat(T, 0).flatten()
    azims = azimuths_rad[subsampled_views[1:]][None].repeat(T, 0).flatten()
    azims = (azims - azimuths_rad[v0]) % (torch.pi * 2)
    model.sampler.num_steps = num_steps
    version_dict["options"]["num_steps"] = num_steps
    samples = run_img2vid(
        version_dict, model, image, seed, polars, azims, cond_motion, cond_view, decoding_t
    )
    samples = samples.view(T, V, 3, H, W)
    for i, t in enumerate(frame_indices):
        for j, v in enumerate(view_indices):
            if img_matrix[t][v] is None:
                img_matrix[t][v] = samples[i, j][None] * 2 - 1

    # concat video
    grid_list = []
    for t in frame_indices:
        imgs_view = torch.cat(img_matrix[t])
        grid_list.append(torchvision.utils.make_grid(imgs_view, nrow=3).unsqueeze(0))
    # save output videos
    anchor_vis_file = os.path.join(output_folder, "anchor_vis.mp4")
    save_video(anchor_vis_file, grid_list, fps=3)
    anchor_file = os.path.join(output_folder, "anchor.mp4")
    image_list = samples.view(T*V, 3, H, W).unsqueeze(1) * 2 - 1
    save_video(anchor_file, image_list)

    return sv3d_file, anchor_vis_file, anchor_file


def sample_all(
    input_path: str = "inputs/test_video1.mp4",  # Can either be video file or folder with image files
    sv3d_path: str = "outputs/sv4d/000000_t000.mp4",
    anchor_path: str = "outputs/sv4d/000000_anchor.mp4",
    seed: Optional[int] = None,
    num_steps: int = 20,
    device: str = "cuda",
    elevations_deg: Optional[Union[float, List[float]]] = 10.0,
    azimuths_deg: Optional[List[float]] = None,
):
    """
    Simple script to generate multiple novel-view videos conditioned on a video `input_path` or multiple frames, one for each
    image file in folder `input_path`. If you run out of VRAM, try decreasing `decoding_t`.
    """
    output_folder = os.path.dirname(input_path)
    torch.manual_seed(seed)
    os.makedirs(output_folder, exist_ok=True)

    # Read input video frames i.e. images at view 0
    print(f"Reading {input_path}")
    images_v0 = read_video(
        input_path,
        n_frames=n_frames,
        device=device,
    )

    images_t0 = read_video(
        sv3d_path,
        n_frames=n_views_sv3d,
        device=device,
    )

    # Get camera viewpoints
    if isinstance(elevations_deg, float) or isinstance(elevations_deg, int):
        elevations_deg = [elevations_deg] * n_views_sv3d
    assert (
        len(elevations_deg) == n_views_sv3d
    ), f"Please provide 1 value, or a list of {n_views_sv3d} values for elevations_deg! Given {len(elevations_deg)}"
    if azimuths_deg is None:
        azimuths_deg = np.linspace(0, 360, n_views_sv3d + 1)[1:] % 360
    assert (
        len(azimuths_deg) == n_views_sv3d
    ), f"Please provide a list of {n_views_sv3d} values for azimuths_deg! Given {len(azimuths_deg)}"
    polars_rad = np.array([np.deg2rad(90 - e) for e in elevations_deg])
    azimuths_rad = np.array(
        [np.deg2rad((a - azimuths_deg[-1]) % 360) for a in azimuths_deg]
    )

    # Initialize image matrix
    img_matrix = [[None] * n_views for _ in range(n_frames)]
    for i, v in enumerate(subsampled_views):
        img_matrix[0][i] = images_t0[v]
    for t in range(n_frames):
        img_matrix[t][0] = images_v0[t]

    # load interleaved sampling for anchor frames
    t0, v0 = 0, 0
    frame_indices = np.arange(T - 1, n_frames, T - 1)  # [4, 8, 12, 16, 20]
    view_indices = np.arange(V) + 1

    anchor_frames = read_video(
        anchor_path,
        n_frames=T * V,
        device=device,
    )
    anchor_frames = torch.cat(anchor_frames).view(T, V, 3, H, W)
    for i, t in enumerate(frame_indices):
        for

Download .txt

gitextract_3mk8n7n3/

├── .github/
│   └── workflows/
│       ├── black.yml
│       ├── test-build.yaml
│       └── test-inference.yml
├── .gitignore
├── CODEOWNERS
├── LICENSE-CODE
├── README.md
├── configs/
│   ├── example_training/
│   │   ├── autoencoder/
│   │   │   └── kl-f4/
│   │   │       ├── imagenet-attnfree-logvar.yaml
│   │   │       └── imagenet-kl_f8_8chn.yaml
│   │   ├── imagenet-f8_cond.yaml
│   │   ├── toy/
│   │   │   ├── cifar10_cond.yaml
│   │   │   ├── mnist.yaml
│   │   │   ├── mnist_cond.yaml
│   │   │   ├── mnist_cond_discrete_eps.yaml
│   │   │   ├── mnist_cond_l1_loss.yaml
│   │   │   └── mnist_cond_with_ema.yaml
│   │   ├── txt2img-clipl-legacy-ucg-training.yaml
│   │   └── txt2img-clipl.yaml
│   └── inference/
│       ├── sd_xl_base.yaml
│       ├── sd_xl_refiner.yaml
│       ├── sv3d_p.yaml
│       ├── sv3d_u.yaml
│       ├── svd.yaml
│       └── svd_image_decoder.yaml
├── main.py
├── model_licenses/
│   ├── LICENSE-SDXL-Turbo
│   ├── LICENSE-SDXL0.9
│   ├── LICENSE-SDXL1.0
│   ├── LICENSE-SV3D
│   └── LICENSE-SVD
├── pyproject.toml
├── pytest.ini
├── requirements/
│   └── pt2.txt
├── scripts/
│   ├── __init__.py
│   ├── demo/
│   │   ├── __init__.py
│   │   ├── detect.py
│   │   ├── discretization.py
│   │   ├── gradio_app.py
│   │   ├── gradio_app_sv4d.py
│   │   ├── sampling.py
│   │   ├── streamlit_helpers.py
│   │   ├── sv3d_helpers.py
│   │   ├── sv4d_helpers.py
│   │   ├── turbo.py
│   │   └── video_sampling.py
│   ├── sampling/
│   │   ├── configs/
│   │   │   ├── sv3d_p.yaml
│   │   │   ├── sv3d_u.yaml
│   │   │   ├── sv4d.yaml
│   │   │   ├── sv4d2.yaml
│   │   │   ├── sv4d2_8views.yaml
│   │   │   ├── svd.yaml
│   │   │   ├── svd_image_decoder.yaml
│   │   │   ├── svd_xt.yaml
│   │   │   ├── svd_xt_1_1.yaml
│   │   │   └── svd_xt_image_decoder.yaml
│   │   ├── simple_video_sample.py
│   │   ├── simple_video_sample_4d.py
│   │   └── simple_video_sample_4d2.py
│   ├── tests/
│   │   └── attention.py
│   └── util/
│       ├── __init__.py
│       └── detection/
│           ├── __init__.py
│           ├── nsfw_and_watermark_dectection.py
│           ├── p_head_v1.npz
│           └── w_head_v1.npz
├── sgm/
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── cifar10.py
│   │   ├── dataset.py
│   │   └── mnist.py
│   ├── inference/
│   │   ├── api.py
│   │   └── helpers.py
│   ├── lr_scheduler.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── autoencoder.py
│   │   └── diffusion.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── attention.py
│   │   ├── autoencoding/
│   │   │   ├── __init__.py
│   │   │   ├── losses/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── discriminator_loss.py
│   │   │   │   └── lpips.py
│   │   │   ├── lpips/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── loss/
│   │   │   │   │   ├── .gitignore
│   │   │   │   │   ├── LICENSE
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── lpips.py
│   │   │   │   ├── model/
│   │   │   │   │   ├── LICENSE
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── model.py
│   │   │   │   ├── util.py
│   │   │   │   └── vqperceptual.py
│   │   │   ├── regularizers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── quantize.py
│   │   │   └── temporal_ae.py
│   │   ├── diffusionmodules/
│   │   │   ├── __init__.py
│   │   │   ├── denoiser.py
│   │   │   ├── denoiser_scaling.py
│   │   │   ├── denoiser_weighting.py
│   │   │   ├── discretizer.py
│   │   │   ├── guiders.py
│   │   │   ├── loss.py
│   │   │   ├── loss_weighting.py
│   │   │   ├── model.py
│   │   │   ├── openaimodel.py
│   │   │   ├── sampling.py
│   │   │   ├── sampling_utils.py
│   │   │   ├── sigma_sampling.py
│   │   │   ├── util.py
│   │   │   ├── video_model.py
│   │   │   └── wrappers.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   └── distributions.py
│   │   ├── ema.py
│   │   ├── encoders/
│   │   │   ├── __init__.py
│   │   │   └── modules.py
│   │   ├── spacetime_attention.py
│   │   └── video_attention.py
│   └── util.py
└── tests/
    └── inference/
        └── test_inference.py

Download .txt

SYMBOL INDEX (814 symbols across 56 files)

FILE: main.py
  function default_trainer_args (line 31) | def default_trainer_args():
  function get_parser (line 42) | def get_parser(**parser_kwargs):
  function get_checkpoint_name (line 203) | def get_checkpoint_name(logdir):
  class SetupCallback (line 230) | class SetupCallback(Callback):
    method __init__ (line 231) | def __init__(
    method on_exception (line 254) | def on_exception(self, trainer: pl.Trainer, pl_module, exception):
    method on_fit_start (line 263) | def on_fit_start(self, trainer, pl_module):
  class ImageLogger (line 309) | class ImageLogger(Callback):
    method __init__ (line 310) | def __init__(
    method log_local (line 340) | def log_local(
    method log_img (line 395) | def log_img(self, pl_module, batch, batch_idx, split="train"):
    method check_frequency (line 444) | def check_frequency(self, check_idx):
    method on_train_batch_end (line 457) | def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch...
    method on_train_batch_start (line 462) | def on_train_batch_start(self, trainer, pl_module, batch, batch_idx):
    method on_validation_batch_end (line 468) | def on_validation_batch_end(
  function init_wandb (line 481) | def init_wandb(save_dir, opt, config, group_name, name_str):
  function melk (line 876) | def melk(*args, **kwargs):
  function divein (line 886) | def divein(*args, **kwargs):

FILE: scripts/demo/detect.py
  class WatermarkDecoder (line 21) | class WatermarkDecoder(object):
    method __init__ (line 26) | def __init__(self, wm_type="bytes", length=0):
    method reconstruct (line 31) | def reconstruct(self, bits):
    method decode (line 37) | def decode(self, cv2Image, method="dwtDct", **configs):
  class GetWatermarkMatch (line 77) | class GetWatermarkMatch:
    method __init__ (line 78) | def __init__(self, watermark):
    method __call__ (line 83) | def __call__(self, x: np.ndarray) -> np.ndarray:

FILE: scripts/demo/discretization.py
  class Img2ImgDiscretizationWrapper (line 6) | class Img2ImgDiscretizationWrapper:
    method __init__ (line 13) | def __init__(self, discretization: Discretization, strength: float = 1...
    method __call__ (line 18) | def __call__(self, *args, **kwargs):
  class Txt2NoisyDiscretizationWrapper (line 30) | class Txt2NoisyDiscretizationWrapper:
    method __init__ (line 37) | def __init__(
    method __call__ (line 45) | def __call__(self, *args, **kwargs):

FILE: scripts/demo/gradio_app.py
  function sample (line 74) | def sample(
  function resize_image (line 230) | def resize_image(image_path, output_size=(1024, 576)):

FILE: scripts/demo/gradio_app_sv4d.py
  function sample_anchor (line 139) | def sample_anchor(
  function sample_all (line 258) | def sample_all(

FILE: scripts/demo/sampling.py
  function load_img (line 76) | def load_img(display=True, key=None, device="cuda"):
  function run_txt2img (line 94) | def run_txt2img(
  function run_img2img (line 144) | def run_img2img(
  function apply_refiner (line 192) | def apply_refiner(

FILE: scripts/demo/streamlit_helpers.py
  function init_st (line 45) | def init_st(version_dict, load_ckpt=True, load_filter=True):
  function load_model (line 63) | def load_model(model):
  function set_lowvram_mode (line 70) | def set_lowvram_mode(mode):
  function initial_model_load (line 75) | def initial_model_load(model):
  function unload_model (line 84) | def unload_model(model):
  function load_model_from_config (line 91) | def load_model_from_config(config, ckpt=None, verbose=True):
  function get_unique_embedder_keys_from_conditioner (line 126) | def get_unique_embedder_keys_from_conditioner(conditioner):
  function init_embedder_options (line 130) | def init_embedder_options(keys, init_dict, prompt=None, negative_prompt=...
  function perform_save_locally (line 202) | def perform_save_locally(save_path, samples):
  function init_save_locally (line 214) | def init_save_locally(_dir, init_value: bool = False):
  function get_guider (line 224) | def get_guider(options, key):
  function init_sampling (line 306) | def init_sampling(
  function get_discretization (line 364) | def get_discretization(discretization, options, key=1):
  function get_sampler (line 389) | def get_sampler(sampler_name, steps, discretization_config, guider_confi...
  function get_interactive_image (line 465) | def get_interactive_image() -> Image.Image:
  function load_img (line 474) | def load_img(
  function get_init_img (line 501) | def get_init_img(batch_size=1, key=None):
  function do_sample (line 507) | def do_sample(
  function get_batch (line 635) | def get_batch(
  function do_img2img (line 737) | def do_img2img(
  function get_resizing_factor (line 827) | def get_resizing_factor(
  function get_interactive_image (line 852) | def get_interactive_image(key=None) -> Image.Image:
  function load_img_for_prediction (line 861) | def load_img_for_prediction(
  function save_video_as_grid_and_mp4 (line 897) | def save_video_as_grid_and_mp4(

FILE: scripts/demo/sv3d_helpers.py
  function generate_dynamic_cycle_xy_values (line 7) | def generate_dynamic_cycle_xy_values(
  function smooth_data (line 41) | def smooth_data(data, window_size):
  function gen_dynamic_loop (line 59) | def gen_dynamic_loop(length=21, elev_deg=0):
  function plot_3D (line 77) | def plot_3D(azim, polar, save_path, dynamic=True):

FILE: scripts/demo/sv4d_helpers.py
  function load_module_gpu (line 38) | def load_module_gpu(model):
  function unload_module_gpu (line 42) | def unload_module_gpu(model):
  function initial_model_load (line 47) | def initial_model_load(model):
  function get_resizing_factor (line 52) | def get_resizing_factor(
  function read_gif (line 76) | def read_gif(input_path, n_frames):
  function read_mp4 (line 86) | def read_mp4(input_path, n_frames):
  function save_img (line 98) | def save_img(file_name, img):
  function save_video (line 107) | def save_video(file_name, imgs, fps=10):
  function read_video (line 120) | def read_video(
  function preprocess_video (line 167) | def preprocess_video(
  function sample_sv3d (line 292) | def sample_sv3d(
  function decode_latents (line 404) | def decode_latents(
  function init_embedder_options_no_st (line 423) | def init_embedder_options_no_st(keys, init_dict, prompt=None, negative_p...
  function get_discretization_no_st (line 475) | def get_discretization_no_st(discretization, options, key=1):
  function get_guider_no_st (line 495) | def get_guider_no_st(options, key):
  function get_sampler_no_st (line 607) | def get_sampler_no_st(sampler_name, steps, discretization_config, guider...
  function init_sampling_no_st (line 683) | def init_sampling_no_st(
  function run_img2vid (line 716) | def run_img2vid(
  function prepare_inputs_forward_backward (line 800) | def prepare_inputs_forward_backward(
  function prepare_inputs (line 852) | def prepare_inputs(
  function do_sample (line 907) | def do_sample(
  function prepare_sampling_ (line 1025) | def prepare_sampling_(
  function do_sample_per_step (line 1101) | def do_sample_per_step(
  function prepare_sampling (line 1146) | def prepare_sampling(
  function get_unique_embedder_keys_from_conditioner (line 1217) | def get_unique_embedder_keys_from_conditioner(conditioner):
  function get_batch_sv3d (line 1221) | def get_batch_sv3d(keys, value_dict, N, T, device):
  function get_batch (line 1260) | def get_batch(
  function load_model (line 1387) | def load_model(

FILE: scripts/demo/turbo.py
  class SubstepSampler (line 19) | class SubstepSampler(EulerAncestralSampler):
    method __init__ (line 20) | def __init__(self, n_sample_steps=1, *args, **kwargs):
    method prepare_sampling_loop (line 25) | def prepare_sampling_loop(self, x, cond, uc=None, num_steps=None):
  function seeded_randn (line 39) | def seeded_randn(shape, seed):
  class SeededNoise (line 45) | class SeededNoise:
    method __init__ (line 46) | def __init__(self, seed):
    method __call__ (line 49) | def __call__(self, x):
  function init_embedder_options (line 54) | def init_embedder_options(keys, init_dict, prompt=None, negative_prompt=...
  function sample (line 86) | def sample(
  function v_spacer (line 148) | def v_spacer(height) -> None:
  function increment_counter (line 179) | def increment_counter():
  function decrement_counter (line 182) | def decrement_counter():

FILE: scripts/sampling/simple_video_sample.py
  function sample (line 24) | def sample(
  function get_unique_embedder_keys_from_conditioner (line 278) | def get_unique_embedder_keys_from_conditioner(conditioner):
  function get_batch (line 282) | def get_batch(keys, value_dict, N, T, device):
  function load_model (line 321) | def load_model(

FILE: scripts/sampling/simple_video_sample_4d.py
  function sample (line 29) | def sample(

FILE: scripts/sampling/simple_video_sample_4d2.py
  function sample (line 79) | def sample(

FILE: scripts/tests/attention.py
  function benchmark_attn (line 10) | def benchmark_attn():
  function run_model (line 136) | def run_model(model, x, context):
  function benchmark_transformer_blocks (line 140) | def benchmark_transformer_blocks():
  function test01 (line 234) | def test01():
  function test02 (line 263) | def test02():

FILE: scripts/util/detection/nsfw_and_watermark_dectection.py
  function predict_proba (line 12) | def predict_proba(X, weights, biases):
  function load_model_weights (line 20) | def load_model_weights(path: str):
  function clip_process_images (line 25) | def clip_process_images(images: torch.Tensor) -> torch.Tensor:
  class DeepFloydDataFiltering (line 39) | class DeepFloydDataFiltering(object):
    method __init__ (line 40) | def __init__(
    method __call__ (line 58) | def __call__(self, images: torch.Tensor) -> torch.Tensor:
  function load_img (line 78) | def load_img(path: str) -> torch.Tensor:
  function test (line 90) | def test(root):

FILE: sgm/data/cifar10.py
  class CIFAR10DataDictWrapper (line 7) | class CIFAR10DataDictWrapper(Dataset):
    method __init__ (line 8) | def __init__(self, dset):
    method __getitem__ (line 12) | def __getitem__(self, i):
    method __len__ (line 16) | def __len__(self):
  class CIFAR10Loader (line 20) | class CIFAR10Loader(pl.LightningDataModule):
    method __init__ (line 21) | def __init__(self, batch_size, num_workers=0, shuffle=True):
    method prepare_data (line 42) | def prepare_data(self):
    method train_dataloader (line 45) | def train_dataloader(self):
    method test_dataloader (line 53) | def test_dataloader(self):
    method val_dataloader (line 61) | def val_dataloader(self):

FILE: sgm/data/dataset.py
  class StableDataModuleFromConfig (line 20) | class StableDataModuleFromConfig(LightningDataModule):
    method __init__ (line 21) | def __init__(
    method setup (line 59) | def setup(self, stage: str) -> None:
    method train_dataloader (line 72) | def train_dataloader(self) -> torchdata.datapipes.iter.IterDataPipe:
    method val_dataloader (line 76) | def val_dataloader(self) -> wds.DataPipeline:
    method test_dataloader (line 79) | def test_dataloader(self) -> wds.DataPipeline:

FILE: sgm/data/mnist.py
  class MNISTDataDictWrapper (line 7) | class MNISTDataDictWrapper(Dataset):
    method __init__ (line 8) | def __init__(self, dset):
    method __getitem__ (line 12) | def __getitem__(self, i):
    method __len__ (line 16) | def __len__(self):
  class MNISTLoader (line 20) | class MNISTLoader(pl.LightningDataModule):
    method __init__ (line 21) | def __init__(self, batch_size, num_workers=0, prefetch_factor=2, shuff...
    method prepare_data (line 43) | def prepare_data(self):
    method train_dataloader (line 46) | def train_dataloader(self):
    method test_dataloader (line 55) | def test_dataloader(self):
    method val_dataloader (line 64) | def val_dataloader(self):

FILE: sgm/inference/api.py
  class ModelArchitecture (line 19) | class ModelArchitecture(str, Enum):
  class Sampler (line 26) | class Sampler(str, Enum):
  class Discretization (line 35) | class Discretization(str, Enum):
  class Guider (line 40) | class Guider(str, Enum):
  class Thresholder (line 45) | class Thresholder(str, Enum):
  class SamplingParams (line 50) | class SamplingParams:
  class SamplingSpec (line 78) | class SamplingSpec:
  class SamplingPipeline (line 133) | class SamplingPipeline:
    method __init__ (line 134) | def __init__(
    method _load_model (line 151) | def _load_model(self, device="cuda", use_fp16=True):
    method text_to_image (line 162) | def text_to_image(
    method image_to_image (line 190) | def image_to_image(
    method refiner (line 223) | def refiner(
  function get_guider_config (line 258) | def get_guider_config(params: SamplingParams):
  function get_discretization_config (line 284) | def get_discretization_config(params: SamplingParams):
  function get_sampler_config (line 303) | def get_sampler_config(params: SamplingParams):

FILE: sgm/inference/helpers.py
  class WatermarkEmbedder (line 16) | class WatermarkEmbedder:
    method __init__ (line 17) | def __init__(self, watermark):
    method __call__ (line 23) | def __call__(self, image: torch.Tensor) -> torch.Tensor:
  function get_unique_embedder_keys_from_conditioner (line 61) | def get_unique_embedder_keys_from_conditioner(conditioner):
  function perform_save_locally (line 65) | def perform_save_locally(save_path, samples):
  class Img2ImgDiscretizationWrapper (line 77) | class Img2ImgDiscretizationWrapper:
    method __init__ (line 84) | def __init__(self, discretization, strength: float = 1.0):
    method __call__ (line 89) | def __call__(self, *args, **kwargs):
  function do_sample (line 101) | def do_sample(
  function get_batch (line 173) | def get_batch(keys, value_dict, N: Union[List, ListConfig], device="cuda"):
  function get_input_image_tensor (line 230) | def get_input_image_tensor(image: Image.Image, device="cuda"):
  function do_img2img (line 243) | def do_img2img(

FILE: sgm/lr_scheduler.py
  class LambdaWarmUpCosineScheduler (line 4) | class LambdaWarmUpCosineScheduler:
    method __init__ (line 9) | def __init__(
    method schedule (line 26) | def schedule(self, n, **kwargs):
    method __call__ (line 47) | def __call__(self, n, **kwargs):
  class LambdaWarmUpCosineScheduler2 (line 51) | class LambdaWarmUpCosineScheduler2:
    method __init__ (line 57) | def __init__(
    method find_in_interval (line 76) | def find_in_interval(self, n):
    method schedule (line 83) | def schedule(self, n, **kwargs):
    method __call__ (line 109) | def __call__(self, n, **kwargs):
  class LambdaLinearScheduler (line 113) | class LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):
    method schedule (line 114) | def schedule(self, n, **kwargs):

FILE: sgm/models/autoencoder.py
  class AbstractAutoencoder (line 22) | class AbstractAutoencoder(pl.LightningModule):
    method __init__ (line 29) | def __init__(
    method apply_ckpt (line 49) | def apply_ckpt(self, ckpt: Union[None, str, dict]):
    method get_input (line 61) | def get_input(self, batch) -> Any:
    method on_train_batch_end (line 64) | def on_train_batch_end(self, *args, **kwargs):
    method ema_scope (line 70) | def ema_scope(self, context=None):
    method encode (line 85) | def encode(self, *args, **kwargs) -> torch.Tensor:
    method decode (line 89) | def decode(self, *args, **kwargs) -> torch.Tensor:
    method instantiate_optimizer_from_config (line 92) | def instantiate_optimizer_from_config(self, params, lr, cfg):
    method configure_optimizers (line 98) | def configure_optimizers(self) -> Any:
  class AutoencodingEngine (line 102) | class AutoencodingEngine(AbstractAutoencoder):
    method __init__ (line 109) | def __init__(
    method get_input (line 170) | def get_input(self, batch: Dict) -> torch.Tensor:
    method get_autoencoder_params (line 176) | def get_autoencoder_params(self) -> list:
    method get_discriminator_params (line 186) | def get_discriminator_params(self) -> list:
    method get_last_layer (line 193) | def get_last_layer(self):
    method encode (line 196) | def encode(
    method decode (line 210) | def decode(self, z: torch.Tensor, **kwargs) -> torch.Tensor:
    method forward (line 214) | def forward(
    method inner_training_step (line 221) | def inner_training_step(
    method training_step (line 281) | def training_step(self, batch: dict, batch_idx: int):
    method validation_step (line 298) | def validation_step(self, batch: dict, batch_idx: int) -> Dict:
    method _validation_step (line 305) | def _validation_step(self, batch: dict, batch_idx: int, postfix: str =...
    method get_param_groups (line 343) | def get_param_groups(
    method configure_optimizers (line 363) | def configure_optimizers(self) -> List[torch.optim.Optimizer]:
    method log_images (line 395) | def log_images(
  class AutoencodingEngineLegacy (line 437) | class AutoencodingEngineLegacy(AutoencodingEngine):
    method __init__ (line 438) | def __init__(self, embed_dim: int, **kwargs):
    method get_autoencoder_params (line 464) | def get_autoencoder_params(self) -> list:
    method encode (line 468) | def encode(
    method decode (line 490) | def decode(self, z: torch.Tensor, **decoder_kwargs) -> torch.Tensor:
  class AutoencoderKL (line 508) | class AutoencoderKL(AutoencodingEngineLegacy):
    method __init__ (line 509) | def __init__(self, **kwargs):
  class AutoencoderLegacyVQ (line 523) | class AutoencoderLegacyVQ(AutoencodingEngineLegacy):
    method __init__ (line 524) | def __init__(
  class IdentityFirstStage (line 549) | class IdentityFirstStage(AbstractAutoencoder):
    method __init__ (line 550) | def __init__(self, *args, **kwargs):
    method get_input (line 553) | def get_input(self, x: Any) -> Any:
    method encode (line 556) | def encode(self, x: Any, *args, **kwargs) -> Any:
    method decode (line 559) | def decode(self, x: Any, *args, **kwargs) -> Any:
  class AEIntegerWrapper (line 563) | class AEIntegerWrapper(nn.Module):
    method __init__ (line 564) | def __init__(
    method encode (line 580) | def encode(self, x) -> torch.Tensor:
    method decode (line 589) | def decode(
  class AutoencoderKLModeOnly (line 602) | class AutoencoderKLModeOnly(AutoencodingEngineLegacy):
    method __init__ (line 603) | def __init__(self, **kwargs):

FILE: sgm/models/diffusion.py
  class DiffusionEngine (line 19) | class DiffusionEngine(pl.LightningModule):
    method __init__ (line 20) | def __init__(
    method init_from_ckpt (line 85) | def init_from_ckpt(
    method _init_first_stage (line 105) | def _init_first_stage(self, config):
    method get_input (line 112) | def get_input(self, batch):
    method decode_first_stage (line 118) | def decode_first_stage(self, z):
    method encode_first_stage (line 138) | def encode_first_stage(self, x):
    method forward (line 152) | def forward(self, x, batch):
    method shared_step (line 158) | def shared_step(self, batch: Dict) -> Any:
    method training_step (line 165) | def training_step(self, batch, batch_idx):
    method on_train_start (line 189) | def on_train_start(self, *args, **kwargs):
    method on_train_batch_end (line 193) | def on_train_batch_end(self, *args, **kwargs):
    method ema_scope (line 198) | def ema_scope(self, context=None):
    method instantiate_optimizer_from_config (line 212) | def instantiate_optimizer_from_config(self, params, lr, cfg):
    method configure_optimizers (line 217) | def configure_optimizers(self):
    method sample (line 238) | def sample(
    method log_conditionings (line 255) | def log_conditionings(self, batch: Dict, n: int) -> Dict:
    method log_images (line 294) | def log_images(

FILE: sgm/modules/attention.py
  function exists (line 61) | def exists(val):
  function uniq (line 65) | def uniq(arr):
  function default (line 69) | def default(val, d):
  function max_neg_value (line 75) | def max_neg_value(t):
  function init_ (line 79) | def init_(tensor):
  class GEGLU (line 87) | class GEGLU(nn.Module):
    method __init__ (line 88) | def __init__(self, dim_in, dim_out):
    method forward (line 92) | def forward(self, x):
  class FeedForward (line 97) | class FeedForward(nn.Module):
    method __init__ (line 98) | def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.0):
    method forward (line 112) | def forward(self, x):
  function zero_module (line 116) | def zero_module(module):
  function Normalize (line 125) | def Normalize(in_channels):
  class LinearAttention (line 131) | class LinearAttention(nn.Module):
    method __init__ (line 132) | def __init__(self, dim, heads=4, dim_head=32):
    method forward (line 139) | def forward(self, x):
  class SelfAttention (line 154) | class SelfAttention(nn.Module):
    method __init__ (line 157) | def __init__(
    method forward (line 179) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class SpatialSelfAttention (line 210) | class SpatialSelfAttention(nn.Module):
    method __init__ (line 211) | def __init__(self, in_channels):
    method forward (line 229) | def forward(self, x):
  class CrossAttention (line 255) | class CrossAttention(nn.Module):
    method __init__ (line 256) | def __init__(
    method forward (line 281) | def forward(
  class MemoryEfficientCrossAttention (line 347) | class MemoryEfficientCrossAttention(nn.Module):
    method __init__ (line 349) | def __init__(
    method forward (line 373) | def forward(
  class BasicTransformerBlock (line 456) | class BasicTransformerBlock(nn.Module):
    method __init__ (line 462) | def __init__(
    method forward (line 527) | def forward(
    method _forward (line 551) | def _forward(
  class BasicTransformerSingleLayerBlock (line 575) | class BasicTransformerSingleLayerBlock(nn.Module):
    method __init__ (line 582) | def __init__(
    method forward (line 608) | def forward(self, x, context=None):
    method _forward (line 613) | def _forward(self, x, context=None):
  class SpatialTransformer (line 619) | class SpatialTransformer(nn.Module):
    method __init__ (line 629) | def __init__(
    method forward (line 702) | def forward(self, x, context=None):
  class SimpleTransformer (line 726) | class SimpleTransformer(nn.Module):
    method __init__ (line 727) | def __init__(
    method forward (line 752) | def forward(

FILE: sgm/modules/autoencoding/losses/discriminator_loss.py
  class GeneralLPIPSWithDiscriminator (line 17) | class GeneralLPIPSWithDiscriminator(nn.Module):
    method __init__ (line 18) | def __init__(
    method get_trainable_parameters (line 85) | def get_trainable_parameters(self) -> Iterator[nn.Parameter]:
    method get_trainable_autoencoder_parameters (line 88) | def get_trainable_autoencoder_parameters(self) -> Iterator[nn.Parameter]:
    method log_images (line 94) | def log_images(
    method calculate_adaptive_weight (line 196) | def calculate_adaptive_weight(
    method forward (line 207) | def forward(
    method get_nll_loss (line 294) | def get_nll_loss(

FILE: sgm/modules/autoencoding/losses/lpips.py
  class LatentLPIPS (line 8) | class LatentLPIPS(nn.Module):
    method __init__ (line 9) | def __init__(
    method init_decoder (line 27) | def init_decoder(self, config):
    method forward (line 32) | def forward(self, latent_inputs, latent_predictions, image_inputs, spl...

FILE: sgm/modules/autoencoding/lpips/loss/lpips.py
  class LPIPS (line 12) | class LPIPS(nn.Module):
    method __init__ (line 14) | def __init__(self, use_dropout=True):
    method load_from_pretrained (line 28) | def load_from_pretrained(self, name="vgg_lpips"):
    method from_pretrained (line 36) | def from_pretrained(cls, name="vgg_lpips"):
    method forward (line 46) | def forward(self, input, target):
  class ScalingLayer (line 67) | class ScalingLayer(nn.Module):
    method __init__ (line 68) | def __init__(self):
    method forward (line 77) | def forward(self, inp):
  class NetLinLayer (line 81) | class NetLinLayer(nn.Module):
    method __init__ (line 84) | def __init__(self, chn_in, chn_out=1, use_dropout=False):
  class vgg16 (line 99) | class vgg16(torch.nn.Module):
    method __init__ (line 100) | def __init__(self, requires_grad=False, pretrained=True):
    method forward (line 123) | def forward(self, X):
  function normalize_tensor (line 141) | def normalize_tensor(x, eps=1e-10):
  function spatial_average (line 146) | def spatial_average(x, keepdim=True):

FILE: sgm/modules/autoencoding/lpips/model/model.py
  function weights_init (line 8) | def weights_init(m):
  class NLayerDiscriminator (line 17) | class NLayerDiscriminator(nn.Module):
    method __init__ (line 22) | def __init__(self, input_nc=3, ndf=64, n_layers=3, use_actnorm=False):
    method forward (line 86) | def forward(self, input):

FILE: sgm/modules/autoencoding/lpips/util.py
  function download (line 16) | def download(url, local_path, chunk_size=1024):
  function md5_hash (line 28) | def md5_hash(path):
  function get_ckpt_path (line 34) | def get_ckpt_path(name, root, check=False):
  class ActNorm (line 45) | class ActNorm(nn.Module):
    method __init__ (line 46) | def __init__(
    method initialize (line 58) | def initialize(self, input):
    method forward (line 79) | def forward(self, input, reverse=False):
    method reverse (line 107) | def reverse(self, output):

FILE: sgm/modules/autoencoding/lpips/vqperceptual.py
  function hinge_d_loss (line 5) | def hinge_d_loss(logits_real, logits_fake):
  function vanilla_d_loss (line 12) | def vanilla_d_loss(logits_real, logits_fake):

FILE: sgm/modules/autoencoding/regularizers/__init__.py
  class DiagonalGaussianRegularizer (line 13) | class DiagonalGaussianRegularizer(AbstractRegularizer):
    method __init__ (line 14) | def __init__(self, sample: bool = True):
    method get_trainable_parameters (line 18) | def get_trainable_parameters(self) -> Any:
    method forward (line 21) | def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, dict]:

FILE: sgm/modules/autoencoding/regularizers/base.py
  class AbstractRegularizer (line 9) | class AbstractRegularizer(nn.Module):
    method __init__ (line 10) | def __init__(self):
    method forward (line 13) | def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, dict]:
    method get_trainable_parameters (line 17) | def get_trainable_parameters(self) -> Any:
  class IdentityRegularizer (line 21) | class IdentityRegularizer(AbstractRegularizer):
    method forward (line 22) | def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, dict]:
    method get_trainable_parameters (line 25) | def get_trainable_parameters(self) -> Any:
  function measure_perplexity (line 29) | def measure_perplexity(

FILE: sgm/modules/autoencoding/regularizers/quantize.py
  class AbstractQuantizer (line 17) | class AbstractQuantizer(AbstractRegularizer):
    method __init__ (line 18) | def __init__(self):
    method remap_to_used (line 26) | def remap_to_used(self, inds: torch.Tensor) -> torch.Tensor:
    method unmap_to_all (line 43) | def unmap_to_all(self, inds: torch.Tensor) -> torch.Tensor:
    method get_codebook_entry (line 55) | def get_codebook_entry(
    method get_trainable_parameters (line 60) | def get_trainable_parameters(self) -> Iterator[torch.nn.Parameter]:
  class GumbelQuantizer (line 64) | class GumbelQuantizer(AbstractQuantizer):
    method __init__ (line 73) | def __init__(
    method forward (line 119) | def forward(
    method get_codebook_entry (line 158) | def get_codebook_entry(self, indices, shape):
  class VectorQuantizer (line 172) | class VectorQuantizer(AbstractQuantizer):
    method __init__ (line 184) | def __init__(
    method forward (line 234) | def forward(
    method get_codebook_entry (line 302) | def get_codebook_entry(
  class EmbeddingEMA (line 323) | class EmbeddingEMA(nn.Module):
    method __init__ (line 324) | def __init__(self, num_tokens, codebook_dim, decay=0.99, eps=1e-5):
    method forward (line 334) | def forward(self, embed_id):
    method cluster_size_ema_update (line 337) | def cluster_size_ema_update(self, new_cluster_size):
    method embed_avg_ema_update (line 342) | def embed_avg_ema_update(self, new_embed_avg):
    method weight_update (line 345) | def weight_update(self, num_tokens):
  class EMAVectorQuantizer (line 355) | class EMAVectorQuantizer(AbstractQuantizer):
    method __init__ (line 356) | def __init__(
    method forward (line 396) | def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, Dict]:
  class VectorQuantizerWithInputProjection (line 446) | class VectorQuantizerWithInputProjection(VectorQuantizer):
    method __init__ (line 447) | def __init__(
    method forward (line 464) | def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, Dict]:

FILE: sgm/modules/autoencoding/temporal_ae.py
  class VideoResBlock (line 16) | class VideoResBlock(ResnetBlock):
    method __init__ (line 17) | def __init__(
    method get_alpha (line 54) | def get_alpha(self, bs):
    method forward (line 62) | def forward(self, x, temb, skip_video=False, timesteps=None):
  class AE3DConv (line 84) | class AE3DConv(torch.nn.Conv2d):
    method __init__ (line 85) | def __init__(self, in_channels, out_channels, video_kernel_size=3, *ar...
    method forward (line 99) | def forward(self, input, timesteps, skip_video=False):
  class VideoBlock (line 108) | class VideoBlock(AttnBlock):
    method __init__ (line 109) | def __init__(
    method forward (line 140) | def forward(self, x, timesteps, skip_video=False):
    method get_alpha (line 167) | def get_alpha(
  class MemoryEfficientVideoBlock (line 178) | class MemoryEfficientVideoBlock(MemoryEfficientAttnBlock):
    method __init__ (line 179) | def __init__(
    method forward (line 210) | def forward(self, x, timesteps, skip_time_block=False):
    method get_alpha (line 237) | def get_alpha(
  function make_time_attn (line 248) | def make_time_attn(
  class Conv2DWrapper (line 286) | class Conv2DWrapper(torch.nn.Conv2d):
    method forward (line 287) | def forward(self, input: torch.Tensor, **kwargs) -> torch.Tensor:
  class VideoDecoder (line 291) | class VideoDecoder(Decoder):
    method __init__ (line 294) | def __init__(
    method get_last_layer (line 312) | def get_last_layer(self, skip_time_mix=False, **kwargs):
    method _make_attn (line 322) | def _make_attn(self) -> Callable:
    method _make_conv (line 332) | def _make_conv(self) -> Callable:
    method _make_resblock (line 338) | def _make_resblock(self) -> Callable:

FILE: sgm/modules/diffusionmodules/denoiser.py
  class Denoiser (line 11) | class Denoiser(nn.Module):
    method __init__ (line 12) | def __init__(self, scaling_config: Dict):
    method possibly_quantize_sigma (line 17) | def possibly_quantize_sigma(self, sigma: torch.Tensor) -> torch.Tensor:
    method possibly_quantize_c_noise (line 20) | def possibly_quantize_c_noise(self, c_noise: torch.Tensor) -> torch.Te...
    method forward (line 23) | def forward(
  class DiscreteDenoiser (line 42) | class DiscreteDenoiser(Denoiser):
    method __init__ (line 43) | def __init__(
    method sigma_to_idx (line 61) | def sigma_to_idx(self, sigma: torch.Tensor) -> torch.Tensor:
    method idx_to_sigma (line 65) | def idx_to_sigma(self, idx: Union[torch.Tensor, int]) -> torch.Tensor:
    method possibly_quantize_sigma (line 68) | def possibly_quantize_sigma(self, sigma: torch.Tensor) -> torch.Tensor:
    method possibly_quantize_c_noise (line 71) | def possibly_quantize_c_noise(self, c_noise: torch.Tensor) -> torch.Te...

FILE: sgm/modules/diffusionmodules/denoiser_scaling.py
  class DenoiserScaling (line 7) | class DenoiserScaling(ABC):
    method __call__ (line 9) | def __call__(
  class EDMScaling (line 15) | class EDMScaling:
    method __init__ (line 16) | def __init__(self, sigma_data: float = 0.5):
    method __call__ (line 19) | def __call__(
  class EpsScaling (line 29) | class EpsScaling:
    method __call__ (line 30) | def __call__(
  class VScaling (line 40) | class VScaling:
    method __call__ (line 41) | def __call__(
  class VScalingWithEDMcNoise (line 51) | class VScalingWithEDMcNoise(DenoiserScaling):
    method __call__ (line 52) | def __call__(

FILE: sgm/modules/diffusionmodules/denoiser_weighting.py
  class UnitWeighting (line 4) | class UnitWeighting:
    method __call__ (line 5) | def __call__(self, sigma):
  class EDMWeighting (line 9) | class EDMWeighting:
    method __init__ (line 10) | def __init__(self, sigma_data=0.5):
    method __call__ (line 13) | def __call__(self, sigma):
  class VWeighting (line 17) | class VWeighting(EDMWeighting):
    method __init__ (line 18) | def __init__(self):
  class EpsWeighting (line 22) | class EpsWeighting:
    method __call__ (line 23) | def __call__(self, sigma):

FILE: sgm/modules/diffusionmodules/discretizer.py
  function generate_roughly_equally_spaced_steps (line 11) | def generate_roughly_equally_spaced_steps(
  class Discretization (line 17) | class Discretization:
    method __call__ (line 18) | def __call__(self, n, do_append_zero=True, device="cpu", flip=False):
    method get_sigmas (line 24) | def get_sigmas(self, n, device):
  class EDMDiscretization (line 28) | class EDMDiscretization(Discretization):
    method __init__ (line 29) | def __init__(self, sigma_min=0.002, sigma_max=80.0, rho=7.0):
    method get_sigmas (line 34) | def get_sigmas(self, n, device="cpu"):
  class LegacyDDPMDiscretization (line 42) | class LegacyDDPMDiscretization(Discretization):
    method __init__ (line 43) | def __init__(
    method get_sigmas (line 58) | def get_sigmas(self, n, device="cpu"):

FILE: sgm/modules/diffusionmodules/guiders.py
  class Guider (line 13) | class Guider(ABC):
    method __call__ (line 15) | def __call__(self, x: torch.Tensor, sigma: float) -> torch.Tensor:
    method prepare_inputs (line 18) | def prepare_inputs(
  class VanillaCFG (line 24) | class VanillaCFG(Guider):
    method __init__ (line 25) | def __init__(self, scale: float):
    method __call__ (line 28) | def __call__(self, x: torch.Tensor, sigma: torch.Tensor) -> torch.Tensor:
    method prepare_inputs (line 33) | def prepare_inputs(self, x, s, c, uc):
  class IdentityGuider (line 45) | class IdentityGuider(Guider):
    method __call__ (line 46) | def __call__(self, x: torch.Tensor, sigma: float) -> torch.Tensor:
    method prepare_inputs (line 49) | def prepare_inputs(
  class LinearPredictionGuider (line 60) | class LinearPredictionGuider(Guider):
    method __init__ (line 61) | def __init__(
    method __call__ (line 78) | def __call__(self, x: torch.Tensor, sigma: torch.Tensor) -> torch.Tensor:
    method prepare_inputs (line 88) | def prepare_inputs(
  class TrianglePredictionGuider (line 102) | class TrianglePredictionGuider(LinearPredictionGuider):
    method __init__ (line 103) | def __init__(
    method triangle_wave (line 130) | def triangle_wave(self, values: torch.Tensor, period) -> torch.Tensor:
  class TrapezoidPredictionGuider (line 134) | class TrapezoidPredictionGuider(LinearPredictionGuider):
    method __init__ (line 135) | def __init__(
  class SpatiotemporalPredictionGuider (line 156) | class SpatiotemporalPredictionGuider(LinearPredictionGuider):
    method __init__ (line 157) | def __init__(
    method triangle_wave (line 174) | def triangle_wave(self, values: torch.Tensor, period=1) -> torch.Tensor:

FILE: sgm/modules/diffusionmodules/loss.py
  class StandardDiffusionLoss (line 12) | class StandardDiffusionLoss(nn.Module):
    method __init__ (line 13) | def __init__(
    method get_noised_input (line 42) | def get_noised_input(
    method forward (line 48) | def forward(
    method _forward (line 59) | def _forward(
    method get_loss (line 92) | def get_loss(self, model_output, target, w):

FILE: sgm/modules/diffusionmodules/loss_weighting.py
  class DiffusionLossWeighting (line 6) | class DiffusionLossWeighting(ABC):
    method __call__ (line 8) | def __call__(self, sigma: torch.Tensor) -> torch.Tensor:
  class UnitWeighting (line 12) | class UnitWeighting(DiffusionLossWeighting):
    method __call__ (line 13) | def __call__(self, sigma: torch.Tensor) -> torch.Tensor:
  class EDMWeighting (line 17) | class EDMWeighting(DiffusionLossWeighting):
    method __init__ (line 18) | def __init__(self, sigma_data: float = 0.5):
    method __call__ (line 21) | def __call__(self, sigma: torch.Tensor) -> torch.Tensor:
  class VWeighting (line 25) | class VWeighting(EDMWeighting):
    method __init__ (line 26) | def __init__(self):
  class EpsWeighting (line 30) | class EpsWeighting(DiffusionLossWeighting):
    method __call__ (line 31) | def __call__(self, sigma: torch.Tensor) -> torch.Tensor:

FILE: sgm/modules/diffusionmodules/model.py
  function get_timestep_embedding (line 26) | def get_timestep_embedding(timesteps, embedding_dim):
  function nonlinearity (line 47) | def nonlinearity(x):
  function Normalize (line 52) | def Normalize(in_channels, num_groups=32):
  class Upsample (line 58) | class Upsample(nn.Module):
    method __init__ (line 59) | def __init__(self, in_channels, with_conv):
    method forward (line 67) | def forward(self, x):
  class Downsample (line 74) | class Downsample(nn.Module):
    method __init__ (line 75) | def __init__(self, in_channels, with_conv):
    method forward (line 84) | def forward(self, x):
  class ResnetBlock (line 94) | class ResnetBlock(nn.Module):
    method __init__ (line 95) | def __init__(
    method forward (line 131) | def forward(self, x, temb):
  class LinAttnBlock (line 154) | class LinAttnBlock(LinearAttention):
    method __init__ (line 157) | def __init__(self, in_channels):
  class AttnBlock (line 161) | class AttnBlock(nn.Module):
    method __init__ (line 162) | def __init__(self, in_channels):
    method attention (line 180) | def attention(self, h_: torch.Tensor) -> torch.Tensor:
    method forward (line 197) | def forward(self, x, **kwargs):
  class MemoryEfficientAttnBlock (line 204) | class MemoryEfficientAttnBlock(nn.Module):
    method __init__ (line 212) | def __init__(self, in_channels):
    method attention (line 231) | def attention(self, h_: torch.Tensor) -> torch.Tensor:
    method forward (line 261) | def forward(self, x, **kwargs):
  class MemoryEfficientCrossAttentionWrapper (line 268) | class MemoryEfficientCrossAttentionWrapper(MemoryEfficientCrossAttention):
    method forward (line 269) | def forward(self, x, context=None, mask=None, **unused_kwargs):
  function make_attn (line 277) | def make_attn(in_channels, attn_type="vanilla", attn_kwargs=None):
  class Model (line 312) | class Model(nn.Module):
    method __init__ (line 313) | def __init__(
    method forward (line 434) | def forward(self, x, t=None, context=None):
    method get_last_layer (line 483) | def get_last_layer(self):
  class Encoder (line 487) | class Encoder(nn.Module):
    method __init__ (line 488) | def __init__(
    method forward (line 576) | def forward(self, x):
  class Decoder (line 604) | class Decoder(nn.Module):
    method __init__ (line 605) | def __init__(
    method _make_attn (line 703) | def _make_attn(self) -> Callable:
    method _make_resblock (line 706) | def _make_resblock(self) -> Callable:
    method _make_conv (line 709) | def _make_conv(self) -> Callable:
    method get_last_layer (line 712) | def get_last_layer(self, **kwargs):
    method forward (line 715) | def forward(self, z, **kwargs):

FILE: sgm/modules/diffusionmodules/openaimodel.py
  class AttentionPool2d (line 22) | class AttentionPool2d(nn.Module):
    method __init__ (line 27) | def __init__(
    method forward (line 43) | def forward(self, x: th.Tensor) -> th.Tensor:
  class TimestepBlock (line 54) | class TimestepBlock(nn.Module):
    method forward (line 60) | def forward(self, x: th.Tensor, emb: th.Tensor):
  class TimestepEmbedSequential (line 66) | class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
    method forward (line 72) | def forward(
  class Upsample (line 157) | class Upsample(nn.Module):
    method __init__ (line 166) | def __init__(
    method forward (line 189) | def forward(self, x: th.Tensor) -> th.Tensor:
  class Downsample (line 210) | class Downsample(nn.Module):
    method __init__ (line 219) | def __init__(
    method forward (line 254) | def forward(self, x: th.Tensor) -> th.Tensor:
  class ResBlock (line 260) | class ResBlock(TimestepBlock):
    method __init__ (line 276) | def __init__(
    method forward (line 366) | def forward(self, x: th.Tensor, emb: th.Tensor) -> th.Tensor:
    method _forward (line 378) | def _forward(self, x: th.Tensor, emb: th.Tensor) -> th.Tensor:
  class AttentionBlock (line 407) | class AttentionBlock(nn.Module):
    method __init__ (line 414) | def __init__(
    method forward (line 443) | def forward(self, x: th.Tensor, **kwargs) -> th.Tensor:
    method _forward (line 446) | def _forward(self, x: th.Tensor) -> th.Tensor:
  class QKVAttentionLegacy (line 455) | class QKVAttentionLegacy(nn.Module):
    method __init__ (line 460) | def __init__(self, n_heads: int):
    method forward (line 464) | def forward(self, qkv: th.Tensor) -> th.Tensor:
  class QKVAttention (line 483) | class QKVAttention(nn.Module):
    method __init__ (line 488) | def __init__(self, n_heads: int):
    method forward (line 492) | def forward(self, qkv: th.Tensor) -> th.Tensor:
  class Timestep (line 513) | class Timestep(nn.Module):
    method __init__ (line 514) | def __init__(self, dim: int):
    method forward (line 518) | def forward(self, t: th.Tensor) -> th.Tensor:
  class UNetModel (line 522) | class UNetModel(nn.Module):
    method __init__ (line 552) | def __init__(
    method forward (line 866) | def forward(

FILE: sgm/modules/diffusionmodules/sampling.py
  class BaseDiffusionSampler (line 21) | class BaseDiffusionSampler:
    method __init__ (line 22) | def __init__(
    method prepare_sampling_loop (line 41) | def prepare_sampling_loop(self, x, cond, uc=None, num_steps=None):
    method denoise (line 54) | def denoise(self, x, denoiser, sigma, cond, uc):
    method get_sigma_gen (line 59) | def get_sigma_gen(self, num_sigmas):
  class SingleStepDiffusionSampler (line 74) | class SingleStepDiffusionSampler(BaseDiffusionSampler):
    method sampler_step (line 75) | def sampler_step(self, sigma, next_sigma, denoiser, x, cond, uc, *args...
    method euler_step (line 78) | def euler_step(self, x, d, dt):
  class EDMSampler (line 82) | class EDMSampler(SingleStepDiffusionSampler):
    method __init__ (line 83) | def __init__(
    method sampler_step (line 93) | def sampler_step(self, sigma, next_sigma, denoiser, x, cond, uc=None, ...
    method __call__ (line 109) | def __call__(self, denoiser, x, cond, uc=None, num_steps=None):
  class AncestralSampler (line 133) | class AncestralSampler(SingleStepDiffusionSampler):
    method __init__ (line 134) | def __init__(self, eta=1.0, s_noise=1.0, *args, **kwargs):
    method ancestral_euler_step (line 141) | def ancestral_euler_step(self, x, denoised, sigma, sigma_down):
    method ancestral_step (line 147) | def ancestral_step(self, x, sigma, next_sigma, sigma_up):
    method __call__ (line 155) | def __call__(self, denoiser, x, cond, uc=None, num_steps=None):
  class LinearMultistepSampler (line 173) | class LinearMultistepSampler(BaseDiffusionSampler):
    method __init__ (line 174) | def __init__(
    method __call__ (line 184) | def __call__(self, denoiser, x, cond, uc=None, num_steps=None, **kwargs):
  class EulerEDMSampler (line 211) | class EulerEDMSampler(EDMSampler):
    method possible_correction_step (line 212) | def possible_correction_step(
  class HeunEDMSampler (line 218) | class HeunEDMSampler(EDMSampler):
    method possible_correction_step (line 219) | def possible_correction_step(
  class EulerAncestralSampler (line 237) | class EulerAncestralSampler(AncestralSampler):
    method sampler_step (line 238) | def sampler_step(self, sigma, next_sigma, denoiser, x, cond, uc):
  class DPMPP2SAncestralSampler (line 247) | class DPMPP2SAncestralSampler(AncestralSampler):
    method get_variables (line 248) | def get_variables(self, sigma, sigma_down):
    method get_mult (line 254) | def get_mult(self, h, s, t, t_next):
    method sampler_step (line 262) | def sampler_step(self, sigma, next_sigma, denoiser, x, cond, uc=None, ...
  class DPMPP2MSampler (line 287) | class DPMPP2MSampler(BaseDiffusionSampler):
    method get_variables (line 288) | def get_variables(self, sigma, next_sigma, previous_sigma=None):
    method get_mult (line 299) | def get_mult(self, h, r, t, t_next, previous_sigma):
    method sampler_step (line 310) | def sampler_step(
    method __call__ (line 344) | def __call__(self, denoiser, x, cond, uc=None, num_steps=None, **kwargs):

FILE: sgm/modules/diffusionmodules/sampling_utils.py
  function linear_multistep_coeff (line 7) | def linear_multistep_coeff(order, t, i, j, epsrel=1e-4):
  function get_ancestral_step (line 22) | def get_ancestral_step(sigma_from, sigma_to, eta=1.0):
  function to_d (line 34) | def to_d(x, sigma, denoised):
  function to_neg_log_sigma (line 38) | def to_neg_log_sigma(sigma):
  function to_sigma (line 42) | def to_sigma(neg_log_sigma):

FILE: sgm/modules/diffusionmodules/sigma_sampling.py
  class EDMSampling (line 6) | class EDMSampling:
    method __init__ (line 7) | def __init__(self, p_mean=-1.2, p_std=1.2):
    method __call__ (line 11) | def __call__(self, n_samples, rand=None):
  class DiscreteSampling (line 16) | class DiscreteSampling:
    method __init__ (line 17) | def __init__(self, discretization_config, num_idx, do_append_zero=Fals...
    method idx_to_sigma (line 23) | def idx_to_sigma(self, idx):
    method __call__ (line 26) | def __call__(self, n_samples, rand=None):
  class ZeroSampler (line 34) | class ZeroSampler:
    method __call__ (line 35) | def __call__(

FILE: sgm/modules/diffusionmodules/util.py
  function get_alpha (line 20) | def get_alpha(
  function make_beta_schedule (line 50) | def make_beta_schedule(
  function extract_into_tensor (line 66) | def extract_into_tensor(a, t, x_shape):
  function mixed_checkpoint (line 72) | def mixed_checkpoint(func, inputs: dict, params, flag):
  class MixedCheckpointFunction (line 108) | class MixedCheckpointFunction(torch.autograd.Function):
    method forward (line 110) | def forward(
    method backward (line 150) | def backward(ctx, *output_grads):
  function checkpoint (line 184) | def checkpoint(func, inputs, params, flag):
  class CheckpointFunction (line 201) | class CheckpointFunction(torch.autograd.Function):
    method forward (line 203) | def forward(ctx, run_function, length, *args):
    method backward (line 217) | def backward(ctx, *output_grads):
  function timestep_embedding (line 237) | def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=Fal...
  function zero_module (line 264) | def zero_module(module):
  function scale_module (line 273) | def scale_module(module, scale):
  function mean_flat (line 282) | def mean_flat(tensor):
  function normalization (line 289) | def normalization(channels):
  class SiLU (line 299) | class SiLU(nn.Module):
    method forward (line 300) | def forward(self, x):
  class GroupNorm32 (line 304) | class GroupNorm32(nn.GroupNorm):
    method forward (line 305) | def forward(self, x):
  function conv_nd (line 309) | def conv_nd(dims, *args, **kwargs):
  function linear (line 322) | def linear(*args, **kwargs):
  function avg_pool_nd (line 329) | def avg_pool_nd(dims, *args, **kwargs):
  class AlphaBlender (line 342) | class AlphaBlender(nn.Module):
    method __init__ (line 345) | def __init__(
    method get_alpha (line 371) | def get_alpha(self, image_only_indicator: torch.Tensor) -> torch.Tensor:
    method forward (line 388) | def forward(

FILE: sgm/modules/diffusionmodules/video_model.py
  class VideoResBlock (line 17) | class VideoResBlock(ResBlock):
    method __init__ (line 18) | def __init__(
    method forward (line 67) | def forward(
  class VideoUNet (line 89) | class VideoUNet(nn.Module):
    method __init__ (line 90) | def __init__(
    method forward (line 447) | def forward(
  class PostHocAttentionBlockWithTimeMixing (line 501) | class PostHocAttentionBlockWithTimeMixing(AttentionBlock):
    method __init__ (line 502) | def __init__(
    method forward (line 570) | def forward(
  class PostHocResBlockWithTime (line 615) | class PostHocResBlockWithTime(ResBlock):
    method __init__ (line 616) | def __init__(
    method forward (line 700) | def forward(
  class SpatialUNetModelWithTime (line 729) | class SpatialUNetModelWithTime(nn.Module):
    method __init__ (line 730) | def __init__(
    method forward (line 1183) | def forward(

FILE: sgm/modules/diffusionmodules/wrappers.py
  class IdentityWrapper (line 8) | class IdentityWrapper(nn.Module):
    method __init__ (line 9) | def __init__(self, diffusion_model, compile_model: bool = False):
    method forward (line 19) | def forward(self, *args, **kwargs):
  class OpenAIWrapper (line 23) | class OpenAIWrapper(IdentityWrapper):
    method forward (line 24) | def forward(

FILE: sgm/modules/distributions/distributions.py
  class AbstractDistribution (line 5) | class AbstractDistribution:
    method sample (line 6) | def sample(self):
    method mode (line 9) | def mode(self):
  class DiracDistribution (line 13) | class DiracDistribution(AbstractDistribution):
    method __init__ (line 14) | def __init__(self, value):
    method sample (line 17) | def sample(self):
    method mode (line 20) | def mode(self):
  class DiagonalGaussianDistribution (line 24) | class DiagonalGaussianDistribution(object):
    method __init__ (line 25) | def __init__(self, parameters, deterministic=False):
    method sample (line 37) | def sample(self):
    method kl (line 43) | def kl(self, other=None):
    method nll (line 62) | def nll(self, sample, dims=[1, 2, 3]):
    method mode (line 71) | def mode(self):
  function normal_kl (line 75) | def normal_kl(mean1, logvar1, mean2, logvar2):

FILE: sgm/modules/ema.py
  class LitEma (line 5) | class LitEma(nn.Module):
    method __init__ (line 6) | def __init__(self, model, decay=0.9999, use_num_upates=True):
    method reset_num_updates (line 29) | def reset_num_updates(self):
    method forward (line 33) | def forward(self, model):
    method copy_to (line 56) | def copy_to(self, model):
    method store (line 65) | def store(self, parameters):
    method restore (line 74) | def restore(self, parameters):

FILE: sgm/modules/encoders/modules.py
  class AbstractEmbModel (line 27) | class AbstractEmbModel(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method is_trainable (line 35) | def is_trainable(self) -> bool:
    method ucg_rate (line 39) | def ucg_rate(self) -> Union[float, torch.Tensor]:
    method input_key (line 43) | def input_key(self) -> str:
    method is_trainable (line 47) | def is_trainable(self, value: bool):
    method ucg_rate (line 51) | def ucg_rate(self, value: Union[float, torch.Tensor]):
    method input_key (line 55) | def input_key(self, value: str):
    method is_trainable (line 59) | def is_trainable(self):
    method ucg_rate (line 63) | def ucg_rate(self):
    method input_key (line 67) | def input_key(self):
  class GeneralConditioner (line 71) | class GeneralConditioner(nn.Module):
    method __init__ (line 75) | def __init__(self, emb_models: Union[List, ListConfig]):
    method possibly_get_ucg_val (line 111) | def possibly_get_ucg_val(self, embedder: AbstractEmbModel, batch: Dict...
    method forward (line 120) | def forward(
    method get_unconditional_conditioning (line 170) | def get_unconditional_conditioning(
  class InceptionV3 (line 191) | class InceptionV3(nn.Module):
    method __init__ (line 195) | def __init__(self, normalize_input=False, **kwargs):
    method forward (line 202) | def forward(self, inp):
  class IdentityEncoder (line 211) | class IdentityEncoder(AbstractEmbModel):
    method encode (line 212) | def encode(self, x):
    method forward (line 215) | def forward(self, x):
  class ClassEmbedder (line 219) | class ClassEmbedder(AbstractEmbModel):
    method __init__ (line 220) | def __init__(self, embed_dim, n_classes=1000, add_sequence_dim=False):
    method forward (line 226) | def forward(self, c):
    method get_unconditional_conditioning (line 232) | def get_unconditional_conditioning(self, bs, device="cuda"):
  class ClassEmbedderForMultiCond (line 241) | class ClassEmbedderForMultiCond(ClassEmbedder):
    method forward (line 242) | def forward(self, batch, key=None, disable_dropout=False):
  class FrozenT5Embedder (line 253) | class FrozenT5Embedder(AbstractEmbModel):
    method __init__ (line 256) | def __init__(
    method freeze (line 267) | def freeze(self):
    method forward (line 273) | def forward(self, text):
    method encode (line 289) | def encode(self, text):
  class FrozenByT5Embedder (line 293) | class FrozenByT5Embedder(AbstractEmbModel):
    method __init__ (line 298) | def __init__(
    method freeze (line 309) | def freeze(self):
    method forward (line 315) | def forward(self, text):
    method encode (line 331) | def encode(self, text):
  class FrozenCLIPEmbedder (line 335) | class FrozenCLIPEmbedder(AbstractEmbModel):
    method __init__ (line 340) | def __init__(
    method freeze (line 365) | def freeze(self):
    method forward (line 372) | def forward(self, text):
    method encode (line 396) | def encode(self, text):
  class FrozenOpenCLIPEmbedder2 (line 400) | class FrozenOpenCLIPEmbedder2(AbstractEmbModel):
    method __init__ (line 407) | def __init__(
    method freeze (line 442) | def freeze(self):
    method forward (line 448) | def forward(self, text):
    method encode_with_transformer (line 458) | def encode_with_transformer(self, text):
    method pool (line 475) | def pool(self, x, text):
    method text_transformer_forward (line 483) | def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    method encode (line 498) | def encode(self, text):
  class FrozenOpenCLIPEmbedder (line 502) | class FrozenOpenCLIPEmbedder(AbstractEmbModel):
    method __init__ (line 509) | def __init__(
    method freeze (line 538) | def freeze(self):
    method forward (line 543) | def forward(self, text):
    method encode_with_transformer (line 548) | def encode_with_transformer(self, text):
    method text_transformer_forward (line 557) | def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    method encode (line 570) | def encode(self, text):
  class FrozenOpenCLIPImageEmbedder (line 574) | class FrozenOpenCLIPImageEmbedder(AbstractEmbModel):
    method __init__ (line 579) | def __init__(
    method preprocess (line 624) | def preprocess(self, x):
    method freeze (line 638) | def freeze(self):
    method forward (line 644) | def forward(self, image, no_dropout=False):
    method encode_with_vision_transformer (line 697) | def encode_with_vision_transformer(self, img):
    method encode (line 731) | def encode(self, text):
  class FrozenCLIPT5Encoder (line 735) | class FrozenCLIPT5Encoder(AbstractEmbModel):
    method __init__ (line 736) | def __init__(
    method encode (line 754) | def encode(self, text):
    method forward (line 757) | def forward(self, text):
  class SpatialRescaler (line 763) | class SpatialRescaler(nn.Module):
    method __init__ (line 764) | def __init__(
    method forward (line 803) | def forward(self, x):
    method encode (line 819) | def encode(self, x):
  class LowScaleEncoder (line 823) | class LowScaleEncoder(nn.Module):
    method __init__ (line 824) | def __init__(
    method register_schedule (line 843) | def register_schedule(
    method q_sample (line 891) | def q_sample(self, x_start, t, noise=None):
    method forward (line 899) | def forward(self, x):
    method decode (line 912) | def decode(self, z):
  class ConcatTimestepEmbedderND (line 917) | class ConcatTimestepEmbedderND(AbstractEmbModel):
    method __init__ (line 920) | def __init__(self, outdim):
    method forward (line 925) | def forward(self, x):
  class GaussianEncoder (line 936) | class GaussianEncoder(Encoder, AbstractEmbModel):
    method __init__ (line 937) | def __init__(
    method forward (line 945) | def forward(self, x) -> Tuple[Dict, torch.Tensor]:
  class VideoPredictionEmbedderWithEncoder (line 955) | class VideoPredictionEmbedderWithEncoder(AbstractEmbModel):
    method __init__ (line 956) | def __init__(
    method forward (line 988) | def forward(
  class FrozenOpenCLIPImagePredictionEmbedder (line 1036) | class FrozenOpenCLIPImagePredictionEmbedder(AbstractEmbModel):
    method __init__ (line 1037) | def __init__(
    method forward (line 1049) | def forward(self, vid):

FILE: sgm/modules/spacetime_attention.py
  class TimeMixSequential (line 16) | class TimeMixSequential(nn.Sequential):
    method forward (line 17) | def forward(self, x, context=None, timesteps=None):
  class BasicTransformerTimeMixBlock (line 24) | class BasicTransformerTimeMixBlock(nn.Module):
    method __init__ (line 30) | def __init__(
    method forward (line 110) | def forward(
    method _forward (line 118) | def _forward(self, x, context=None, timesteps=None):
    method get_last_layer (line 151) | def get_last_layer(self):
  class PostHocSpatialTransformerWithTimeMixing (line 155) | class PostHocSpatialTransformerWithTimeMixing(SpatialTransformer):
    method __init__ (line 156) | def __init__(
    method forward (line 262) | def forward(
  class PostHocSpatialTransformerWithTimeMixingAndMotion (line 352) | class PostHocSpatialTransformerWithTimeMixingAndMotion(SpatialTransformer):
    method __init__ (line 353) | def __init__(
    method forward (line 503) | def forward(

FILE: sgm/modules/video_attention.py
  class TimeMixSequential (line 8) | class TimeMixSequential(nn.Sequential):
    method forward (line 9) | def forward(self, x, context=None, timesteps=None):
  class VideoTransformerBlock (line 16) | class VideoTransformerBlock(nn.Module):
    method __init__ (line 22) | def __init__(
    method forward (line 102) | def forward(
    method _forward (line 110) | def _forward(self, x, context=None, timesteps=None):
    method get_last_layer (line 143) | def get_last_layer(self):
  class SpatialVideoTransformer (line 147) | class SpatialVideoTransformer(SpatialTransformer):
    method __init__ (line 148) | def __init__(
    method forward (line 231) | def forward(

FILE: sgm/util.py
  function disabled_train (line 14) | def disabled_train(self, mode=True):
  function get_string_from_tuple (line 20) | def get_string_from_tuple(s):
  function is_power_of_two (line 36) | def is_power_of_two(n):
  function autocast (line 52) | def autocast(f, enabled=True):
  function load_partial_from_config (line 64) | def load_partial_from_config(config):
  function log_txt_as_img (line 68) | def log_txt_as_img(wh, xc, size=10):
  function partialclass (line 98) | def partialclass(cls, *args, **kwargs):
  function make_path_absolute (line 105) | def make_path_absolute(path):
  function ismap (line 112) | def ismap(x):
  function isimage (line 118) | def isimage(x):
  function isheatmap (line 124) | def isheatmap(x):
  function isneighbors (line 131) | def isneighbors(x):
  function exists (line 137) | def exists(x):
  function expand_dims_like (line 141) | def expand_dims_like(x, y):
  function default (line 147) | def default(val, d):
  function mean_flat (line 153) | def mean_flat(tensor):
  function count_params (line 161) | def count_params(model, verbose=False):
  function instantiate_from_config (line 168) | def instantiate_from_config(config):
  function get_obj_from_str (line 178) | def get_obj_from_str(string, reload=False, invalidate_cache=True):
  function append_zero (line 188) | def append_zero(x):
  function append_dims (line 192) | def append_dims(x, target_dims):
  function load_model_from_config (line 202) | def load_model_from_config(config, ckpt, verbose=True, freeze=True):
  function get_configs_path (line 233) | def get_configs_path() -> str:
  function get_nested_attribute (line 251) | def get_nested_attribute(obj, attribute_path, depth=None, return_key=Fal...

FILE: tests/inference/test_inference.py
  class TestInference (line 19) | class TestInference:
    method pipeline (line 21) | def pipeline(self, request) -> SamplingPipeline:
    method sdxl_pipelines (line 35) | def sdxl_pipelines(self, request) -> Tuple[SamplingPipeline, SamplingP...
    method create_init_image (line 43) | def create_init_image(self, h, w):
    method test_txt2img (line 49) | def test_txt2img(self, pipeline: SamplingPipeline, sampler_enum):
    method test_img2img (line 60) | def test_img2img(self, pipeline: SamplingPipeline, sampler_enum):
    method test_sdxl_with_refiner (line 74) | def test_sdxl_with_refiner(

Download .json

Condensed preview — 120 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (834K chars).

[
  {
    "path": ".github/workflows/black.yml",
    "chars": 346,
    "preview": "name: Run black\non: [pull_request]\n\njobs:\n  lint:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v"
  },
  {
    "path": ".github/workflows/test-build.yaml",
    "chars": 667,
    "preview": "name: Build package\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n\njobs:\n  build:\n    name: Build\n    runs-on: ubu"
  },
  {
    "path": ".github/workflows/test-inference.yml",
    "chars": 1056,
    "preview": "name: Test inference\r\n\r\non:\r\n  pull_request:\r\n  push:\r\n    branches:\r\n      - main\r\n\r\njobs:\r\n  test:\r\n    name: \"Test in"
  },
  {
    "path": ".gitignore",
    "chars": 133,
    "preview": "# extensions\n*.egg-info\n*.py[cod]\n\n# envs\n.pt13\n.pt2\n\n# directories\n/checkpoints\n/dist\n/outputs\n/build\n/src\n/.vscode\n**/"
  },
  {
    "path": "CODEOWNERS",
    "chars": 36,
    "preview": ".github @Stability-AI/infrastructure"
  },
  {
    "path": "LICENSE-CODE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2023 Stability AI\n\nPermission is hereby granted, free of charge, to any person obtaining a co"
  },
  {
    "path": "README.md",
    "chars": 23497,
    "preview": "# Generative Models by Stability AI\n\n![sample1](assets/000.jpg)\n\n## News\n\n\n**May 20, 2025**\n- We are releasing **[Stable"
  },
  {
    "path": "configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml",
    "chars": 2480,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: sgm.models.autoencoder.AutoencodingEngine\n  params:\n    input_key: jpg\n   "
  },
  {
    "path": "configs/example_training/autoencoder/kl-f4/imagenet-kl_f8_8chn.yaml",
    "chars": 2519,
    "preview": "model:\n  base_learning_rate: 4.5e-6\n  target: sgm.models.autoencoder.AutoencodingEngine\n  params:\n    input_key: jpg\n   "
  },
  {
    "path": "configs/example_training/imagenet-f8_cond.yaml",
    "chars": 5292,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.13025\n "
  },
  {
    "path": "configs/example_training/toy/cifar10_cond.yaml",
    "chars": 2524,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    denoiser_config:\n      "
  },
  {
    "path": "configs/example_training/toy/mnist.yaml",
    "chars": 2001,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    denoiser_config:\n      "
  },
  {
    "path": "configs/example_training/toy/mnist_cond.yaml",
    "chars": 2520,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    denoiser_config:\n      "
  },
  {
    "path": "configs/example_training/toy/mnist_cond_discrete_eps.yaml",
    "chars": 2754,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    denoiser_config:\n      "
  },
  {
    "path": "configs/example_training/toy/mnist_cond_l1_loss.yaml",
    "chars": 2542,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    denoiser_config:\n      "
  },
  {
    "path": "configs/example_training/toy/mnist_cond_with_ema.yaml",
    "chars": 2539,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    use_ema: True\n\n    deno"
  },
  {
    "path": "configs/example_training/txt2img-clipl-legacy-ucg-training.yaml",
    "chars": 5188,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.13025\n "
  },
  {
    "path": "configs/example_training/txt2img-clipl.yaml",
    "chars": 5214,
    "preview": "model:\n  base_learning_rate: 1.0e-4\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.13025\n "
  },
  {
    "path": "configs/inference/sd_xl_base.yaml",
    "chars": 2784,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.13025\n    disable_first_stage_autoca"
  },
  {
    "path": "configs/inference/sd_xl_refiner.yaml",
    "chars": 2589,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.13025\n    disable_first_stage_autoca"
  },
  {
    "path": "configs/inference/sv3d_p.yaml",
    "chars": 3756,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "configs/inference/sv3d_u.yaml",
    "chars": 3399,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "configs/inference/svd.yaml",
    "chars": 4138,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "configs/inference/svd_image_decoder.yaml",
    "chars": 3518,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "main.py",
    "chars": 33355,
    "preview": "import argparse\nimport datetime\nimport glob\nimport inspect\nimport os\nimport sys\nfrom inspect import Parameter\nfrom typin"
  },
  {
    "path": "model_licenses/LICENSE-SDXL-Turbo",
    "chars": 7309,
    "preview": "STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE AGREEMENT        \nDated: November 28, 2023\n\n\nBy using or distribu"
  },
  {
    "path": "model_licenses/LICENSE-SDXL0.9",
    "chars": 11565,
    "preview": "SDXL 0.9 RESEARCH LICENSE AGREEMENT\nCopyright (c) Stability AI Ltd.\nThis License Agreement (as may be amended in accorda"
  },
  {
    "path": "model_licenses/LICENSE-SDXL1.0",
    "chars": 14103,
    "preview": "Copyright (c) 2023 Stability AI CreativeML Open RAIL++-M License dated July 26, 2023\n\nSection I: PREAMBLE Multimodal gen"
  },
  {
    "path": "model_licenses/LICENSE-SV3D",
    "chars": 7237,
    "preview": "STABILITY AI NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT\nDated: March 18, 2024\n\n\"Agreement\" means this Stable Non-Commerc"
  },
  {
    "path": "model_licenses/LICENSE-SVD",
    "chars": 5897,
    "preview": "STABLE VIDEO DIFFUSION NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT\t\nDated: November 21, 2023\n\n“AUP” means the Stability A"
  },
  {
    "path": "pyproject.toml",
    "chars": 1284,
    "preview": "[build-system]\nrequires = [\"hatchling\"]\nbuild-backend = \"hatchling.build\"\n\n[project]\nname = \"sgm\"\ndynamic = [\"version\"]\n"
  },
  {
    "path": "pytest.ini",
    "chars": 92,
    "preview": "[pytest]\nmarkers = \n  inference: mark as inference test (deselect with '-m \"not inference\"')"
  },
  {
    "path": "requirements/pt2.txt",
    "chars": 763,
    "preview": "black==23.7.0\nchardet==5.1.0\nclip @ git+https://github.com/openai/CLIP.git\neinops>=0.6.1\nfairscale>=0.4.13\nfire>=0.5.0\nf"
  },
  {
    "path": "scripts/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "scripts/demo/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "scripts/demo/detect.py",
    "chars": 5172,
    "preview": "import argparse\n\nimport cv2\nimport numpy as np\n\ntry:\n    from imwatermark import WatermarkDecoder\nexcept ImportError as "
  },
  {
    "path": "scripts/demo/discretization.py",
    "chars": 2180,
    "preview": "import torch\n\nfrom sgm.modules.diffusionmodules.discretizer import Discretization\n\n\nclass Img2ImgDiscretizationWrapper:\n"
  },
  {
    "path": "scripts/demo/gradio_app.py",
    "chars": 11905,
    "preview": "# Adding this at the very top of app.py to make 'generative-models' directory discoverable\nimport os\nimport sys\n\nsys.pat"
  },
  {
    "path": "scripts/demo/gradio_app_sv4d.py",
    "chars": 17854,
    "preview": "# Adding this at the very top of app.py to make 'generative-models' directory discoverable\nimport os\nimport sys\n\nsys.pat"
  },
  {
    "path": "scripts/demo/sampling.py",
    "chars": 9521,
    "preview": "from pytorch_lightning import seed_everything\n\nfrom scripts.demo.streamlit_helpers import *\n\nSAVE_PATH = \"outputs/demo/t"
  },
  {
    "path": "scripts/demo/streamlit_helpers.py",
    "chars": 31366,
    "preview": "import copy\nimport math\nimport os\nfrom glob import glob\nfrom typing import Dict, List, Optional, Tuple, Union\n\nimport cv"
  },
  {
    "path": "scripts/demo/sv3d_helpers.py",
    "chars": 4149,
    "preview": "import os\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n\ndef generate_dynamic_cycle_xy_values(\n    length=21,\n   "
  },
  {
    "path": "scripts/demo/sv4d_helpers.py",
    "chars": 46228,
    "preview": "import math\nimport os\nfrom glob import glob\nfrom pathlib import Path\nfrom typing import Dict, List, Optional, Tuple, Uni"
  },
  {
    "path": "scripts/demo/turbo.py",
    "chars": 6422,
    "preview": "from st_keyup import st_keyup\nfrom streamlit_helpers import *\n\nfrom sgm.modules.diffusionmodules.sampling import EulerAn"
  },
  {
    "path": "scripts/demo/video_sampling.py",
    "chars": 8672,
    "preview": "import os\nimport sys\n\nsys.path.append(os.path.realpath(os.path.join(os.path.dirname(__file__), \"../../\")))\nfrom pytorch_"
  },
  {
    "path": "scripts/sampling/configs/sv3d_p.yaml",
    "chars": 4210,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/sv3d_u.yaml",
    "chars": 3853,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/sv4d.yaml",
    "chars": 6735,
    "preview": "N_TIME: 5\nN_VIEW: 8\nN_FRAMES: 40\n\nmodel:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18"
  },
  {
    "path": "scripts/sampling/configs/sv4d2.yaml",
    "chars": 6929,
    "preview": "N_TIME: 12\nN_VIEW: 4\nN_FRAMES: 48\n\nmodel:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.1"
  },
  {
    "path": "scripts/sampling/configs/sv4d2_8views.yaml",
    "chars": 6936,
    "preview": "N_TIME: 5\nN_VIEW: 8\nN_FRAMES: 40\n\nmodel:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18"
  },
  {
    "path": "scripts/sampling/configs/svd.yaml",
    "chars": 4613,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/svd_image_decoder.yaml",
    "chars": 4007,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/svd_xt.yaml",
    "chars": 4616,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/svd_xt_1_1.yaml",
    "chars": 4621,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/configs/svd_xt_image_decoder.yaml",
    "chars": 4010,
    "preview": "model:\n  target: sgm.models.diffusion.DiffusionEngine\n  params:\n    scale_factor: 0.18215\n    disable_first_stage_autoca"
  },
  {
    "path": "scripts/sampling/simple_video_sample.py",
    "chars": 13706,
    "preview": "import math\nimport os\nimport sys\nfrom glob import glob\nfrom pathlib import Path\nfrom typing import List, Optional\n\nsys.p"
  },
  {
    "path": "scripts/sampling/simple_video_sample_4d.py",
    "chars": 9212,
    "preview": "import os\nimport sys\nfrom glob import glob\nfrom typing import List, Optional, Union\n\nfrom tqdm import tqdm\n\nsys.path.app"
  },
  {
    "path": "scripts/sampling/simple_video_sample_4d2.py",
    "chars": 8200,
    "preview": "import os\nimport sys\nfrom glob import glob\nfrom typing import List, Optional\n\nfrom tqdm import tqdm\n\nsys.path.append(os."
  },
  {
    "path": "scripts/tests/attention.py",
    "chars": 10164,
    "preview": "import einops\nimport torch\nimport torch.nn.functional as F\nimport torch.utils.benchmark as benchmark\nfrom torch.backends"
  },
  {
    "path": "scripts/util/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "scripts/util/detection/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "scripts/util/detection/nsfw_and_watermark_dectection.py",
    "chars": 3641,
    "preview": "import os\n\nimport clip\nimport numpy as np\nimport torch\nimport torchvision.transforms as T\nfrom PIL import Image\n\nRESOURC"
  },
  {
    "path": "sgm/__init__.py",
    "chars": 139,
    "preview": "from .models import AutoencodingEngine, DiffusionEngine\nfrom .util import get_configs_path, instantiate_from_config\n\n__v"
  },
  {
    "path": "sgm/data/__init__.py",
    "chars": 48,
    "preview": "from .dataset import StableDataModuleFromConfig\n"
  },
  {
    "path": "sgm/data/cifar10.py",
    "chars": 1869,
    "preview": "import pytorch_lightning as pl\nimport torchvision\nfrom torch.utils.data import DataLoader, Dataset\nfrom torchvision impo"
  },
  {
    "path": "sgm/data/dataset.py",
    "chars": 2969,
    "preview": "from typing import Optional\n\nimport torchdata.datapipes.iter\nimport webdataset as wds\nfrom omegaconf import DictConfig\nf"
  },
  {
    "path": "sgm/data/mnist.py",
    "chars": 2450,
    "preview": "import pytorch_lightning as pl\nimport torchvision\nfrom torch.utils.data import DataLoader, Dataset\nfrom torchvision impo"
  },
  {
    "path": "sgm/inference/api.py",
    "chars": 11522,
    "preview": "import pathlib\r\nfrom dataclasses import asdict, dataclass\r\nfrom enum import Enum\r\nfrom typing import Optional\r\n\r\nfrom om"
  },
  {
    "path": "sgm/inference/helpers.py",
    "chars": 10676,
    "preview": "import math\nimport os\nfrom typing import List, Optional, Union\n\nimport numpy as np\nimport torch\nfrom einops import rearr"
  },
  {
    "path": "sgm/lr_scheduler.py",
    "chars": 4286,
    "preview": "import numpy as np\n\n\nclass LambdaWarmUpCosineScheduler:\n    \"\"\"\n    note: use with a base_lr of 1.0\n    \"\"\"\n\n    def __i"
  },
  {
    "path": "sgm/models/__init__.py",
    "chars": 83,
    "preview": "from .autoencoder import AutoencodingEngine\nfrom .diffusion import DiffusionEngine\n"
  },
  {
    "path": "sgm/models/autoencoder.py",
    "chars": 22678,
    "preview": "import logging\nimport math\nimport re\nfrom abc import abstractmethod\nfrom contextlib import contextmanager\nfrom typing im"
  },
  {
    "path": "sgm/models/diffusion.py",
    "chars": 12498,
    "preview": "import math\nfrom contextlib import contextmanager\nfrom typing import Any, Dict, List, Optional, Tuple, Union\n\nimport pyt"
  },
  {
    "path": "sgm/modules/__init__.py",
    "chars": 159,
    "preview": "from .encoders.modules import GeneralConditioner\n\nUNCONDITIONAL_CONFIG = {\n    \"target\": \"sgm.modules.GeneralConditioner"
  },
  {
    "path": "sgm/modules/attention.py",
    "chars": 25025,
    "preview": "import logging\nimport math\nfrom inspect import isfunction\nfrom typing import Any, Optional\n\nimport torch\nimport torch.nn"
  },
  {
    "path": "sgm/modules/autoencoding/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/autoencoding/losses/__init__.py",
    "chars": 164,
    "preview": "__all__ = [\n    \"GeneralLPIPSWithDiscriminator\",\n    \"LatentLPIPS\",\n]\n\nfrom .discriminator_loss import GeneralLPIPSWithD"
  },
  {
    "path": "sgm/modules/autoencoding/losses/discriminator_loss.py",
    "chars": 12077,
    "preview": "from typing import Dict, Iterator, List, Optional, Tuple, Union\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\ni"
  },
  {
    "path": "sgm/modules/autoencoding/losses/lpips.py",
    "chars": 2915,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom ....util import default, instantiate_from_config\nfrom ..lpips.loss.lpips import"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/autoencoding/lpips/loss/.gitignore",
    "chars": 7,
    "preview": "vgg.pth"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/loss/LICENSE",
    "chars": 1355,
    "preview": "Copyright (c) 2018, Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang\nAll rights reserved.\n\nRedi"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/loss/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/autoencoding/lpips/loss/lpips.py",
    "chars": 5116,
    "preview": "\"\"\"Stripped version of https://github.com/richzhang/PerceptualSimilarity/tree/master/models\"\"\"\n\nfrom collections import "
  },
  {
    "path": "sgm/modules/autoencoding/lpips/model/LICENSE",
    "chars": 3564,
    "preview": "Copyright (c) 2017, Jun-Yan Zhu and Taesung Park\nAll rights reserved.\n\nRedistribution and use in source and binary forms"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/model/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/autoencoding/lpips/model/model.py",
    "chars": 2850,
    "preview": "import functools\n\nimport torch.nn as nn\n\nfrom ..util import ActNorm\n\n\ndef weights_init(m):\n    classname = m.__class__._"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/util.py",
    "chars": 3954,
    "preview": "import hashlib\nimport os\n\nimport requests\nimport torch\nimport torch.nn as nn\nfrom tqdm import tqdm\n\nURL_MAP = {\"vgg_lpip"
  },
  {
    "path": "sgm/modules/autoencoding/lpips/vqperceptual.py",
    "chars": 480,
    "preview": "import torch\nimport torch.nn.functional as F\n\n\ndef hinge_d_loss(logits_real, logits_fake):\n    loss_real = torch.mean(F."
  },
  {
    "path": "sgm/modules/autoencoding/regularizers/__init__.py",
    "chars": 877,
    "preview": "from abc import abstractmethod\nfrom typing import Any, Tuple\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functio"
  },
  {
    "path": "sgm/modules/autoencoding/regularizers/base.py",
    "chars": 1254,
    "preview": "from abc import abstractmethod\nfrom typing import Any, Tuple\n\nimport torch\nimport torch.nn.functional as F\nfrom torch im"
  },
  {
    "path": "sgm/modules/autoencoding/regularizers/quantize.py",
    "chars": 17424,
    "preview": "import logging\nfrom abc import abstractmethod\nfrom typing import Dict, Iterator, Literal, Optional, Tuple, Union\n\nimport"
  },
  {
    "path": "sgm/modules/autoencoding/temporal_ae.py",
    "chars": 11747,
    "preview": "from typing import Callable, Iterable, Union\n\nimport torch\nfrom einops import rearrange, repeat\n\nfrom sgm.modules.diffus"
  },
  {
    "path": "sgm/modules/diffusionmodules/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/diffusionmodules/denoiser.py",
    "chars": 2458,
    "preview": "from typing import Dict, Union\n\nimport torch\nimport torch.nn as nn\n\nfrom ...util import append_dims, instantiate_from_co"
  },
  {
    "path": "sgm/modules/diffusionmodules/denoiser_scaling.py",
    "chars": 1869,
    "preview": "from abc import ABC, abstractmethod\nfrom typing import Tuple\n\nimport torch\n\n\nclass DenoiserScaling(ABC):\n    @abstractme"
  },
  {
    "path": "sgm/modules/diffusionmodules/denoiser_weighting.py",
    "chars": 516,
    "preview": "import torch\n\n\nclass UnitWeighting:\n    def __call__(self, sigma):\n        return torch.ones_like(sigma, device=sigma.de"
  },
  {
    "path": "sgm/modules/diffusionmodules/discretizer.py",
    "chars": 2314,
    "preview": "from abc import abstractmethod\nfrom functools import partial\n\nimport numpy as np\nimport torch\n\nfrom ...modules.diffusion"
  },
  {
    "path": "sgm/modules/diffusionmodules/guiders.py",
    "chars": 5870,
    "preview": "import logging\nfrom abc import ABC, abstractmethod\nfrom typing import Dict, List, Literal, Optional, Tuple, Union\n\nimpor"
  },
  {
    "path": "sgm/modules/diffusionmodules/loss.py",
    "chars": 3505,
    "preview": "from typing import Dict, List, Optional, Tuple, Union\n\nimport torch\nimport torch.nn as nn\n\nfrom ...modules.autoencoding."
  },
  {
    "path": "sgm/modules/diffusionmodules/loss_weighting.py",
    "chars": 855,
    "preview": "from abc import ABC, abstractmethod\n\nimport torch\n\n\nclass DiffusionLossWeighting(ABC):\n    @abstractmethod\n    def __cal"
  },
  {
    "path": "sgm/modules/diffusionmodules/model.py",
    "chars": 24004,
    "preview": "# pytorch_diffusion + derived encoder decoder\nimport logging\nimport math\nfrom typing import Any, Callable, Optional\n\nimp"
  },
  {
    "path": "sgm/modules/diffusionmodules/openaimodel.py",
    "chars": 33447,
    "preview": "import logging\nimport math\nfrom abc import abstractmethod\nfrom typing import Iterable, List, Optional, Tuple, Union\n\nimp"
  },
  {
    "path": "sgm/modules/diffusionmodules/sampling.py",
    "chars": 12022,
    "preview": "\"\"\"\n    Partially ported from https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py\n\"\"\"\n\n\nfrom ty"
  },
  {
    "path": "sgm/modules/diffusionmodules/sampling_utils.py",
    "chars": 1029,
    "preview": "import torch\nfrom scipy import integrate\n\nfrom ...util import append_dims\n\n\ndef linear_multistep_coeff(order, t, i, j, e"
  },
  {
    "path": "sgm/modules/diffusionmodules/sigma_sampling.py",
    "chars": 1151,
    "preview": "import torch\nfrom typing import Optional, Union\nfrom ...util import default, instantiate_from_config\n\n\nclass EDMSampling"
  },
  {
    "path": "sgm/modules/diffusionmodules/util.py",
    "chars": 13386,
    "preview": "\"\"\"\npartially adopted from\nhttps://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion."
  },
  {
    "path": "sgm/modules/diffusionmodules/video_model.py",
    "chars": 46678,
    "preview": "from functools import partial\nfrom typing import List, Optional, Union\n\nfrom einops import rearrange\n\nfrom ...modules.di"
  },
  {
    "path": "sgm/modules/diffusionmodules/wrappers.py",
    "chars": 1431,
    "preview": "import torch\nimport torch.nn as nn\nfrom packaging import version\n\nOPENAIUNETWRAPPER = \"sgm.modules.diffusionmodules.wrap"
  },
  {
    "path": "sgm/modules/distributions/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/distributions/distributions.py",
    "chars": 3095,
    "preview": "import numpy as np\nimport torch\n\n\nclass AbstractDistribution:\n    def sample(self):\n        raise NotImplementedError()\n"
  },
  {
    "path": "sgm/modules/ema.py",
    "chars": 3207,
    "preview": "import torch\nfrom torch import nn\n\n\nclass LitEma(nn.Module):\n    def __init__(self, model, decay=0.9999, use_num_upates="
  },
  {
    "path": "sgm/modules/encoders/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "sgm/modules/encoders/modules.py",
    "chars": 35240,
    "preview": "import math\nfrom contextlib import nullcontext\nfrom functools import partial\nfrom typing import Dict, List, Optional, Tu"
  },
  {
    "path": "sgm/modules/spacetime_attention.py",
    "chars": 21720,
    "preview": "from functools import partial\n\nimport torch\nimport torch.nn.functional as F\n\nfrom ..modules.attention import *\nfrom ..mo"
  },
  {
    "path": "sgm/modules/video_attention.py",
    "chars": 9588,
    "preview": "import torch\n\nfrom ..modules.attention import *\nfrom ..modules.diffusionmodules.util import (AlphaBlender, linear,\n     "
  },
  {
    "path": "sgm/util.py",
    "chars": 8466,
    "preview": "import functools\nimport importlib\nimport os\nfrom functools import partial\nfrom inspect import isfunction\n\nimport fsspec\n"
  },
  {
    "path": "tests/inference/test_inference.py",
    "chars": 3967,
    "preview": "import numpy\nfrom PIL import Image\nimport pytest\nfrom pytest import fixture\nimport torch\nfrom typing import Tuple\n\nfrom "
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the Stability-AI/generative-models GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 120 files (778.4 KB), approximately 190.8k tokens, and a symbol index with 814 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo