---
This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring
Fast training diffusion models with transformers. You can find more visualizations on our [project page](https://pixart-alpha.github.io/).

**PixArt-α Community**: Join our PixArt-α discord channels

for discussions. Coders are welcome to contribute.
> [**PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis**](https://pixart-alpha.github.io/)
> [Junsong Chen*](https://lawrence-cj.github.io/), [Jincheng Yu*](https://lovesykun.cn/about.html),
> [Chongjian Ge*](https://chongjiange.github.io/), [Lewei Yao*](https://scholar.google.com/citations?user=hqDyTg8AAAAJ&hl=zh-CN&oi=ao),
> [Enze Xie](https://xieenze.github.io/)†,
> [Yue Wu](https://yuewuhkust.github.io/), [Zhongdao Wang](https://zhongdao.github.io/),
> [James Kwok](https://www.cse.ust.hk/~jamesk/), [Ping Luo](http://luoping.me/),
> [Huchuan Lu](https://scholar.google.com/citations?hl=en&user=D3nE0agAAAAJ),
> [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ)
>
Huawei Noah’s Ark Lab, Dalian University of Technology, HKU, HKUST
> [**PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models**](https://pixart-alpha.github.io/)
> [Junsong Chen](https://lawrence-cj.github.io/), [Yue Wu](https://yuewuhkust.github.io/), [Simian Luo](https://luosiallen.github.io/), [Enze Xie](https://xieenze.github.io/)†,
> [Sayak Paul](https://sayak.dev/), [Ping Luo](http://luoping.me/), [Hang Zhao](), [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ)
>
Huawei Noah’s Ark Lab, DLUT, Tsinghua University, HKU, Hugging Face
---
## Breaking News 🔥🔥!!
- (🔥 New) Apr. 12, 2024. 💥 A better version of [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma) training & inference code, checkpoints are all released!!!
Welcome to collaborate and contribute. Star 🌟us if you think it is helpful!!
- (🔥 New) Jan. 19, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252) ControlNet [app_controlnet.py](app/app_controlnet.py) and [Checkpoint](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) are released!!!
- (🔥 New) Jan. 16, 2024. 💥 Glad to announce that [PixArt-α](https://arxiv.org/abs/2310.00426) is accepted by ICLR 2024 (Spotlight).
- (🔥 New) Dec. 17, 2023. 💥 PixArt supports [ComfyUI](https://github.com/comfyanonymous/ComfyUI#manual-install-windows-linux). Thanks to [@city96](https://github.com/city96/ComfyUI_ExtraModels) with his great work.
- (🔥 New) Nov. 30, 2023. 💥 PixArt collaborates with [LCMs](https://github.com/luosiallen/latent-consistency-model) team to make the **fastest** [Training & Inference Text-to-Image Generation System](https://github.com/PixArt-alpha/PixArt-alpha).
Here, [Training code](train_scripts/train_pixart_lcm.py) & [Inference code](scripts/inference_lcm.py) & [Weights](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS) & [HF Demo](https://huggingface.co/spaces/PixArt-alpha/PixArt-LCM) [OpenXLab Demo](https://openxlab.org.cn/apps/detail/houshaowei/PixArt-LCM) are all released, we hope users will enjoy them.
Detailed **inference speed** and **code guidance** can be found in [docs](asset/docs/pixart_lcm.md). At the same time, we update the codebase for better user experience and fix some bugs in the newest version.
---
## 🚩 **New Features/Updates**
- ✅ Jan. 11, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252): We are excited to announce the release of the [PixArt-δ](https://arxiv.org/abs/2401.05252) technical report!!!
This report offers valuable insights into the training of LCM and ControlNet-like modules in Transformer Models. Along with the report, we have also released all the training and inference code for LCM & ControlNet [in this repository](https://github.com/PixArt-alpha/PixArt-alpha).
We encourage you to try them out and warmly welcome any Pull Requests from our users. Your contributions and feedback are highly appreciated!
- ✅ Feb. 07, 2024. [train_diffusers.py](train_scripts/train_diffusers.py) can directly train with diffusers model and visualize during training.
- ✅ Jan. 26, 2024. 💥 All checkpoints of [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), including 256px checkpoints are all available here [Download Models](#-download-models).
- ✅ Jan. 19, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252) ControlNet [app_controlnet.py](app/app_controlnet.py) and [Checkpoint](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) is released!!!
- ✅ Jan. 12, 2024. 💥 We release the [SAM-LLaVA-Captions](https://huggingface.co/datasets/PixArt-alpha/SAM-LLaVA-Captions10M) used in PixArt-α training.
- ✅ Dec. 27, 2023. [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) incorporates into [ControlLLM](https://github.com/OpenGVLab/ControlLLM)!
- ✅ Dec. 17, 2023. [PixArt-LCM-Lora](train_scripts/train_pixart_lcm_lora.py) & [PixArt-Lora](train_scripts/train_pixart_lora_hf.py) training scripts in Hugging Face style is released.
- ✅ Dec. 13, 2023. Add multi-scale vae feature extraction in [tools/extract_features.py](https://github.com/PixArt-alpha/PixArt-alpha/blob/3b4f0afdbe39def80b41ab05c664c963edeebbcd/tools/extract_features.py#L276).
- ✅ Dec. 01, 2023. Add a [Notebook folder](./notebooks) to help users get started with PixArt quickly! Thanks to [@kopyl](https://github.com/kopyl) for his contribution!
- ✅ Nov. 27, 2023. 💥 **PixArt-α Community**: Join our PixArt-α discord channels

for discussions. Coders are welcome to contribute.
- ✅ Nov. 21, 2023. 💥 [SA-Sovler](https://arxiv.org/abs/2309.05019) official code first release [here](asset/docs/sasolver.md).
- ✅ Nov. 19, 2023. Release `PixArt + Dreambooth` training scripts.
- ✅ Nov. 16, 2023. Diffusers support `random resolution` and `batch images` generation now. Besides,
running `Pixart` in under 8GB GPU VRAM is available in 🧨 [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart).
- ✅ Nov. 10, 2023. Support DALL-E 3 Consistency Decoder in 🧨 diffusers.
- ✅ Nov. 06, 2023. Release pretrained weights with 🧨 diffusers integration, Hugging Face demo, and Google Colab example.
- ✅ Nov. 03, 2023. Release the LLaVA-captioning inference code.
- ✅ Oct. 27, 2023. Release the training & feature extraction code.
- ✅ Oct. 20, 2023. Collaborate with Hugging Face & Diffusers team to co-release the code and weights. (plz stay tuned.)
- ✅ Oct. 15, 2023. Release the inference code.
---
## Contents
* [Training](#-how-to-train)
* [Inference](#-how-to-test)
* [Download Models](#-download-models)
* [Use diffusers](#1---using-in--diffusers)
* [Data Processing](#-how-to-extract-t5-and-vae-features)
* [PixArt-**α** Demo](#3---gradio-with-diffusers--faster-)
* [PixArt-**α** 8GB VRAM](asset/docs/pixart.md)
* [PixArt-**δ** (LCM)](asset/docs/pixart_lcm.md)
* [PixArt-**δ** (ControlNet)](asset/docs/pixart_controlnet.md)
* [PixArt-**δ** (Dreambooth)](asset/docs/pixart-dreambooth.md)
* [Acknowledgement](#acknowledgements)
* [Citation](#bibtex)
* [PixArt-**Σ** Releasing](https://github.com/PixArt-alpha/PixArt-sigma)
---
## 🐱 Abstract
TL; DR: PixArt-α is a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), and the training speed markedly surpasses existing large-scale T2I models, e.g., PixArt-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days).
CLICK for the full abstract
The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours),
seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions.
This paper introduces PixArt-α, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney),
reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost.
To achieve this goal, three core designs are proposed:
(1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality;
(2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch;
(3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning.
As a result, PixArt-α's training speed markedly surpasses existing large-scale T2I models,
e.g., PixArt-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days),
saving nearly $300,000 ($26,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL,
our training cost is merely 1%. Extensive experiments demonstrate that PixArt-α excels in image quality, artistry, and semantic control.
We hope PixArt-α will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.
---

---
# 🔥🔥🔥 Why PixArt-α?
## Training Efficiency
PixArt-α only takes 12% of Stable Diffusion v1.5's training time (753 vs. 6,250 A100 GPU days), saving nearly $300,000 ($28,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%.

| Method | Type | #Params | #Images| FID-30K ↓ | A100 GPU days |
|-----------|------|---------|--------|------------------|---------------|
| DALL·E | Diff | 12.0B | 250M | 27.50 | |
| GLIDE | Diff | 5.0B | 250M | 12.24 | |
| LDM | Diff | 1.4B | 400M | 12.64 | |
| DALL·E 2 | Diff | 6.5B | 650M | 10.39 | 41,66 |
| SDv1.5 | Diff | 0.9B | 2000M | 9.62 | 6,250 |
| GigaGAN | GAN | 0.9B | 2700M | 9.09 | 4,783 |
| Imagen | Diff | 3.0B | 860M | 7.27 | 7,132 |
| RAPHAEL | Diff | 3.0B | 5000M+ | 6.61 | 60,000 |
| PixArt-α | Diff | 0.6B | 25M | 7.32 (zero-shot) | 753 |
| PixArt-α | Diff | 0.6B | 25M | 5.51 (COCO FT) | 753 |
## Inference Efficiency
PIXART-δ successfully generates **1024x1024 high resolution** images within **0.5 seconds** on an A100. With the implementation
of 8-bit inference technology, PIXART-δ requires **less than 8GB of GPU VRAM**.
Let us stress again how liberating it is to explore image generation so easily with PixArt-LCM.
| Hardware | PIXART-δ (4 steps) | SDXL LoRA LCM (4 steps) | PixArt-α (14 steps) | SDXL standard (25 steps) |
|-----------------------------|--------------------|-------------------------|---------------------|---------------------------|
| T4 (Google Colab Free Tier) | 3.3s | 8.4s | 16.0s | 26.5s |
| V100 (32 GB) | 0.8s | 1.2s | 5.5s | 7.7s |
| A100 (80 GB) | 0.51s | 1.2s | 2.2s | 3.8s |
These tests were run with a batch size of 1 in all cases.
For cards with a lot of capacity, such as A100, performance increases significantly when generating multiple images at once, which is usually the case for production workloads.
## High-quality Generation from PixArt-α
- More samples
- PixArt + [Dreambooth](https://dreambooth.github.io/)
- PixArt + [ControlNet](https://github.com/lllyasviel/ControlNet)
# 🔧 Dependencies and Installation
- Python >= 3.9 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
- [PyTorch >= 1.13.0+cu11.7](https://pytorch.org/)
```bash
conda create -n pixart python=3.9
conda activate pixart
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/PixArt-alpha/PixArt-alpha.git
cd PixArt-alpha
pip install -r requirements.txt
```
# ⏬ Download Models
All models will be automatically downloaded. You can also choose to download manually from this [url](https://huggingface.co/PixArt-alpha/PixArt-alpha).
| Model | #Params | url | Download in OpenXLab |
|:----------------------------|:--------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------|
| T5 | 4.3B | [T5](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) | [T5](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/t5-v1_1-xxl.zip) |
| VAE | 80M | [VAE](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema) | [VAE](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/sd-vae-ft-ema.zip) |
| PixArt-α-SAM-256 | 0.6B | [PixArt-XL-2-SAM-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-SAM-256x256.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-SAM-256x256) | [256-SAM](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-SAM-256x256.pth) |
| PixArt-α-256 | 0.6B | [PixArt-XL-2-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-256x256.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-256x256) | [256](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-256x256.pth) |
| PixArt-α-256-MSCOCO-FID7.32 | 0.6B | [PixArt-XL-2-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth) | [256]() |
| PixArt-α-512 | 0.6B | [PixArt-XL-2-512x512.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-512x512.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512) | [512](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-512x512.pth) |
| PixArt-α-1024 | 0.6B | [PixArt-XL-2-1024-MS.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-1024-MS.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS) | [1024](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-1024-MS.pth) |
| PixArt-δ-1024-LCM | 0.6B | [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS) | |
| ControlNet-HED-Encoder | 30M | [ControlNetHED.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/ControlNetHED.pth) | |
| PixArt-δ-512-ControlNet | 0.9B | [PixArt-XL-2-512-ControlNet.pth](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) | [512](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-ControlNet) |
| PixArt-δ-1024-ControlNet | 0.9B | [PixArt-XL-2-1024-ControlNet.pth](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) | [1024](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-ControlNet) |
ALSO find all models in [OpenXLab_PixArt-alpha](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-alpha)
# 🔥 How to Train
## 1. PixArt Training
**First of all.**
Thanks to [@kopyl](https://github.com/kopyl), you can reproduce the full fine-tune training flow on [Pokemon dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) from HugginFace with notebooks:
1. Train with [notebooks/train.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/53dac066f60fe5fdbdde4f0360145ca96d4cc38c/notebooks/train.ipynb).
2. Convert to Diffusers with [notebooks/convert-checkpoint-to-diffusers.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/master/notebooks/convert-checkpoint-to-diffusers.ipynb).
3. Run the inference with converted checkpoint in step 2 with [notebooks/infer.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/master/notebooks/infer.ipynb).
**Then, for more details.**
Here we take SAM dataset training config as an example, but of course, you can also prepare your own dataset following this method.
You **ONLY** need to change the **config** file in [config](./configs/pixart_config) and **dataloader** in [dataset](./diffusion/data/datasets).
```bash
python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train.py configs/pixart_config/PixArt_xl2_img256_SAM.py --work-dir output/train_SAM_256
```
The directory structure for SAM dataset is:
```
cd ./data
SA1B
├──images/ (images are saved here)
│ ├──sa_xxxxx.jpg
│ ├──sa_xxxxx.jpg
│ ├──......
├──captions/ (corresponding captions are saved here, same name as images)
│ ├──sa_xxxxx.txt
│ ├──sa_xxxxx.txt
├──partition/ (all image names are stored txt file where each line is a image name)
│ ├──part0.txt
│ ├──part1.txt
│ ├──......
├──caption_feature_wmask/ (run tools/extract_caption_feature.py to generate caption T5 features, same name as images except .npz extension)
│ ├──sa_xxxxx.npz
│ ├──sa_xxxxx.npz
│ ├──......
├──img_vae_feature/ (run tools/extract_img_vae_feature.py to generate image VAE features, same name as images except .npy extension)
│ ├──train_vae_256/
│ │ ├──noflip/
│ │ │ ├──sa_xxxxx.npy
│ │ │ ├──sa_xxxxx.npy
│ │ │ ├──......
```
**Here we prepare data_toy for better understanding**
```bash
cd ./data
git lfs install
git clone https://huggingface.co/datasets/PixArt-alpha/data_toy
```
Then,
[Here](https://huggingface.co/datasets/PixArt-alpha/data_toy/blob/main/part0.txt) is an example of partition/part0.txt file.
---
Besides, for json file guided [training](https://github.com/PixArt-alpha/PixArt-alpha/blob/fe0cb78065d64c18ecd8955a04e4f29138d47946/configs/pixart_config/PixArt_xl2_img1024_internalms.py#L3C2-L3C2),
[here](https://huggingface.co/datasets/PixArt-alpha/data_toy/blob/main/data_info.json) is a toy json file for better understand.
---
## 2. PixArt + DreamBooth Training
Following the `Pixart + DreamBooth` [training guidance](asset/docs/pixart-dreambooth.md)
## 3. PixArt + LCM / LCM-LoRA Training
Following the `PixArt + LCM` [training guidance](asset/docs/pixart_lcm.md)
## 4. PixArt + ControlNet Training
Following the `PixArt + ControlNet` [training guidance](asset/docs/pixart_controlnet.md)
## 4. PixArt + LoRA Training
```bash
pip install peft==0.6.2
accelerate launch --num_processes=1 --main_process_port=36667 train_scripts/train_pixart_lora_hf.py --mixed_precision="fp16" \
--pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS \
--dataset_name=lambdalabs/pokemon-blip-captions --caption_column="text" \
--resolution=1024 --random_flip \
--train_batch_size=16 \
--num_train_epochs=200 --checkpointing_steps=100 \
--learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir="pixart-pokemon-model" \
--validation_prompt="cute dragon creature" --report_to="tensorboard" \
--gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 \
--rank=16
```
# 💻 How to Test
Inference requires at least `23GB` of GPU memory using this repo, while `11GB and 8GB` using in 🧨 [diffusers](#using-in--diffusers).
Currently support:
- [x] [IDDPM](https://arxiv.org/abs/2102.09672)
- [x] [DPM-Solver](https://arxiv.org/abs/2206.00927)
- [x] [SA-Solver](https://arxiv.org/abs/2309.05019)
- [ ] [DPM-Solver-v3](https://arxiv.org/abs/2310.13268v2)
## 1. Quick start with [Gradio](https://www.gradio.app/guides/quickstart)
To get started, first install the required dependencies. Make sure you've downloaded the [models](https://huggingface.co/PixArt-alpha/PixArt-alpha) to the output/pretrained_models folder, and then run on your local machine:
```bash
DEMO_PORT=12345 python app/app.py
```
As an alternative, a sample [Dockerfile](Dockerfile) is provided to make a runtime container that starts the Gradio app.
```bash
docker build . -t pixart
docker run --gpus all -it -p 12345:12345 -v
:/root/.cache/huggingface pixart
```
Or use docker-compose. Note, if you want to change context from the 1024 to 512 or LCM version of the app just change the APP_CONTEXT env variable in the docker-compose.yml file. The default is 1024
```bash
docker compose build
docker compose up
```
Let's have a look at a simple example using the `http://your-server-ip:12345`.
## 2. Integration in diffusers
### 1). Using in 🧨 diffusers
Make sure you have the updated versions of the following libraries:
```bash
pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4
```
And then:
```python
import torch
from diffusers import PixArtAlphaPipeline, ConsistencyDecoderVAE, AutoencoderKL
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# You can replace the checkpoint id with "PixArt-alpha/PixArt-XL-2-512x512" too.
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16, use_safetensors=True)
# If use DALL-E 3 Consistency Decoder
# pipe.vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=torch.float16)
# If use SA-Solver sampler
# from diffusion.sa_solver_diffusers import SASolverScheduler
# pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction')
# If loading a LoRA model
# transformer = Transformer2DModel.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", subfolder="transformer", torch_dtype=torch.float16)
# transformer = PeftModel.from_pretrained(transformer, "Your-LoRA-Model-Path")
# pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", transformer=transformer, torch_dtype=torch.float16, use_safetensors=True)
# del transformer
# Enable memory optimizations.
# pipe.enable_model_cpu_offload()
pipe.to(device)
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("./catcus.png")
```
Check out the [documentation](./asset/docs/sasolver.md) for more information about SA-Solver Sampler.
This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM.
Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.
### 2). Running the `PixArtAlphaPipeline` in under 8GB GPU VRAM
GPU VRAM consumption under 8 GB is supported now, please refer to [documentation](asset/docs/pixart.md) for more information.
### 3). Gradio with diffusers (Faster)
To get started, first install the required dependencies, then run on your local machine:
```bash
# diffusers version
DEMO_PORT=12345 python app/app.py
```
Let's have a look at a simple example using the `http://your-server-ip:12345`.
You can also click [here](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing) to have a free trial on Google Colab.
### 4). Convert .pth checkpoint into diffusers version
```bash
python tools/convert_pixart_alpha_to_diffusers.py --image_size your_img_size --multi_scale_train (True if you use PixArtMS else False) --orig_ckpt_path path/to/pth --dump_path path/to/diffusers --only_transformer=True
```
## 3. Online Demo [](https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha)

# ✏️ How to LLaVA captioning
Thanks to the code base of [LLaVA-Lightning-MPT](https://huggingface.co/liuhaotian/LLaVA-Lightning-MPT-7B-preview),
we can caption the LAION and SAM dataset with the following launching code:
```bash
python tools/VLM_caption_lightning.py --output output/dir/ --data-root data/root/path --index path/to/data.json
```
We present auto-labeling with custom prompts for LAION (left) and SAM (right). The words highlighted in green represent the original caption in LAION, while those marked in red indicate the detailed captions labeled by LLaVA.

# ✏️ How to extract T5 and VAE features
Prepare T5 text feature and VAE image feature in advance will speed up the training process and save GPU memory.
```bash
python tools/extract_features.py --img_size=1024 \
--json_path "data/data_info.json" \
--t5_save_root "data/SA1B/caption_feature_wmask" \
--vae_save_root "data/SA1B/img_vae_features" \
--pretrained_models_dir "output/pretrained_models" \
--dataset_root "data/SA1B/Images/"
```
## 💪To-Do List (Congratulations🎉)
- [x] Inference code
- [x] Training code
- [x] T5 & VAE feature extraction code
- [x] LLaVA captioning code
- [x] Model zoo
- [x] Diffusers version & Hugging Face demo
- [x] Google Colab example
- [x] DALLE3 VAE integration
- [x] Inference under 8GB GPU VRAM with diffusers
- [x] Dreambooth Training code
- [x] SA-Solver code
- [x] PixArt-α-LCM will release soon
- [x] Multi-scale vae feature extraction code
- [x] PixArt-α-LCM-LoRA scripts will release soon
- [x] PixArt-α-LoRA training scripts will release soon
- [x] ControlNet code will be released
- [x] SAM-LLaVA caption dataset
- [x] ControlNet checkpoint
- [x] 256px pre-trained models
- [x] PixArt-Σ: Next version model with much better ability is training!
# Other Source
We make a video comparing PixArt with current most powerful Text-to-Image models.
[](https://www.youtube.com/watch?v=7_6KsIITgWY)
# 📖BibTeX
@misc{chen2023pixartalpha,
title={PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis},
author={Junsong Chen and Jincheng Yu and Chongjian Ge and Lewei Yao and Enze Xie and Yue Wu and Zhongdao Wang and James Kwok and Ping Luo and Huchuan Lu and Zhenguo Li},
year={2023},
eprint={2310.00426},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{chen2024pixartdelta,
title={PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models},
author={Junsong Chen and Yue Wu and Simian Luo and Enze Xie and Sayak Paul and Ping Luo and Hang Zhao and Zhenguo Li},
year={2024},
eprint={2401.05252},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
# 🤗Acknowledgements
- Thanks to [Diffusers](https://github.com/huggingface/diffusers) for their wonderful technical support and awesome collaboration!
- Thanks to [Hugging Face](https://github.com/huggingface) for sponsoring the nicely demo!
- Thanks to [DiT](https://github.com/facebookresearch/DiT) for their wonderful work and codebase!
## Star History
[](https://star-history.com/#PixArt-alpha/PixArt-alpha&Date)
================================================
FILE: PixArt-alpha-ToCa/app/app.py
================================================
#!/usr/bin/env python
from __future__ import annotations
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import random
import gradio as gr
import numpy as np
import uuid
from diffusers import ConsistencyDecoderVAE, PixArtAlphaPipeline, DPMSolverMultistepScheduler
import torch
from typing import Tuple
from datetime import datetime
from diffusion.sa_solver_diffusers import SASolverScheduler
DESCRIPTION = """
# PixArt-Alpha 1024px
#### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS) checkpoint.
#### English prompts ONLY; 提示词仅限英文
Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).
### You may change the DPM-Solver inference steps from 14 to 20, if you didn't get satisfied results.
"""
if not torch.cuda.is_available():
DESCRIPTION += "\nRunning on CPU 🥶 This demo does not work on CPU.
"
MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = torch.cuda.is_available() and os.getenv("CACHE_EXAMPLES", "1") == "1"
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "2048"))
USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
PORT = int(os.getenv("DEMO_PORT", "15432"))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
style_list = [
{
"name": "(No style)",
"prompt": "{prompt}",
"negative_prompt": "",
},
{
"name": "Cinematic",
"prompt": "cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
"negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
},
{
"name": "Photographic",
"prompt": "cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed",
"negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly",
},
{
"name": "Anime",
"prompt": "anime artwork {prompt} . anime style, key visual, vibrant, studio anime, highly detailed",
"negative_prompt": "photo, deformed, black and white, realism, disfigured, low contrast",
},
{
"name": "Manga",
"prompt": "manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style",
"negative_prompt": "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style",
},
{
"name": "Digital Art",
"prompt": "concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed",
"negative_prompt": "photo, photorealistic, realism, ugly",
},
{
"name": "Pixel art",
"prompt": "pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics",
"negative_prompt": "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic",
},
{
"name": "Fantasy art",
"prompt": "ethereal fantasy concept art of {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy",
"negative_prompt": "photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white",
},
{
"name": "Neonpunk",
"prompt": "neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional",
"negative_prompt": "painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
},
{
"name": "3D Model",
"prompt": "professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting",
"negative_prompt": "ugly, deformed, noisy, low poly, blurry, painting",
},
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list}
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"
SCHEDULE_NAME = ["DPM-Solver", "SA-Solver"]
DEFAULT_SCHEDULE_NAME = "DPM-Solver"
NUM_IMAGES_PER_PROMPT = 1
def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
if not negative:
negative = ""
return p.replace("{prompt}", positive), n + negative
if torch.cuda.is_available():
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
torch_dtype=torch.float16,
use_safetensors=True,
)
if os.getenv('CONSISTENCY_DECODER', False):
print("Using DALL-E 3 Consistency Decoder")
pipe.vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=torch.float16)
if ENABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
else:
pipe.to(device)
print("Loaded on Device!")
# speed-up T5
pipe.text_encoder.to_bettertransformer()
if USE_TORCH_COMPILE:
pipe.transformer = torch.compile(pipe.transformer, mode="reduce-overhead", fullgraph=True)
print("Model Compiled!")
def save_image(img):
unique_name = f'{str(uuid.uuid4())}.png'
save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')
os.makedirs(save_path, exist_ok=True)
unique_name = os.path.join(save_path, unique_name)
img.save(unique_name)
return unique_name
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
def generate(
prompt: str,
negative_prompt: str = "",
style: str = DEFAULT_STYLE_NAME,
use_negative_prompt: bool = False,
seed: int = 0,
width: int = 1024,
height: int = 1024,
schedule: str = 'DPM-Solver',
dpms_guidance_scale: float = 4.5,
sas_guidance_scale: float = 3,
dpms_inference_steps: int = 20,
sas_inference_steps: int = 25,
randomize_seed: bool = False,
use_resolution_binning: bool = True,
progress=gr.Progress(track_tqdm=True),
):
seed = int(randomize_seed_fn(seed, randomize_seed))
generator = torch.Generator().manual_seed(seed)
if schedule == 'DPM-Solver':
if not isinstance(pipe.scheduler, DPMSolverMultistepScheduler):
pipe.scheduler = DPMSolverMultistepScheduler()
num_inference_steps = dpms_inference_steps
guidance_scale = dpms_guidance_scale
elif schedule == "SA-Solver":
if not isinstance(pipe.scheduler, SASolverScheduler):
pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction', tau_func=lambda t: 1 if 200 <= t <= 800 else 0, predictor_order=2, corrector_order=2)
num_inference_steps = sas_inference_steps
guidance_scale = sas_guidance_scale
else:
raise ValueError(f"Unknown schedule: {schedule}")
if not use_negative_prompt:
negative_prompt = None # type: ignore
prompt, negative_prompt = apply_style(style, prompt, negative_prompt)
images = pipe(
prompt=prompt,
width=width,
height=height,
negative_prompt=negative_prompt,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps,
generator=generator,
num_images_per_prompt=NUM_IMAGES_PER_PROMPT,
use_resolution_binning=use_resolution_binning,
output_type="pil",
).images
image_paths = [save_image(img) for img in images]
print(image_paths)
return image_paths, seed
examples = [
"A small cactus with a happy face in the Sahara desert.",
"an astronaut sitting in a diner, eating fries, cinematic, analog film",
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.",
"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.",
"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background",
"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism",
"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur",
"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8",
]
with gr.Blocks(css="app/style.css") as demo:
gr.Markdown(DESCRIPTION)
gr.DuplicateButton(
value="Duplicate Space for private use",
elem_id="duplicate-button",
visible=os.getenv("SHOW_DUPLICATE_BUTTON") == "1",
)
with gr.Group():
with gr.Row():
prompt = gr.Text(
label="Prompt",
show_label=False,
max_lines=1,
placeholder="Enter your prompt",
container=False,
)
run_button = gr.Button("Run", scale=0)
result = gr.Gallery(label="Result", columns=NUM_IMAGES_PER_PROMPT, show_label=False)
with gr.Accordion("Advanced options", open=False):
with gr.Row():
use_negative_prompt = gr.Checkbox(label="Use negative prompt", value=False, visible=True)
schedule = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=SCHEDULE_NAME,
value=DEFAULT_SCHEDULE_NAME,
label="Sampler Schedule",
visible=True,
)
style_selection = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=STYLE_NAMES,
value=DEFAULT_STYLE_NAME,
label="Image Style",
)
negative_prompt = gr.Text(
label="Negative prompt",
max_lines=1,
placeholder="Enter a negative prompt",
visible=True,
)
seed = gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
)
randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
with gr.Row(visible=True):
width = gr.Slider(
label="Width",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
height = gr.Slider(
label="Height",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
with gr.Row():
dpms_guidance_scale = gr.Slider(
label="DPM-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=4.5,
)
dpms_inference_steps = gr.Slider(
label="DPM-Solver inference steps",
minimum=5,
maximum=40,
step=1,
value=14,
)
with gr.Row():
sas_guidance_scale = gr.Slider(
label="SA-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=3,
)
sas_inference_steps = gr.Slider(
label="SA-Solver inference steps",
minimum=10,
maximum=40,
step=1,
value=25,
)
gr.Examples(
examples=examples,
inputs=prompt,
outputs=[result, seed],
fn=generate,
cache_examples=CACHE_EXAMPLES,
)
use_negative_prompt.change(
fn=lambda x: gr.update(visible=x),
inputs=use_negative_prompt,
outputs=negative_prompt,
api_name=False,
)
gr.on(
triggers=[
prompt.submit,
negative_prompt.submit,
run_button.click,
],
fn=generate,
inputs=[
prompt,
negative_prompt,
style_selection,
use_negative_prompt,
seed,
width,
height,
schedule,
dpms_guidance_scale,
sas_guidance_scale,
dpms_inference_steps,
sas_inference_steps,
randomize_seed,
],
outputs=[result, seed],
api_name="run",
)
if __name__ == "__main__":
demo.queue(max_size=20).launch(server_name="0.0.0.0", server_port=PORT, debug=True)
================================================
FILE: PixArt-alpha-ToCa/app/app_512.py
================================================
#!/usr/bin/env python
from __future__ import annotations
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import random
import gradio as gr
import numpy as np
import uuid
from diffusers import PixArtAlphaPipeline, ConsistencyDecoderVAE, DPMSolverMultistepScheduler
import torch
from typing import Tuple
from datetime import datetime
from diffusion.data.datasets import ASPECT_RATIO_512_TEST
from diffusion.model.utils import resize_and_crop_img
from diffusion.sa_solver_diffusers import SASolverScheduler
DESCRIPTION = """
# PixArt-Alpha 512px
#### [PixArt-Alpha 512px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-512x512](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512) checkpoint.
#### English prompts ONLY; 提示词仅限英文
Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).
"""
if not torch.cuda.is_available():
DESCRIPTION += "\nRunning on CPU 🥶 This demo does not work on CPU.
"
MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = torch.cuda.is_available() and os.getenv("CACHE_EXAMPLES", "1") == "1"
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "1024"))
USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
PORT = int(os.getenv("DEMO_PORT", "15432"))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
style_list = [
{
"name": "(No style)",
"prompt": "{prompt}",
"negative_prompt": "",
},
{
"name": "Cinematic",
"prompt": "cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
"negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
},
{
"name": "Photographic",
"prompt": "cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed",
"negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly",
},
{
"name": "Anime",
"prompt": "anime artwork {prompt} . anime style, key visual, vibrant, studio anime, highly detailed",
"negative_prompt": "photo, deformed, black and white, realism, disfigured, low contrast",
},
{
"name": "Manga",
"prompt": "manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style",
"negative_prompt": "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style",
},
{
"name": "Digital Art",
"prompt": "concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed",
"negative_prompt": "photo, photorealistic, realism, ugly",
},
{
"name": "Pixel art",
"prompt": "pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics",
"negative_prompt": "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic",
},
{
"name": "Fantasy art",
"prompt": "ethereal fantasy concept art of {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy",
"negative_prompt": "photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white",
},
{
"name": "Neonpunk",
"prompt": "neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional",
"negative_prompt": "painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
},
{
"name": "3D Model",
"prompt": "professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting",
"negative_prompt": "ugly, deformed, noisy, low poly, blurry, painting",
},
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list}
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"
SCHEDULE_NAME = ["DPM-Solver", "SA-Solver"]
DEFAULT_SCHEDULE_NAME = "DPM-Solver"
NUM_IMAGES_PER_PROMPT = 2
def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
if not negative:
negative = ""
return p.replace("{prompt}", positive), n + negative
if torch.cuda.is_available():
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-512x512",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
)
if os.getenv('CONSISTENCY_DECODER', False):
print("Using DALL-E 3 Consistency Decoder")
pipe.vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=torch.float16)
if ENABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
else:
pipe.to(device)
print("Loaded on Device!")
# speed-up T5
pipe.text_encoder.to_bettertransformer()
if USE_TORCH_COMPILE:
pipe.transformer = torch.compile(pipe.transformer, mode="reduce-overhead", fullgraph=True)
print("Model Compiled!")
def prepare_prompt_hw(height, width, ratios):
ar = float(height/width)
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
default_hw = ratios[closest_ratio]
return int(default_hw[0]), int(default_hw[1])
def save_image(img):
unique_name = f'{str(uuid.uuid4())}.png'
save_path = os.path.join(f'output/online_demo_img512/{datetime.now().date()}')
os.makedirs(save_path, exist_ok=True)
unique_name = os.path.join(save_path, unique_name)
img.save(unique_name)
return unique_name
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
def classify_height_width_bin(height: int, width: int, ratios: dict):
ar = float(height / width)
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
default_hw = ratios[closest_ratio]
return int(default_hw[0]), int(default_hw[1])
def generate(
prompt: str,
negative_prompt: str = "",
style: str = DEFAULT_STYLE_NAME,
use_negative_prompt: bool = False,
seed: int = 0,
width: int = 512,
height: int = 512,
schedule: str = 'DPM-Solver',
dpms_guidance_scale: float = 4.5,
sas_guidance_scale: float = 3,
dpms_inference_steps: int = 20,
sas_inference_steps: int = 25,
randomize_seed: bool = False,
use_resolution_binning: bool = True,
progress=gr.Progress(track_tqdm=True),
):
seed = int(randomize_seed_fn(seed, randomize_seed))
generator = torch.Generator().manual_seed(seed)
if schedule == 'DPM-Solver':
if not isinstance(pipe.scheduler, DPMSolverMultistepScheduler):
pipe.scheduler = DPMSolverMultistepScheduler()
num_inference_steps = dpms_inference_steps
guidance_scale = dpms_guidance_scale
elif schedule == "SA-Solver":
if not isinstance(pipe.scheduler, SASolverScheduler):
pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction', tau_func=lambda t: 1 if 200 <= t <= 800 else 0, predictor_order=2, corrector_order=2)
num_inference_steps = sas_inference_steps
guidance_scale = sas_guidance_scale
else:
raise ValueError(f"Unknown schedule: {schedule}")
if not use_negative_prompt:
negative_prompt = None # type: ignore
prompt, negative_prompt = apply_style(style, prompt, negative_prompt)
if use_resolution_binning:
orig_height, orig_width = height, width
height, width = classify_height_width_bin(height, width, ratios=ASPECT_RATIO_512_TEST)
images = pipe(
prompt=prompt,
width=width,
height=height,
negative_prompt=negative_prompt,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps,
generator=generator,
use_resolution_binning=False,
num_images_per_prompt=NUM_IMAGES_PER_PROMPT,
output_type="pil",
).images
if use_resolution_binning:
images = [resize_and_crop_img(img, orig_width, orig_height) for img in images]
image_paths = [save_image(img) for img in images]
print(image_paths)
return image_paths, seed
examples = [
"A small cactus with a happy face in the Sahara desert.",
"an astronaut sitting in a diner, eating fries, cinematic, analog film",
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.",
"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.",
"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background",
"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism",
"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur",
"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8",
]
with gr.Blocks(css="scripts/style.css") as demo:
gr.Markdown(DESCRIPTION)
gr.DuplicateButton(
value="Duplicate Space for private use",
elem_id="duplicate-button",
visible=os.getenv("SHOW_DUPLICATE_BUTTON") == "1",
)
with gr.Group():
with gr.Row():
prompt = gr.Text(
label="Prompt",
show_label=False,
max_lines=1,
placeholder="Enter your prompt",
container=False,
)
run_button = gr.Button("Run", scale=0)
result = gr.Gallery(label="Result", columns=NUM_IMAGES_PER_PROMPT, show_label=False)
with gr.Accordion("Advanced options", open=False):
with gr.Row():
use_negative_prompt = gr.Checkbox(label="Use negative prompt", value=False, visible=False)
schedule = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=SCHEDULE_NAME,
value=DEFAULT_SCHEDULE_NAME,
label="Sampler Schedule",
visible=True,
)
style_selection = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=STYLE_NAMES,
value=DEFAULT_STYLE_NAME,
label="Image Style",
)
negative_prompt = gr.Text(
label="Negative prompt (no use now)",
max_lines=1,
placeholder="Enter a negative prompt",
visible=False,
)
seed = gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
)
randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
with gr.Row(visible=True):
width = gr.Slider(
label="Width",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=512,
)
height = gr.Slider(
label="Height",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=512,
)
with gr.Row():
dpms_guidance_scale = gr.Slider(
label="DPM-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=4.5,
)
dpms_inference_steps = gr.Slider(
label="DPM-Solver inference steps",
minimum=5,
maximum=40,
step=1,
value=20,
)
with gr.Row():
sas_guidance_scale = gr.Slider(
label="SA-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=3,
)
sas_inference_steps = gr.Slider(
label="SA-Solver inference steps",
minimum=10,
maximum=40,
step=1,
value=25,
)
gr.Examples(
examples=examples,
inputs=prompt,
outputs=[result, seed],
fn=generate,
cache_examples=CACHE_EXAMPLES,
)
use_negative_prompt.change(
fn=lambda x: gr.update(visible=x),
inputs=use_negative_prompt,
outputs=negative_prompt,
api_name=False,
)
gr.on(
triggers=[
prompt.submit,
negative_prompt.submit,
run_button.click,
],
fn=generate,
inputs=[
prompt,
negative_prompt,
style_selection,
use_negative_prompt,
seed,
width,
height,
schedule,
dpms_guidance_scale,
sas_guidance_scale,
dpms_inference_steps,
sas_inference_steps,
randomize_seed,
],
outputs=[result, seed],
api_name="run",
)
if __name__ == "__main__":
demo.queue(max_size=20).launch(server_name="0.0.0.0", server_port=PORT, debug=True)
================================================
FILE: PixArt-alpha-ToCa/app/app_controlnet.py
================================================
#!/usr/bin/env python
from __future__ import annotations
import argparse
import os
import random
import sys
import uuid
from datetime import datetime
from pathlib import Path
from typing import List, Tuple, Union
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import gradio as gr
import numpy as np
import torch
from PIL import Image as PILImage
import torchvision.transforms as T
import torchvision.transforms.functional as TF
from torchvision.utils import _log_api_usage_once, make_grid, save_image
from diffusers import PixArtAlphaPipeline
from diffusion import DPMS, SASolverSampler
from diffusion.data.datasets import *
from diffusion.model.hed import HEDdetector
from diffusion.model.nets import PixArt_XL_2, PixArtMS_XL_2, ControlPixArtHalf, ControlPixArtMSHalf
from diffusion.model.utils import resize_and_crop_tensor
from diffusion.utils.misc import read_config
from tools.download import find_model
DESCRIPTION = """
# PixArt-Delta (ControlNet)
#### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5.
#### This demo uses the [PixArt-alpha/PixArt-XL-2-1024-ControlNet](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) checkpoint.
#### This demo uses the [PixArt-alpha/PixArt-XL-2-512-ControlNet](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) checkpoint.
#### English prompts ONLY; 提示词仅限英文
### Please use the image size corresponding to the model as input to get the best performance. (eg. 1024px for PixArt-XL-2-1024-ControlNet.pth)
"""
if not torch.cuda.is_available():
DESCRIPTION += "\nRunning on CPU �� This demo does not work on CPU.
"
MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = torch.cuda.is_available() and os.getenv("CACHE_EXAMPLES", "1") == "1"
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "2048"))
USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
PORT = int(os.getenv("DEMO_PORT", "15432"))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
@torch.no_grad()
def ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs, ) -> None:
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
_log_api_usage_once(save_image)
grid = make_grid(tensor, **kwargs)
ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to("cpu", torch.uint8).numpy()
return ndarr
style_list = [
{
"name": "(No style)",
"prompt": "{prompt}",
"negative_prompt": "",
},
{
"name": "Cinematic",
"prompt": "cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
"negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
},
{
"name": "Photographic",
"prompt": "cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed",
"negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly",
},
{
"name": "Anime",
"prompt": "anime artwork {prompt} . anime style, key visual, vibrant, studio anime, highly detailed",
"negative_prompt": "photo, deformed, black and white, realism, disfigured, low contrast",
},
{
"name": "Manga",
"prompt": "manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style",
"negative_prompt": "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style",
},
{
"name": "Digital Art",
"prompt": "concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed",
"negative_prompt": "photo, photorealistic, realism, ugly",
},
{
"name": "Pixel art",
"prompt": "pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics",
"negative_prompt": "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic",
},
{
"name": "Fantasy art",
"prompt": "ethereal fantasy concept art of {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy",
"negative_prompt": "photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white",
},
{
"name": "Neonpunk",
"prompt": "neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional",
"negative_prompt": "painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
},
{
"name": "3D Model",
"prompt": "professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting",
"negative_prompt": "ugly, deformed, noisy, low poly, blurry, painting",
},
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list}
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"
SCHEDULE_NAME = ["DPM-Solver", "SA-Solver"]
DEFAULT_SCHEDULE_NAME = "DPM-Solver"
def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
if not negative:
negative = ""
return p.replace("{prompt}", positive), n + negative
def save_image(img):
unique_name = str(uuid.uuid4()) + '.png'
save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')
os.makedirs(save_path, exist_ok=True)
unique_name = os.path.join(save_path, unique_name)
img.save(unique_name)
return unique_name
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
@torch.inference_mode()
def generate(
prompt: str,
given_image = None,
negative_prompt: str = "",
style: str = DEFAULT_STYLE_NAME,
use_negative_prompt: bool = False,
seed: int = 0,
width: int = 1024,
height: int = 1024,
schedule: str = 'DPM-Solver',
dpms_guidance_scale: float = 4.5,
sas_guidance_scale: float = 3,
dpms_inference_steps: int = 14,
sas_inference_steps: int = 25,
randomize_seed: bool = False,
):
seed = int(randomize_seed_fn(seed, randomize_seed))
torch.manual_seed(seed)
torch.cuda.empty_cache()
strength = 1.0
c_vis = given_image
if not use_negative_prompt:
negative_prompt = None # type: ignore
prompt, negative_prompt = apply_style(style, prompt, negative_prompt)
prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask\
= pipe.encode_prompt(prompt=prompt, negative_prompt=negative_prompt)
prompt_embeds, negative_prompt_embeds = prompt_embeds[:, None], negative_prompt_embeds[:, None]
torch.cuda.empty_cache()
# condition process
if given_image is not None:
ar = torch.tensor([given_image.size[1] / given_image.size[0]], device=device)[None]
custom_hw = torch.tensor([given_image.size[1], given_image.size[0]], device=device)[None]
closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]
hw = torch.tensor(closest_hw, device=device)[None]
condition_transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(int(min(closest_hw))),
T.CenterCrop([int(closest_hw[0]), int(closest_hw[1])]),
T.ToTensor(),
])
given_image = condition_transform(given_image).unsqueeze(0).to(device)
hed_edge = hed(given_image) * strength
hed_edge = TF.normalize(hed_edge, [.5], [.5])
hed_edge = hed_edge.repeat(1, 3, 1, 1).to(weight_dtype)
posterior = vae.encode(hed_edge).latent_dist
condition = posterior.sample()
c = condition * config.scale_factor
c_vis = vae.decode(condition)['sample']
c_vis = torch.clamp(127.5 * c_vis + 128.0, 0, 255).permute(0, 2, 3, 1).to("cpu", dtype=torch.uint8).numpy()[0]
else:
c = None
ar = torch.tensor([int(height) / int(width)], device=device)[None]
custom_hw = torch.tensor([int(height), int(width)], device=device)[None]
closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]
hw = torch.tensor(closest_hw, device=device)[None]
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
# Sample images:
if schedule == 'DPM-Solver':
# Create sampling noise:
n = prompt_embeds.shape[0]
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=prompt_attention_mask, c=c)
dpm_solver = DPMS(model.forward_with_dpmsolver,
condition=prompt_embeds,
uncondition=negative_prompt_embeds,
cfg_scale=dpms_guidance_scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=dpms_inference_steps,
order=2,
skip_type="time_uniform",
method="multistep",
).to(weight_dtype)
elif schedule == "SA-Solver":
# Create sampling noise:
n = prompt_embeds.shape[0]
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=prompt_attention_mask, c=c)
sas_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)
samples = sas_solver.sample(
S=sas_inference_steps,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=prompt_embeds,
unconditional_conditioning=negative_prompt_embeds,
unconditional_guidance_scale=sas_guidance_scale,
model_kwargs=model_kwargs,
)[0].to(weight_dtype)
samples = vae.decode(samples / config.scale_factor).sample
torch.cuda.empty_cache()
samples = resize_and_crop_tensor(samples, custom_hw[0, 1], custom_hw[0, 0])
samples = PILImage.fromarray(ndarr_image(samples, normalize=True, value_range=(-1, 1)))
image_paths = [save_image(samples)]
c_vis = PILImage.fromarray(c_vis) if c_vis is not None else samples
c_paths = [save_image(c_vis)]
print(image_paths)
return image_paths, c_paths, seed
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("config", type=str, help="config")
parser.add_argument('--image_size', default=1024, type=int)
parser.add_argument('--model_path', type=str)
return parser.parse_args()
args = get_args()
config = read_config(args.config)
device = "cuda" if torch.cuda.is_available() else "cpu"
assert args.image_size in [512, 1024], "We only provide pre-trained models for 512x512 and 1024x1024 resolutions."
lewei_scale = {512: 1, 1024: 2}
latent_size = args.image_size // 8
weight_dtype = torch.float16
print(f"Inference with {weight_dtype}")
if torch.cuda.is_available():
hed = HEDdetector(False).to(device)
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
transformer=None,
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe.to(device)
print("Loaded on Device!")
vae = pipe.vae
text_encoder = pipe.text_encoder
tokenizer = pipe.tokenizer
assert args.image_size == config.image_size
if config.image_size == 512:
model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[config.image_size])
print('model architecture ControlPixArtHalf and image size is 512')
model = ControlPixArtHalf(model).to(device)
elif config.image_size == 1024:
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[config.image_size])
print('model architecture ControlPixArtMSHalf and image size is 1024')
model = ControlPixArtMSHalf(model).to(device)
state_dict = find_model(args.model_path)['state_dict']
if 'pos_embed' in state_dict:
del state_dict['pos_embed']
elif 'base_model.pos_embed' in state_dict:
del state_dict['base_model.pos_embed']
missing, unexpected = model.load_state_dict(state_dict, strict=False)
print('Missing keys (missing pos_embed is normal): ', missing)
print('Unexpected keys', unexpected)
model.eval()
model.to(weight_dtype)
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
with gr.Blocks(css="app/style_controlnet.css") as demo:
gr.Markdown(DESCRIPTION)
gr.DuplicateButton(
value="Duplicate Space for private use",
elem_id="duplicate-button",
visible=os.getenv("SHOW_DUPLICATE_BUTTON") == "1",
)
image_input = gr.Image(
label="Image",
height=360,
width=360,
show_label=False,
sources="upload",
type="pil",
)
with gr.Group():
with gr.Row():
prompt = gr.Text(
label="Prompt",
show_label=False,
max_lines=1,
placeholder="Enter your prompt",
container=False,
)
run_button = gr.Button("Run", scale=0)
with gr.Group():
with gr.Row():
hed_result = gr.Gallery(label="Hed Result", show_label=False)
result = gr.Gallery(label="Result", show_label=False)
with gr.Accordion("Advanced options", open=False):
with gr.Row():
use_negative_prompt = gr.Checkbox(label="Use negative prompt", value=False, visible=True)
schedule = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=SCHEDULE_NAME,
value=DEFAULT_SCHEDULE_NAME,
label="Sampler Schedule",
visible=True,
)
style_selection = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=STYLE_NAMES,
value=DEFAULT_STYLE_NAME,
label="Image Style",
)
negative_prompt = gr.Text(
label="Negative prompt",
max_lines=1,
placeholder="Enter a negative prompt",
visible=True,
)
seed = gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
)
randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
with gr.Row(visible=True):
width = gr.Slider(
label="Width",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=config.image_size,
)
height = gr.Slider(
label="Height",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=config.image_size,
)
with gr.Row():
dpms_guidance_scale = gr.Slider(
label="DPM-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=4.5,
)
dpms_inference_steps = gr.Slider(
label="DPM-Solver inference steps",
minimum=5,
maximum=40,
step=1,
value=14,
)
with gr.Row():
sas_guidance_scale = gr.Slider(
label="SA-Solver Guidance scale",
minimum=1,
maximum=10,
step=0.1,
value=3,
)
sas_inference_steps = gr.Slider(
label="SA-Solver inference steps",
minimum=10,
maximum=40,
step=1,
value=25,
)
gr.Examples(
examples=[
[
"anime superman in action",
"asset/images/controlnet/0_0.png",
],
[
"illustration of A loving couple standing in the open kitchen of the living room, cooking ,Couples have a full body, with characters accounting for a quarter of the screen, and the composition of the living room has a large perspective, resulting in a larger space.",
"asset/images/controlnet/0_3.png",
],
[
"A Electric 4 seats mini VAN,simple design stylel,led headlight,front 45 angle view,sunlight,clear sky.",
"asset/images/controlnet/0_2.png",
],
],
inputs=[prompt, image_input],
outputs=[result, hed_result, seed],
fn=generate,
cache_examples=CACHE_EXAMPLES,
)
use_negative_prompt.change(
fn=lambda x: gr.update(visible=x),
inputs=use_negative_prompt,
outputs=negative_prompt,
api_name=False,
)
gr.on(
triggers=[
prompt.submit,
negative_prompt.submit,
run_button.click,
],
fn=generate,
inputs=[
prompt,
image_input,
negative_prompt,
style_selection,
use_negative_prompt,
seed,
width,
height,
schedule,
dpms_guidance_scale,
sas_guidance_scale,
dpms_inference_steps,
sas_inference_steps,
randomize_seed,
],
outputs=[result, hed_result, seed],
api_name="run",
)
if __name__ == "__main__":
demo.queue(max_size=20).launch(server_name="0.0.0.0", server_port=PORT, debug=True)
================================================
FILE: PixArt-alpha-ToCa/app/app_lcm.py
================================================
#!/usr/bin/env python
from __future__ import annotations
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import random
import gradio as gr
import numpy as np
import uuid
from diffusers import PixArtAlphaPipeline, Transformer2DModel
from peft import PeftModel
import torch
from typing import Tuple
from datetime import datetime
import argparse
DESCRIPTION = """
# PixArt-LCM 1024px
#### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-LCM-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS) checkpoint.
#### [LCMs](https://github.com/luosiallen/latent-consistency-model) is a diffusion distillation method which predict PF-ODE's solution directly in latent space, achieving super fast inference with few steps.
#### English prompts ONLY; 提示词仅限英文
Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).
"""
if not torch.cuda.is_available():
DESCRIPTION += "\nRunning on CPU 🥶 This demo does not work on CPU.
"
MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = torch.cuda.is_available() and os.getenv("CACHE_EXAMPLES", "1") == "1"
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "2048"))
USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
PORT = int(os.getenv("DEMO_PORT", "15432"))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
style_list = [
{
"name": "(No style)",
"prompt": "{prompt}",
"negative_prompt": "",
},
{
"name": "Cinematic",
"prompt": "cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
"negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
},
{
"name": "Photographic",
"prompt": "cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed",
"negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly",
},
{
"name": "Anime",
"prompt": "anime artwork {prompt} . anime style, key visual, vibrant, studio anime, highly detailed",
"negative_prompt": "photo, deformed, black and white, realism, disfigured, low contrast",
},
{
"name": "Manga",
"prompt": "manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style",
"negative_prompt": "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style",
},
{
"name": "Digital Art",
"prompt": "concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed",
"negative_prompt": "photo, photorealistic, realism, ugly",
},
{
"name": "Pixel art",
"prompt": "pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics",
"negative_prompt": "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic",
},
{
"name": "Fantasy art",
"prompt": "ethereal fantasy concept art of {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy",
"negative_prompt": "photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white",
},
{
"name": "Neonpunk",
"prompt": "neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional",
"negative_prompt": "painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured",
},
{
"name": "3D Model",
"prompt": "professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting",
"negative_prompt": "ugly, deformed, noisy, low poly, blurry, painting",
},
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list}
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"
NUM_IMAGES_PER_PROMPT = 1
def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
if not negative:
negative = ""
return p.replace("{prompt}", positive), n + negative
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--is_lora', action='store_true', help='enable lora ckpt loading')
parser.add_argument('--repo_id', default="PixArt-alpha/PixArt-LCM-XL-2-1024-MS", type=str)
parser.add_argument('--lora_repo_id', default="PixArt-alpha/PixArt-LCM-LoRA-XL-2-1024-MS", type=str)
return parser.parse_args()
args = get_args()
if torch.cuda.is_available():
if not args.is_lora:
pipe = PixArtAlphaPipeline.from_pretrained(
args.repo_id,
torch_dtype=torch.float16,
use_safetensors=True,
)
else:
assert args.lora_repo_id is not None
transformer = Transformer2DModel.from_pretrained(args.repo_id, subfolder="transformer", torch_dtype=torch.float16)
transformer = PeftModel.from_pretrained(transformer, args.lora_repo_id)
pipe = PixArtAlphaPipeline.from_pretrained(
args.repo_id,
transformer=transformer,
torch_dtype=torch.float16,
use_safetensors=True,
)
del transformer
if ENABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
else:
pipe.to(device)
print("Loaded on Device!")
# speed-up T5
pipe.text_encoder.to_bettertransformer()
if USE_TORCH_COMPILE:
pipe.transformer = torch.compile(pipe.transformer, mode="reduce-overhead", fullgraph=True)
print("Model Compiled!")
def save_image(img):
unique_name = f'{str(uuid.uuid4())}.png'
save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')
os.makedirs(save_path, exist_ok=True)
unique_name = os.path.join(save_path, unique_name)
img.save(unique_name)
return unique_name
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
def generate(
prompt: str,
negative_prompt: str = "",
style: str = DEFAULT_STYLE_NAME,
use_negative_prompt: bool = False,
seed: int = 0,
width: int = 1024,
height: int = 1024,
inference_steps: int = 4,
randomize_seed: bool = False,
use_resolution_binning: bool = True,
progress=gr.Progress(track_tqdm=True),
):
seed = int(randomize_seed_fn(seed, randomize_seed))
generator = torch.Generator().manual_seed(seed)
if not use_negative_prompt:
negative_prompt = None # type: ignore
prompt, negative_prompt = apply_style(style, prompt, negative_prompt)
images = pipe(
prompt=prompt,
width=width,
height=height,
negative_prompt=negative_prompt,
guidance_scale=0.,
num_inference_steps=inference_steps,
generator=generator,
num_images_per_prompt=NUM_IMAGES_PER_PROMPT,
use_resolution_binning=use_resolution_binning,
output_type="pil",
).images
image_paths = [save_image(img) for img in images]
print(image_paths)
return image_paths, seed
examples = [
"A small cactus with a happy face in the Sahara desert.",
"an astronaut sitting in a diner, eating fries, cinematic, analog film",
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.",
"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.",
"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background",
"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism",
"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur",
"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8",
]
with gr.Blocks(css="scripts/style.css") as demo:
gr.Markdown(DESCRIPTION)
gr.DuplicateButton(
value="Duplicate Space for private use",
elem_id="duplicate-button",
visible=os.getenv("SHOW_DUPLICATE_BUTTON") == "1",
)
with gr.Group():
with gr.Row():
prompt = gr.Text(
label="Prompt",
show_label=False,
max_lines=1,
placeholder="Enter your prompt",
container=False,
)
run_button = gr.Button("Run", scale=0)
result = gr.Gallery(label="Result", columns=NUM_IMAGES_PER_PROMPT, show_label=False)
with gr.Accordion("Advanced options", open=False):
with gr.Row():
use_negative_prompt = gr.Checkbox(label="Use negative prompt", value=False, visible=True)
negative_prompt = gr.Text(
label="Negative prompt",
max_lines=1,
placeholder="Enter a negative prompt",
visible=True,
)
style_selection = gr.Radio(
show_label=True,
container=True,
interactive=True,
choices=STYLE_NAMES,
value=DEFAULT_STYLE_NAME,
label="Image Style",
)
seed = gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
)
randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
with gr.Row(visible=True):
width = gr.Slider(
label="Width",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
height = gr.Slider(
label="Height",
minimum=256,
maximum=MAX_IMAGE_SIZE,
step=32,
value=1024,
)
with gr.Row():
inference_steps = gr.Slider(
label="LCM inference steps",
minimum=1,
maximum=30,
step=1,
value=4,
)
gr.Examples(
examples=examples,
inputs=prompt,
outputs=[result, seed],
fn=generate,
cache_examples=CACHE_EXAMPLES,
)
use_negative_prompt.change(
fn=lambda x: gr.update(visible=x),
inputs=use_negative_prompt,
outputs=negative_prompt,
api_name=False,
)
gr.on(
triggers=[
prompt.submit,
negative_prompt.submit,
run_button.click,
],
fn=generate,
inputs=[
prompt,
negative_prompt,
style_selection,
use_negative_prompt,
seed,
width,
height,
inference_steps,
randomize_seed,
],
outputs=[result, seed],
api_name="run",
)
if __name__ == "__main__":
demo.queue(max_size=20).launch(server_name="0.0.0.0", server_port=PORT, debug=True)
================================================
FILE: PixArt-alpha-ToCa/app/style.css
================================================
.gradio-container{width:680px!important}
================================================
FILE: PixArt-alpha-ToCa/app/style_controlnet.css
================================================
.gradio-container{width:768px!important}
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart-dreambooth.md
================================================
# 🔥 How to Train PixArt + Dreambooth
- PixArt + [Dreambooth](https://dreambooth.github.io/)
You **ONLY** need to change the **config** file in [config](../../configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py) and **dataloader** in [dataset](../../diffusion/data/datasets/Dreambooth.py).
The directory structure for Dreambooth dataset is:
```
cd ./data/dreambooth
dataset
├──dog6/
│ ├──00.jpg
│ ├──01.jpg
│ ├──......
├──cat/
│ ├──00.jpg
│ ├──01.jpg
│ ├──......
```
To get started, first install the required dependencies, then run on your local machine:
```bash
cd data/
git clone https://github.com/google/dreambooth.git
python -m torch.distributed.launch --nproc_per_node=1 --master_port=26666 train_scripts/train_dreambooth.py configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py --work-dir output/path
```
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart.md
================================================
[//]: # ((reference from [hugging Face](https://github.com/huggingface/diffusers/blob/docs/8bit-inference-pixart/docs/source/en/api/pipelines/pixart.md)))
## Running the `PixArtAlphaPipeline` in under 8GB GPU VRAM
It is possible to run the [`PixArtAlphaPipeline`] under 8GB GPU VRAM by loading the text encoder in 8-bit numerical precision. Let's walk through a full-fledged example.
First, install the `bitsandbytes` library:
```bash
pip install -U bitsandbytes
```
Then load the text encoder in 8-bit:
```python
from transformers import T5EncoderModel
from diffusers import PixArtAlphaPipeline
text_encoder = T5EncoderModel.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
subfolder="text_encoder",
load_in_8bit=True,
device_map="auto",
)
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=text_encoder,
transformer=None,
device_map="auto"
)
```
Now, use the `pipe` to encode a prompt:
```python
with torch.no_grad():
prompt = "cute cat"
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
del text_encoder
del pipe
flush()
```
`flush()` is just a utility function to clear the GPU VRAM and is implemented like so:
```python
import gc
def flush():
gc.collect()
torch.cuda.empty_cache()
```
Then compute the latents providing the prompt embeddings as inputs:
```python
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=None,
torch_dtype=torch.float16,
).to("cuda")
latents = pipe(
negative_prompt=None,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
num_images_per_prompt=1,
output_type="latent",
).images
del pipe.transformer
flush()
```
Notice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded.
Once the latents are computed, pass it off the VAE to decode into a real image:
```python
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")
image.save("cat.png")
```
All of this, put together, should allow you to run [`PixArtAlphaPipeline`] under 8GB GPU VRAM.

Find the script [here](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e) that can be run end-to-end to report the memory being used.
Text embeddings computed in 8-bit can have an impact on the quality of the generated images because of the information loss in the representation space induced by the reduced precision. It's recommended to compare the outputs with and without 8-bit.
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart_comfyui.md
================================================
## 🔥 How to use PixArt in ComfyUI
### 1. Preparation for PixArt running envrironment
```bash
cd /workspace
conda create -n pixart python==3.9.0
conda activate pixart
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
git clone https://github.com/PixArt-alpha/PixArt-alpha.git
cd PixArt-alpha
pip install -r requirements.txt
```
### 2. Install ComfyUI related dependencies
```bash
cd /workspace
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
git clone https://github.com/city96/ComfyUI_ExtraModels custom_nodes/ComfyUI_ExtraModels
```
### 3. Download all the checkpoints: PixArt, VAE, T5 with script
```bash
cd /workspace/PixArt
python tools/download.py --model_names "PixArt-XL-2-1024-MS.pth"
```
or download with urls:[PixArt ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-1024-MS.pth), [VAE ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema),
[T5 ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl).
### 4. Put Checkpoints into corresponding folders
```bash
cd /workspace/ComfyUI
mv /path/to/PixArt-XL-2-1024-MS.pth ./models/checkpoints/
mv /path/to/sd-vae-ft-ema ./models/VAE/
mv /path/to/t5-v1_1-xxl ./models/t5/
```
### 5. run the ComfyUI website
```bash
cd /workspace/ComfyUI
python main.py --port 11111 --listen 0.0.0.0
```
Open http://your-server-ip:11111 to play with PixArt.
### 6. Create your own custom nodes
Here we prepare two examples for better understanding:
1) [PixArt Text-to-Image workflow](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-image-to-image-workflow.json)
2) [PixArt Image-to-Image workflow](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-image-to-image-workflow.json)
Once you download these json files, you can open your server website which is `http://your-server-ip:11111` and drop the json file into the website window to begin the PixArt-ComfyUI playground.
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart_controlnet.md
================================================
## 🔥 ControlNet
We incorporate a ControlNet-like(https://github.com/lllyasviel/ControlNet) module enables fine-grained control over text-to-image diffusion models. We introduce a novel ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability alongside high-quality image generation.
For more details about PixArt-ControlNet, please check the technical report [PixArt-δ](https://arxiv.org/abs/2401.05252).
## Training the `PixArt + ControlNet` on your machine
```bash
# Train on 1024px
python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_controlnet.py configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py --work-dir output/pixartcontrolnet-xl2-img1024
# Train on 512px
python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_controlnet.py configs/pixart_app_config/PixArt_xl2_img512_controlHed.py --work-dir output/pixartcontrolnet-xl2-img512
```
## Testing the `PixArt + ControlNet`
```bash
# Test on 1024px
DEMO_PORT= 12345 python app/app_controlnet.py configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py --model_path path/to/1024px/PixArt-XL-2-1024-ControlNet.pth
# Test on 512px
DEMO_PORT= 12345 python app/app_controlnet.py configs/pixart_app_config/PixArt_xl2_img512_controlHed.py --model_path path/to/512px/pixart_controlnet_ckpt
```
Then have a look at a simple example using the http://your-server-ip:12345
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart_inpaint.md
================================================
```python
import torch
from scripts.pipeline_pixart_inpaint import PixArtAlphaInpaintPipeline
from PIL import Image
pipe = PixArtAlphaInpaintPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)
prompt = ""
image = Image.open('')
mask_image = Image.open('')
out = pipe(prompt, image=image, mask_image=mask_image, strength=1.0).images[0]
out.save('./cactus_removed.png')
```
================================================
FILE: PixArt-alpha-ToCa/asset/docs/pixart_lcm.md
================================================
## 🔥 Why Need PixArt-LCM
Following [LCM LoRA](https://huggingface.co/blog/lcm_lora), we illustrative of the generation speed we achieve on various computers. Let us stress again how liberating it is to explore image generation so easily with PixArt-LCM.
| Hardware | PixArt-LCM (4 steps) | SDXL LoRA LCM (4 steps) | PixArt standard (14 steps) | SDXL standard (25 steps) |
|-----------------------------|----------------------|-------------------------|----------------------------|---------------------------|
| T4 (Google Colab Free Tier) | 3.3s | 8.4s | 16.0s | 26.5s |
| A100 (80 GB) | 0.51s | 1.2s | 2.2s | 3.8s |
| V100 (32 GB) | 0.8s | 1.2s | 5.5s | 7.7s |
These tests were run with a batch size of 1 in all cases.
For cards with a lot of capacity, such as A100, performance increases significantly when generating multiple images at once, which is usually the case for production workloads.
## Training the `PixArt + LCM` on your machine
```bash
python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_pixart_lcm.py configs/pixart_config/PixArt_xl2_img1024_lcm.py --work-dir output/pixartlcm-xl2-img1024_ft
```
## Trainig the `PixArt + LCM-LoRA`
```bash
python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_pixart_lcm_lora.py configs/pixart_config/PixArt_xl2_img1024_lcm.py --work-dir output/pixartlcm-lora-xl2-img1024_ft
```
## Testing the `PixArt + LCM` on your machine
```bash
DEMO_PORT=12345 python app/app_lcm.py
Then have a look at a simple example using the http://your-server-ip:12345
```
## Testing the `PixArt + LCM-LoRA`
```bash
DEMO_PORT=12345 python app/app_lcm.py --is_lora --lora_repo_id output/pixartlcm-lora-xl2-img1024_ft/checkpoint-xxx
Then have a look at a simple example using the http://your-server-ip:12345
```
## Integration in diffusers
### Using in 🧨 diffusers
Make sure you have the updated versions of the following libraries:
```bash
pip install -U transformers accelerate diffusers
```
And then:
```python
import torch
from diffusers import PixArtAlphaPipeline, AutoencoderKL
# for PixArt-LCM
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", torch_dtype=torch.float16, use_safetensors=True)
# for PixArt-LCM-LoRA
# transformer = Transformer2DModel.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", subfolder="transformer", torch_dtype=torch.float16)
# transformer = PeftModel.from_pretrained(transformer, "PixArt-alpha/PixArt-LCM-LoRA-XL-2-1024-MS")
# pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", transformer=transformer, torch_dtype=torch.float16, use_safetensors=True)
# del transformer
# Enable memory optimizations.
pipe.enable_model_cpu_offload()
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt, guidance_scale=0., num_inference_steps=4).images[0]
```
This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM.
Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.
# Keeping updating
================================================
FILE: PixArt-alpha-ToCa/asset/docs/sasolver.md
================================================
## SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Neurips 2023)
> [**SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Neurips 2023)**](https://arxiv.org/pdf/2309.05019.pdf)
> [Shuchen Xue*](https://github.com/scxue), [Mingyang Yi]()†,
> [Weijian Luo](), [Shifeng Zhang](), [Jiacheng Sun](),
> [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ),
> [Zhi-Ming Ma]()
>
University of Chinese Academy of Sciences, Huawei Noah’s Ark Lab, Peking University
---
## 🐱 Abstract
SA-Solver is a stochastic diffusion sampler based on Stochastic Adams Method. It is training-free and can be employed into pretrained diffusion models. It is a multistep SDE solver that can do fast stochastic sampling.
1. The parameter 'tau function' controls the stochasticity in the sampling process. Inspired by EDM, we choose the 'tau function' to be a piecewise constant function that is greater than 0 in the middle stage of sampling process and equals zero in the start and end stage. Specifically, we choose the default value of this parameter to be
```python
tau_func = lambda t: 1 if t >= 200 and t <= 800 else 0
```
in diffusers library and
```python
tau_t = lambda t: eta if 0.2 <= t <= 0.8 else 0
```
in ldm library. (The difference is because the time transformation * 1000).
The value '1' represents the magnitude of stochasticity. Higher value are recommended with more NFEs.
If you want to employ deterministic sampling (solving diffusion ODE) in SA-Solver, please set
```python
tau_func = lambda t: 0
```
If you want to employ original stochastic sampling (solving original diffusion SDE) in SA-Solver, please set
```python
tau_func = lambda t: 1
```
2. The parameter 'predictor_order' and 'corrector_order' controls the specific orders of 'SA-Predictor' and 'SA-Corrector'. For unconditional generation and conditional generation with small classifier-free guidance scale, the recommended orders are 'predictor_order = 3' and 'corrector_order = 4'; for conditional generation with large classifier-free guidance scale (e.g. t2i), the recommended orders are 'predictor_order = 2' and 'corrector_order = 2'.
================================================
FILE: PixArt-alpha-ToCa/asset/examples.py
================================================
examples = [
[
"A small cactus with a happy face in the Sahara desert.",
"dpm-solver", 20, 4.5,
"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/carousel/carousel1.png",
"Prompt: A small cactus with a happy face in the Sahara desert. \nSize: --ar 1:1.",
"Model path: PixArt-XL-2-1024x1024.pt.\nBase image size: 1024, \nSampling Algo: dpm-solver"],
[
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, "
"spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, "
"intricate detail. --ar 6144:4096.",
"dpm-solver", 20, 4.5,
"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/15.png",
"Prompt: Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, "
"spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, "
"intricate detail.\nSize: --ar 6144:4096.",
"Model path: PixArt-XL-2-1024x1024.pt.\nBase image size: 1024, \nSampling Algo: dpm-solver"],
[
"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, "
"blue and pink, brilliantly illuminated in the background.",
"dpm-solver", 20, 4.5,
"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/13.png",
"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.",
"Model path: PixArt-XL-2-1024x1024.pt.\nBase image size: 1024, \nSampling Algo: dpm-solver"],
[
"nature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.",
"dpm-solver", 20, 4.5,
"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/14.png",
"nature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.",
"Model path: PixArt-XL-2-1024x1024.pt.\nBase image size: 1024, \nSampling Algo: dpm-solver"],
]
================================================
FILE: PixArt-alpha-ToCa/asset/samples.txt
================================================
A small cactus with a happy face in the Sahara desert.
Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.
beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background
stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.
nature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.
Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism
anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur
The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8
Bright scene, aerial view, ancient city, fantasy, gorgeous light, mirror reflection, high detail, wide angle lens.
8k uhd A man looks up at the starry sky, lonely and ethereal, Minimalism, Chaotic composition Op Art
A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.
A 4k dslr image of a lemur wearing a red magician hat and a blue coat performing magic tricks with cards in a garden.
A alpaca made of colorful building blocks, cyberpunk
A baby painter trying to draw very simple picture, white background
A boy and a girl fall in love
A dog that has been meditating all the time
A man is sitting in a chair with his chin resting on his hand. The chair, along with the man's feet, are submerged in the sea. Strikingly, the man's back is on fire.
A painter study hard to learn how to draw with many concepts in the air, white background
A painter with low quality, white background, pixel art
A person standing on the desert, desert waves, gossip illustration, half red, half blue, abstract image of sand, clear style, trendy illustration, outdoor, top view, clear style, precision art, ultra high definition image
A silhouette of a grand piano overlooking a dusky cityscape viewed from a top-floor penthouse, rendered in the bold and vivid sytle of a vintage travel poster.
A sureal parallel world where mankind avoid extinction by preserving nature, epic trees, water streams, various flowers, intricate details, rich colors, rich vegetation, cinematic, symmetrical, beautiful lighting, V-Ray render, sun rays, magical lights, photography
A woman is shopping for fresh produce at the farmer's market.
A worker that looks like a mixture of cow and horse is working hard to type code
A young man dressed in ancient Chinese clothing, Asian people, White robe, Handsome, Hand gestures forming a spell, Martial arts and fairy-like vibe, Carrying a legendary-level giant sword on the back, Game character, Surrounded by runes, Cyberpunk style, neon lights, best quality, masterpiece, cg, hdr, high-definition, extremely detailed, photorealistic, epic, character design, detailed face, superhero, hero, detailed UHD, real-time, vfx, 3D rendering, 8k
An alien octopus floats through a protal reading a newspaper
An epressive oil painting of a basketbal player dunking, depicted as an explosion of a nebula
art collection style and fashion shoot, in the style of made of glass, dark blue and light pink, paul rand, solarpunk, camille vivier, beth didonato hair, barbiecore, hyper-realistic
artistic
beautiful secen
Crocodile in a sweater
Design a letter A, 3D stereoscopic Ice material Interior light blue Conceptual product design Futuristic Blind box toy Handcrafted Exquisite 3D effect Full body display Ultra-high precision Ultra-detailed Perfect lighting OC Renderer Blender 8k Ultra-sharp Ultra-noise reduction
Floating,colossal,futuristic statue in the sky, awe-inspiring and serenein the style of Stuart Lippincott:2with detailed composition and subtle geometric elements.This sanctuary-ike atmosphere features crisp clarity and soft amber tones.In contrasttiny human figures surround the statueThe pieceincorporates flowing draperiesreminiscent of Shwedoff and Philip McKay's stylesemphasizing thejuxtaposition between the powerful presence of the statue and thevulnerability of the minuscule human figuresshwedoff
knolling of a drawing tools for painter
Leonardo da Vinci's Last Supper content, Van Goph's Starry Night Style
Luffy from ONEPIECE, handsome face, fantasy
photography shot through an outdoor window of a coffee shop with neon sign lighting, window glares and reflections, depth of field, {little girl with red hair sitting at a table, portrait, kodak portra 800,105 mm f1.8
poster of a mechanical cat, techical Schematics viewed from front and side view on light white blueprint paper, illustartion drafting style, illustation, typography, conceptual art, dark fantasy steampunk, cinematic, dark fantasy
The girl in the car is filled with goldfish and flowers, goldfish can fly, Kawaguchi Renko's art, natural posture, holiday dadcore, youthful energy and pressure, body stretching, goldfish simulation movies in the sky, super details, and dreamy high photography. Colorful. Covered by water and goldfish, indoor scene, close-up shot in XT4 movie
The image features a woman wearing a red shirt with an icon. She appears to be posing for the camera, and her outfit includes a pair of jeans. The woman seems to be in a good mood, as she is smiling. The background of the image is blurry, focusing more on the woman and her attire.
The towel was on top of the hard counter.
A vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds.
I want to supplement vitamin c, please help me paint related food.
A vibrant yellow banana-shaped couch sits in a cozy living room, its curve cradling a pile of colorful cushions. on the wooden floor, a patterned rug adds a touch of eclectic charm, and a potted plant sits in the corner, reaching towards the sunlight filtering through the window.
A transparent sculpture of a duck made out of glass. The sculpture is in front of a painting of a landscape.
A blue jay standing on a large basket of rainbow macarons.
A bucket bag made of blue suede. The bag is decorated with intricate golden paisley patterns. The handle of the bag is made of rubies and pearls.
An alien octopus floats through a portal reading a newspaper.
bird's eye view of a city.
beautiful scene
A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon.
In front of a deep black backdrop, a figure of middle years, her Tongan skin rich and glowing, is captured mid-twirl, her curly hair flowing like a storm behind her. Her attire resembles a whirlwind of marble and porcelain fragments. Illuminated by the gleam of scattered porcelain shards, creating a dreamlike atmosphere, the dancer manages to appear fragmented, yet maintains a harmonious and fluid form.
Digital illustration of a beach scene crafted from yarn. The sandy beach is depicted with beige yarn, waves are made of blue and white yarn crashing onto the shore. A yarn sun sets on the horizon, casting a warm glow. Yarn palm trees sway gently, and little yarn seashells dot the shoreline.
Illustration of a chic chair with a design reminiscent of a pumpkin’s form, with deep orange cushioning, in a stylish loft setting.
A detailed oil painting of an old sea captain, steering his ship through a storm. Saltwater is splashing against his weathered face, determination in his eyes. Twirling malevolent clouds are seen above and stern waves threaten to submerge the ship while seagulls dive and twirl through the chaotic landscape. Thunder and lights embark in the distance, illuminating the scene with an eerie green glow.
An illustration of a human heart made of translucent glass, standing on a pedestal amidst a stormy sea. Rays of sunlight pierce the clouds, illuminating the heart, revealing a tiny universe within. The quote 'Find the universe within you' is etched in bold letters across the horizon.
A modern architectural building with large glass windows, situated on a cliff overlooking a serene ocean at sunset
photo of an ancient shipwreck nestled on the ocean floor. Marine plants have claimed the wooden structure, and fish swim in and out of its hollow spaces. Sunken treasures and old cannons are scattered around, providing a glimpse into the past
A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere.A minimap diorama of a cafe adorned with indoor plants. Wooden beams crisscross above, and a cold brew station stands out with tiny bottles and glasses.
An antique botanical illustration drawn with fine lines and a touch of watercolour whimsy, depicting a strange lily crossed with a Venus flytrap, its petals poised as if ready to snap shut on any unsuspecting insects.An illustration inspired by old-world botanical sketches blends a cactus with lilac blooms into a Möbius strip, using detailed lines and subtle watercolor touches to capture nature's diverse beauty and mathematical intrigue.
An ink sketch style illustration of a small hedgehog holding a piece of watermelon with its tiny paws, taking little bites with its eyes closed in delight.Photo of a lychee-inspired spherical chair, with a bumpy white exterior and plush interior, set against a tropical wallpaper.
3d digital art of an adorable ghost, glowing within, holding a heart shaped pumpkin, Halloween, super cute, spooky haunted house background
professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.
an astronaut sitting in a diner, eating fries, cinematic, analog film
================================================
FILE: PixArt-alpha-ToCa/configs/PixArt_xl2_internal.py
================================================
data_root = '/data/data'
data = dict(type='InternalData', root='images', image_list_json=['data_info.json'], transform='default_train', load_vae_feat=True)
image_size = 256 # the generated image resolution
train_batch_size = 32
eval_batch_size = 16
use_fsdp=False # if use FSDP mode
valid_num=0 # take as valid aspect-ratio when sample number >= valid_num
# model setting
model = 'PixArt_XL_2'
aspect_ratio_type = None # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]
multi_scale = False # if use multiscale dataset model training
lewei_scale = 1.0 # lewei_scale for positional embedding interpolation
# training setting
num_workers=4
train_sampling_steps = 1000
eval_sampling_steps = 250
model_max_length = 120
lora_rank = 4
num_epochs = 80
gradient_accumulation_steps = 1
grad_checkpointing = False
gradient_clip = 1.0
gc_step = 1
auto_lr = dict(rule='sqrt')
# we use different weight decay with the official implementation since it results better result
optimizer = dict(type='AdamW', lr=1e-4, weight_decay=3e-2, eps=1e-10)
lr_schedule = 'constant'
lr_schedule_args = dict(num_warmup_steps=500)
save_image_epochs = 1
save_model_epochs = 1
save_model_steps=1000000
sample_posterior = True
mixed_precision = 'fp16'
scale_factor = 0.18215
ema_rate = 0.9999
tensorboard_mox_interval = 50
log_interval = 50
cfg_scale = 4
mask_type='null'
num_group_tokens=0
mask_loss_coef=0.
load_mask_index=False # load prepared mask_type index
# load model settings
vae_pretrained = "/cache/pretrained_models/sd-vae-ft-ema"
load_from = None
resume_from = dict(checkpoint=None, load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
snr_loss=False
# work dir settings
work_dir = '/cache/exps/'
s3_work_dir = None
seed = 43
================================================
FILE: PixArt-alpha-ToCa/configs/PixArt_xl2_sam.py
================================================
data_root = '/data/data'
data = dict(type='SAM', root='images', image_list_txt='part0.txt', transform='default_train', load_vae_feat=True)
image_size = 256 # the generated image resolution
train_batch_size = 32
eval_batch_size = 16
use_fsdp=False # if use FSDP mode
# model setting
model = 'PixArt_XL_2'
aspect_ratio_type = None # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_1024]
multi_scale = False # if use multiscale dataset model training
lewei_scale = 1.0
model_max_length = 120
lora_rank = 4
# training setting
num_workers=4
train_sampling_steps = 1000
eval_sampling_steps = 250
num_epochs = 80
gradient_accumulation_steps = 1
grad_checkpointing = False
gc_step = 1
gradient_clip = 1.0
auto_lr = dict(rule='sqrt')
# we use different weight decay with the official implementation since it results better result
optimizer = dict(type='AdamW', lr=1e-4, weight_decay=3e-2, eps=1e-10)
lr_schedule = 'constant'
lr_schedule_args = dict(num_warmup_steps=500)
save_image_epochs = 1
save_model_epochs = 1
save_model_steps=1000000
sample_posterior = True
mixed_precision = 'fp16'
scale_factor = 0.18215
ema_rate = 0.9999
tensorboard_mox_interval = 50
log_interval = 50
cfg_scale = 4
mask_type='null'
num_group_tokens=0
mask_loss_coef=0.
load_mask_index=False # load prepared mask_type index
# load model settings
vae_pretrained = "/cache/pretrained_models/sd-vae-ft-ema"
load_from = None
resume_from = dict(checkpoint=None, load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
snr_loss=False
# work dir settings
work_dir = '/cache/exps/'
s3_work_dir = None
seed = 43
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalDataHed', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 1024
# model setting
model = 'PixArtMS_XL_2'
fp32_attention = False # Set to True if you got NaN loss
load_from = 'path-to-pixart-checkpoints'
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
lewei_scale = 2.0
# training setting
num_workers=10
train_batch_size = 4 # set the batch size according to your VRAM
num_epochs = 10 # 3
gradient_accumulation_steps = 4
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=0)
save_model_epochs=5
save_model_steps=1000
log_interval = 20
eval_sampling_steps = 200
work_dir = 'output_debug/debug'
# controlnet related params
copy_blocks_num = 13
class_dropout_prob = 0.5
train_ratio = 1
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data/dreambooth/dataset'
data = dict(type='DreamBooth', root='dog6', prompt=['a photo of sks dog'], transform='default_train', load_vae_feat=True)
image_size = 1024
# model setting
model = 'PixArtMS_XL_2' # model for multi-scale training
fp32_attention = True
load_from = 'Path/to/PixArt-XL-2-1024-MS.pth'
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
aspect_ratio_type = 'ASPECT_RATIO_1024' # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]
multi_scale = True # if use multiscale dataset model training
lewei_scale = 2.0
# training setting
num_workers=1
train_batch_size = 1
num_epochs = 200
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=5e-6, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=0)
auto_lr = None
log_interval = 1
save_model_epochs=10000
save_model_steps=100
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img512_controlHed.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalDataHed', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 512
# model setting
model = 'PixArt_XL_2'
fp32_attention = False # Set to True if you got NaN loss
load_from = 'path-to-pixart-checkpoints'
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
lewei_scale = 1.0
# training setting
num_workers=10
train_batch_size = 12 # 32 # max 96 for DiT-L/4 when grad_checkpoint
num_epochs = 1000 # 3
gradient_accumulation_steps = 4
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=0)
save_model_epochs=5
save_model_steps=1000
log_interval = 20
eval_sampling_steps = 200
work_dir = 'output_debug/debug'
# controlnet related params
copy_blocks_num = 13
class_dropout_prob = 0.5
train_ratio = 0.1
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_internal.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 1024
# model setting
window_block_indexes = []
window_size=0
use_rel_pos=False
model = 'PixArt_XL_2'
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
lewei_scale = 2.0
# training setting
num_workers=10
train_batch_size = 2 # 32
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
eval_sampling_steps = 200
log_interval = 20
save_model_epochs=1
save_model_steps=2000
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_internalms.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 1024
# model setting
model = 'PixArtMS_XL_2' # model for multi-scale training
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
aspect_ratio_type = 'ASPECT_RATIO_1024' # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]
multi_scale = True # if use multiscale dataset model training
lewei_scale = 2.0
# training setting
num_workers=10
train_batch_size = 12 # max 14 for PixArt-xL/2 when grad_checkpoint
num_epochs = 10 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
save_model_epochs=1
save_model_steps=2000
log_interval = 20
eval_sampling_steps = 200
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_lcm.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 1024
# model setting
model = 'PixArtMS_XL_2' # model for multi-scale training
fp32_attention = False # Set to True if you got NaN loss
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
aspect_ratio_type = 'ASPECT_RATIO_1024' # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]
multi_scale = True # if use multiscale dataset model training
lewei_scale = 2.0
# training setting
num_workers=4
train_batch_size = 16 # max 12 for PixArt-xL/2 when grad_checkpoint 16 for LCM-LoRA
num_epochs = 10 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=0.0, eps=1e-10)
# optimizer = dict(type='CAMEWrapper', lr=1e-7, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=100)
save_model_epochs=1
save_model_steps=200
valid_num=0 # take as valid aspect-ratio when sample number >= valid_num
log_interval = 10
eval_sampling_steps = 200
work_dir = 'output/debug'
# LCM
loss_type = 'huber'
huber_c = 0.001
num_ddim_timesteps=50
w_max = 15.0
w_min = 3.0
ema_decay = 0.95
cfg_scale = 4.5
class_dropout_prob = 0.
lora_rank = 32
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img256_SAM.py
================================================
_base_ = ['../PixArt_xl2_sam.py']
data_root = 'data'
image_list_txt = ['part0.txt', 'part1.txt', 'part2.txt', 'part3.txt', 'part4.txt', 'part5.txt', 'part6.txt', 'part7.txt', 'part8.txt',
'part9.txt', 'part10.txt', 'part11.txt', 'part12.txt', 'part13.txt', 'part14.txt','part15.txt','part16.txt',
'part17.txt','part18.txt','part19.txt','part20.txt','part21.txt', 'part22.txt', 'part23.txt', 'part24.txt',
'part25.txt', 'part26.txt', 'part27.txt', 'part28.txt', 'part29.txt', 'part30.txt', 'part31.txt']
data = dict(type='SAM', root='SA1B', image_list_txt=image_list_txt, transform='default_train', load_vae_feat=True)
image_size = 256
# model setting
window_block_indexes=[]
window_size=0
use_rel_pos=False
model = 'PixArt_XL_2'
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
# training setting
use_fsdp=False # if use FSDP mode
num_workers=10
train_batch_size = 176 # 32
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
eval_sampling_steps = 200
log_interval = 20
save_model_epochs=2
save_model_steps=20000
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img256_internal.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 256
# model setting
window_block_indexes=[]
window_size=0
use_rel_pos=False
model = 'PixArt_XL_2'
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
# training setting
eval_sampling_steps = 200
num_workers=10
train_batch_size = 176 # 32 # max 96 for PixArt-L/4 when grad_checkpoint
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
log_interval = 20
save_model_epochs=5
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img512_internal.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 512
# model setting
window_block_indexes = []
window_size=0
use_rel_pos=False
model = 'PixArt_XL_2'
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
lewei_scale = 1.0
# training setting
use_fsdp=False # if use FSDP mode
num_workers=10
train_batch_size = 38 # 32
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
eval_sampling_steps = 200
log_interval = 20
save_model_epochs=1
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img512_internalms.py
================================================
_base_ = ['../PixArt_xl2_internal.py']
data_root = 'data'
image_list_json = ['data_info.json',]
data = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 512
# model setting
model = 'PixArtMS_XL_2' # model for multi-scale training
fp32_attention = True
load_from = None
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
window_block_indexes = []
window_size=0
use_rel_pos=False
aspect_ratio_type = 'ASPECT_RATIO_512' # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]
multi_scale = True # if use multiscale dataset model training
lewei_scale = 1.0
# training setting
num_workers=10
train_batch_size = 40 # max 40 for PixArt-xL/2 when grad_checkpoint
num_epochs = 20 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
save_model_epochs=1
save_model_steps=2000
log_interval = 20
eval_sampling_steps = 200
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/diffusion/__init__.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
from .iddpm import IDDPM
from .dpm_solver import DPMS
from .sa_sampler import SASolverSampler
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/__init__.py
================================================
from .datasets import *
from .transforms import get_transform
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/builder.py
================================================
import os
import time
from mmcv import Registry, build_from_cfg
from torch.utils.data import DataLoader
from diffusion.data.transforms import get_transform
from diffusion.utils.logger import get_root_logger
DATASETS = Registry('datasets')
DATA_ROOT = '/cache/data'
def set_data_root(data_root):
global DATA_ROOT
DATA_ROOT = data_root
def get_data_path(data_dir):
if os.path.isabs(data_dir):
return data_dir
global DATA_ROOT
return os.path.join(DATA_ROOT, data_dir)
def build_dataset(cfg, resolution=224, **kwargs):
logger = get_root_logger()
dataset_type = cfg.get('type')
logger.info(f"Constructing dataset {dataset_type}...")
t = time.time()
transform = cfg.pop('transform', 'default_train')
transform = get_transform(transform, resolution)
dataset = build_from_cfg(cfg, DATASETS, default_args=dict(transform=transform, resolution=resolution, **kwargs))
logger.info(f"Dataset {dataset_type} constructed. time: {(time.time() - t):.2f} s, length (use/ori): {len(dataset)}/{dataset.ori_imgs_nums}")
return dataset
def build_dataloader(dataset, batch_size=256, num_workers=4, shuffle=True, **kwargs):
return (
DataLoader(
dataset,
batch_sampler=kwargs['batch_sampler'],
num_workers=num_workers,
pin_memory=True,
)
if 'batch_sampler' in kwargs
else DataLoader(
dataset,
batch_size=batch_size,
shuffle=shuffle,
num_workers=num_workers,
pin_memory=True,
**kwargs
)
)
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/Dreambooth.py
================================================
from PIL import Image
import numpy as np
import torch
from torchvision.datasets.folder import default_loader, IMG_EXTENSIONS
from torch.utils.data import Dataset
from diffusers.utils.torch_utils import randn_tensor
from torchvision import transforms as T
import pathlib
from diffusers.models import AutoencoderKL
from diffusion.data.builder import get_data_path, DATASETS
from diffusion.data.datasets.utils import *
IMAGE_EXTENSIONS = {'bmp', 'jpg', 'jpeg', 'pgm', 'png', 'ppm', 'tif', 'tiff', 'webp', 'JPEG'}
@DATASETS.register_module()
class DreamBooth(Dataset):
def __init__(self,
root,
transform=None,
resolution=1024,
**kwargs):
self.root = get_data_path(root)
path = pathlib.Path(self.root)
self.transform = transform
self.resolution = resolution
self.img_samples = sorted(
[file for ext in IMAGE_EXTENSIONS for file in path.glob(f'*.{ext}')]
)
self.ori_imgs_nums = len(self)
self.loader = default_loader
self.base_size = int(kwargs['aspect_ratio_type'].split('_')[-1])
self.aspect_ratio = eval(kwargs.pop('aspect_ratio_type')) # base aspect ratio
self.ratio_nums = {}
for k, v in self.aspect_ratio.items():
self.ratio_nums[float(k)] = 0 # used for batch-sampler
self.data_info = {'img_hw': torch.tensor([resolution, resolution], dtype=torch.float32), 'aspect_ratio': 1.}
# image related
with torch.inference_mode():
vae = AutoencoderKL.from_pretrained("output/pretrained_models/sd-vae-ft-ema")
imgs = []
for img_path in self.img_samples:
img = self.loader(img_path)
self.ratio_nums[1.0] += 1
if self.transform is not None:
imgs.append(self.transform(img))
imgs = torch.stack(imgs, dim=0)
self.img_vae = vae.encode(imgs).latent_dist.sample()
del vae
def __getitem__(self, index):
return self.img_vae[index], self.data_info
@staticmethod
def vae_feat_loader(path):
# [mean, std]
mean, std = torch.from_numpy(np.load(path)).chunk(2)
sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)
return mean + std * sample
def load_ori_img(self, img_path):
# 加载图像并转换为Tensor
transform = T.Compose([
T.Resize(256), # Image.BICUBIC
T.CenterCrop(256),
T.ToTensor(),
])
return transform(Image.open(img_path))
def __len__(self):
return len(self.img_samples)
def __getattr__(self, name):
if name == "set_epoch":
return lambda epoch: None
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
def get_data_info(self, idx):
return {'height': self.resolution, 'width': self.resolution}
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/InternalData.py
================================================
import os
import random
from PIL import Image
import numpy as np
import torch
from torchvision.datasets.folder import default_loader, IMG_EXTENSIONS
from torch.utils.data import Dataset
from diffusers.utils.torch_utils import randn_tensor
from torchvision import transforms as T
from diffusion.data.builder import get_data_path, DATASETS
from diffusion.utils.logger import get_root_logger
import json
@DATASETS.register_module()
class InternalData(Dataset):
def __init__(self,
root,
image_list_json='data_info.json',
transform=None,
resolution=256,
sample_subset=None,
load_vae_feat=False,
input_size=32,
patch_size=2,
mask_ratio=0.0,
load_mask_index=False,
max_length=120,
config=None,
**kwargs):
self.root = get_data_path(root)
self.transform = transform
self.load_vae_feat = load_vae_feat
self.ori_imgs_nums = 0
self.resolution = resolution
self.N = int(resolution // (input_size // patch_size))
self.mask_ratio = mask_ratio
self.load_mask_index = load_mask_index
self.max_lenth = max_length
self.meta_data_clean = []
self.img_samples = []
self.txt_feat_samples = []
self.vae_feat_samples = []
self.mask_index_samples = []
self.prompt_samples = []
image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]
for json_file in image_list_json:
meta_data = self.load_json(os.path.join(self.root, 'partition', json_file))
self.ori_imgs_nums += len(meta_data)
meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]
self.meta_data_clean.extend(meta_data_clean)
self.img_samples.extend([os.path.join(self.root.replace('InternData', "InternImgs"), item['path']) for item in meta_data_clean])
self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])
self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_features_{resolution}resolution/noflip', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])
self.prompt_samples.extend([item['prompt'] for item in meta_data_clean])
# Set loader and extensions
if load_vae_feat:
self.transform = None
self.loader = self.vae_feat_loader
else:
self.loader = default_loader
if sample_subset is not None:
self.sample_subset(sample_subset) # sample dataset for local debug
logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
logger.info(f"T5 max token length: {self.max_lenth}")
def getdata(self, index):
img_path = self.img_samples[index]
npz_path = self.txt_feat_samples[index]
npy_path = self.vae_feat_samples[index]
prompt = self.prompt_samples[index]
data_info = {
'img_hw': torch.tensor([torch.tensor(self.resolution), torch.tensor(self.resolution)], dtype=torch.float32),
'aspect_ratio': torch.tensor(1.)
}
img = self.loader(npy_path) if self.load_vae_feat else self.loader(img_path)
txt_info = np.load(npz_path)
txt_fea = torch.from_numpy(txt_info['caption_feature']) # 1xTx4096
attention_mask = torch.ones(1, 1, txt_fea.shape[1]) # 1x1xT
if 'attention_mask' in txt_info.keys():
attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]
if txt_fea.shape[1] != self.max_lenth:
txt_fea = torch.cat([txt_fea, txt_fea[:, -1:].repeat(1, self.max_lenth-txt_fea.shape[1], 1)], dim=1)
attention_mask = torch.cat([attention_mask, torch.zeros(1, 1, self.max_lenth-attention_mask.shape[-1])], dim=-1)
if self.transform:
img = self.transform(img)
data_info['prompt'] = prompt
return img, txt_fea, attention_mask, data_info
def __getitem__(self, idx):
for _ in range(20):
try:
return self.getdata(idx)
except Exception as e:
print(f"Error details: {str(e)}")
idx = np.random.randint(len(self))
raise RuntimeError('Too many bad data.')
def get_data_info(self, idx):
data_info = self.meta_data_clean[idx]
return {'height': data_info['height'], 'width': data_info['width']}
@staticmethod
def vae_feat_loader(path):
# [mean, std]
mean, std = torch.from_numpy(np.load(path)).chunk(2)
sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)
return mean + std * sample
def load_ori_img(self, img_path):
# 加载图像并转换为Tensor
transform = T.Compose([
T.Resize(256), # Image.BICUBIC
T.CenterCrop(256),
T.ToTensor(),
])
return transform(Image.open(img_path))
def load_json(self, file_path):
with open(file_path, 'r') as f:
meta_data = json.load(f)
return meta_data
def sample_subset(self, ratio):
sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))
self.img_samples = [self.img_samples[i] for i in sampled_idx]
def __len__(self):
return len(self.img_samples)
def __getattr__(self, name):
if name == "set_epoch":
return lambda epoch: None
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/InternalData_ms.py
================================================
import os
import numpy as np
import torch
import random
from torchvision.datasets.folder import default_loader
from diffusion.data.datasets.InternalData import InternalData
from diffusion.data.builder import get_data_path, DATASETS
from diffusion.utils.logger import get_root_logger
import torchvision.transforms as T
from torchvision.transforms.functional import InterpolationMode
from diffusion.data.datasets.utils import *
def get_closest_ratio(height: float, width: float, ratios: dict):
aspect_ratio = height / width
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))
return ratios[closest_ratio], float(closest_ratio)
@DATASETS.register_module()
class InternalDataMS(InternalData):
def __init__(self,
root,
image_list_json='data_info.json',
transform=None,
resolution=256,
sample_subset=None,
load_vae_feat=False,
input_size=32,
patch_size=2,
mask_ratio=0.0,
mask_type='null',
load_mask_index=False,
max_length=120,
config=None,
**kwargs):
self.root = get_data_path(root)
self.transform = transform
self.load_vae_feat = load_vae_feat
self.ori_imgs_nums = 0
self.resolution = resolution
self.N = int(resolution // (input_size // patch_size))
self.mask_ratio = mask_ratio
self.load_mask_index = load_mask_index
self.mask_type = mask_type
self.base_size = int(kwargs['aspect_ratio_type'].split('_')[-1])
self.max_lenth = max_length
self.aspect_ratio = eval(kwargs.pop('aspect_ratio_type')) # base aspect ratio
self.meta_data_clean = []
self.img_samples = []
self.txt_feat_samples = []
self.vae_feat_samples = []
self.mask_index_samples = []
self.ratio_index = {}
self.ratio_nums = {}
for k, v in self.aspect_ratio.items():
self.ratio_index[float(k)] = [] # used for self.getitem
self.ratio_nums[float(k)] = 0 # used for batch-sampler
image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]
for json_file in image_list_json:
meta_data = self.load_json(os.path.join(self.root, 'partition_filter', json_file))
self.ori_imgs_nums += len(meta_data)
meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]
self.meta_data_clean.extend(meta_data_clean)
self.img_samples.extend([os.path.join(self.root.replace('InternData', "InternImgs"), item['path']) for item in meta_data_clean])
self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])
self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_fatures_{resolution}_multiscale/ms', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])
# Set loader and extensions
if load_vae_feat:
self.transform = None
self.loader = self.vae_feat_loader
else:
self.loader = default_loader
if sample_subset is not None:
self.sample_subset(sample_subset) # sample dataset for local debug
# scan the dataset for ratio static
for i, info in enumerate(self.meta_data_clean[:len(self.meta_data_clean)//3]):
ori_h, ori_w = info['height'], info['width']
closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)
self.ratio_nums[closest_ratio] += 1
if len(self.ratio_index[closest_ratio]) == 0:
self.ratio_index[closest_ratio].append(i)
# print(self.ratio_nums)
logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
logger.info(f"T5 max token length: {self.max_lenth}")
def getdata(self, index):
img_path = self.img_samples[index]
npz_path = self.txt_feat_samples[index]
npy_path = self.vae_feat_samples[index]
ori_h, ori_w = self.meta_data_clean[index]['height'], self.meta_data_clean[index]['width']
# Calculate the closest aspect ratio and resize & crop image[w, h]
closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)
closest_size = list(map(lambda x: int(x), closest_size))
self.closest_ratio = closest_ratio
if self.load_vae_feat:
try:
img = self.loader(npy_path)
if index not in self.ratio_index[closest_ratio]:
self.ratio_index[closest_ratio].append(index)
except Exception:
index = random.choice(self.ratio_index[closest_ratio])
return self.getdata(index)
h, w = (img.shape[1], img.shape[2])
assert h, w == (ori_h//8, ori_w//8)
else:
img = self.loader(img_path)
h, w = (img.size[1], img.size[0])
assert h, w == (ori_h, ori_w)
data_info = {'img_hw': torch.tensor([ori_h, ori_w], dtype=torch.float32)}
data_info['aspect_ratio'] = closest_ratio
data_info["mask_type"] = self.mask_type
txt_info = np.load(npz_path)
txt_fea = torch.from_numpy(txt_info['caption_feature'])
attention_mask = torch.ones(1, 1, txt_fea.shape[1])
if 'attention_mask' in txt_info.keys():
attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]
if not self.load_vae_feat:
if closest_size[0] / ori_h > closest_size[1] / ori_w:
resize_size = closest_size[0], int(ori_w * closest_size[0] / ori_h)
else:
resize_size = int(ori_h * closest_size[1] / ori_w), closest_size[1]
self.transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(resize_size, interpolation=InterpolationMode.BICUBIC), # Image.BICUBIC
T.CenterCrop(closest_size),
T.ToTensor(),
T.Normalize([.5], [.5]),
])
if self.transform:
img = self.transform(img)
return img, txt_fea, attention_mask, data_info
def __getitem__(self, idx):
for _ in range(20):
try:
return self.getdata(idx)
except Exception as e:
print(f"Error details: {str(e)}")
idx = random.choice(self.ratio_index[self.closest_ratio])
raise RuntimeError('Too many bad data.')
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/SA.py
================================================
import os
import random
import time
import numpy as np
import torch
from torchvision.datasets.folder import default_loader, IMG_EXTENSIONS
from torch.utils.data import Dataset
from diffusers.utils.torch_utils import randn_tensor
from diffusion.data.builder import get_data_path, DATASETS
@DATASETS.register_module()
class SAM(Dataset):
def __init__(self,
root,
image_list_txt='part0.txt',
transform=None,
resolution=256,
sample_subset=None,
load_vae_feat=False,
mask_ratio=0.0,
mask_type='null',
**kwargs):
self.root = get_data_path(root)
self.transform = transform
self.load_vae_feat = load_vae_feat
self.mask_type = mask_type
self.mask_ratio = mask_ratio
self.resolution = resolution
self.img_samples = []
self.txt_feat_samples = []
self.vae_feat_samples = []
image_list_txt = image_list_txt if isinstance(image_list_txt, list) else [image_list_txt]
if image_list_txt == 'all':
image_list_txts = os.listdir(os.path.join(self.root, 'partition'))
for txt in image_list_txts:
image_list = os.path.join(self.root, 'partition', txt)
with open(image_list, 'r') as f:
lines = [line.strip() for line in f.readlines()]
self.img_samples.extend([os.path.join(self.root, 'images', i+'.jpg') for i in lines])
self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', i+'.npz') for i in lines])
elif isinstance(image_list_txt, list):
for txt in image_list_txt:
image_list = os.path.join(self.root, 'partition', txt)
with open(image_list, 'r') as f:
lines = [line.strip() for line in f.readlines()]
self.img_samples.extend([os.path.join(self.root, 'images', i + '.jpg') for i in lines])
self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', i + '.npz') for i in lines])
self.vae_feat_samples.extend([os.path.join(self.root, 'img_vae_feature/train_vae_256/noflip', i + '.npy') for i in lines])
self.ori_imgs_nums = len(self)
# self.img_samples = self.img_samples[:10000]
# Set loader and extensions
if load_vae_feat:
self.transform = None
self.loader = self.vae_feat_loader
else:
self.loader = default_loader
if sample_subset is not None:
self.sample_subset(sample_subset) # sample dataset for local debug
def getdata(self, idx):
img_path = self.img_samples[idx]
npz_path = self.txt_feat_samples[idx]
npy_path = self.vae_feat_samples[idx]
data_info = {'img_hw': torch.tensor([self.resolution, self.resolution], dtype=torch.float32),
'aspect_ratio': torch.tensor(1.)}
img = self.loader(npy_path) if self.load_vae_feat else self.loader(img_path)
npz_info = np.load(npz_path)
txt_fea = torch.from_numpy(npz_info['caption_feature'])
attention_mask = torch.ones(1, 1, txt_fea.shape[1])
if 'attention_mask' in npz_info.keys():
attention_mask = torch.from_numpy(npz_info['attention_mask'])[None]
if self.transform:
img = self.transform(img)
data_info["mask_type"] = self.mask_type
return img, txt_fea, attention_mask, data_info
def __getitem__(self, idx):
for _ in range(20):
try:
return self.getdata(idx)
except Exception:
print(self.img_samples[idx], ' info is not correct')
idx = np.random.randint(len(self))
raise RuntimeError('Too many bad data.')
@staticmethod
def vae_feat_loader(path):
# [mean, std]
mean, std = torch.from_numpy(np.load(path)).chunk(2)
sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)
return mean + std * sample
# return mean
def sample_subset(self, ratio):
sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))
self.img_samples = [self.img_samples[i] for i in sampled_idx]
self.txt_feat_samples = [self.txt_feat_samples[i] for i in sampled_idx]
def __len__(self):
return len(self.img_samples)
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/__init__.py
================================================
from .SA import SAM
from .InternalData import InternalData
from .InternalData_ms import InternalDataMS
from .Dreambooth import DreamBooth
from .pixart_control import InternalDataHed
from .utils import *
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/pixart_control.py
================================================
import os
import random
from PIL import Image
import numpy as np
import torch
from torchvision.datasets.folder import default_loader, IMG_EXTENSIONS
from torch.utils.data import Dataset
from diffusers.utils.torch_utils import randn_tensor
from torchvision import transforms as T
from diffusion.data.builder import get_data_path, DATASETS
import json, time
@DATASETS.register_module()
class InternalDataHed(Dataset):
def __init__(self,
root,
image_list_json='data_info.json',
transform=None,
resolution=256,
sample_subset=None,
load_vae_feat=False,
input_size=32,
patch_size=2,
mask_ratio=0.0,
load_mask_index=False,
train_ratio=1.0,
mode='train',
**kwargs):
self.root = get_data_path(root)
self.transform = transform
self.load_vae_feat = load_vae_feat
self.ori_imgs_nums = 0
self.resolution = resolution
self.N = int(resolution // (input_size // patch_size))
self.mask_ratio = mask_ratio
self.load_mask_index = load_mask_index
self.meta_data_clean = []
self.img_samples = []
self.txt_feat_samples = []
self.vae_feat_samples = []
self.hed_feat_samples = []
self.prompt_samples = []
image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]
for json_file in image_list_json:
meta_data = self.load_json(os.path.join(self.root, 'partition_filter', json_file))
self.ori_imgs_nums += len(meta_data)
meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]
self.meta_data_clean.extend(meta_data_clean)
self.img_samples.extend([os.path.join(self.root.replace('InternData', "InternImgs"), item['path']) for item in meta_data_clean])
self.txt_feat_samples.extend([os.path.join(self.root, 'caption_features', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])
self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_features_{resolution}resolution/noflip', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])
self.hed_feat_samples.extend([os.path.join(self.root, f'hed_feature_{resolution}', item['path'].replace('.png', '.npz')) for item in meta_data_clean])
self.prompt_samples.extend([item['prompt'] for item in meta_data_clean])
total_sample = len(self.img_samples)
used_sample_num = int(total_sample * train_ratio)
print("using mode", mode)
if mode == 'train':
self.img_samples = self.img_samples[:used_sample_num]
self.txt_feat_samples = self.txt_feat_samples[:used_sample_num]
self.vae_feat_samples = self.vae_feat_samples[:used_sample_num]
self.hed_feat_samples = self.hed_feat_samples[:used_sample_num]
self.prompt_samples = self.prompt_samples[:used_sample_num]
else:
self.img_samples = self.img_samples[-used_sample_num:]
self.txt_feat_samples = self.txt_feat_samples[-used_sample_num:]
self.vae_feat_samples = self.vae_feat_samples[-used_sample_num:]
self.hed_feat_samples = self.hed_feat_samples[-used_sample_num:]
self.prompt_samples = self.prompt_samples[-used_sample_num:]
# Set loader and extensions
if load_vae_feat:
self.transform = None
self.loader = self.vae_feat_loader
else:
self.loader = default_loader
if sample_subset is not None:
self.sample_subset(sample_subset) # sample dataset for local debug
def getdata(self, index):
img_path = self.img_samples[index]
npz_path = self.txt_feat_samples[index]
npy_path = self.vae_feat_samples[index]
hed_npz_path = self.hed_feat_samples[index]
prompt = self.prompt_samples[index]
# only trained on single-scale 1024 res data
data_info = {'img_hw': torch.tensor([1024., 1024.], dtype=torch.float32), 'aspect_ratio': torch.tensor(1.)}
if self.load_vae_feat:
img = self.loader(npy_path)
else:
img = self.loader(img_path)
hed_fea = self.vae_feat_loader_npz(hed_npz_path)
txt_info = np.load(npz_path)
txt_fea = torch.from_numpy(txt_info['caption_feature'])
attention_mask = torch.ones(1, 1, txt_fea.shape[1])
if 'attention_mask' in txt_info.keys():
attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]
if self.transform:
img = self.transform(img)
data_info['condition'] = hed_fea
data_info['prompt'] = prompt
return img, txt_fea, attention_mask, data_info
def __getitem__(self, idx):
for i in range(20):
try:
data = self.getdata(idx)
return data
except Exception as e:
print(f"Error details: {str(e)}")
idx = np.random.randint(len(self))
raise RuntimeError('Too many bad data.')
def get_data_info(self, idx):
data_info = self.meta_data_clean[idx]
return {'height': data_info['height'], 'width': data_info['width']}
@staticmethod
def vae_feat_loader(path):
# [mean, std]
mean, std = torch.from_numpy(np.load(path)).chunk(2)
sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)
return mean + std * sample
@staticmethod
def vae_feat_loader_npz(path):
# [mean, std]
mean, std = torch.from_numpy(np.load(path)['arr_0']).chunk(2)
sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)
return mean + std * sample
def load_json(self, file_path):
with open(file_path, 'r') as f:
meta_data = json.load(f)
return meta_data
def sample_subset(self, ratio):
sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))
self.img_samples = [self.img_samples[i] for i in sampled_idx]
def __len__(self):
return len(self.img_samples)
def __getattr__(self, name):
if name == "set_epoch":
return lambda epoch: None
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/datasets/utils.py
================================================
ASPECT_RATIO_1024 = {
'0.25': [512., 2048.], '0.26': [512., 1984.], '0.27': [512., 1920.], '0.28': [512., 1856.],
'0.32': [576., 1792.], '0.33': [576., 1728.], '0.35': [576., 1664.], '0.4': [640., 1600.],
'0.42': [640., 1536.], '0.48': [704., 1472.], '0.5': [704., 1408.], '0.52': [704., 1344.],
'0.57': [768., 1344.], '0.6': [768., 1280.], '0.68': [832., 1216.], '0.72': [832., 1152.],
'0.78': [896., 1152.], '0.82': [896., 1088.], '0.88': [960., 1088.], '0.94': [960., 1024.],
'1.0': [1024., 1024.], '1.07': [1024., 960.], '1.13': [1088., 960.], '1.21': [1088., 896.],
'1.29': [1152., 896.], '1.38': [1152., 832.], '1.46': [1216., 832.], '1.67': [1280., 768.],
'1.75': [1344., 768.], '2.0': [1408., 704.], '2.09': [1472., 704.], '2.4': [1536., 640.],
'2.5': [1600., 640.], '2.89': [1664., 576.], '3.0': [1728., 576.], '3.11': [1792., 576.],
'3.62': [1856., 512.], '3.75': [1920., 512.], '3.88': [1984., 512.], '4.0': [2048., 512.],
}
ASPECT_RATIO_512 = {
'0.25': [256.0, 1024.0], '0.26': [256.0, 992.0], '0.27': [256.0, 960.0], '0.28': [256.0, 928.0],
'0.32': [288.0, 896.0], '0.33': [288.0, 864.0], '0.35': [288.0, 832.0], '0.4': [320.0, 800.0],
'0.42': [320.0, 768.0], '0.48': [352.0, 736.0], '0.5': [352.0, 704.0], '0.52': [352.0, 672.0],
'0.57': [384.0, 672.0], '0.6': [384.0, 640.0], '0.68': [416.0, 608.0], '0.72': [416.0, 576.0],
'0.78': [448.0, 576.0], '0.82': [448.0, 544.0], '0.88': [480.0, 544.0], '0.94': [480.0, 512.0],
'1.0': [512.0, 512.0], '1.07': [512.0, 480.0], '1.13': [544.0, 480.0], '1.21': [544.0, 448.0],
'1.29': [576.0, 448.0], '1.38': [576.0, 416.0], '1.46': [608.0, 416.0], '1.67': [640.0, 384.0],
'1.75': [672.0, 384.0], '2.0': [704.0, 352.0], '2.09': [736.0, 352.0], '2.4': [768.0, 320.0],
'2.5': [800.0, 320.0], '2.89': [832.0, 288.0], '3.0': [864.0, 288.0], '3.11': [896.0, 288.0],
'3.62': [928.0, 256.0], '3.75': [960.0, 256.0], '3.88': [992.0, 256.0], '4.0': [1024.0, 256.0]
}
ASPECT_RATIO_256 = {
'0.25': [128.0, 512.0], '0.26': [128.0, 496.0], '0.27': [128.0, 480.0], '0.28': [128.0, 464.0],
'0.32': [144.0, 448.0], '0.33': [144.0, 432.0], '0.35': [144.0, 416.0], '0.4': [160.0, 400.0],
'0.42': [160.0, 384.0], '0.48': [176.0, 368.0], '0.5': [176.0, 352.0], '0.52': [176.0, 336.0],
'0.57': [192.0, 336.0], '0.6': [192.0, 320.0], '0.68': [208.0, 304.0], '0.72': [208.0, 288.0],
'0.78': [224.0, 288.0], '0.82': [224.0, 272.0], '0.88': [240.0, 272.0], '0.94': [240.0, 256.0],
'1.0': [256.0, 256.0], '1.07': [256.0, 240.0], '1.13': [272.0, 240.0], '1.21': [272.0, 224.0],
'1.29': [288.0, 224.0], '1.38': [288.0, 208.0], '1.46': [304.0, 208.0], '1.67': [320.0, 192.0],
'1.75': [336.0, 192.0], '2.0': [352.0, 176.0], '2.09': [368.0, 176.0], '2.4': [384.0, 160.0],
'2.5': [400.0, 160.0], '2.89': [416.0, 144.0], '3.0': [432.0, 144.0], '3.11': [448.0, 144.0],
'3.62': [464.0, 128.0], '3.75': [480.0, 128.0], '3.88': [496.0, 128.0], '4.0': [512.0, 128.0]
}
ASPECT_RATIO_256_TEST = {
'0.25': [128.0, 512.0], '0.28': [128.0, 464.0],
'0.32': [144.0, 448.0], '0.33': [144.0, 432.0], '0.35': [144.0, 416.0], '0.4': [160.0, 400.0],
'0.42': [160.0, 384.0], '0.48': [176.0, 368.0], '0.5': [176.0, 352.0], '0.52': [176.0, 336.0],
'0.57': [192.0, 336.0], '0.6': [192.0, 320.0], '0.68': [208.0, 304.0], '0.72': [208.0, 288.0],
'0.78': [224.0, 288.0], '0.82': [224.0, 272.0], '0.88': [240.0, 272.0], '0.94': [240.0, 256.0],
'1.0': [256.0, 256.0], '1.07': [256.0, 240.0], '1.13': [272.0, 240.0], '1.21': [272.0, 224.0],
'1.29': [288.0, 224.0], '1.38': [288.0, 208.0], '1.46': [304.0, 208.0], '1.67': [320.0, 192.0],
'1.75': [336.0, 192.0], '2.0': [352.0, 176.0], '2.09': [368.0, 176.0], '2.4': [384.0, 160.0],
'2.5': [400.0, 160.0], '3.0': [432.0, 144.0],
'4.0': [512.0, 128.0]
}
ASPECT_RATIO_512_TEST = {
'0.25': [256.0, 1024.0], '0.28': [256.0, 928.0],
'0.32': [288.0, 896.0], '0.33': [288.0, 864.0], '0.35': [288.0, 832.0], '0.4': [320.0, 800.0],
'0.42': [320.0, 768.0], '0.48': [352.0, 736.0], '0.5': [352.0, 704.0], '0.52': [352.0, 672.0],
'0.57': [384.0, 672.0], '0.6': [384.0, 640.0], '0.68': [416.0, 608.0], '0.72': [416.0, 576.0],
'0.78': [448.0, 576.0], '0.82': [448.0, 544.0], '0.88': [480.0, 544.0], '0.94': [480.0, 512.0],
'1.0': [512.0, 512.0], '1.07': [512.0, 480.0], '1.13': [544.0, 480.0], '1.21': [544.0, 448.0],
'1.29': [576.0, 448.0], '1.38': [576.0, 416.0], '1.46': [608.0, 416.0], '1.67': [640.0, 384.0],
'1.75': [672.0, 384.0], '2.0': [704.0, 352.0], '2.09': [736.0, 352.0], '2.4': [768.0, 320.0],
'2.5': [800.0, 320.0], '3.0': [864.0, 288.0],
'4.0': [1024.0, 256.0]
}
ASPECT_RATIO_1024_TEST = {
'0.25': [512., 2048.], '0.28': [512., 1856.],
'0.32': [576., 1792.], '0.33': [576., 1728.], '0.35': [576., 1664.], '0.4': [640., 1600.],
'0.42': [640., 1536.], '0.48': [704., 1472.], '0.5': [704., 1408.], '0.52': [704., 1344.],
'0.57': [768., 1344.], '0.6': [768., 1280.], '0.68': [832., 1216.], '0.72': [832., 1152.],
'0.78': [896., 1152.], '0.82': [896., 1088.], '0.88': [960., 1088.], '0.94': [960., 1024.],
'1.0': [1024., 1024.], '1.07': [1024., 960.], '1.13': [1088., 960.], '1.21': [1088., 896.],
'1.29': [1152., 896.], '1.38': [1152., 832.], '1.46': [1216., 832.], '1.67': [1280., 768.],
'1.75': [1344., 768.], '2.0': [1408., 704.], '2.09': [1472., 704.], '2.4': [1536., 640.],
'2.5': [1600., 640.], '3.0': [1728., 576.],
'4.0': [2048., 512.],
}
def get_chunks(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i + n]
================================================
FILE: PixArt-alpha-ToCa/diffusion/data/transforms.py
================================================
import torchvision.transforms as T
TRANSFORMS = {}
def register_transform(transform):
name = transform.__name__
if name in TRANSFORMS:
raise RuntimeError(f'Transform {name} has already registered.')
TRANSFORMS.update({name: transform})
def get_transform(type, resolution):
transform = TRANSFORMS[type](resolution)
transform = T.Compose(transform)
transform.image_size = resolution
return transform
@register_transform
def default_train(n_px):
return [
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(n_px), # Image.BICUBIC
T.CenterCrop(n_px),
# T.RandomHorizontalFlip(),
T.ToTensor(),
T.Normalize([0.5], [0.5]),
]
================================================
FILE: PixArt-alpha-ToCa/diffusion/dpm_solver.py
================================================
import torch
from .model import gaussian_diffusion as gd
from .model.dpm_solver import model_wrapper, DPM_Solver, NoiseScheduleVP
def DPMS(model, condition, uncondition, cfg_scale, model_type='noise', noise_schedule="linear", guidance_type='classifier-free', model_kwargs=None, diffusion_steps=1000):
if model_kwargs is None:
model_kwargs = {}
betas = torch.tensor(gd.get_named_beta_schedule(noise_schedule, diffusion_steps))
## 1. Define the noise schedule.
noise_schedule = NoiseScheduleVP(schedule='discrete', betas=betas)
## 2. Convert your discrete-time `model` to the continuous-time
## noise prediction model. Here is an example for a diffusion model
## `model` with the noise prediction type ("noise") .
model_fn = model_wrapper(
model,
noise_schedule,
model_type=model_type,
model_kwargs=model_kwargs,
guidance_type=guidance_type,
condition=condition,
unconditional_condition=uncondition,
guidance_scale=cfg_scale,
)
## 3. Define dpm-solver and sample by multistep DPM-Solver.
return DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver++")
================================================
FILE: PixArt-alpha-ToCa/diffusion/iddpm.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
from diffusion.model.respace import SpacedDiffusion, space_timesteps
from .model import gaussian_diffusion as gd
def IDDPM(
timestep_respacing,
noise_schedule="linear",
use_kl=False,
sigma_small=False,
predict_xstart=False,
learn_sigma=True,
pred_sigma=True,
rescale_learned_sigmas=False,
diffusion_steps=1000,
snr=False,
return_startx=False,
):
betas = gd.get_named_beta_schedule(noise_schedule, diffusion_steps)
if use_kl:
loss_type = gd.LossType.RESCALED_KL
elif rescale_learned_sigmas:
loss_type = gd.LossType.RESCALED_MSE
else:
loss_type = gd.LossType.MSE
if timestep_respacing is None or timestep_respacing == "":
timestep_respacing = [diffusion_steps]
return SpacedDiffusion(
use_timesteps=space_timesteps(diffusion_steps, timestep_respacing),
betas=betas,
model_mean_type=(
gd.ModelMeanType.START_X if predict_xstart else gd.ModelMeanType.EPSILON
),
model_var_type=(
(gd.ModelVarType.LEARNED_RANGE if learn_sigma else (
gd.ModelVarType.FIXED_LARGE
if not sigma_small
else gd.ModelVarType.FIXED_SMALL
)
)
if pred_sigma
else None
),
loss_type=loss_type,
snr=snr,
return_startx=return_startx,
# rescale_timesteps=rescale_timesteps,
)
================================================
FILE: PixArt-alpha-ToCa/diffusion/lcm_scheduler.py
================================================
# Copyright 2023 Stanford University Team and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# DISCLAIMER: This code is strongly influenced by https://github.com/pesser/pytorch_diffusion
# and https://github.com/hojonathanho/diffusion
import math
from dataclasses import dataclass
from typing import List, Optional, Tuple, Union
import numpy as np
import torch
from diffusers import ConfigMixin, SchedulerMixin
from diffusers.configuration_utils import register_to_config
from diffusers.utils import BaseOutput
@dataclass
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM
class LCMSchedulerOutput(BaseOutput):
"""
Output class for the scheduler's `step` function output.
Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance.
"""
prev_sample: torch.FloatTensor
denoised: Optional[torch.FloatTensor] = None
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
"""
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1].
Contains a function alpha_bar that takes an argument t and transforms it to the cumulative product of (1-beta) up
to that part of the diffusion process.
Args:
num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
"""
if alpha_transform_type == "cosine":
def alpha_bar_fn(t):
return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = []
for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32)
def rescale_zero_terminal_snr(betas):
"""
Rescales betas to have zero terminal SNR Based on https://arxiv.org/pdf/2305.08891.pdf (Algorithm 1)
Args:
betas (`torch.FloatTensor`):
the betas that the scheduler is being initialized with.
Returns:
`torch.FloatTensor`: rescaled betas with zero terminal SNR
"""
# Convert betas to alphas_bar_sqrt
alphas = 1.0 - betas
alphas_cumprod = torch.cumprod(alphas, dim=0)
alphas_bar_sqrt = alphas_cumprod.sqrt()
# Store old values.
alphas_bar_sqrt_0 = alphas_bar_sqrt[0].clone()
alphas_bar_sqrt_T = alphas_bar_sqrt[-1].clone()
# Shift so the last timestep is zero.
alphas_bar_sqrt -= alphas_bar_sqrt_T
# Scale so the first timestep is back to the old value.
alphas_bar_sqrt *= alphas_bar_sqrt_0 / (alphas_bar_sqrt_0 - alphas_bar_sqrt_T)
# Convert alphas_bar_sqrt to betas
alphas_bar = alphas_bar_sqrt ** 2 # Revert sqrt
alphas = alphas_bar[1:] / alphas_bar[:-1] # Revert cumprod
alphas = torch.cat([alphas_bar[:1], alphas])
betas = 1 - alphas
return betas
class LCMScheduler(SchedulerMixin, ConfigMixin):
"""
`LCMScheduler` extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with
non-Markovian guidance.
This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
methods the library implements for all schedulers such as loading and saving.
Args:
num_train_timesteps (`int`, defaults to 1000):
The number of diffusion steps to train the model.
beta_start (`float`, defaults to 0.0001):
The starting `beta` value of inference.
beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, *optional*):
Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
clip_sample (`bool`, defaults to `True`):
Clip the predicted sample for numerical stability.
clip_sample_range (`float`, defaults to 1.0):
The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
set_alpha_to_one (`bool`, defaults to `True`):
Each diffusion step uses the alphas product value at that step and at the previous one. For the final step
there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`,
otherwise it uses the alpha value at step 0.
steps_offset (`int`, defaults to 0):
An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
Diffusion.
prediction_type (`str`, defaults to `epsilon`, *optional*):
Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
`sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
Video](https://imagen.research.google/video/paper.pdf) paper).
thresholding (`bool`, defaults to `False`):
Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
as Stable Diffusion.
dynamic_thresholding_ratio (`float`, defaults to 0.995):
The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
sample_max_value (`float`, defaults to 1.0):
The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
timestep_spacing (`str`, defaults to `"leading"`):
The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
rescale_betas_zero_snr (`bool`, defaults to `False`):
Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and
dark samples instead of limiting it to samples with medium brightness. Loosely related to
[`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).
"""
# _compatibles = [e.name for e in KarrasDiffusionSchedulers]
order = 1
@register_to_config
def __init__(
self,
num_train_timesteps: int = 1000,
beta_start: float = 0.0001,
beta_end: float = 0.02,
beta_schedule: str = "linear",
trained_betas: Optional[Union[np.ndarray, List[float]]] = None,
clip_sample: bool = True,
set_alpha_to_one: bool = True,
steps_offset: int = 0,
prediction_type: str = "epsilon",
thresholding: bool = False,
dynamic_thresholding_ratio: float = 0.995,
clip_sample_range: float = 1.0,
sample_max_value: float = 1.0,
timestep_spacing: str = "leading",
rescale_betas_zero_snr: bool = False,
):
if trained_betas is not None:
self.betas = torch.tensor(trained_betas, dtype=torch.float32)
elif beta_schedule == "linear":
self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)
elif beta_schedule == "scaled_linear":
# this schedule is very specific to the latent diffusion model.
self.betas = (
torch.linspace(beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps, dtype=torch.float32) ** 2
)
elif beta_schedule == "squaredcos_cap_v2":
# Glide cosine schedule
self.betas = betas_for_alpha_bar(num_train_timesteps)
else:
raise NotImplementedError(f"{beta_schedule} does is not implemented for {self.__class__}")
# Rescale for zero SNR
if rescale_betas_zero_snr:
self.betas = rescale_zero_terminal_snr(self.betas)
self.alphas = 1.0 - self.betas
self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
# At every step in ddim, we are looking into the previous alphas_cumprod
# For the final step, there is no previous alphas_cumprod because we are already at 0
# `set_alpha_to_one` decides whether we set this parameter simply to one or
# whether we use the final alpha of the "non-previous" one.
self.final_alpha_cumprod = torch.tensor(1.0) if set_alpha_to_one else self.alphas_cumprod[0]
# standard deviation of the initial noise distribution
self.init_noise_sigma = 1.0
# setable values
self.num_inference_steps = None
self.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64))
def scale_model_input(self, sample: torch.FloatTensor, timestep: Optional[int] = None) -> torch.FloatTensor:
"""
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep.
Args:
sample (`torch.FloatTensor`):
The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns:
`torch.FloatTensor`:
A scaled input sample.
"""
return sample
def _get_variance(self, timestep, prev_timestep):
alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev
return (beta_prod_t_prev / beta_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sample
def _threshold_sample(self, sample: torch.FloatTensor) -> torch.FloatTensor:
"""
"Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (the
prediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide by
s. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventing
pixels from saturation at each step. We find that dynamic thresholding results in significantly better
photorealism as well as better image-text alignment, especially when using very large guidance weights."
https://arxiv.org/abs/2205.11487
"""
dtype = sample.dtype
batch_size, channels, height, width = sample.shape
if dtype not in (torch.float32, torch.float64):
sample = sample.float() # upcast for quantile calculation, and clamp not implemented for cpu half
# Flatten sample for doing quantile calculation along each image
sample = sample.reshape(batch_size, channels * height * width)
abs_sample = sample.abs() # "a certain percentile absolute pixel value"
s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)
s = torch.clamp(
s, min=1, max=self.config.sample_max_value
) # When clamped to min=1, equivalent to standard clipping to [-1, 1]
s = s.unsqueeze(1) # (batch_size, 1) because clamp will broadcast along dim=0
sample = torch.clamp(sample, -s, s) / s # "we threshold xt0 to the range [-s, s] and then divide by s"
sample = sample.reshape(batch_size, channels, height, width)
sample = sample.to(dtype)
return sample
def set_timesteps(self, num_inference_steps: int, lcm_origin_steps: int, device: Union[str, torch.device] = None):
"""
Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args:
num_inference_steps (`int`):
The number of diffusion steps used when generating samples with a pre-trained model.
"""
if num_inference_steps > self.config.num_train_timesteps:
raise ValueError(
f"`num_inference_steps`: {num_inference_steps} cannot be larger than `self.config.train_timesteps`:"
f" {self.config.num_train_timesteps} as the unet model trained with this scheduler can only handle"
f" maximal {self.config.num_train_timesteps} timesteps."
)
self.num_inference_steps = num_inference_steps
# LCM Timesteps Setting: # Linear Spacing
c = self.config.num_train_timesteps // lcm_origin_steps
lcm_origin_timesteps = np.asarray(list(range(1, lcm_origin_steps + 1))) * c - 1 # LCM Training Steps Schedule
skipping_step = len(lcm_origin_timesteps) // num_inference_steps
timesteps = lcm_origin_timesteps[::-skipping_step][:num_inference_steps] # LCM Inference Steps Schedule
self.timesteps = torch.from_numpy(timesteps.copy()).to(device)
def get_scalings_for_boundary_condition_discrete(self, t):
self.sigma_data = 0.5 # Default: 0.5
# By dividing 0.1: This is almost a delta function at t=0.
c_skip = self.sigma_data ** 2 / ((t / 0.1) ** 2 + self.sigma_data ** 2)
c_out = ((t / 0.1) / ((t / 0.1) ** 2 + self.sigma_data ** 2) ** 0.5)
return c_skip, c_out
def step(
self,
model_output: torch.FloatTensor,
timeindex: int,
timestep: int,
sample: torch.FloatTensor,
eta: float = 0.0,
use_clipped_model_output: bool = False,
generator=None,
variance_noise: Optional[torch.FloatTensor] = None,
return_dict: bool = True,
) -> Union[LCMSchedulerOutput, Tuple]:
"""
Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise).
Args:
model_output (`torch.FloatTensor`):
The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
eta (`float`):
The weight of noise for added noise in diffusion step.
use_clipped_model_output (`bool`, defaults to `False`):
If `True`, computes "corrected" `model_output` from the clipped predicted original sample. Necessary
because predicted original sample is clipped to [-1, 1] when `self.config.clip_sample` is `True`. If no
clipping has happened, "corrected" `model_output` would coincide with the one provided as input and
`use_clipped_model_output` has no effect.
generator (`torch.Generator`, *optional*):
A random number generator.
variance_noise (`torch.FloatTensor`):
Alternative to generating noise with `generator` by directly providing the noise for the variance
itself. Useful for methods such as [`CycleDiffusion`].
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~schedulers.scheduling_lcm.LCMSchedulerOutput`] or `tuple`.
Returns:
[`~schedulers.scheduling_utils.LCMSchedulerOutput`] or `tuple`:
If return_dict is `True`, [`~schedulers.scheduling_lcm.LCMSchedulerOutput`] is returned, otherwise a
tuple is returned where the first element is the sample tensor.
"""
if self.num_inference_steps is None:
raise ValueError(
"Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler"
)
# 1. get previous step value
prev_timeindex = timeindex + 1
if prev_timeindex < len(self.timesteps):
prev_timestep = self.timesteps[prev_timeindex]
else:
prev_timestep = timestep
# 2. compute alphas, betas
alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev
# 3. Get scalings for boundary conditions
c_skip, c_out = self.get_scalings_for_boundary_condition_discrete(timestep)
# 4. Different Parameterization:
parameterization = self.config.prediction_type
if parameterization == "epsilon": # noise-prediction
pred_x0 = (sample - beta_prod_t.sqrt() * model_output) / alpha_prod_t.sqrt()
elif parameterization == "sample": # x-prediction
pred_x0 = model_output
elif parameterization == "v_prediction": # v-prediction
pred_x0 = alpha_prod_t.sqrt() * sample - beta_prod_t.sqrt() * model_output
# 4. Denoise model output using boundary conditions
denoised = c_out * pred_x0 + c_skip * sample
# 5. Sample z ~ N(0, I), For MultiStep Inference
# Noise is not used for one-step sampling.
if len(self.timesteps) > 1:
noise = torch.randn(model_output.shape).to(model_output.device)
prev_sample = alpha_prod_t_prev.sqrt() * denoised + beta_prod_t_prev.sqrt() * noise
else:
prev_sample = denoised
if not return_dict:
return (prev_sample, denoised)
return LCMSchedulerOutput(prev_sample=prev_sample, denoised=denoised)
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noise
def add_noise(
self,
original_samples: torch.FloatTensor,
noise: torch.FloatTensor,
timesteps: torch.IntTensor,
) -> torch.FloatTensor:
# Make sure alphas_cumprod and timestep have same device and dtype as original_samples
alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device, dtype=original_samples.dtype)
timesteps = timesteps.to(original_samples.device)
sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
sqrt_alpha_prod = sqrt_alpha_prod.flatten()
while len(sqrt_alpha_prod.shape) < len(original_samples.shape):
sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.get_velocity
def get_velocity(
self, sample: torch.FloatTensor, noise: torch.FloatTensor, timesteps: torch.IntTensor
) -> torch.FloatTensor:
# Make sure alphas_cumprod and timestep have same device and dtype as sample
alphas_cumprod = self.alphas_cumprod.to(device=sample.device, dtype=sample.dtype)
timesteps = timesteps.to(sample.device)
sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
sqrt_alpha_prod = sqrt_alpha_prod.flatten()
while len(sqrt_alpha_prod.shape) < len(sample.shape):
sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
return sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
def __len__(self):
return self.config.num_train_timesteps
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/__init__.py
================================================
from .nets import *
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/builder.py
================================================
from mmcv import Registry
from diffusion.model.utils import set_grad_checkpoint
MODELS = Registry('models')
def build_model(cfg, use_grad_checkpoint=False, use_fp32_attention=False, gc_step=1, **kwargs):
if isinstance(cfg, str):
cfg = dict(type=cfg)
model = MODELS.build(cfg, default_args=kwargs)
if use_grad_checkpoint:
set_grad_checkpoint(model, use_fp32_attention=use_fp32_attention, gc_step=gc_step)
return model
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/__init__.py
================================================
from .cache_cutfresh import cache_cutfresh
from .fresh_ratio_scheduler import fresh_ratio_scheduler
from .score_evaluate import score_evaluate
from .global_force_fresh import global_force_fresh
from .cache_cutfresh import cache_cutfresh
from .update_cache import update_cache
from .force_init import force_init
from .attention import cached_attention_forward
from .cache_init import cache_init
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/attention.py
================================================
# Besides, re-arrange the attention module
from torch.jit import Final
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Union
from xformers.ops.fmha.attn_bias import BlockDiagonalMask
def cached_attention_forward(
query: torch.Tensor,
key: torch.Tensor,
value: torch.Tensor,
attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,
p: float = 0.0,
scale: Optional[float] = None
) -> torch.Tensor:
scale = 1.0 / query.shape[-1] ** 0.5
query = query * scale
query = query.transpose(1, 2)
key = key.transpose(1, 2)
value = value.transpose(1, 2)
attn = query @ key.transpose(-2, -1)
if attn_bias is not None:
attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)
attn = attn + attn_bias
#out_map = attn
attn_map = attn.softmax(-1)
attn = F.dropout(attn_map, p)
attn = attn @ value
return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/cache_cutfresh.py
================================================
from .fresh_ratio_scheduler import fresh_ratio_scheduler
from .score_evaluate import score_evaluate
#from .token_merge import token_merge
import torch
def cache_cutfresh(cache_dic, tokens, current):
'''
Cut fresh tokens from the input tokens and update the cache counter.
cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.
tokens: torch.Tensor, the input tokens to be cut.
current: dict, the current step, layer, and module information. Particularly convenient for debugging.
'''
step = current['step']
layer = current['layer']
module = current['module']
fresh_ratio = fresh_ratio_scheduler(cache_dic, current)
fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1)
# Generate the index tensor for fresh tokens
score = score_evaluate(cache_dic, tokens, current) # s1, s2, s3 mentioned in the paper
score = local_selection_with_bonus(score, 0.4, 4) # Uniform Spatial Distribution s4 mentioned in the paper
indices = score.argsort(dim=-1, descending=True)
topk = int(fresh_ratio * score.shape[1])
fresh_indices = indices[:, :topk]
stale_indices = indices[:, topk:]
# (B, fresh_ratio *N)
# Updating the Cache Frequency Score s3 mentioned in the paper
# stale tokens index + 1 in each ***module***, fresh tokens index = 0
cache_dic['cache_index'][-1][layer][module] += 1
cache_dic['cache_index'][-1][layer][module].scatter_(dim=1, index=fresh_indices,
src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))
cache_dic['cache_index']['layer_index'][module] += 1
cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices,
src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))
fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])
if module in ['mlp', 'attn', 'cross-attn']:
fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)
return fresh_indices, fresh_tokens
else:
raise ValueError("Unrecognized module?", module)
def local_selection_with_bonus(score, bonus_ratio, grid_size=2):
batch_size, num_tokens = score.shape
image_size = int(num_tokens ** 0.5)
block_size = grid_size * grid_size
assert num_tokens % block_size == 0, "The number of tokens must be divisible by the block size."
# Step 1: Reshape score to group it by blocks
score_reshaped = score.view(batch_size, image_size // grid_size, grid_size, image_size // grid_size, grid_size)
score_reshaped = score_reshaped.permute(0, 1, 3, 2, 4).contiguous()
score_reshaped = score_reshaped.view(batch_size, -1, block_size) # [batch_size, num_blocks, block_size]
# Step 2: Find the max token in each block
max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True) # [batch_size, num_blocks, 1]
# Step 3: Create a mask to identify max score tokens
mask = torch.zeros_like(score_reshaped)
mask.scatter_(-1, max_indices, 1) # Set mask to 1 at the max indices
# Step 4: Apply the bonus only to the max score tokens
score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio) # Apply bonus only to max tokens
# Step 5: Reshape the score back to its original shape
score_modified = score_reshaped.view(batch_size, image_size // grid_size, image_size // grid_size, grid_size, grid_size)
score_modified = score_modified.permute(0, 1, 3, 2, 4).contiguous()
score_modified = score_modified.view(batch_size, num_tokens)
return score_modified
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/cache_init.py
================================================
def cache_init(model_kwargs, num_steps):
'''
Initialization for cache.
'''
cache_dic = {}
cache = {}
cache_index = {}
cache[-1]={}
cache_index[-1]={}
cache_index['layer_index']={}
cache_dic['attn_map'] = {}
cache_dic['attn_map'][-1] = {}
cache_dic['cross_attn_map'] = {}
cache_dic['cross_attn_map'][-1] = {}
for j in range(28):
cache[-1][j] = {}
cache_index[-1][j] = {}
cache_dic['attn_map'][-1][j] = {}
cache_dic['cross_attn_map'][-1][j] = {}
cache_dic['cache_type'] = model_kwargs['cache_type']
cache_dic['cache_index'] = cache_index
cache_dic['cache'] = cache
cache_dic['fresh_ratio_schedule'] = model_kwargs['ratio_scheduler']
cache_dic['fresh_ratio'] = model_kwargs['fresh_ratio']
cache_dic['fresh_threshold'] = model_kwargs['fresh_threshold']
cache_dic['force_fresh'] = model_kwargs['force_fresh']
cache_dic['soft_fresh_weight'] = model_kwargs['soft_fresh_weight']
#cache_dic['merge_weight'] = merge_weight
current = {}
current['num_steps'] = num_steps
return cache_dic, current
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/force_init.py
================================================
import torch
from .force_scheduler import force_scheduler
def force_init(cache_dic, current, tokens):
'''
Initialization for Force Activation step.
'''
cache_dic['cache_index'][-1][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)
force_scheduler(cache_dic, current)
if current['layer'] == 0:
cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/force_scheduler.py
================================================
import torch
def force_scheduler(cache_dic, current):
if cache_dic['fresh_ratio'] == 0:
# FORA
linear_step_weight = 0.0
else:
# TokenCache
linear_step_weight = 0.2
step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])
threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)
# no force constrain for sensitive steps, cause the performance is good enough.
# you may have a try.
cache_dic['cal_threshold'] = threshold
#return threshold
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/fresh_ratio_scheduler.py
================================================
import torch
def fresh_ratio_scheduler(cache_dic, current):
'''
Return the fresh ratio for the current step.
'''
fresh_ratio = cache_dic['fresh_ratio']
fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']
step = current['step']
num_steps = current['num_steps']
threshold = cache_dic['fresh_threshold']
weight = 0.9
if fresh_ratio_schedule == 'constant':
return fresh_ratio
elif fresh_ratio_schedule == 'linear':
return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)
elif fresh_ratio_schedule == 'exp':
#return 0.5 * (0.052 ** (step/num_steps))
return fresh_ratio * (weight ** (step / num_steps))
elif fresh_ratio_schedule == 'linear-mode':
mode = (step % threshold)/threshold - 0.5
mode_weight = 0.1
return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)
elif fresh_ratio_schedule == 'layerwise':
return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)
elif fresh_ratio_schedule == 'linear-layerwise':
step_weight = -0.9 #0.9
step_factor = 1 - step_weight + 2 * step_weight * step / num_steps
#if current['layer'] == 2:
# return 1.0
#sigmoid
#sigmoid_weight = 0.13
#layer_factor = 2 * torch.sigmoid(torch.tensor([sigmoid_weight * (13.5 - current['layer'])]))
layer_weight = 0.6
layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27
module_weight = 1.0 #TokenCache N=8 2.5 N=6 2.5 #N=4 2.1
module_time_weight = 0.6
module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)
return fresh_ratio * layer_factor * step_factor * module_factor
elif fresh_ratio_schedule == 'ToCa':
step_weight = -0.9 #0.9
step_factor = 1 - step_weight + 2 * step_weight * step / num_steps
layer_weight = 0.6
layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27
module_weight = 1.0
module_time_weight = 0.6
# this means 60*x% cross-attn computation, and 160*x% mlp computation. This is designed for cross-attn has best temporal redundancy, and mlp has worse.
# so cross-attn compute less and mlp compute more.
module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)
return fresh_ratio * layer_factor * step_factor * module_factor
else:
raise ValueError("unrecognized fresh ratio schedule", fresh_ratio_schedule)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/global_force_fresh.py
================================================
from .force_scheduler import force_scheduler
def global_force_fresh(cache_dic, current):
'''
Return whether to force fresh tokens globally.
'''
first_step = (current['step'] == 0)
force_fresh = cache_dic['force_fresh']
if not first_step:
fresh_threshold = cache_dic['cal_threshold']
else:
fresh_threshold = cache_dic['fresh_threshold']
if force_fresh == 'global':
return (first_step or (current['step']% fresh_threshold == 0))
elif force_fresh == 'local':
return first_step
elif force_fresh == 'none':
return first_step
else:
raise ValueError("unrecognized force fresh strategy", force_fresh)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/score_evaluate.py
================================================
import torch
import torch.nn as nn
from .scores import attn_score, similarity_score, norm_score
def score_evaluate(cache_dic, tokens, current) -> torch.Tensor:
'''
Return the score tensor (B, N) for the given tokens.
'''
#if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):
# # abandoned branch, if you want to explore the local force fresh strategy, this may help.
# force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module
# force_len = force_fresh_mask.sum(dim=1)
# force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]
# force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]
# Just see more explanation in the version of DiT-ToCa if needed.
if cache_dic['cache_type'] == 'random':
score = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1], device=tokens.device)
score = torch.cat([score, score], dim=0).to(tokens.device)
elif cache_dic['cache_type'] == 'straight':
score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)
elif cache_dic['cache_type'] == 'attention':
# cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed
score = attn_score(cache_dic, current)
#score = score + 0.0 * torch.rand_like(score, device= score.device)
elif cache_dic['cache_type'] == 'similarity':
score = similarity_score(cache_dic, current, tokens)
elif cache_dic['cache_type'] == 'norm':
score = norm_score(cache_dic, current, tokens)
elif cache_dic['cache_type'] == 'compress':
score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])
score1 = torch.cat([score1, score1], dim=0).to(tokens.device)
score2 = cache_dic['attn_map'][-1][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)
# normalize
score2 = score2 / score2.max(dim=1, keepdim=True)[0]
score = 0.5 * score1 + 0.5 * score2
# abandoned the branch, if you want to explore the local force fresh strategy, this may help.
#if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed
# #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)
# score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32,
# device=force_indices.device))
if (True and (cache_dic['force_fresh'] == 'global')):
soft_step_score = cache_dic['cache_index'][-1][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])
soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)
score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score
return score.to(tokens.device)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/scores.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
def attn_score(cache_dic, current):
#self_attn_score = 1- cache_dic['attn_map'][-1][current['layer']].diagonal(dim1=1, dim2=2)
#self_attn_score = F.normalize(self_attn_score, dim=1, p=2)
#attention_score = F.normalize(cache_dic['attn_map'][-1][current['layer']].sum(dim=1), dim=1, p=2)
#cross_attn_map = F.threshold(cache_dic['cross_attn_map'][-1][current['layer']],threshold=0.0, value=0.0)
#cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)
# Note: It is important to give a same selection method for cfg and no cfg.
# Because the influence of **Cross-Attention** in text-contidional models makes cfg and no cfg a BIG difference.
# Same selection for cfg and no cfg
cond_cmap, uncond_cmap = torch.split(cache_dic['cross_attn_map'][-1][current['layer']], len(cache_dic['cross_attn_map'][-1][current['layer']]) // 2, dim=0)
cond_weight = 0.5
cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap
# Entropy score
cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)
cross_attention_score = F.normalize(1 + cross_attention_entropy, dim=1, p=2) # Note here "1" does not influence the sorted sequence, but provie stability.
score = cross_attention_score.repeat(2, 1)
# In PixArt, the cross_attention_score (s2) is used as the score, for a better text-image alignment.
# You can try conbining the self_attention_score (s1) and cross_attention_score (s2) as the final score, there exists a balance.
#cross_weight = 0.0
#score = (1-cross_weight) * attention_score + cross_weight * cross_attention_score
return score
def similarity_score(cache_dic, current, tokens):
cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][-1][current['layer']][current['module']], dim=-1)
return F.normalize(1- cosine_sim, dim=-1, p=2)
def norm_score(cache_dic, current, tokens):
norm = tokens.norm(dim=-1, p=2)
return F.normalize(norm, dim=-1, p=2)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/token_merge.py
================================================
import torch
def token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):
'''
An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.
'''
if (current['layer'] % 1 == 0):
fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))
stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))
method = 'similarity'
if method == 'distance':
descending = False
distance = torch.cdist(stale_tokens, fresh_tokens, p=1)
stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)
elif method == 'similarity':
descending = True
fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)
stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)
similarity = stale_tokens @ fresh_tokens.transpose(1, 2)
stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)
saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())
merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]
stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)
merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)
merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)
cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices
cache_dic['merged_stale_sequence'] = merged_stale_sequence
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/cache_functions/update_cache.py
================================================
import torch
def update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):
'''
Update the cache with the fresh tokens.
'''
step = current['step']
layer = current['layer']
module = current['module']
# Update the cached tokens at the positions
if module == 'attn':
# this branch is not used in the final version, but if you explore the partial fresh strategy of attention, it works (probably a few bugs).
indices = fresh_indices#.sort(dim=1, descending=False)[0]
cache_dic['attn_map'][-1][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)
elif module == 'cross-attn':
indices = fresh_indices#.sort(dim=1, descending=False)[0]
cache_dic['cross_attn_map'][-1][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)
elif module == 'mlp':
indices = fresh_indices
cache_dic['cache'][-1][layer][module].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/diffusion_utils.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
import numpy as np
import torch as th
def normal_kl(mean1, logvar1, mean2, logvar2):
"""
Compute the KL divergence between two gaussians.
Shapes are automatically broadcasted, so batches can be compared to
scalars, among other use cases.
"""
tensor = next(
(
obj
for obj in (mean1, logvar1, mean2, logvar2)
if isinstance(obj, th.Tensor)
),
None,
)
assert tensor is not None, "at least one argument must be a Tensor"
# Force variances to be Tensors. Broadcasting helps convert scalars to
# Tensors, but it does not work for th.exp().
logvar1, logvar2 = [
x if isinstance(x, th.Tensor) else th.tensor(x, device=tensor.device)
for x in (logvar1, logvar2)
]
return 0.5 * (
-1.0
+ logvar2
- logvar1
+ th.exp(logvar1 - logvar2)
+ ((mean1 - mean2) ** 2) * th.exp(-logvar2)
)
def approx_standard_normal_cdf(x):
"""
A fast approximation of the cumulative distribution function of the
standard normal.
"""
return 0.5 * (1.0 + th.tanh(np.sqrt(2.0 / np.pi) * (x + 0.044715 * th.pow(x, 3))))
def continuous_gaussian_log_likelihood(x, *, means, log_scales):
"""
Compute the log-likelihood of a continuous Gaussian distribution.
:param x: the targets
:param means: the Gaussian mean Tensor.
:param log_scales: the Gaussian log stddev Tensor.
:return: a tensor like x of log probabilities (in nats).
"""
centered_x = x - means
inv_stdv = th.exp(-log_scales)
normalized_x = centered_x * inv_stdv
return th.distributions.Normal(th.zeros_like(x), th.ones_like(x)).log_prob(
normalized_x
)
def discretized_gaussian_log_likelihood(x, *, means, log_scales):
"""
Compute the log-likelihood of a Gaussian distribution discretizing to a
given image.
:param x: the target images. It is assumed that this was uint8 values,
rescaled to the range [-1, 1].
:param means: the Gaussian mean Tensor.
:param log_scales: the Gaussian log stddev Tensor.
:return: a tensor like x of log probabilities (in nats).
"""
assert x.shape == means.shape == log_scales.shape
centered_x = x - means
inv_stdv = th.exp(-log_scales)
plus_in = inv_stdv * (centered_x + 1.0 / 255.0)
cdf_plus = approx_standard_normal_cdf(plus_in)
min_in = inv_stdv * (centered_x - 1.0 / 255.0)
cdf_min = approx_standard_normal_cdf(min_in)
log_cdf_plus = th.log(cdf_plus.clamp(min=1e-12))
log_one_minus_cdf_min = th.log((1.0 - cdf_min).clamp(min=1e-12))
cdf_delta = cdf_plus - cdf_min
log_probs = th.where(
x < -0.999,
log_cdf_plus,
th.where(x > 0.999, log_one_minus_cdf_min, th.log(cdf_delta.clamp(min=1e-12))),
)
assert log_probs.shape == x.shape
return log_probs
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/dpm_solver.py
================================================
import torch
from tqdm import tqdm
from ..model.cache_functions import cache_init
class NoiseScheduleVP:
def __init__(
self,
schedule='discrete',
betas=None,
alphas_cumprod=None,
continuous_beta_0=0.1,
continuous_beta_1=20.,
dtype=torch.float32,
):
"""Create a wrapper class for the forward SDE (VP type).
***
Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t.
We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.
***
The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ).
We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper).
Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have:
log_alpha_t = self.marginal_log_mean_coeff(t)
sigma_t = self.marginal_std(t)
lambda_t = self.marginal_lambda(t)
Moreover, as lambda(t) is an invertible function, we also support its inverse function:
t = self.inverse_lambda(lambda_t)
===============================================================
We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).
1. For discrete-time DPMs:
For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by:
t_i = (i + 1) / N
e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1.
We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.
Args:
betas: A `torch.Tensor`. The beta array for the discrete-time DPM. (See the original DDPM paper for details)
alphas_cumprod: A `torch.Tensor`. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)
Note that we always have alphas_cumprod = cumprod(1 - betas). Therefore, we only need to set one of `betas` and `alphas_cumprod`.
**Important**: Please pay special attention for the args for `alphas_cumprod`:
The `alphas_cumprod` is the \hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that
q_{t_n | 0}(x_{t_n} | x_0) = N ( \sqrt{\hat{alpha_n}} * x_0, (1 - \hat{alpha_n}) * I ).
Therefore, the notation \hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have
alpha_{t_n} = \sqrt{\hat{alpha_n}},
and
log(alpha_{t_n}) = 0.5 * log(\hat{alpha_n}).
2. For continuous-time DPMs:
We support the linear VPSDE for the continuous time setting. The hyperparameters for the noise
schedule are the default settings in Yang Song's ScoreSDE:
Args:
beta_min: A `float` number. The smallest beta for the linear schedule.
beta_max: A `float` number. The largest beta for the linear schedule.
T: A `float` number. The ending time of the forward process.
===============================================================
Args:
schedule: A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs,
'linear' for continuous-time DPMs.
Returns:
A wrapper object of the forward SDE (VP type).
===============================================================
Example:
# For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):
>>> ns = NoiseScheduleVP('discrete', betas=betas)
# For discrete-time DPMs, given alphas_cumprod (the \hat{alpha_n} array for n = 0, 1, ..., N - 1):
>>> ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)
# For continuous-time DPMs (VPSDE), linear schedule:
>>> ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)
"""
if schedule not in ['discrete', 'linear']:
raise ValueError(
f"Unsupported noise schedule {schedule}. The schedule needs to be 'discrete' or 'linear'"
)
self.schedule = schedule
if schedule == 'discrete':
if betas is not None:
log_alphas = 0.5 * torch.log(1 - betas).cumsum(dim=0)
else:
assert alphas_cumprod is not None
log_alphas = 0.5 * torch.log(alphas_cumprod)
self.T = 1.
self.log_alpha_array = self.numerical_clip_alpha(log_alphas).reshape((1, -1,)).to(dtype=dtype)
self.total_N = self.log_alpha_array.shape[1]
self.t_array = torch.linspace(0., 1., self.total_N + 1)[1:].reshape((1, -1)).to(dtype=dtype)
else:
self.T = 1.
self.total_N = 1000
self.beta_0 = continuous_beta_0
self.beta_1 = continuous_beta_1
def numerical_clip_alpha(self, log_alphas, clipped_lambda=-5.1):
"""
For some beta schedules such as cosine schedule, the log-SNR has numerical isssues.
We clip the log-SNR near t=T within -5.1 to ensure the stability.
Such a trick is very useful for diffusion models with the cosine schedule, such as i-DDPM, guided-diffusion and GLIDE.
"""
log_sigmas = 0.5 * torch.log(1. - torch.exp(2. * log_alphas))
lambs = log_alphas - log_sigmas
idx = torch.searchsorted(torch.flip(lambs, [0]), clipped_lambda)
if idx > 0:
log_alphas = log_alphas[:-idx]
return log_alphas
def marginal_log_mean_coeff(self, t):
"""
Compute log(alpha_t) of a given continuous-time label t in [0, T].
"""
if self.schedule == 'discrete':
return interpolate_fn(t.reshape((-1, 1)), self.t_array.to(t.device),
self.log_alpha_array.to(t.device)).reshape((-1))
elif self.schedule == 'linear':
return -0.25 * t ** 2 * (self.beta_1 - self.beta_0) - 0.5 * t * self.beta_0
def marginal_alpha(self, t):
"""
Compute alpha_t of a given continuous-time label t in [0, T].
"""
return torch.exp(self.marginal_log_mean_coeff(t))
def marginal_std(self, t):
"""
Compute sigma_t of a given continuous-time label t in [0, T].
"""
return torch.sqrt(1. - torch.exp(2. * self.marginal_log_mean_coeff(t)))
def marginal_lambda(self, t):
"""
Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].
"""
log_mean_coeff = self.marginal_log_mean_coeff(t)
log_std = 0.5 * torch.log(1. - torch.exp(2. * log_mean_coeff))
return log_mean_coeff - log_std
def inverse_lambda(self, lamb):
"""
Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.
"""
if self.schedule == 'linear':
tmp = 2. * (self.beta_1 - self.beta_0) * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))
Delta = self.beta_0 ** 2 + tmp
return tmp / (torch.sqrt(Delta) + self.beta_0) / (self.beta_1 - self.beta_0)
elif self.schedule == 'discrete':
log_alpha = -0.5 * torch.logaddexp(torch.zeros((1,)).to(lamb.device), -2. * lamb)
t = interpolate_fn(log_alpha.reshape((-1, 1)), torch.flip(self.log_alpha_array.to(lamb.device), [1]),
torch.flip(self.t_array.to(lamb.device), [1]))
return t.reshape((-1,))
def model_wrapper(
model,
noise_schedule,
model_type="noise",
model_kwargs={},
guidance_type="uncond",
condition=None,
unconditional_condition=None,
guidance_scale=1.,
classifier_fn=None,
classifier_kwargs={},
):
"""Create a wrapper function for the noise prediction model.
DPM-Solver needs to solve the continuous-time diffusion ODEs. For DPMs trained on discrete-time labels, we need to
firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.
We support four types of the diffusion model by setting `model_type`:
1. "noise": noise prediction model. (Trained by predicting noise).
2. "x_start": data prediction model. (Trained by predicting the data x_0 at time 0).
3. "v": velocity prediction model. (Trained by predicting the velocity).
The "v" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].
[1] Salimans, Tim, and Jonathan Ho. "Progressive distillation for fast sampling of diffusion models."
arXiv preprint arXiv:2202.00512 (2022).
[2] Ho, Jonathan, et al. "Imagen Video: High Definition Video Generation with Diffusion Models."
arXiv preprint arXiv:2210.02303 (2022).
4. "score": marginal score function. (Trained by denoising score matching).
Note that the score function and the noise prediction model follows a simple relationship:
```
noise(x_t, t) = -sigma_t * score(x_t, t)
```
We support three types of guided sampling by DPMs by setting `guidance_type`:
1. "uncond": unconditional sampling by DPMs.
The input `model` has the following format:
``
model(x, t_input, **model_kwargs) -> noise | x_start | v | score
``
2. "classifier": classifier guidance sampling [3] by DPMs and another classifier.
The input `model` has the following format:
``
model(x, t_input, **model_kwargs) -> noise | x_start | v | score
``
The input `classifier_fn` has the following format:
``
classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)
``
[3] P. Dhariwal and A. Q. Nichol, "Diffusion models beat GANs on image synthesis,"
in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.
3. "classifier-free": classifier-free guidance sampling by conditional DPMs.
The input `model` has the following format:
``
model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score
``
And if cond == `unconditional_condition`, the model output is the unconditional DPM output.
[4] Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance."
arXiv preprint arXiv:2207.12598 (2022).
The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)
or continuous-time labels (i.e. epsilon to T).
We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:
``
def model_fn(x, t_continuous) -> noise:
t_input = get_model_input_time(t_continuous)
return noise_pred(model, x, t_input, **model_kwargs)
``
where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for DPM-Solver.
===============================================================
Args:
model: A diffusion model with the corresponding format described above.
noise_schedule: A noise schedule object, such as NoiseScheduleVP.
model_type: A `str`. The parameterization type of the diffusion model.
"noise" or "x_start" or "v" or "score".
model_kwargs: A `dict`. A dict for the other inputs of the model function.
guidance_type: A `str`. The type of the guidance for sampling.
"uncond" or "classifier" or "classifier-free".
condition: A pytorch tensor. The condition for the guided sampling.
Only used for "classifier" or "classifier-free" guidance type.
unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.
Only used for "classifier-free" guidance type.
guidance_scale: A `float`. The scale for the guided sampling.
classifier_fn: A classifier function. Only used for the classifier guidance.
classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.
Returns:
A noise prediction model that accepts the noised data and the continuous time as the inputs.
"""
def get_model_input_time(t_continuous):
"""
Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.
For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].
For continuous-time DPMs, we just use `t_continuous`.
"""
if noise_schedule.schedule == 'discrete':
return (t_continuous - 1. / noise_schedule.total_N) * 1000.
else:
return t_continuous
def noise_pred_fn(x, t_continuous, current, cache_dic, cond=None):
t_input = get_model_input_time(t_continuous)
if cond is None:
output = model(x, t_input, current, cache_dic, **model_kwargs)
else:
output = model(x, t_input, current, cache_dic, cond, **model_kwargs)
if model_type == "noise":
return output
elif model_type == "x_start":
alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
return (x - expand_dims(alpha_t, x.dim()) * output) / expand_dims(sigma_t, x.dim())
elif model_type == "v":
alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
return expand_dims(alpha_t, x.dim()) * output + expand_dims(sigma_t, x.dim()) * x
elif model_type == "score":
sigma_t = noise_schedule.marginal_std(t_continuous)
return -expand_dims(sigma_t, x.dim()) * output
def cond_grad_fn(x, t_input):
"""
Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).
"""
with torch.enable_grad():
x_in = x.detach().requires_grad_(True)
log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)
return torch.autograd.grad(log_prob.sum(), x_in)[0]
def model_fn(x, t_continuous, current, cache_dic):
"""
The noise predicition model function that is used for DPM-Solver.
"""
if guidance_type == "uncond":
return noise_pred_fn(x, t_continuous)
elif guidance_type == "classifier":
assert classifier_fn is not None
t_input = get_model_input_time(t_continuous)
cond_grad = cond_grad_fn(x, t_input)
sigma_t = noise_schedule.marginal_std(t_continuous)
noise = noise_pred_fn(x, t_continuous)
return noise - guidance_scale * expand_dims(sigma_t, x.dim()) * cond_grad
elif guidance_type == "classifier-free":
if guidance_scale == 1. or unconditional_condition is None:
return noise_pred_fn(x, t_continuous, cond=condition)
x_in = torch.cat([x] * 2)
t_in = torch.cat([t_continuous] * 2)
c_in = torch.cat([unconditional_condition, condition])
noise_uncond, noise = noise_pred_fn(x_in, t_in, current, cache_dic, cond=c_in).chunk(2)
return noise_uncond + guidance_scale * (noise - noise_uncond)
assert model_type in ["noise", "x_start", "v", "score"]
assert guidance_type in ["uncond", "classifier", "classifier-free"]
return model_fn
class DPM_Solver:
def __init__(
self,
model_fn,
noise_schedule,
algorithm_type="dpmsolver++",
correcting_x0_fn=None,
correcting_xt_fn=None,
thresholding_max_val=1.,
dynamic_thresholding_ratio=0.995,
):
"""Construct a DPM-Solver.
We support both DPM-Solver (`algorithm_type="dpmsolver"`) and DPM-Solver++ (`algorithm_type="dpmsolver++"`).
We also support the "dynamic thresholding" method in Imagen[1]. For pixel-space diffusion models, you
can set both `algorithm_type="dpmsolver++"` and `correcting_x0_fn="dynamic_thresholding"` to use the
dynamic thresholding. The "dynamic thresholding" can greatly improve the sample quality for pixel-space
DPMs with large guidance scales. Note that the thresholding method is **unsuitable** for latent-space
DPMs (such as stable-diffusion).
To support advanced algorithms in image-to-image applications, we also support corrector functions for
both x0 and xt.
Args:
model_fn: A noise prediction model function which accepts the continuous-time input (t in [epsilon, T]):
``
def model_fn(x, t_continuous):
return noise
``
The shape of `x` is `(batch_size, **shape)`, and the shape of `t_continuous` is `(batch_size,)`.
noise_schedule: A noise schedule object, such as NoiseScheduleVP.
algorithm_type: A `str`. Either "dpmsolver" or "dpmsolver++".
correcting_x0_fn: A `str` or a function with the following format:
```
def correcting_x0_fn(x0, t):
x0_new = ...
return x0_new
```
This function is to correct the outputs of the data prediction model at each sampling step. e.g.,
```
x0_pred = data_pred_model(xt, t)
if correcting_x0_fn is not None:
x0_pred = correcting_x0_fn(x0_pred, t)
xt_1 = update(x0_pred, xt, t)
```
If `correcting_x0_fn="dynamic_thresholding"`, we use the dynamic thresholding proposed in Imagen[1].
correcting_xt_fn: A function with the following format:
```
def correcting_xt_fn(xt, t, step):
x_new = ...
return x_new
```
This function is to correct the intermediate samples xt at each sampling step. e.g.,
```
xt = ...
xt = correcting_xt_fn(xt, t, step)
```
thresholding_max_val: A `float`. The max value for thresholding.
Valid only when use `dpmsolver++` and `correcting_x0_fn="dynamic_thresholding"`.
dynamic_thresholding_ratio: A `float`. The ratio for dynamic thresholding (see Imagen[1] for details).
Valid only when use `dpmsolver++` and `correcting_x0_fn="dynamic_thresholding"`.
[1] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour,
Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models
with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.
"""
self.model = lambda x, t, current, cache_dic: model_fn(x, t.expand((x.shape[0])), current, cache_dic)
self.noise_schedule = noise_schedule
assert algorithm_type in ["dpmsolver", "dpmsolver++"]
self.algorithm_type = algorithm_type
if correcting_x0_fn == "dynamic_thresholding":
self.correcting_x0_fn = self.dynamic_thresholding_fn
else:
self.correcting_x0_fn = correcting_x0_fn
self.correcting_xt_fn = correcting_xt_fn
self.dynamic_thresholding_ratio = dynamic_thresholding_ratio
self.thresholding_max_val = thresholding_max_val
def dynamic_thresholding_fn(self, x0, t):
"""
The dynamic thresholding method.
"""
dims = x0.dim()
p = self.dynamic_thresholding_ratio
s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)
s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)
x0 = torch.clamp(x0, -s, s) / s
return x0
def noise_prediction_fn(self, x, t, current, cache_dic):
"""
Return the noise prediction model.
"""
return self.model(x, t, current, cache_dic)
def data_prediction_fn(self, x, t, current, cache_dic):
"""
Return the data prediction model (with corrector).
"""
noise = self.noise_prediction_fn(x, t, current, cache_dic)
alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)
x0 = (x - sigma_t * noise) / alpha_t
if self.correcting_x0_fn is not None:
x0 = self.correcting_x0_fn(x0, t)
return x0
def model_fn(self, x, t, current, cache_dic):
"""
Convert the model to the noise prediction model or the data prediction model.
"""
if self.algorithm_type == "dpmsolver++":
return self.data_prediction_fn(x, t, current, cache_dic)
else:
return self.noise_prediction_fn(x, t, current, cache_dic)
def get_time_steps(self, skip_type, t_T, t_0, N, device):
"""Compute the intermediate time steps for sampling.
Args:
skip_type: A `str`. The type for the spacing of the time steps. We support three types:
- 'logSNR': uniform logSNR for the time steps.
- 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)
- 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)
t_T: A `float`. The starting time of the sampling (default is T).
t_0: A `float`. The ending time of the sampling (default is epsilon).
N: A `int`. The total number of the spacing of the time steps.
device: A torch device.
Returns:
A pytorch tensor of the time steps, with the shape (N + 1,).
"""
if skip_type == 'logSNR':
lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))
lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))
logSNR_steps = torch.linspace(lambda_T.cpu().item(), lambda_0.cpu().item(), N + 1).to(device)
return self.noise_schedule.inverse_lambda(logSNR_steps)
elif skip_type == 'time_uniform':
return torch.linspace(t_T, t_0, N + 1).to(device)
elif skip_type == 'time_quadratic':
t_order = 2
return (
torch.linspace(
t_T ** (1.0 / t_order), t_0 ** (1.0 / t_order), N + 1
)
.pow(t_order)
.to(device)
)
else:
raise ValueError(
f"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time_uniform' or 'time_quadratic'"
)
def get_orders_and_timesteps_for_singlestep_solver(self, steps, order, skip_type, t_T, t_0, device):
"""
Get the order of each step for sampling by the singlestep DPM-Solver.
We combine both DPM-Solver-1,2,3 to use all the function evaluations, which is named as "DPM-Solver-fast".
Given a fixed number of function evaluations by `steps`, the sampling procedure by DPM-Solver-fast is:
- If order == 1:
We take `steps` of DPM-Solver-1 (i.e. DDIM).
- If order == 2:
- Denote K = (steps // 2). We take K or (K + 1) intermediate time steps for sampling.
- If steps % 2 == 0, we use K steps of DPM-Solver-2.
- If steps % 2 == 1, we use K steps of DPM-Solver-2 and 1 step of DPM-Solver-1.
- If order == 3:
- Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.
- If steps % 3 == 0, we use (K - 2) steps of DPM-Solver-3, and 1 step of DPM-Solver-2 and 1 step of DPM-Solver-1.
- If steps % 3 == 1, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-1.
- If steps % 3 == 2, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-2.
============================================
Args:
order: A `int`. The max order for the solver (2 or 3).
steps: A `int`. The total number of function evaluations (NFE).
skip_type: A `str`. The type for the spacing of the time steps. We support three types:
- 'logSNR': uniform logSNR for the time steps.
- 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)
- 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)
t_T: A `float`. The starting time of the sampling (default is T).
t_0: A `float`. The ending time of the sampling (default is epsilon).
device: A torch device.
Returns:
orders: A list of the solver order of each step.
"""
if order == 3:
K = steps // 3 + 1
if steps % 3 == 0:
orders = [3, ] * (K - 2) + [2, 1]
elif steps % 3 == 1:
orders = [3, ] * (K - 1) + [1]
else:
orders = [3, ] * (K - 1) + [2]
elif order == 2:
if steps % 2 == 0:
K = steps // 2
orders = [2, ] * K
else:
K = steps // 2 + 1
orders = [2, ] * (K - 1) + [1]
elif order == 1:
K = 1
orders = [1, ] * steps
else:
raise ValueError("'order' must be '1' or '2' or '3'.")
if skip_type == 'logSNR':
# To reproduce the results in DPM-Solver paper
timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, K, device)
else:
timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, steps, device)[
torch.cumsum(torch.tensor([0, ] + orders), 0).to(device)]
return timesteps_outer, orders
def denoise_to_zero_fn(self, x, s):
"""
Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.
"""
return self.data_prediction_fn(x, s)
def dpm_solver_first_update(self, x, s, t, current, cache_dic, model_s=None, return_intermediate=False):
"""
DPM-Solver-1 (equivalent to DDIM) from time `s` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
s: A pytorch tensor. The starting time, with the shape (1,).
t: A pytorch tensor. The ending time, with the shape (1,).
model_s: A pytorch tensor. The model function evaluated at time `s`.
If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
return_intermediate: A `bool`. If true, also return the model value at time `s`.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
ns = self.noise_schedule
dims = x.dim()
lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
h = lambda_t - lambda_s
log_alpha_s, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(t)
sigma_s, sigma_t = ns.marginal_std(s), ns.marginal_std(t)
alpha_t = torch.exp(log_alpha_t)
if self.algorithm_type == "dpmsolver++":
phi_1 = torch.expm1(-h)
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
x_t = (
sigma_t / sigma_s * x
- alpha_t * phi_1 * model_s
)
else:
phi_1 = torch.expm1(h)
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
x_t = (
torch.exp(log_alpha_t - log_alpha_s) * x
- (sigma_t * phi_1) * model_s
)
return (x_t, {'model_s': model_s}) if return_intermediate else x_t
def singlestep_dpm_solver_second_update(self, x, s, t, current, cache_dic, r1=0.5, model_s=None, return_intermediate=False,
solver_type='dpmsolver'):
"""
Singlestep solver DPM-Solver-2 from time `s` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
s: A pytorch tensor. The starting time, with the shape (1,).
t: A pytorch tensor. The ending time, with the shape (1,).
r1: A `float`. The hyperparameter of the second-order solver.
model_s: A pytorch tensor. The model function evaluated at time `s`.
If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
return_intermediate: A `bool`. If true, also return the model value at time `s` and `s1` (the intermediate time).
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
if solver_type not in ['dpmsolver', 'taylor']:
raise ValueError(
f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}"
)
if r1 is None:
r1 = 0.5
ns = self.noise_schedule
lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
h = lambda_t - lambda_s
lambda_s1 = lambda_s + r1 * h
s1 = ns.inverse_lambda(lambda_s1)
log_alpha_s, log_alpha_s1, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(
s1), ns.marginal_log_mean_coeff(t)
sigma_s, sigma_s1, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(t)
alpha_s1, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_t)
if self.algorithm_type == "dpmsolver++":
phi_11 = torch.expm1(-r1 * h)
phi_1 = torch.expm1(-h)
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
x_s1 = (
(sigma_s1 / sigma_s) * x
- (alpha_s1 * phi_11) * model_s
)
model_s1 = self.model_fn(x_s1, s1, current, cache_dic)
if solver_type == 'dpmsolver':
x_t = (
(sigma_t / sigma_s) * x
- (alpha_t * phi_1) * model_s
- (0.5 / r1) * (alpha_t * phi_1) * (model_s1 - model_s)
)
elif solver_type == 'taylor':
x_t = (
(sigma_t / sigma_s) * x
- (alpha_t * phi_1) * model_s
+ (1. / r1) * (alpha_t * (phi_1 / h + 1.)) * (model_s1 - model_s)
)
else:
phi_11 = torch.expm1(r1 * h)
phi_1 = torch.expm1(h)
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
x_s1 = (
torch.exp(log_alpha_s1 - log_alpha_s) * x
- (sigma_s1 * phi_11) * model_s
)
model_s1 = self.model_fn(x_s1, s1, current, cache_dic)
if solver_type == 'dpmsolver':
x_t = (
torch.exp(log_alpha_t - log_alpha_s) * x
- (sigma_t * phi_1) * model_s
- (0.5 / r1) * (sigma_t * phi_1) * (model_s1 - model_s)
)
elif solver_type == 'taylor':
x_t = (
torch.exp(log_alpha_t - log_alpha_s) * x
- (sigma_t * phi_1) * model_s
- (1. / r1) * (sigma_t * (phi_1 / h - 1.)) * (model_s1 - model_s)
)
if return_intermediate:
return x_t, {'model_s': model_s, 'model_s1': model_s1}
else:
return x_t
def singlestep_dpm_solver_third_update(self, x, s, t, current, cache_dic, r1=1. / 3., r2=2. / 3., model_s=None, model_s1=None,
return_intermediate=False, solver_type='dpmsolver'):
"""
Singlestep solver DPM-Solver-3 from time `s` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
s: A pytorch tensor. The starting time, with the shape (1,).
t: A pytorch tensor. The ending time, with the shape (1,).
r1: A `float`. The hyperparameter of the third-order solver.
r2: A `float`. The hyperparameter of the third-order solver.
model_s: A pytorch tensor. The model function evaluated at time `s`.
If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
model_s1: A pytorch tensor. The model function evaluated at time `s1` (the intermediate time given by `r1`).
If `model_s1` is None, we evaluate the model at `s1`; otherwise we directly use it.
return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
if solver_type not in ['dpmsolver', 'taylor']:
raise ValueError(
f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}"
)
if r1 is None:
r1 = 1. / 3.
if r2 is None:
r2 = 2. / 3.
ns = self.noise_schedule
lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
h = lambda_t - lambda_s
lambda_s1 = lambda_s + r1 * h
lambda_s2 = lambda_s + r2 * h
s1 = ns.inverse_lambda(lambda_s1)
s2 = ns.inverse_lambda(lambda_s2)
log_alpha_s, log_alpha_s1, log_alpha_s2, log_alpha_t = ns.marginal_log_mean_coeff(
s), ns.marginal_log_mean_coeff(s1), ns.marginal_log_mean_coeff(s2), ns.marginal_log_mean_coeff(t)
sigma_s, sigma_s1, sigma_s2, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(
s2), ns.marginal_std(t)
alpha_s1, alpha_s2, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_s2), torch.exp(log_alpha_t)
if self.algorithm_type == "dpmsolver++":
phi_11 = torch.expm1(-r1 * h)
phi_12 = torch.expm1(-r2 * h)
phi_1 = torch.expm1(-h)
phi_22 = torch.expm1(-r2 * h) / (r2 * h) + 1.
phi_2 = phi_1 / h + 1.
phi_3 = phi_2 / h - 0.5
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
if model_s1 is None:
x_s1 = (
(sigma_s1 / sigma_s) * x
- (alpha_s1 * phi_11) * model_s
)
model_s1 = self.model_fn(x_s1, s1, current, cache_dic)
x_s2 = (
(sigma_s2 / sigma_s) * x
- (alpha_s2 * phi_12) * model_s
+ r2 / r1 * (alpha_s2 * phi_22) * (model_s1 - model_s)
)
model_s2 = self.model_fn(x_s2, s2, current, cache_dic)
if solver_type == 'dpmsolver':
x_t = (
(sigma_t / sigma_s) * x
- (alpha_t * phi_1) * model_s
+ (1. / r2) * (alpha_t * phi_2) * (model_s2 - model_s)
)
elif solver_type == 'taylor':
D1_0 = (1. / r1) * (model_s1 - model_s)
D1_1 = (1. / r2) * (model_s2 - model_s)
D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)
D2 = 2. * (D1_1 - D1_0) / (r2 - r1)
x_t = (
(sigma_t / sigma_s) * x
- (alpha_t * phi_1) * model_s
+ (alpha_t * phi_2) * D1
- (alpha_t * phi_3) * D2
)
else:
phi_11 = torch.expm1(r1 * h)
phi_12 = torch.expm1(r2 * h)
phi_1 = torch.expm1(h)
phi_22 = torch.expm1(r2 * h) / (r2 * h) - 1.
phi_2 = phi_1 / h - 1.
phi_3 = phi_2 / h - 0.5
if model_s is None:
model_s = self.model_fn(x, s, current, cache_dic)
if model_s1 is None:
x_s1 = (
(torch.exp(log_alpha_s1 - log_alpha_s)) * x
- (sigma_s1 * phi_11) * model_s
)
model_s1 = self.model_fn(x_s1, s1, current, cache_dic)
x_s2 = (
(torch.exp(log_alpha_s2 - log_alpha_s)) * x
- (sigma_s2 * phi_12) * model_s
- r2 / r1 * (sigma_s2 * phi_22) * (model_s1 - model_s)
)
model_s2 = self.model_fn(x_s2, s2, current, cache_dic)
if solver_type == 'dpmsolver':
x_t = (
(torch.exp(log_alpha_t - log_alpha_s)) * x
- (sigma_t * phi_1) * model_s
- (1. / r2) * (sigma_t * phi_2) * (model_s2 - model_s)
)
elif solver_type == 'taylor':
D1_0 = (1. / r1) * (model_s1 - model_s)
D1_1 = (1. / r2) * (model_s2 - model_s)
D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)
D2 = 2. * (D1_1 - D1_0) / (r2 - r1)
x_t = (
(torch.exp(log_alpha_t - log_alpha_s)) * x
- (sigma_t * phi_1) * model_s
- (sigma_t * phi_2) * D1
- (sigma_t * phi_3) * D2
)
if return_intermediate:
return x_t, {'model_s': model_s, 'model_s1': model_s1, 'model_s2': model_s2}
else:
return x_t
def multistep_dpm_solver_second_update(self, x, model_prev_list, t_prev_list, t, solver_type="dpmsolver"):
"""
Multistep solver DPM-Solver-2 from time `t_prev_list[-1]` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
model_prev_list: A list of pytorch tensor. The previous computed model values.
t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
t: A pytorch tensor. The ending time, with the shape (1,).
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
if solver_type not in ['dpmsolver', 'taylor']:
raise ValueError(
f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}"
)
ns = self.noise_schedule
model_prev_1, model_prev_0 = model_prev_list[-2], model_prev_list[-1]
t_prev_1, t_prev_0 = t_prev_list[-2], t_prev_list[-1]
lambda_prev_1, lambda_prev_0, lambda_t = ns.marginal_lambda(t_prev_1), ns.marginal_lambda(
t_prev_0), ns.marginal_lambda(t)
log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)
sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)
alpha_t = torch.exp(log_alpha_t)
h_0 = lambda_prev_0 - lambda_prev_1
h = lambda_t - lambda_prev_0
r0 = h_0 / h
D1_0 = (1. / r0) * (model_prev_0 - model_prev_1)
if self.algorithm_type == "dpmsolver++":
phi_1 = torch.expm1(-h)
if solver_type == 'dpmsolver':
x_t = (
(sigma_t / sigma_prev_0) * x
- (alpha_t * phi_1) * model_prev_0
- 0.5 * (alpha_t * phi_1) * D1_0
)
elif solver_type == 'taylor':
x_t = (
(sigma_t / sigma_prev_0) * x
- (alpha_t * phi_1) * model_prev_0
+ (alpha_t * (phi_1 / h + 1.)) * D1_0
)
else:
phi_1 = torch.expm1(h)
if solver_type == 'dpmsolver':
x_t = (
(torch.exp(log_alpha_t - log_alpha_prev_0)) * x
- (sigma_t * phi_1) * model_prev_0
- 0.5 * (sigma_t * phi_1) * D1_0
)
elif solver_type == 'taylor':
x_t = (
(torch.exp(log_alpha_t - log_alpha_prev_0)) * x
- (sigma_t * phi_1) * model_prev_0
- (sigma_t * (phi_1 / h - 1.)) * D1_0
)
return x_t
def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev_list, t, solver_type='dpmsolver'):
"""
Multistep solver DPM-Solver-3 from time `t_prev_list[-1]` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
model_prev_list: A list of pytorch tensor. The previous computed model values.
t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
t: A pytorch tensor. The ending time, with the shape (1,).
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
ns = self.noise_schedule
model_prev_2, model_prev_1, model_prev_0 = model_prev_list
t_prev_2, t_prev_1, t_prev_0 = t_prev_list
lambda_prev_2, lambda_prev_1, lambda_prev_0, lambda_t = ns.marginal_lambda(t_prev_2), ns.marginal_lambda(
t_prev_1), ns.marginal_lambda(t_prev_0), ns.marginal_lambda(t)
log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)
sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)
alpha_t = torch.exp(log_alpha_t)
h_1 = lambda_prev_1 - lambda_prev_2
h_0 = lambda_prev_0 - lambda_prev_1
h = lambda_t - lambda_prev_0
r0, r1 = h_0 / h, h_1 / h
D1_0 = (1. / r0) * (model_prev_0 - model_prev_1)
D1_1 = (1. / r1) * (model_prev_1 - model_prev_2)
D1 = D1_0 + (r0 / (r0 + r1)) * (D1_0 - D1_1)
D2 = (1. / (r0 + r1)) * (D1_0 - D1_1)
if self.algorithm_type == "dpmsolver++":
phi_1 = torch.expm1(-h)
phi_2 = phi_1 / h + 1.
phi_3 = phi_2 / h - 0.5
return (
(sigma_t / sigma_prev_0) * x
- (alpha_t * phi_1) * model_prev_0
+ (alpha_t * phi_2) * D1
- (alpha_t * phi_3) * D2
)
else:
phi_1 = torch.expm1(h)
phi_2 = phi_1 / h - 1.
phi_3 = phi_2 / h - 0.5
return (
(torch.exp(log_alpha_t - log_alpha_prev_0)) * x
- (sigma_t * phi_1) * model_prev_0
- (sigma_t * phi_2) * D1
- (sigma_t * phi_3) * D2
)
def singlestep_dpm_solver_update(self, x, s, t, current, cache_dic, order, return_intermediate=False, solver_type='dpmsolver', r1=None,
r2=None):
"""
Singlestep DPM-Solver with the order `order` from time `s` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
s: A pytorch tensor. The starting time, with the shape (1,).
t: A pytorch tensor. The ending time, with the shape (1,).
order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.
return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
r1: A `float`. The hyperparameter of the second-order or third-order solver.
r2: A `float`. The hyperparameter of the third-order solver.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
if order == 1:
return self.dpm_solver_first_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate)
elif order == 2:
return self.singlestep_dpm_solver_second_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate,
solver_type=solver_type, r1=r1)
elif order == 3:
return self.singlestep_dpm_solver_third_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate,
solver_type=solver_type, r1=r1, r2=r2)
else:
raise ValueError(f"Solver order must be 1 or 2 or 3, got {order}")
def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list, t, current, cache_dic, order, solver_type='dpmsolver'):
"""
Multistep DPM-Solver with the order `order` from time `t_prev_list[-1]` to time `t`.
Args:
x: A pytorch tensor. The initial value at time `s`.
model_prev_list: A list of pytorch tensor. The previous computed model values.
t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
t: A pytorch tensor. The ending time, with the shape (1,).
order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_t: A pytorch tensor. The approximated solution at time `t`.
"""
if order == 1:
return self.dpm_solver_first_update(x, t_prev_list[-1], t, current, cache_dic, model_s=model_prev_list[-1])
elif order == 2:
return self.multistep_dpm_solver_second_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)
elif order == 3:
return self.multistep_dpm_solver_third_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)
else:
raise ValueError(f"Solver order must be 1 or 2 or 3, got {order}")
def dpm_solver_adaptive(self, x, order, t_T, t_0, h_init=0.05, atol=0.0078, rtol=0.05, theta=0.9, t_err=1e-5,
solver_type='dpmsolver'):
"""
The adaptive step size solver based on singlestep DPM-Solver.
Args:
x: A pytorch tensor. The initial value at time `t_T`.
order: A `int`. The (higher) order of the solver. We only support order == 2 or 3.
t_T: A `float`. The starting time of the sampling (default is T).
t_0: A `float`. The ending time of the sampling (default is epsilon).
h_init: A `float`. The initial step size (for logSNR).
atol: A `float`. The absolute tolerance of the solver. For image data, the default setting is 0.0078, followed [1].
rtol: A `float`. The relative tolerance of the solver. The default setting is 0.05.
theta: A `float`. The safety hyperparameter for adapting the step size. The default setting is 0.9, followed [1].
t_err: A `float`. The tolerance for the time. We solve the diffusion ODE until the absolute error between the
current time and `t_0` is less than `t_err`. The default setting is 1e-5.
solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
Returns:
x_0: A pytorch tensor. The approximated solution at time `t_0`.
[1] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, "Gotta go fast when generating data with score-based models," arXiv preprint arXiv:2105.14080, 2021.
"""
ns = self.noise_schedule
s = t_T * torch.ones((1,)).to(x)
lambda_s = ns.marginal_lambda(s)
lambda_0 = ns.marginal_lambda(t_0 * torch.ones_like(s).to(x))
h = h_init * torch.ones_like(s).to(x)
x_prev = x
nfe = 0
if order == 2:
r1 = 0.5
lower_update = lambda x, s, t: self.dpm_solver_first_update(x, s, t, return_intermediate=True)
higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_second_update(x, s, t, r1=r1,
solver_type=solver_type,
**kwargs)
elif order == 3:
r1, r2 = 1. / 3., 2. / 3.
lower_update = lambda x, s, t: self.singlestep_dpm_solver_second_update(x, s, t, r1=r1,
return_intermediate=True,
solver_type=solver_type)
higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_third_update(x, s, t, r1=r1, r2=r2,
solver_type=solver_type,
**kwargs)
else:
raise ValueError(
f"For adaptive step size solver, order must be 2 or 3, got {order}"
)
while torch.abs((s - t_0)).mean() > t_err:
t = ns.inverse_lambda(lambda_s + h)
x_lower, lower_noise_kwargs = lower_update(x, s, t)
x_higher = higher_update(x, s, t, **lower_noise_kwargs)
delta = torch.max(torch.ones_like(x).to(x) * atol, rtol * torch.max(torch.abs(x_lower), torch.abs(x_prev)))
norm_fn = lambda v: torch.sqrt(torch.square(v.reshape((v.shape[0], -1))).mean(dim=-1, keepdim=True))
E = norm_fn((x_higher - x_lower) / delta).max()
if torch.all(E <= 1.):
x = x_higher
s = t
x_prev = x_lower
lambda_s = ns.marginal_lambda(s)
h = torch.min(theta * h * torch.float_power(E, -1. / order).float(), lambda_0 - lambda_s)
nfe += order
print('adaptive solver nfe', nfe)
return x
def add_noise(self, x, t, noise=None):
"""
Compute the noised input xt = alpha_t * x + sigma_t * noise.
Args:
x: A `torch.Tensor` with shape `(batch_size, *shape)`.
t: A `torch.Tensor` with shape `(t_size,)`.
Returns:
xt with shape `(t_size, batch_size, *shape)`.
"""
alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)
if noise is None:
noise = torch.randn((t.shape[0], *x.shape), device=x.device)
x = x.reshape((-1, *x.shape))
xt = expand_dims(alpha_t, x.dim()) * x + expand_dims(sigma_t, x.dim()) * noise
return xt.squeeze(0) if t.shape[0] == 1 else xt
def inverse(self, x, steps=20, t_start=None, t_end=None, order=2, skip_type='time_uniform',
method='multistep', lower_order_final=True, denoise_to_zero=False, solver_type='dpmsolver',
atol=0.0078, rtol=0.05, return_intermediate=False,
):
"""
Inverse the sample `x` from time `t_start` to `t_end` by DPM-Solver.
For discrete-time DPMs, we use `t_start=1/N`, where `N` is the total time steps during training.
"""
t_0 = 1. / self.noise_schedule.total_N if t_start is None else t_start
t_T = self.noise_schedule.T if t_end is None else t_end
assert t_0 > 0 and t_T > 0, "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
return self.sample(x, steps=steps, t_start=t_0, t_end=t_T, order=order, skip_type=skip_type,
method=method, lower_order_final=lower_order_final, denoise_to_zero=denoise_to_zero,
solver_type=solver_type,
atol=atol, rtol=rtol, return_intermediate=return_intermediate)
def sample(self, x, steps=20, t_start=None, t_end=None, order=2, skip_type='time_uniform',
method='multistep', lower_order_final=True, denoise_to_zero=False, solver_type='dpmsolver',
atol=0.0078, rtol=0.05, return_intermediate=False, model_kwargs = {}, rank = None,
):
"""
Compute the sample at time `t_end` by DPM-Solver, given the initial `x` at time `t_start`.
=====================================================
We support the following algorithms for both noise prediction model and data prediction model:
- 'singlestep':
Singlestep DPM-Solver (i.e. "DPM-Solver-fast" in the paper), which combines different orders of singlestep DPM-Solver.
We combine all the singlestep solvers with order <= `order` to use up all the function evaluations (steps).
The total number of function evaluations (NFE) == `steps`.
Given a fixed NFE == `steps`, the sampling procedure is:
- If `order` == 1:
- Denote K = steps. We use K steps of DPM-Solver-1 (i.e. DDIM).
- If `order` == 2:
- Denote K = (steps // 2) + (steps % 2). We take K intermediate time steps for sampling.
- If steps % 2 == 0, we use K steps of singlestep DPM-Solver-2.
- If steps % 2 == 1, we use (K - 1) steps of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.
- If `order` == 3:
- Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.
- If steps % 3 == 0, we use (K - 2) steps of singlestep DPM-Solver-3, and 1 step of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.
- If steps % 3 == 1, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of DPM-Solver-1.
- If steps % 3 == 2, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of singlestep DPM-Solver-2.
- 'multistep':
Multistep DPM-Solver with the order of `order`. The total number of function evaluations (NFE) == `steps`.
We initialize the first `order` values by lower order multistep solvers.
Given a fixed NFE == `steps`, the sampling procedure is:
Denote K = steps.
- If `order` == 1:
- We use K steps of DPM-Solver-1 (i.e. DDIM).
- If `order` == 2:
- We firstly use 1 step of DPM-Solver-1, then use (K - 1) step of multistep DPM-Solver-2.
- If `order` == 3:
- We firstly use 1 step of DPM-Solver-1, then 1 step of multistep DPM-Solver-2, then (K - 2) step of multistep DPM-Solver-3.
- 'singlestep_fixed':
Fixed order singlestep DPM-Solver (i.e. DPM-Solver-1 or singlestep DPM-Solver-2 or singlestep DPM-Solver-3).
We use singlestep DPM-Solver-`order` for `order`=1 or 2 or 3, with total [`steps` // `order`] * `order` NFE.
- 'adaptive':
Adaptive step size DPM-Solver (i.e. "DPM-Solver-12" and "DPM-Solver-23" in the paper).
We ignore `steps` and use adaptive step size DPM-Solver with a higher order of `order`.
You can adjust the absolute tolerance `atol` and the relative tolerance `rtol` to balance the computatation costs
(NFE) and the sample quality.
- If `order` == 2, we use DPM-Solver-12 which combines DPM-Solver-1 and singlestep DPM-Solver-2.
- If `order` == 3, we use DPM-Solver-23 which combines singlestep DPM-Solver-2 and singlestep DPM-Solver-3.
=====================================================
Some advices for choosing the algorithm:
- For **unconditional sampling** or **guided sampling with small guidance scale** by DPMs:
Use singlestep DPM-Solver or DPM-Solver++ ("DPM-Solver-fast" in the paper) with `order = 3`.
e.g., DPM-Solver:
>>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver")
>>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,
skip_type='time_uniform', method='singlestep')
e.g., DPM-Solver++:
>>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver++")
>>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,
skip_type='time_uniform', method='singlestep')
- For **guided sampling with large guidance scale** by DPMs:
Use multistep DPM-Solver with `algorithm_type="dpmsolver++"` and `order = 2`.
e.g.
>>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver++")
>>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=2,
skip_type='time_uniform', method='multistep')
We support three types of `skip_type`:
- 'logSNR': uniform logSNR for the time steps. **Recommended for low-resolutional images**
- 'time_uniform': uniform time for the time steps. **Recommended for high-resolutional images**.
- 'time_quadratic': quadratic time for the time steps.
=====================================================
Args:
x: A pytorch tensor. The initial value at time `t_start`
e.g. if `t_start` == T, then `x` is a sample from the standard normal distribution.
steps: A `int`. The total number of function evaluations (NFE).
t_start: A `float`. The starting time of the sampling.
If `T` is None, we use self.noise_schedule.T (default is 1.0).
t_end: A `float`. The ending time of the sampling.
If `t_end` is None, we use 1. / self.noise_schedule.total_N.
e.g. if total_N == 1000, we have `t_end` == 1e-3.
For discrete-time DPMs:
- We recommend `t_end` == 1. / self.noise_schedule.total_N.
For continuous-time DPMs:
- We recommend `t_end` == 1e-3 when `steps` <= 15; and `t_end` == 1e-4 when `steps` > 15.
order: A `int`. The order of DPM-Solver.
skip_type: A `str`. The type for the spacing of the time steps. 'time_uniform' or 'logSNR' or 'time_quadratic'.
method: A `str`. The method for sampling. 'singlestep' or 'multistep' or 'singlestep_fixed' or 'adaptive'.
denoise_to_zero: A `bool`. Whether to denoise to time 0 at the final step.
Default is `False`. If `denoise_to_zero` is `True`, the total NFE is (`steps` + 1).
This trick is firstly proposed by DDPM (https://arxiv.org/abs/2006.11239) and
score_sde (https://arxiv.org/abs/2011.13456). Such trick can improve the FID
for diffusion models sampling by diffusion SDEs for low-resolutional images
(such as CIFAR-10). However, we observed that such trick does not matter for
high-resolutional images. As it needs an additional NFE, we do not recommend
it for high-resolutional images.
lower_order_final: A `bool`. Whether to use lower order solvers at the final steps.
Only valid for `method=multistep` and `steps < 15`. We empirically find that
this trick is a key to stabilizing the sampling by DPM-Solver with very few steps
(especially for steps <= 10). So we recommend to set it to be `True`.
solver_type: A `str`. The taylor expansion type for the solver. `dpmsolver` or `taylor`. We recommend `dpmsolver`.
atol: A `float`. The absolute tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.
rtol: A `float`. The relative tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.
return_intermediate: A `bool`. Whether to save the xt at each step.
When set to `True`, method returns a tuple (x0, intermediates); when set to False, method returns only x0.
Returns:
x_end: A pytorch tensor. The approximated solution at time `t_end`.
"""
t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
t_T = self.noise_schedule.T if t_start is None else t_start
assert t_0 > 0 and t_T > 0, "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
if return_intermediate:
assert method in ['multistep', 'singlestep',
'singlestep_fixed'], "Cannot use adaptive solver when saving intermediate values"
if self.correcting_xt_fn is not None:
assert method in ['multistep', 'singlestep',
'singlestep_fixed'], "Cannot use adaptive solver when correcting_xt_fn is not None"
device = x.device
intermediates = []
cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=steps)
with torch.no_grad():
if method == 'adaptive':
x = self.dpm_solver_adaptive(x, order=order, t_T=t_T, t_0=t_0, atol=atol, rtol=rtol,
solver_type=solver_type)
elif method == 'multistep':
assert steps >= order
timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, device=device)
assert timesteps.shape[0] - 1 == steps
# Init the initial values.
step = 0
current['step'] = step
t = timesteps[step]
t_prev_list = [t]
model_prev_list = [self.model_fn(x, t, current, cache_dic)]
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
# Init the first `order` values by lower order multistep DPM-Solver.
for step in range(1, order):
current['step'] = step
t = timesteps[step]
x = self.multistep_dpm_solver_update(x, model_prev_list, t_prev_list, t, current, cache_dic, step,
solver_type=solver_type)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
t_prev_list.append(t)
model_prev_list.append(self.model_fn(x, t, current, cache_dic))
# Compute the remaining values by `order`-th order multistep DPM-Solver.
pbar = tqdm(range(order, steps + 1), leave=False) if (rank == 0) or (rank == None) else range(order, steps + 1)
for step in pbar:
current['step'] = step
t = timesteps[step]
# We only use lower order for steps < 10
if lower_order_final and steps < 10:
step_order = min(order, steps + 1 - step)
else:
step_order = order
x = self.multistep_dpm_solver_update(x, model_prev_list, t_prev_list, t, current, cache_dic, step_order,
solver_type=solver_type)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
for i in range(order - 1):
t_prev_list[i] = t_prev_list[i + 1]
model_prev_list[i] = model_prev_list[i + 1]
t_prev_list[-1] = t
# We do not need to evaluate the final model value.
if step < steps:
model_prev_list[-1] = self.model_fn(x, t, current, cache_dic)
elif method in ['singlestep', 'singlestep_fixed']:
if method == 'singlestep':
timesteps_outer, orders = self.get_orders_and_timesteps_for_singlestep_solver(steps=steps,
order=order,
skip_type=skip_type,
t_T=t_T, t_0=t_0,
device=device)
elif method == 'singlestep_fixed':
K = steps // order
orders = [order, ] * K
timesteps_outer = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=K, device=device)
for step, order in enumerate(orders):
s, t = timesteps_outer[step], timesteps_outer[step + 1]
timesteps_inner = self.get_time_steps(skip_type=skip_type, t_T=s.item(), t_0=t.item(), N=order,
device=device)
lambda_inner = self.noise_schedule.marginal_lambda(timesteps_inner)
h = lambda_inner[-1] - lambda_inner[0]
r1 = None if order <= 1 else (lambda_inner[1] - lambda_inner[0]) / h
r2 = None if order <= 2 else (lambda_inner[2] - lambda_inner[0]) / h
x = self.singlestep_dpm_solver_update(x, s, t, order, solver_type=solver_type, r1=r1, r2=r2)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
else:
raise ValueError(f"Got wrong method {method}")
if denoise_to_zero:
t = torch.ones((1,)).to(device) * t_0
x = self.denoise_to_zero_fn(x, t)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step + 1)
if return_intermediate:
intermediates.append(x)
return (x, intermediates) if return_intermediate else x
#############################################################
# other utility functions
#############################################################
def interpolate_fn(x, xp, yp):
"""
A piecewise linear function y = f(x), using xp and yp as keypoints.
We implement f(x) in a differentiable way (i.e. applicable for autograd).
The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)
Args:
x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).
xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.
yp: PyTorch tensor with shape [C, K].
Returns:
The function values f(x), with shape [N, C].
"""
N, K = x.shape[0], xp.shape[1]
all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)
sorted_all_x, x_indices = torch.sort(all_x, dim=2)
x_idx = torch.argmin(x_indices, dim=2)
cand_start_idx = x_idx - 1
start_idx = torch.where(
torch.eq(x_idx, 0),
torch.tensor(1, device=x.device),
torch.where(
torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,
),
)
end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)
start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)
end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)
start_idx2 = torch.where(
torch.eq(x_idx, 0),
torch.tensor(0, device=x.device),
torch.where(
torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,
),
)
y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)
start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)
end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)
return start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)
def expand_dims(v, dims):
"""
Expand the tensor `v` to the dim `dims`.
Args:
`v`: a PyTorch tensor with shape [N].
`dim`: a `int`.
Returns:
a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.
"""
return v[(...,) + (None,) * (dims - 1)]
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/edm_sample.py
================================================
import random
import numpy as np
from tqdm import tqdm
from diffusion.model.utils import *
# ----------------------------------------------------------------------------
# Proposed EDM sampler (Algorithm 2).
def edm_sampler(
net, latents, class_labels=None, cfg_scale=None, randn_like=torch.randn_like,
num_steps=18, sigma_min=0.002, sigma_max=80, rho=7,
S_churn=0, S_min=0, S_max=float('inf'), S_noise=1, **kwargs
):
# Adjust noise levels based on what's supported by the network.
sigma_min = max(sigma_min, net.sigma_min)
sigma_max = min(sigma_max, net.sigma_max)
# Time step discretization.
step_indices = torch.arange(num_steps, dtype=torch.float64, device=latents.device)
t_steps = (sigma_max ** (1 / rho) + step_indices / (num_steps - 1) * (
sigma_min ** (1 / rho) - sigma_max ** (1 / rho))) ** rho
t_steps = torch.cat([net.round_sigma(t_steps), torch.zeros_like(t_steps[:1])]) # t_N = 0
# Main sampling loop.
x_next = latents.to(torch.float64) * t_steps[0]
for i, (t_cur, t_next) in tqdm(list(enumerate(zip(t_steps[:-1], t_steps[1:])))): # 0, ..., N-1
x_cur = x_next
# Increase noise temporarily.
gamma = min(S_churn / num_steps, np.sqrt(2) - 1) if S_min <= t_cur <= S_max else 0
t_hat = net.round_sigma(t_cur + gamma * t_cur)
x_hat = x_cur + (t_hat ** 2 - t_cur ** 2).sqrt() * S_noise * randn_like(x_cur)
# Euler step.
denoised = net(x_hat.float(), t_hat, class_labels, cfg_scale, **kwargs)['x'].to(torch.float64)
d_cur = (x_hat - denoised) / t_hat
x_next = x_hat + (t_next - t_hat) * d_cur
# Apply 2nd order correction.
if i < num_steps - 1:
denoised = net(x_next.float(), t_next, class_labels, cfg_scale, **kwargs)['x'].to(torch.float64)
d_prime = (x_next - denoised) / t_next
x_next = x_hat + (t_next - t_hat) * (0.5 * d_cur + 0.5 * d_prime)
return x_next
# ----------------------------------------------------------------------------
# Generalized ablation sampler, representing the superset of all sampling
# methods discussed in the paper.
def ablation_sampler(
net, latents, class_labels=None, cfg_scale=None, feat=None, randn_like=torch.randn_like,
num_steps=18, sigma_min=None, sigma_max=None, rho=7,
solver='heun', discretization='edm', schedule='linear', scaling='none',
epsilon_s=1e-3, C_1=0.001, C_2=0.008, M=1000, alpha=1,
S_churn=0, S_min=0, S_max=float('inf'), S_noise=1,
):
assert solver in ['euler', 'heun']
assert discretization in ['vp', 've', 'iddpm', 'edm']
assert schedule in ['vp', 've', 'linear']
assert scaling in ['vp', 'none']
# Helper functions for VP & VE noise level schedules.
vp_sigma = lambda beta_d, beta_min: lambda t: (np.e ** (0.5 * beta_d * (t ** 2) + beta_min * t) - 1) ** 0.5
vp_sigma_deriv = lambda beta_d, beta_min: lambda t: 0.5 * (beta_min + beta_d * t) * (sigma(t) + 1 / sigma(t))
vp_sigma_inv = lambda beta_d, beta_min: lambda sigma: ((beta_min ** 2 + 2 * beta_d * (
sigma ** 2 + 1).log()).sqrt() - beta_min) / beta_d
ve_sigma = lambda t: t.sqrt()
ve_sigma_deriv = lambda t: 0.5 / t.sqrt()
ve_sigma_inv = lambda sigma: sigma ** 2
# Select default noise level range based on the specified time step discretization.
if sigma_min is None:
vp_def = vp_sigma(beta_d=19.1, beta_min=0.1)(t=epsilon_s)
sigma_min = {'vp': vp_def, 've': 0.02, 'iddpm': 0.002, 'edm': 0.002}[discretization]
if sigma_max is None:
vp_def = vp_sigma(beta_d=19.1, beta_min=0.1)(t=1)
sigma_max = {'vp': vp_def, 've': 100, 'iddpm': 81, 'edm': 80}[discretization]
# Adjust noise levels based on what's supported by the network.
sigma_min = max(sigma_min, net.sigma_min)
sigma_max = min(sigma_max, net.sigma_max)
# Compute corresponding betas for VP.
vp_beta_d = 2 * (np.log(sigma_min ** 2 + 1) / epsilon_s - np.log(sigma_max ** 2 + 1)) / (epsilon_s - 1)
vp_beta_min = np.log(sigma_max ** 2 + 1) - 0.5 * vp_beta_d
# Define time steps in terms of noise level.
step_indices = torch.arange(num_steps, dtype=torch.float64, device=latents.device)
if discretization == 'vp':
orig_t_steps = 1 + step_indices / (num_steps - 1) * (epsilon_s - 1)
sigma_steps = vp_sigma(vp_beta_d, vp_beta_min)(orig_t_steps)
elif discretization == 've':
orig_t_steps = (sigma_max ** 2) * ((sigma_min ** 2 / sigma_max ** 2) ** (step_indices / (num_steps - 1)))
sigma_steps = ve_sigma(orig_t_steps)
elif discretization == 'iddpm':
u = torch.zeros(M + 1, dtype=torch.float64, device=latents.device)
alpha_bar = lambda j: (0.5 * np.pi * j / M / (C_2 + 1)).sin() ** 2
for j in torch.arange(M, 0, -1, device=latents.device): # M, ..., 1
u[j - 1] = ((u[j] ** 2 + 1) / (alpha_bar(j - 1) / alpha_bar(j)).clip(min=C_1) - 1).sqrt()
u_filtered = u[torch.logical_and(u >= sigma_min, u <= sigma_max)]
sigma_steps = u_filtered[((len(u_filtered) - 1) / (num_steps - 1) * step_indices).round().to(torch.int64)]
else:
assert discretization == 'edm'
sigma_steps = (sigma_max ** (1 / rho) + step_indices / (num_steps - 1) * (
sigma_min ** (1 / rho) - sigma_max ** (1 / rho))) ** rho
# Define noise level schedule.
if schedule == 'vp':
sigma = vp_sigma(vp_beta_d, vp_beta_min)
sigma_deriv = vp_sigma_deriv(vp_beta_d, vp_beta_min)
sigma_inv = vp_sigma_inv(vp_beta_d, vp_beta_min)
elif schedule == 've':
sigma = ve_sigma
sigma_deriv = ve_sigma_deriv
sigma_inv = ve_sigma_inv
else:
assert schedule == 'linear'
sigma = lambda t: t
sigma_deriv = lambda t: 1
sigma_inv = lambda sigma: sigma
# Define scaling schedule.
if scaling == 'vp':
s = lambda t: 1 / (1 + sigma(t) ** 2).sqrt()
s_deriv = lambda t: -sigma(t) * sigma_deriv(t) * (s(t) ** 3)
else:
assert scaling == 'none'
s = lambda t: 1
s_deriv = lambda t: 0
# Compute final time steps based on the corresponding noise levels.
t_steps = sigma_inv(net.round_sigma(sigma_steps))
t_steps = torch.cat([t_steps, torch.zeros_like(t_steps[:1])]) # t_N = 0
# Main sampling loop.
t_next = t_steps[0]
x_next = latents.to(torch.float64) * (sigma(t_next) * s(t_next))
for i, (t_cur, t_next) in enumerate(zip(t_steps[:-1], t_steps[1:])): # 0, ..., N-1
x_cur = x_next
# Increase noise temporarily.
gamma = min(S_churn / num_steps, np.sqrt(2) - 1) if S_min <= sigma(t_cur) <= S_max else 0
t_hat = sigma_inv(net.round_sigma(sigma(t_cur) + gamma * sigma(t_cur)))
x_hat = s(t_hat) / s(t_cur) * x_cur + (sigma(t_hat) ** 2 - sigma(t_cur) ** 2).clip(min=0).sqrt() * s(
t_hat) * S_noise * randn_like(x_cur)
# Euler step.
h = t_next - t_hat
denoised = net(x_hat.float() / s(t_hat), sigma(t_hat), class_labels, cfg_scale, feat=feat)['x'].to(
torch.float64)
d_cur = (sigma_deriv(t_hat) / sigma(t_hat) + s_deriv(t_hat) / s(t_hat)) * x_hat - sigma_deriv(t_hat) * s(
t_hat) / sigma(t_hat) * denoised
x_prime = x_hat + alpha * h * d_cur
t_prime = t_hat + alpha * h
# Apply 2nd order correction.
if solver == 'euler' or i == num_steps - 1:
x_next = x_hat + h * d_cur
else:
assert solver == 'heun'
denoised = net(x_prime.float() / s(t_prime), sigma(t_prime), class_labels, cfg_scale, feat=feat)['x'].to(
torch.float64)
d_prime = (sigma_deriv(t_prime) / sigma(t_prime) + s_deriv(t_prime) / s(t_prime)) * x_prime - sigma_deriv(
t_prime) * s(t_prime) / sigma(t_prime) * denoised
x_next = x_hat + h * ((1 - 1 / (2 * alpha)) * d_cur + 1 / (2 * alpha) * d_prime)
return x_next
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/gaussian_diffusion.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
import enum
import math
import numpy as np
import torch as th
import torch.nn.functional as F
from .diffusion_utils import discretized_gaussian_log_likelihood, normal_kl
from .cache_functions import cache_init
def mean_flat(tensor):
"""
Take the mean over all non-batch dimensions.
"""
return tensor.mean(dim=list(range(1, len(tensor.shape))))
class ModelMeanType(enum.Enum):
"""
Which type of output the model predicts.
"""
PREVIOUS_X = enum.auto() # the model predicts x_{t-1}
START_X = enum.auto() # the model predicts x_0
EPSILON = enum.auto() # the model predicts epsilon
class ModelVarType(enum.Enum):
"""
What is used as the model's output variance.
The LEARNED_RANGE option has been added to allow the model to predict
values between FIXED_SMALL and FIXED_LARGE, making its job easier.
"""
LEARNED = enum.auto()
FIXED_SMALL = enum.auto()
FIXED_LARGE = enum.auto()
LEARNED_RANGE = enum.auto()
class LossType(enum.Enum):
MSE = enum.auto() # use raw MSE loss (and KL when learning variances)
RESCALED_MSE = (
enum.auto()
) # use raw MSE loss (with RESCALED_KL when learning variances)
KL = enum.auto() # use the variational lower-bound
RESCALED_KL = enum.auto() # like KL, but rescale to estimate the full VLB
def is_vb(self):
return self in [LossType.KL, LossType.RESCALED_KL]
def _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, warmup_frac):
betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)
warmup_time = int(num_diffusion_timesteps * warmup_frac)
betas[:warmup_time] = np.linspace(beta_start, beta_end, warmup_time, dtype=np.float64)
return betas
def get_beta_schedule(beta_schedule, *, beta_start, beta_end, num_diffusion_timesteps):
"""
This is the deprecated API for creating beta schedules.
See get_named_beta_schedule() for the new library of schedules.
"""
if beta_schedule == "quad":
betas = (
np.linspace(
beta_start ** 0.5,
beta_end ** 0.5,
num_diffusion_timesteps,
dtype=np.float64,
)
** 2
)
elif beta_schedule == "linear":
betas = np.linspace(beta_start, beta_end, num_diffusion_timesteps, dtype=np.float64)
elif beta_schedule == "warmup10":
betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.1)
elif beta_schedule == "warmup50":
betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.5)
elif beta_schedule == "const":
betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)
elif beta_schedule == "jsd": # 1/T, 1/(T-1), 1/(T-2), ..., 1
betas = 1.0 / np.linspace(
num_diffusion_timesteps, 1, num_diffusion_timesteps, dtype=np.float64
)
else:
raise NotImplementedError(beta_schedule)
assert betas.shape == (num_diffusion_timesteps,)
return betas
def get_named_beta_schedule(schedule_name, num_diffusion_timesteps):
"""
Get a pre-defined beta schedule for the given name.
The beta schedule library consists of beta schedules which remain similar
in the limit of num_diffusion_timesteps.
Beta schedules may be added, but should not be removed or changed once
they are committed to maintain backwards compatibility.
"""
if schedule_name == "linear":
# Linear schedule from Ho et al, extended to work for any number of
# diffusion steps.
scale = 1000 / num_diffusion_timesteps
return get_beta_schedule(
"linear",
beta_start=scale * 0.0001,
beta_end=scale * 0.02,
num_diffusion_timesteps=num_diffusion_timesteps,
)
elif schedule_name == "squaredcos_cap_v2":
return betas_for_alpha_bar(
num_diffusion_timesteps,
lambda t: math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2,
)
else:
raise NotImplementedError(f"unknown beta schedule: {schedule_name}")
def betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.999):
"""
Create a beta schedule that discretizes the given alpha_t_bar function,
which defines the cumulative product of (1-beta) over time from t = [0,1].
:param num_diffusion_timesteps: the number of betas to produce.
:param alpha_bar: a lambda that takes an argument t from 0 to 1 and
produces the cumulative product of (1-beta) up to that
part of the diffusion process.
:param max_beta: the maximum beta to use; use values lower than 1 to
prevent singularities.
"""
betas = []
for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))
return np.array(betas)
class GaussianDiffusion:
"""
Utilities for training and sampling diffusion models.
Original ported from this codebase:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42
:param betas: a 1-D numpy array of betas for each diffusion timestep,
starting at T and going to 1.
"""
def __init__(
self,
*,
betas,
model_mean_type,
model_var_type,
loss_type,
snr=False,
return_startx=False,
):
self.model_mean_type = model_mean_type
self.model_var_type = model_var_type
self.loss_type = loss_type
self.snr = snr
self.return_startx = return_startx
# Use float64 for accuracy.
betas = np.array(betas, dtype=np.float64)
self.betas = betas
assert len(betas.shape) == 1, "betas must be 1-D"
assert (betas > 0).all() and (betas <= 1).all()
self.num_timesteps = int(betas.shape[0])
alphas = 1.0 - betas
self.alphas_cumprod = np.cumprod(alphas, axis=0)
self.alphas_cumprod_prev = np.append(1.0, self.alphas_cumprod[:-1])
self.alphas_cumprod_next = np.append(self.alphas_cumprod[1:], 0.0)
assert self.alphas_cumprod_prev.shape == (self.num_timesteps,)
# calculations for diffusion q(x_t | x_{t-1}) and others
self.sqrt_alphas_cumprod = np.sqrt(self.alphas_cumprod)
self.sqrt_one_minus_alphas_cumprod = np.sqrt(1.0 - self.alphas_cumprod)
self.log_one_minus_alphas_cumprod = np.log(1.0 - self.alphas_cumprod)
self.sqrt_recip_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod)
self.sqrt_recipm1_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod - 1)
# calculations for posterior q(x_{t-1} | x_t, x_0)
self.posterior_variance = (
betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)
)
# below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain
self.posterior_log_variance_clipped = np.log(
np.append(self.posterior_variance[1], self.posterior_variance[1:])
) if len(self.posterior_variance) > 1 else np.array([])
self.posterior_mean_coef1 = (
betas * np.sqrt(self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)
)
self.posterior_mean_coef2 = (
(1.0 - self.alphas_cumprod_prev) * np.sqrt(alphas) / (1.0 - self.alphas_cumprod)
)
def q_mean_variance(self, x_start, t):
"""
Get the distribution q(x_t | x_0).
:param x_start: the [N x C x ...] tensor of noiseless inputs.
:param t: the number of diffusion steps (minus 1). Here, 0 means one step.
:return: A tuple (mean, variance, log_variance), all of x_start's shape.
"""
mean = _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start
variance = _extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)
log_variance = _extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)
return mean, variance, log_variance
def q_sample(self, x_start, t, noise=None):
"""
Diffuse the data for a given number of diffusion steps.
In other words, sample from q(x_t | x_0).
:param x_start: the initial data batch.
:param t: the number of diffusion steps (minus 1). Here, 0 means one step.
:param noise: if specified, the split-out normal noise.
:return: A noisy version of x_start.
"""
if noise is None:
noise = th.randn_like(x_start)
assert noise.shape == x_start.shape
return (
_extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start
+ _extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise
)
def q_posterior_mean_variance(self, x_start, x_t, t):
"""
Compute the mean and variance of the diffusion posterior:
q(x_{t-1} | x_t, x_0)
"""
assert x_start.shape == x_t.shape
posterior_mean = (
_extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start
+ _extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t
)
posterior_variance = _extract_into_tensor(self.posterior_variance, t, x_t.shape)
posterior_log_variance_clipped = _extract_into_tensor(
self.posterior_log_variance_clipped, t, x_t.shape
)
assert (
posterior_mean.shape[0]
== posterior_variance.shape[0]
== posterior_log_variance_clipped.shape[0]
== x_start.shape[0]
)
return posterior_mean, posterior_variance, posterior_log_variance_clipped
def p_mean_variance(self, model, x, t, current, cache_dic, clip_denoised=True, denoised_fn=None, model_kwargs=None):
"""
Apply the model to get p(x_{t-1} | x_t), as well as a prediction of
the initial x, x_0.
:param model: the model, which takes a signal and a batch of timesteps
as input.
:param x: the [N x C x ...] tensor at time t.
:param t: a 1-D Tensor of timesteps.
:param clip_denoised: if True, clip the denoised signal into [-1, 1].
:param denoised_fn: if not None, a function which applies to the
x_start prediction before it is used to sample. Applies before
clip_denoised.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:return: a dict with the following keys:
- 'mean': the model mean output.
- 'variance': the model variance output.
- 'log_variance': the log of 'variance'.
- 'pred_xstart': the prediction for x_0.
"""
if model_kwargs is None:
model_kwargs = {}
B, C = x.shape[:2]
assert t.shape == (B,)
model_output = model(x, t, current=current, cache_dic=cache_dic, **model_kwargs)
if isinstance(model_output, tuple):
model_output, extra = model_output
else:
extra = None
if self.model_var_type in [ModelVarType.LEARNED, ModelVarType.LEARNED_RANGE]:
assert model_output.shape == (B, C * 2, *x.shape[2:])
model_output, model_var_values = th.split(model_output, C, dim=1)
min_log = _extract_into_tensor(self.posterior_log_variance_clipped, t, x.shape)
max_log = _extract_into_tensor(np.log(self.betas), t, x.shape)
# The model_var_values is [-1, 1] for [min_var, max_var].
frac = (model_var_values + 1) / 2
model_log_variance = frac * max_log + (1 - frac) * min_log
model_variance = th.exp(model_log_variance)
elif self.model_var_type in [ModelVarType.FIXED_LARGE, ModelVarType.FIXED_SMALL]:
model_variance, model_log_variance = {
# for fixedlarge, we set the initial (log-)variance like so
# to get a better decoder log likelihood.
ModelVarType.FIXED_LARGE: (
np.append(self.posterior_variance[1], self.betas[1:]),
np.log(np.append(self.posterior_variance[1], self.betas[1:])),
),
ModelVarType.FIXED_SMALL: (
self.posterior_variance,
self.posterior_log_variance_clipped,
),
}[self.model_var_type]
model_variance = _extract_into_tensor(model_variance, t, x.shape)
model_log_variance = _extract_into_tensor(model_log_variance, t, x.shape)
else:
model_variance = th.zeros_like(model_output)
model_log_variance = th.zeros_like(model_output)
def process_xstart(x):
if denoised_fn is not None:
x = denoised_fn(x)
return x.clamp(-1, 1) if clip_denoised else x
if self.model_mean_type == ModelMeanType.START_X:
pred_xstart = process_xstart(model_output)
else:
pred_xstart = process_xstart(
self._predict_xstart_from_eps(x_t=x, t=t, eps=model_output)
)
model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)
assert model_mean.shape == model_log_variance.shape == pred_xstart.shape == x.shape
return {
"mean": model_mean,
"variance": model_variance,
"log_variance": model_log_variance,
"pred_xstart": pred_xstart,
"extra": extra,
}
def _predict_xstart_from_eps(self, x_t, t, eps):
assert x_t.shape == eps.shape
return (
_extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t
- _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * eps
)
def _predict_eps_from_xstart(self, x_t, t, pred_xstart):
return (
_extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t - pred_xstart
) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape)
def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
"""
Compute the mean for the previous step, given a function cond_fn that
computes the gradient of a conditional log probability with respect to
x. In particular, cond_fn computes grad(log(p(y|x))), and we want to
condition on y.
This uses the conditioning strategy from Sohl-Dickstein et al. (2015).
"""
gradient = cond_fn(x, t, **model_kwargs)
return p_mean_var["mean"].float() + p_mean_var["variance"] * gradient.float()
def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
"""
Compute what the p_mean_variance output would have been, should the
model's score function be conditioned by cond_fn.
See condition_mean() for details on cond_fn.
Unlike condition_mean(), this instead uses the conditioning strategy
from Song et al (2020).
"""
alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)
eps = self._predict_eps_from_xstart(x, t, p_mean_var["pred_xstart"])
eps = eps - (1 - alpha_bar).sqrt() * cond_fn(x, t, **model_kwargs)
out = p_mean_var.copy()
out["pred_xstart"] = self._predict_xstart_from_eps(x, t, eps)
out["mean"], _, _ = self.q_posterior_mean_variance(x_start=out["pred_xstart"], x_t=x, t=t)
return out
def p_sample(
self,
model,
x,
t,
current=None,
cache_dic=None,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
):
"""
Sample x_{t-1} from the model at the given timestep.
:param model: the model to sample from.
:param x: the current tensor at x_{t-1}.
:param t: the value of t, starting at 0 for the first diffusion step.
:param clip_denoised: if True, clip the x_start prediction to [-1, 1].
:param denoised_fn: if not None, a function which applies to the
x_start prediction before it is used to sample.
:param cond_fn: if not None, this is a gradient function that acts
similarly to the model.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:return: a dict containing the following keys:
- 'sample': a random sample from the model.
- 'pred_xstart': a prediction of x_0.
"""
out = self.p_mean_variance(
model,
x,
t,
current=current,
cache_dic=cache_dic,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
)
noise = th.randn_like(x)
nonzero_mask = (
(t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
) # no noise when t == 0
if cond_fn is not None:
out["mean"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)
sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
return {"sample": sample, "pred_xstart": out["pred_xstart"]}
def p_sample_loop(
self,
model,
shape,
noise=None,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
device=None,
progress=False
):
"""
Generate samples from the model.
:param model: the model module.
:param shape: the shape of the samples, (N, C, H, W).
:param noise: if specified, the noise from the encoder to sample.
Should be of the same shape as `shape`.
:param clip_denoised: if True, clip x_start predictions to [-1, 1].
:param denoised_fn: if not None, a function which applies to the
x_start prediction before it is used to sample.
:param cond_fn: if not None, this is a gradient function that acts
similarly to the model.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:param device: if specified, the device to create the samples on.
If not specified, use a model parameter's device.
:param progress: if True, show a tqdm progress bar.
:return: a non-differentiable batch of samples.
"""
final = None
for sample in self.p_sample_loop_progressive(
model,
shape,
noise=noise,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
cond_fn=cond_fn,
model_kwargs=model_kwargs,
device=device,
progress=progress
):
final = sample
return final["sample"]
def p_sample_loop_progressive(
self,
model,
shape,
noise=None,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
device=None,
progress=False
):
"""
Generate samples from the model and yield intermediate samples from
each timestep of diffusion.
Arguments are the same as p_sample_loop().
Returns a generator over dicts, where each dict is the return value of
p_sample().
"""
if device is None:
device = next(model.parameters()).device
assert isinstance(shape, (tuple, list))
img = noise if noise is not None else th.randn(*shape, device=device)
indices = list(range(self.num_timesteps))[::-1]
if progress:
# Lazy import so that we don't depend on tqdm.
from tqdm.auto import tqdm
indices = tqdm(indices)
cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=self.num_timesteps)
for i in indices:
t = th.tensor([i] * shape[0], device=device)
with th.no_grad():
current['step'] = i
out = self.p_sample(
model,
img,
t,
current=current,
cache_dic=cache_dic,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
cond_fn=cond_fn,
model_kwargs=model_kwargs,
)
yield out
img = out["sample"]
def ddim_sample(
self,
model,
x,
t,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
eta=0.0,
):
"""
Sample x_{t-1} from the model using DDIM.
Same usage as p_sample().
"""
out = self.p_mean_variance(
model,
x,
t,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
)
if cond_fn is not None:
out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)
# Usually our model outputs epsilon, but we re-derive it
# in case we used x_start or x_prev prediction.
eps = self._predict_eps_from_xstart(x, t, out["pred_xstart"])
alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)
alpha_bar_prev = _extract_into_tensor(self.alphas_cumprod_prev, t, x.shape)
sigma = (
eta
* th.sqrt((1 - alpha_bar_prev) / (1 - alpha_bar))
* th.sqrt(1 - alpha_bar / alpha_bar_prev)
)
# Equation 12.
noise = th.randn_like(x)
mean_pred = (
out["pred_xstart"] * th.sqrt(alpha_bar_prev)
+ th.sqrt(1 - alpha_bar_prev - sigma ** 2) * eps
)
nonzero_mask = (
(t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
) # no noise when t == 0
sample = mean_pred + nonzero_mask * sigma * noise
return {"sample": sample, "pred_xstart": out["pred_xstart"]}
def ddim_reverse_sample(
self,
model,
x,
t,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
eta=0.0,
):
"""
Sample x_{t+1} from the model using DDIM reverse ODE.
"""
assert eta == 0.0, "Reverse ODE only for deterministic path"
out = self.p_mean_variance(
model,
x,
t,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
)
if cond_fn is not None:
out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)
# Usually our model outputs epsilon, but we re-derive it
# in case we used x_start or x_prev prediction.
eps = (
_extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x.shape) * x
- out["pred_xstart"]
) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x.shape)
alpha_bar_next = _extract_into_tensor(self.alphas_cumprod_next, t, x.shape)
# Equation 12. reversed
mean_pred = out["pred_xstart"] * th.sqrt(alpha_bar_next) + th.sqrt(1 - alpha_bar_next) * eps
return {"sample": mean_pred, "pred_xstart": out["pred_xstart"]}
def ddim_sample_loop(
self,
model,
shape,
noise=None,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
device=None,
progress=False,
eta=0.0,
):
"""
Generate samples from the model using DDIM.
Same usage as p_sample_loop().
"""
final = None
for sample in self.ddim_sample_loop_progressive(
model,
shape,
noise=noise,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
cond_fn=cond_fn,
model_kwargs=model_kwargs,
device=device,
progress=progress,
eta=eta,
):
final = sample
return final["sample"]
def ddim_sample_loop_progressive(
self,
model,
shape,
noise=None,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
device=None,
progress=False,
eta=0.0,
):
"""
Use DDIM to sample from the model and yield intermediate samples from
each timestep of DDIM.
Same usage as p_sample_loop_progressive().
"""
if device is None:
device = next(model.parameters()).device
assert isinstance(shape, (tuple, list))
img = noise if noise is not None else th.randn(*shape, device=device)
indices = list(range(self.num_timesteps))[::-1]
if progress:
# Lazy import so that we don't depend on tqdm.
from tqdm.auto import tqdm
indices = tqdm(indices)
for i in indices:
t = th.tensor([i] * shape[0], device=device)
with th.no_grad():
out = self.ddim_sample(
model,
img,
t,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
cond_fn=cond_fn,
model_kwargs=model_kwargs,
eta=eta,
)
yield out
img = out["sample"]
def _vb_terms_bpd(
self, model, x_start, x_t, t, clip_denoised=True, model_kwargs=None
):
"""
Get a term for the variational lower-bound.
The resulting units are bits (rather than nats, as one might expect).
This allows for comparison to other papers.
:return: a dict with the following keys:
- 'output': a shape [N] tensor of NLLs or KLs.
- 'pred_xstart': the x_0 predictions.
"""
true_mean, _, true_log_variance_clipped = self.q_posterior_mean_variance(
x_start=x_start, x_t=x_t, t=t
)
out = self.p_mean_variance(
model, x_t, t, clip_denoised=clip_denoised, model_kwargs=model_kwargs
)
kl = normal_kl(
true_mean, true_log_variance_clipped, out["mean"], out["log_variance"]
)
kl = mean_flat(kl) / np.log(2.0)
decoder_nll = -discretized_gaussian_log_likelihood(
x_start, means=out["mean"], log_scales=0.5 * out["log_variance"]
)
assert decoder_nll.shape == x_start.shape
decoder_nll = mean_flat(decoder_nll) / np.log(2.0)
# At the first timestep return the decoder NLL,
# otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))
output = th.where((t == 0), decoder_nll, kl)
return {"output": output, "pred_xstart": out["pred_xstart"]}
def training_losses(self, model, x_start, timestep, model_kwargs=None, noise=None, skip_noise=False):
"""
Compute training losses for a single timestep.
:param model: the model to evaluate loss on.
:param x_start: the [N x C x ...] tensor of inputs.
:param t: a batch of timestep indices.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:param noise: if specified, the specific Gaussian noise to try to remove.
:return: a dict with the key "loss" containing a tensor of shape [N].
Some mean or variance settings may also have other keys.
"""
t = timestep
if model_kwargs is None:
model_kwargs = {}
if skip_noise:
x_t = x_start
else:
if noise is None:
noise = th.randn_like(x_start)
x_t = self.q_sample(x_start, t, noise=noise)
terms = {}
if self.loss_type == LossType.KL or self.loss_type == LossType.RESCALED_KL:
terms["loss"] = self._vb_terms_bpd(
model=model,
x_start=x_start,
x_t=x_t,
t=t,
clip_denoised=False,
model_kwargs=model_kwargs,
)["output"]
if self.loss_type == LossType.RESCALED_KL:
terms["loss"] *= self.num_timesteps
elif self.loss_type in [LossType.MSE, LossType.RESCALED_MSE]:
model_output = model(x_t, t, **model_kwargs)
if isinstance(model_output, dict) and model_output.get('x', None) is not None:
output = model_output['x']
else:
output = model_output
if self.return_startx and self.model_mean_type == ModelMeanType.EPSILON:
return self._extracted_from_training_losses_diffusers(x_t, output, t)
# self.model_var_type = ModelVarType.LEARNED_RANGE:4
if self.model_var_type in [
ModelVarType.LEARNED,
ModelVarType.LEARNED_RANGE,
]:
B, C = x_t.shape[:2]
assert output.shape == (B, C * 2, *x_t.shape[2:])
output, model_var_values = th.split(output, C, dim=1)
# Learn the variance using the variational bound, but don't let it affect our mean prediction.
frozen_out = th.cat([output.detach(), model_var_values], dim=1)
# vb variational bound
terms["vb"] = self._vb_terms_bpd(
model=lambda *args, r=frozen_out, **kwargs: r,
x_start=x_start,
x_t=x_t,
t=t,
clip_denoised=False,
)["output"]
if self.loss_type == LossType.RESCALED_MSE:
# Divide by 1000 for equivalence with initial implementation.
# Without a factor of 1/1000, the VB term hurts the MSE term.
terms["vb"] *= self.num_timesteps / 1000.0
target = {
ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(
x_start=x_start, x_t=x_t, t=t
)[0],
ModelMeanType.START_X: x_start,
ModelMeanType.EPSILON: noise,
}[self.model_mean_type]
assert output.shape == target.shape == x_start.shape
if self.snr:
if self.model_mean_type == ModelMeanType.START_X:
pred_noise = self._predict_eps_from_xstart(x_t=x_t, t=t, pred_xstart=output)
pred_startx = output
elif self.model_mean_type == ModelMeanType.EPSILON:
pred_noise = output
pred_startx = self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output)
# terms["mse_eps"] = mean_flat((noise - pred_noise) ** 2)
# terms["mse_x0"] = mean_flat((x_start - pred_startx) ** 2)
t = t[:, None, None, None].expand(pred_startx.shape) # [128, 4, 32, 32]
# best
target = th.where(t > 249, noise, x_start)
output = th.where(t > 249, pred_noise, pred_startx)
loss = (target - output) ** 2
if model_kwargs.get('mask_ratio', False) and model_kwargs['mask_ratio'] > 0:
assert 'mask' in model_output
loss = F.avg_pool2d(loss.mean(dim=1), model.model.module.patch_size).flatten(1)
mask = model_output['mask']
unmask = 1 - mask
terms['mse'] = mean_flat(loss * unmask) * unmask.shape[1]/unmask.sum(1)
if model_kwargs['mask_loss_coef'] > 0:
terms['mae'] = model_kwargs['mask_loss_coef'] * mean_flat(loss * mask) * mask.shape[1]/mask.sum(1)
else:
terms["mse"] = mean_flat(loss)
terms["loss"] = terms["mse"] + terms["vb"] if "vb" in terms else terms["mse"]
if "mae" in terms:
terms["loss"] = terms["loss"] + terms["mae"]
else:
raise NotImplementedError(self.loss_type)
return terms
def training_losses_diffusers(self, model, x_start, timestep, model_kwargs=None, noise=None, skip_noise=False):
"""
Compute training losses for a single timestep.
:param model: the model to evaluate loss on.
:param x_start: the [N x C x ...] tensor of inputs.
:param t: a batch of timestep indices.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:param noise: if specified, the specific Gaussian noise to try to remove.
:return: a dict with the key "loss" containing a tensor of shape [N].
Some mean or variance settings may also have other keys.
"""
t = timestep
if model_kwargs is None:
model_kwargs = {}
if skip_noise:
x_t = x_start
else:
if noise is None:
noise = th.randn_like(x_start)
x_t = self.q_sample(x_start, t, noise=noise)
terms = {}
if self.loss_type in [LossType.KL, LossType.RESCALED_KL]:
terms["loss"] = self._vb_terms_bpd(
model=model,
x_start=x_start,
x_t=x_t,
t=t,
clip_denoised=False,
model_kwargs=model_kwargs,
)["output"]
if self.loss_type == LossType.RESCALED_KL:
terms["loss"] *= self.num_timesteps
elif self.loss_type in [LossType.MSE, LossType.RESCALED_MSE]:
output = model(x_t, timestep=t, **model_kwargs, return_dict=False)[0]
if self.return_startx and self.model_mean_type == ModelMeanType.EPSILON:
return self._extracted_from_training_losses_diffusers(x_t, output, t)
if self.model_var_type in [
ModelVarType.LEARNED,
ModelVarType.LEARNED_RANGE,
]:
B, C = x_t.shape[:2]
assert output.shape == (B, C * 2, *x_t.shape[2:])
output, model_var_values = th.split(output, C, dim=1)
# Learn the variance using the variational bound, but don't let it affect our mean prediction.
frozen_out = th.cat([output.detach(), model_var_values], dim=1)
terms["vb"] = self._vb_terms_bpd(
model=lambda *args, r=frozen_out, **kwargs: r,
x_start=x_start,
x_t=x_t,
t=t,
clip_denoised=False,
)["output"]
if self.loss_type == LossType.RESCALED_MSE:
# Divide by 1000 for equivalence with initial implementation.
# Without a factor of 1/1000, the VB term hurts the MSE term.
terms["vb"] *= self.num_timesteps / 1000.0
target = {
ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(
x_start=x_start, x_t=x_t, t=t
)[0],
ModelMeanType.START_X: x_start,
ModelMeanType.EPSILON: noise,
}[self.model_mean_type]
assert output.shape == target.shape == x_start.shape
if self.snr:
if self.model_mean_type == ModelMeanType.START_X:
pred_noise = self._predict_eps_from_xstart(x_t=x_t, t=t, pred_xstart=output)
pred_startx = output
elif self.model_mean_type == ModelMeanType.EPSILON:
pred_noise = output
pred_startx = self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output)
# terms["mse_eps"] = mean_flat((noise - pred_noise) ** 2)
# terms["mse_x0"] = mean_flat((x_start - pred_startx) ** 2)
t = t[:, None, None, None].expand(pred_startx.shape) # [128, 4, 32, 32]
# best
target = th.where(t > 249, noise, x_start)
output = th.where(t > 249, pred_noise, pred_startx)
loss = (target - output) ** 2
terms["mse"] = mean_flat(loss)
terms["loss"] = terms["mse"] + terms["vb"] if "vb" in terms else terms["mse"]
if "mae" in terms:
terms["loss"] = terms["loss"] + terms["mae"]
else:
raise NotImplementedError(self.loss_type)
return terms
def _extracted_from_training_losses_diffusers(self, x_t, output, t):
B, C = x_t.shape[:2]
assert output.shape == (B, C * 2, *x_t.shape[2:])
output = th.split(output, C, dim=1)[0]
return output, self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output), x_t
def _prior_bpd(self, x_start):
"""
Get the prior KL term for the variational lower-bound, measured in
bits-per-dim.
This term can't be optimized, as it only depends on the encoder.
:param x_start: the [N x C x ...] tensor of inputs.
:return: a batch of [N] KL values (in bits), one per batch element.
"""
batch_size = x_start.shape[0]
t = th.tensor([self.num_timesteps - 1] * batch_size, device=x_start.device)
qt_mean, _, qt_log_variance = self.q_mean_variance(x_start, t)
kl_prior = normal_kl(
mean1=qt_mean, logvar1=qt_log_variance, mean2=0.0, logvar2=0.0
)
return mean_flat(kl_prior) / np.log(2.0)
def calc_bpd_loop(self, model, x_start, clip_denoised=True, model_kwargs=None):
"""
Compute the entire variational lower-bound, measured in bits-per-dim,
as well as other related quantities.
:param model: the model to evaluate loss on.
:param x_start: the [N x C x ...] tensor of inputs.
:param clip_denoised: if True, clip denoised samples.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:return: a dict containing the following keys:
- total_bpd: the total variational lower-bound, per batch element.
- prior_bpd: the prior term in the lower-bound.
- vb: an [N x T] tensor of terms in the lower-bound.
- xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep.
- mse: an [N x T] tensor of epsilon MSEs for each timestep.
"""
device = x_start.device
batch_size = x_start.shape[0]
vb = []
xstart_mse = []
mse = []
for t in list(range(self.num_timesteps))[::-1]:
t_batch = th.tensor([t] * batch_size, device=device)
noise = th.randn_like(x_start)
x_t = self.q_sample(x_start=x_start, t=t_batch, noise=noise)
# Calculate VLB term at the current timestep
with th.no_grad():
out = self._vb_terms_bpd(
model,
x_start=x_start,
x_t=x_t,
t=t_batch,
clip_denoised=clip_denoised,
model_kwargs=model_kwargs,
)
vb.append(out["output"])
xstart_mse.append(mean_flat((out["pred_xstart"] - x_start) ** 2))
eps = self._predict_eps_from_xstart(x_t, t_batch, out["pred_xstart"])
mse.append(mean_flat((eps - noise) ** 2))
vb = th.stack(vb, dim=1)
xstart_mse = th.stack(xstart_mse, dim=1)
mse = th.stack(mse, dim=1)
prior_bpd = self._prior_bpd(x_start)
total_bpd = vb.sum(dim=1) + prior_bpd
return {
"total_bpd": total_bpd,
"prior_bpd": prior_bpd,
"vb": vb,
"xstart_mse": xstart_mse,
"mse": mse,
}
def _extract_into_tensor(arr, timesteps, broadcast_shape):
"""
Extract values from a 1-D numpy array for a batch of indices.
:param arr: the 1-D numpy array.
:param timesteps: a tensor of indices into the array to extract.
:param broadcast_shape: a larger shape of K dimensions with the batch
dimension equal to the length of timesteps.
:return: a tensor of shape [batch_size, 1, ...] where the shape has K dims.
"""
res = th.from_numpy(arr).to(device=timesteps.device)[timesteps].float()
while len(res.shape) < len(broadcast_shape):
res = res[..., None]
return res + th.zeros(broadcast_shape, device=timesteps.device)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/hed.py
================================================
# This is an improved version and model of HED edge detection with Apache License, Version 2.0.
# Please use this implementation in your products
# This implementation may produce slightly different results from Saining Xie's official implementations,
# but it generates smoother edges and is more suitable for ControlNet as well as other image-to-image translations.
# Different from official models and other implementations, this is an RGB-input model (rather than BGR)
# and in this way it works better for gradio's RGB protocol
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent.parent))
from torch import nn
import torch
import numpy as np
from torchvision import transforms as T
from tqdm import tqdm
from torch.utils.data import Dataset, DataLoader
import json
from PIL import Image
import torchvision.transforms.functional as TF
from accelerate import Accelerator
from diffusers.models import AutoencoderKL
import os
image_resize = 1024
class DoubleConvBlock(nn.Module):
def __init__(self, input_channel, output_channel, layer_number):
super().__init__()
self.convs = torch.nn.Sequential()
self.convs.append(torch.nn.Conv2d(in_channels=input_channel, out_channels=output_channel, kernel_size=(3, 3), stride=(1, 1), padding=1))
for i in range(1, layer_number):
self.convs.append(torch.nn.Conv2d(in_channels=output_channel, out_channels=output_channel, kernel_size=(3, 3), stride=(1, 1), padding=1))
self.projection = torch.nn.Conv2d(in_channels=output_channel, out_channels=1, kernel_size=(1, 1), stride=(1, 1), padding=0)
def forward(self, x, down_sampling=False):
h = x
if down_sampling:
h = torch.nn.functional.max_pool2d(h, kernel_size=(2, 2), stride=(2, 2))
for conv in self.convs:
h = conv(h)
h = torch.nn.functional.relu(h)
return h, self.projection(h)
class ControlNetHED_Apache2(nn.Module):
def __init__(self):
super().__init__()
self.norm = torch.nn.Parameter(torch.zeros(size=(1, 3, 1, 1)))
self.block1 = DoubleConvBlock(input_channel=3, output_channel=64, layer_number=2)
self.block2 = DoubleConvBlock(input_channel=64, output_channel=128, layer_number=2)
self.block3 = DoubleConvBlock(input_channel=128, output_channel=256, layer_number=3)
self.block4 = DoubleConvBlock(input_channel=256, output_channel=512, layer_number=3)
self.block5 = DoubleConvBlock(input_channel=512, output_channel=512, layer_number=3)
def forward(self, x):
h = x - self.norm
h, projection1 = self.block1(h)
h, projection2 = self.block2(h, down_sampling=True)
h, projection3 = self.block3(h, down_sampling=True)
h, projection4 = self.block4(h, down_sampling=True)
h, projection5 = self.block5(h, down_sampling=True)
return projection1, projection2, projection3, projection4, projection5
class InternData(Dataset):
def __init__(self):
####
with open('data/InternData/partition/data_info.json', 'r') as f:
self.j = json.load(f)
self.transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(image_resize), # Image.BICUBIC
T.CenterCrop(image_resize),
T.ToTensor(),
])
def __len__(self):
return len(self.j)
def getdata(self, idx):
path = self.j[idx]['path']
image = Image.open("data/InternImgs/" + path)
image = self.transform(image)
return image, path
def __getitem__(self, idx):
for i in range(20):
try:
data = self.getdata(idx)
return data
except Exception as e:
print(f"Error details: {str(e)}")
idx = np.random.randint(len(self))
raise RuntimeError('Too many bad data.')
class HEDdetector(nn.Module):
def __init__(self, feature=True, vae=None):
super().__init__()
self.model = ControlNetHED_Apache2()
self.model.load_state_dict(torch.load('output/pretrained_models/ControlNetHED.pth', map_location='cpu'))
self.model.eval()
self.model.requires_grad_(False)
if feature:
if vae is None:
self.vae = AutoencoderKL.from_pretrained("output/pretrained_models/sd-vae-ft-ema")
else:
self.vae = vae
self.vae.eval()
self.vae.requires_grad_(False)
else:
self.vae = None
def forward(self, input_image):
B, C, H, W = input_image.shape
with torch.inference_mode():
edges = self.model(input_image * 255.)
edges = torch.cat([TF.resize(e, [H, W]) for e in edges], dim=1)
edge = 1 / (1 + torch.exp(-torch.mean(edges, dim=1, keepdim=True)))
edge.clip_(0, 1)
if self.vae:
edge = TF.normalize(edge, [.5], [.5])
edge = edge.repeat(1, 3, 1, 1)
posterior = self.vae.encode(edge).latent_dist
edge = torch.cat([posterior.mean, posterior.std], dim=1).cpu().numpy()
return edge
def main():
dataset = InternData()
dataloader = DataLoader(dataset, batch_size=10, shuffle=False, num_workers=8, pin_memory=True)
hed = HEDdetector()
accelerator = Accelerator()
hed, dataloader = accelerator.prepare(hed, dataloader)
for img, path in tqdm(dataloader):
out = hed(img.cuda())
for p, o in zip(path, out):
save = f'data/InternalData/hed_feature_{image_resize}/' + p.replace('.png', '.npz')
if os.path.exists(save):
continue
os.makedirs(os.path.dirname(save), exist_ok=True)
np.savez_compressed(save, o)
if __name__ == "__main__":
main()
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/__init__.py
================================================
from diffusion.model.llava.llava_mpt import LlavaMPTForCausalLM, LlavaMPTConfig
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/llava_mpt.py
================================================
# Copyright 2023 Haotian Liu
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Optional, Tuple, Union
import warnings
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss
import math
from transformers import AutoConfig, AutoModelForCausalLM, CLIPVisionModel, CLIPImageProcessor
from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
from diffusion.model.llava.mpt.modeling_mpt import MPTConfig, MPTForCausalLM, MPTModel
DEFAULT_IMAGE_TOKEN = ""
DEFAULT_IMAGE_PATCH_TOKEN = ""
DEFAULT_IM_START_TOKEN = ""
DEFAULT_IM_END_TOKEN = ""
class LlavaMPTConfig(MPTConfig):
model_type = "llava_mpt"
class LlavaMPTModel(MPTModel):
config_class = LlavaMPTConfig
def __init__(self, config: MPTConfig, mm_vision_tower=None, mm_hidden_size=None):
super(LlavaMPTModel, self).__init__(config)
if hasattr(config, "mm_vision_tower"):
# HACK: for FSDP
self.vision_tower = [CLIPVisionModel.from_pretrained(config.mm_vision_tower)]
# self.vision_tower = CLIPVisionModel.from_pretrained(config.mm_vision_tower)
if hasattr(config, "use_mm_proj"):
self.mm_projector = nn.Linear(config.mm_hidden_size, config.d_model)
def initialize_vision_modules(self, vision_tower, mm_vision_select_layer,
pretrain_mm_mlp_adapter=None, tune_mm_mlp_adapter=False):
self.config.mm_vision_tower = vision_tower
image_processor = CLIPImageProcessor.from_pretrained(vision_tower)
if not hasattr(self, 'vision_tower'):
vision_tower = CLIPVisionModel.from_pretrained(vision_tower)
else:
vision_tower = self.vision_tower[0]
vision_tower.requires_grad_(False)
vision_tower = vision_tower.to(torch.float16)
self.vision_tower = [vision_tower]
vision_config = vision_tower.config
num_patches = (vision_config.image_size // vision_config.patch_size) ** 2
self.config.use_mm_proj = True
self.config.mm_hidden_size = vision_config.hidden_size
self.config.mm_vision_select_layer = mm_vision_select_layer
if not hasattr(self, 'mm_projector'):
self.mm_projector = nn.Linear(vision_config.hidden_size, self.config.d_model)
if pretrain_mm_mlp_adapter is not None:
mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location='cpu')
self.mm_projector.load_state_dict({k.split('.')[-1]: v for k, v in mm_projector_weights.items() if 'mm_projector' in k})
return dict(
image_processor=image_processor,
image_token_len=num_patches,
vision_config=vision_config
)
def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, images=None):
# HACK: replace back original embeddings for LLaVA pretraining
orig_embeds_params = getattr(self, 'orig_embeds_params', None)
# if orig_embeds_params is not None:
# orig_embeds_params = orig_embeds_params[0]
# with torch.no_grad():
# self.get_input_embeddings().weight.data[:-2] = orig_embeds_params[:-2].data
inputs_embeds = self.wte(input_ids)
vision_tower = getattr(self, 'vision_tower', None)
if vision_tower is not None and (input_ids.shape[1] != 1 or self.training) and images is not None:
# TODO: this is a modified multimodal LLM -- Haotian Liu
vision_tower = vision_tower[0] # HACK: for FSDP
with torch.no_grad():
if type(images) is list:
# variable length images
image_features = []
for image in images:
image_forward_out = vision_tower(image.unsqueeze(0), output_hidden_states=True)
select_hidden_state_layer = getattr(self.config, "mm_vision_select_layer", -1)
select_hidden_state = image_forward_out.hidden_states[select_hidden_state_layer]
image_feature = select_hidden_state[:, 1:]
image_features.append(image_feature)
else:
image_forward_outs = vision_tower(images, output_hidden_states=True)
select_hidden_state_layer = getattr(self.config, "mm_vision_select_layer", -1)
select_hidden_state = image_forward_outs.hidden_states[select_hidden_state_layer]
image_features = select_hidden_state[:, 1:]
if type(images) is list:
image_features = [self.mm_projector(image_feature)[0] for image_feature in image_features]
else:
image_features = self.mm_projector(image_features)
dummy_image_features = torch.zeros(256, 1024, device=inputs_embeds.device, dtype=inputs_embeds.dtype)
dummy_image_features = self.mm_projector(dummy_image_features)
new_input_embeds = []
cur_image_idx = 0
for cur_input_ids, cur_input_embeds in zip(input_ids, inputs_embeds):
if (cur_input_ids == vision_tower.config.im_patch_token).sum() == 0:
# multimodal LLM, but the current sample is not multimodal
cur_input_embeds = cur_input_embeds + (0. * dummy_image_features).sum()
new_input_embeds.append(cur_input_embeds)
continue
cur_image_features = image_features[cur_image_idx]
num_patches = cur_image_features.shape[0]
if vision_tower.config.use_im_start_end:
if (cur_input_ids == vision_tower.config.im_start_token).sum() != (cur_input_ids == vision_tower.config.im_end_token).sum():
raise ValueError("The number of image start tokens and image end tokens should be the same.")
image_start_tokens = torch.where(cur_input_ids == vision_tower.config.im_start_token)[0]
for image_start_token_pos in image_start_tokens:
cur_image_features = image_features[cur_image_idx].to(device=cur_input_embeds.device)
num_patches = cur_image_features.shape[0]
if cur_input_ids[image_start_token_pos + num_patches + 1] != vision_tower.config.im_end_token:
raise ValueError("The image end token should follow the image start token.")
if orig_embeds_params is not None:
cur_new_input_embeds = torch.cat((cur_input_embeds[:image_start_token_pos].detach(), cur_input_embeds[image_start_token_pos:image_start_token_pos+1], cur_image_features, cur_input_embeds[image_start_token_pos + num_patches + 1:image_start_token_pos + num_patches + 2], cur_input_embeds[image_start_token_pos + num_patches + 2:].detach()), dim=0)
else:
cur_new_input_embeds = torch.cat((cur_input_embeds[:image_start_token_pos+1], cur_image_features, cur_input_embeds[image_start_token_pos + num_patches + 1:]), dim=0)
cur_image_idx += 1
else:
if (cur_input_ids == vision_tower.config.im_patch_token).sum() != num_patches:
raise ValueError("The number of image patch tokens should be the same as the number of image patches.")
masked_indices = torch.where(cur_input_ids == vision_tower.config.im_patch_token)[0]
mask_index_start = masked_indices[0]
if (masked_indices != torch.arange(mask_index_start, mask_index_start+num_patches, device=masked_indices.device, dtype=masked_indices.dtype)).any():
raise ValueError("The image patch tokens should be consecutive.")
if orig_embeds_params is not None:
cur_new_input_embeds = torch.cat((cur_input_embeds[:mask_index_start].detach(), cur_image_features, cur_input_embeds[mask_index_start+num_patches:].detach()), dim=0)
else:
cur_new_input_embeds = torch.cat((cur_input_embeds[:mask_index_start], cur_image_features, cur_input_embeds[mask_index_start+num_patches:]), dim=0)
new_input_embeds.append(cur_new_input_embeds)
inputs_embeds = torch.stack(new_input_embeds, dim=0)
return super(LlavaMPTModel, self).forward(input_ids=None, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache, tok_emb=inputs_embeds)
class LlavaMPTForCausalLM(MPTForCausalLM):
config_class = LlavaMPTConfig
supports_gradient_checkpointing = True
def __init__(self, config):
super(MPTForCausalLM, self).__init__(config)
if not config.tie_word_embeddings:
raise ValueError('MPTForCausalLM only supports tied word embeddings')
self.transformer = LlavaMPTModel(config)
self.logit_scale = None
if config.logit_scale is not None:
logit_scale = config.logit_scale
if isinstance(logit_scale, str):
if logit_scale == 'inv_sqrt_d_model':
logit_scale = 1 / math.sqrt(config.d_model)
else:
raise ValueError(f"logit_scale={logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.")
self.logit_scale = logit_scale
def get_model(self):
return self.transformer
def _set_gradient_checkpointing(self, module, value=False):
if isinstance(module, LlavaMPTModel):
module.gradient_checkpointing = value
def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, labels: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, images=None):
return_dict = return_dict if return_dict is not None else self.config.return_dict
use_cache = use_cache if use_cache is not None else self.config.use_cache
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache, images=images)
logits = F.linear(outputs.last_hidden_state, self.transformer.wte.weight)
if self.logit_scale is not None:
if self.logit_scale == 0:
warnings.warn(f'Multiplying logits by self.logit_scale={self.logit_scale!r}. This will produce uniform (uninformative) outputs.')
logits *= self.logit_scale
loss = None
if labels is not None:
labels = torch.roll(labels, shifts=-1)
labels[:, -1] = -100
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.to(logits.device).view(-1))
return CausalLMOutputWithPast(loss=loss, logits=logits, past_key_values=outputs.past_key_values, hidden_states=outputs.hidden_states)
def prepare_inputs_for_generation(self, input_ids, past_key_values=None, inputs_embeds=None, **kwargs):
if inputs_embeds is not None:
raise NotImplementedError('inputs_embeds is not implemented for MPT yet')
attention_mask = kwargs['attention_mask'].bool()
if attention_mask[:, -1].sum() != attention_mask.shape[0]:
raise NotImplementedError('MPT does not support generation with right padding.')
if self.transformer.attn_uses_sequence_id and self.training:
sequence_id = torch.zeros_like(input_ids[:1])
else:
sequence_id = None
if past_key_values is not None:
input_ids = input_ids[:, -1].unsqueeze(-1)
if self.transformer.prefix_lm:
prefix_mask = torch.ones_like(attention_mask)
if kwargs.get('use_cache') == False:
raise NotImplementedError('MPT with prefix_lm=True does not support use_cache=False.')
else:
prefix_mask = None
return {'input_ids': input_ids, 'attention_mask': attention_mask, 'prefix_mask': prefix_mask, 'sequence_id': sequence_id, 'past_key_values': past_key_values, 'use_cache': kwargs.get('use_cache', True), "images": kwargs.get("images", None)}
def initialize_vision_tokenizer(self, mm_use_im_start_end, tokenizer, device,
tune_mm_mlp_adapter=False, pretrain_mm_mlp_adapter=None):
vision_config = self.get_model().vision_tower[0].config
vision_config.use_im_start_end = mm_use_im_start_end
tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
self.resize_token_embeddings(len(tokenizer))
if mm_use_im_start_end:
num_new_tokens = tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
self.resize_token_embeddings(len(tokenizer))
vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert_tokens_to_ids([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])
if num_new_tokens > 0:
input_embeddings = (
self._extracted_from_initialize_vision_tokenizer_14(
num_new_tokens
)
)
if tune_mm_mlp_adapter:
self.get_model().orig_embeds_params = [self.get_input_embeddings().weight.data.clone().to(device=device)]
for p in self.get_input_embeddings().parameters():
p.requires_grad = True
for p in self.get_output_embeddings().parameters():
p.requires_grad = False
if pretrain_mm_mlp_adapter:
mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location='cpu')
embed_tokens_weight = mm_projector_weights['transformer.wte.weight']
assert num_new_tokens == 2
if input_embeddings.shape == embed_tokens_weight.shape:
input_embeddings[-num_new_tokens:] = embed_tokens_weight[-num_new_tokens:]
elif embed_tokens_weight.shape[0] == num_new_tokens:
input_embeddings[-num_new_tokens:] = embed_tokens_weight
else:
raise ValueError(f"Unexpected embed_tokens_weight shape. Pretrained: {embed_tokens_weight.shape}. Current: {input_embeddings.shape}. Numer of new tokens: {num_new_tokens}.")
vision_config.im_patch_token = tokenizer.convert_tokens_to_ids([DEFAULT_IMAGE_PATCH_TOKEN])[0]
# TODO Rename this here and in `initialize_vision_tokenizer`
def _extracted_from_initialize_vision_tokenizer_14(self, num_new_tokens):
result = self.get_input_embeddings().weight.data
output_embeddings = self.get_output_embeddings().weight.data
input_embeddings_avg = result[:-num_new_tokens].mean(dim=0, keepdim=True)
output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(
dim=0, keepdim=True)
result[-num_new_tokens:] = input_embeddings_avg
output_embeddings[-num_new_tokens:] = output_embeddings_avg
return result
AutoConfig.register("llava_mpt", LlavaMPTConfig)
AutoModelForCausalLM.register(LlavaMPTConfig, LlavaMPTForCausalLM)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/attention.py
================================================
"""Attention layers."""
import math
import warnings
from typing import Optional
import torch
import torch.nn as nn
from einops import rearrange
from torch import nn
from .norm import LPLayerNorm
def _reset_is_causal(num_query_tokens: int, num_key_tokens: int, original_is_causal: bool):
if original_is_causal and num_query_tokens != num_key_tokens:
if num_query_tokens != 1:
raise NotImplementedError('MPT does not support query and key with different number of tokens, unless number of query tokens is 1.')
else:
return False
return original_is_causal
def scaled_multihead_dot_product_attention(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):
q = rearrange(query, 'b s (h d) -> b h s d', h=n_heads)
k = rearrange(key, 'b s (h d) -> b h d s', h=1 if multiquery else n_heads)
v = rearrange(value, 'b s (h d) -> b h s d', h=1 if multiquery else n_heads)
min_val = torch.finfo(q.dtype).min
(b, _, s_q, d) = q.shape
s_k = k.size(-1)
if softmax_scale is None:
softmax_scale = 1 / math.sqrt(d)
attn_weight = q.matmul(k) * softmax_scale
if attn_bias is not None:
if attn_bias.size(-1) not in [1, s_k] or attn_bias.size(-2) not in [
1,
s_q,
]:
raise RuntimeError(f'attn_bias (shape: {attn_bias.shape}) is expected to broadcast to shape: {attn_weight.shape}.')
attn_weight = attn_weight + attn_bias
if key_padding_mask is not None:
if attn_bias is not None:
warnings.warn('Propogating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unneccessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')
attn_weight = attn_weight.masked_fill(~key_padding_mask.view((b, 1, 1, s_k)), min_val)
if is_causal:
s = max(s_q, s_k)
causal_mask = attn_weight.new_ones(s, s, dtype=torch.float16)
causal_mask = causal_mask.tril()
causal_mask = causal_mask.to(torch.bool)
causal_mask = ~causal_mask
causal_mask = causal_mask[-s_q:, -s_k:]
attn_weight = attn_weight.masked_fill(causal_mask.view(1, 1, s_q, s_k), min_val)
attn_weight = torch.softmax(attn_weight, dim=-1)
if dropout_p:
attn_weight = torch.nn.functional.dropout(attn_weight, p=dropout_p, training=training, inplace=True)
out = attn_weight.matmul(v)
out = rearrange(out, 'b h s d -> b s (h d)')
return (out, attn_weight) if needs_weights else (out, None)
def check_valid_inputs(*tensors, valid_dtypes=None):
if valid_dtypes is None:
valid_dtypes = [torch.float16, torch.bfloat16]
for tensor in tensors:
if tensor.dtype not in valid_dtypes:
raise TypeError(f'tensor.dtype={tensor.dtype!r} must be in valid_dtypes={valid_dtypes!r}.')
if not tensor.is_cuda:
raise TypeError(f'Inputs must be cuda tensors (tensor.is_cuda={tensor.is_cuda!r}).')
def flash_attn_fn(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):
try:
from flash_attn import bert_padding, flash_attn_interface
except:
raise RuntimeError('Please install flash-attn==1.0.3.post0')
check_valid_inputs(query, key, value)
if attn_bias is not None:
raise NotImplementedError('attn_bias not implemented for flash attn.')
(batch_size, seqlen) = query.shape[:2]
if key_padding_mask is None:
key_padding_mask = torch.ones_like(key[:, :, 0], dtype=torch.bool)
query_padding_mask = key_padding_mask[:, -query.size(1):]
(query_unpad, indices_q, cu_seqlens_q, max_seqlen_q) = bert_padding.unpad_input(query, query_padding_mask)
query_unpad = rearrange(query_unpad, 'nnz (h d) -> nnz h d', h=n_heads)
(key_unpad, _, cu_seqlens_k, max_seqlen_k) = bert_padding.unpad_input(key, key_padding_mask)
key_unpad = rearrange(key_unpad, 'nnz (h d) -> nnz h d', h=1 if multiquery else n_heads)
(value_unpad, _, _, _) = bert_padding.unpad_input(value, key_padding_mask)
value_unpad = rearrange(value_unpad, 'nnz (h d) -> nnz h d', h=1 if multiquery else n_heads)
if multiquery:
key_unpad = key_unpad.expand(key_unpad.size(0), n_heads, key_unpad.size(-1))
value_unpad = value_unpad.expand(value_unpad.size(0), n_heads, value_unpad.size(-1))
dropout_p = dropout_p if training else 0.0
reset_is_causal = _reset_is_causal(query.size(1), key.size(1), is_causal)
output_unpad = flash_attn_interface.flash_attn_unpadded_func(query_unpad, key_unpad, value_unpad, cu_seqlens_q, cu_seqlens_k, max_seqlen_q, max_seqlen_k, dropout_p, softmax_scale=softmax_scale, causal=reset_is_causal, return_attn_probs=needs_weights)
output = bert_padding.pad_input(rearrange(output_unpad, 'nnz h d -> nnz (h d)'), indices_q, batch_size, seqlen)
return (output, None)
def triton_flash_attn_fn(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):
try:
from flash_attn import flash_attn_triton
except:
raise RuntimeError('Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202')
check_valid_inputs(query, key, value)
if dropout_p:
raise NotImplementedError('Dropout not implemented for attn_impl: triton.')
if needs_weights:
raise NotImplementedError('attn_impl: triton cannot return attn weights.')
if key_padding_mask is not None:
warnings.warn('Propagating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unnecessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')
(b_size, s_k) = key_padding_mask.shape[:2]
if attn_bias is None:
attn_bias = query.new_zeros(b_size, 1, 1, s_k)
attn_bias = attn_bias.masked_fill(~key_padding_mask.view((b_size, 1, 1, s_k)), torch.finfo(query.dtype).min)
query = rearrange(query, 'b s (h d) -> b s h d', h=n_heads)
key = rearrange(key, 'b s (h d) -> b s h d', h=1 if multiquery else n_heads)
value = rearrange(value, 'b s (h d) -> b s h d', h=1 if multiquery else n_heads)
if multiquery:
key = key.expand(*key.shape[:2], n_heads, key.size(-1))
value = value.expand(*value.shape[:2], n_heads, value.size(-1))
reset_is_causal = _reset_is_causal(query.size(1), key.size(1), is_causal)
attn_output = flash_attn_triton.flash_attn_func(query, key, value, attn_bias, reset_is_causal, softmax_scale)
output = attn_output.view(*attn_output.shape[:2], -1)
return (output, None)
class MultiheadAttention(nn.Module):
"""Multi-head self attention.
Using torch or triton attention implemetation enables user to also use
additive bias.
"""
def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton', clip_qkv: Optional[float]=None, qk_ln: bool=False, softmax_scale: Optional[float]=None, attn_pdrop: float=0.0, low_precision_layernorm: bool=False, device: Optional[str]=None):
super().__init__()
self.attn_impl = attn_impl
self.clip_qkv = clip_qkv
self.qk_ln = qk_ln
self.d_model = d_model
self.n_heads = n_heads
self.softmax_scale = softmax_scale
if self.softmax_scale is None:
self.softmax_scale = 1 / math.sqrt(self.d_model / self.n_heads)
self.attn_dropout_p = attn_pdrop
self.Wqkv = nn.Linear(self.d_model, 3 * self.d_model, device=device)
fuse_splits = (d_model, 2 * d_model)
self.Wqkv._fused = (0, fuse_splits)
if self.qk_ln:
layernorm_class = LPLayerNorm if low_precision_layernorm else nn.LayerNorm
self.q_ln = layernorm_class(self.d_model, device=device)
self.k_ln = layernorm_class(self.d_model, device=device)
if self.attn_impl == 'flash':
self.attn_fn = flash_attn_fn
elif self.attn_impl == 'triton':
self.attn_fn = triton_flash_attn_fn
warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')
elif self.attn_impl == 'torch':
self.attn_fn = scaled_multihead_dot_product_attention
if torch.cuda.is_available():
warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')
else:
raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')
self.out_proj = nn.Linear(self.d_model, self.d_model, device=device)
self.out_proj._is_residual = True
def forward(self, x, past_key_value=None, attn_bias=None, attention_mask=None, is_causal=True, needs_weights=False):
qkv = self.Wqkv(x)
if self.clip_qkv:
qkv.clamp_(min=-self.clip_qkv, max=self.clip_qkv)
(query, key, value) = qkv.chunk(3, dim=2)
key_padding_mask = attention_mask
if self.qk_ln:
dtype = query.dtype
query = self.q_ln(query).to(dtype)
key = self.k_ln(key).to(dtype)
if past_key_value is not None:
if len(past_key_value) != 0:
key = torch.cat([past_key_value[0], key], dim=1)
value = torch.cat([past_key_value[1], value], dim=1)
past_key_value = (key, value)
if attn_bias is not None:
attn_bias = attn_bias[:, :, -query.size(1):, -key.size(1):]
(context, attn_weights) = self.attn_fn(query, key, value, self.n_heads, softmax_scale=self.softmax_scale, attn_bias=attn_bias, key_padding_mask=key_padding_mask, is_causal=is_causal, dropout_p=self.attn_dropout_p, training=self.training, needs_weights=needs_weights)
return (self.out_proj(context), attn_weights, past_key_value)
class MultiQueryAttention(nn.Module):
"""Multi-Query self attention.
Using torch or triton attention implemetation enables user to also use
additive bias.
"""
def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton', clip_qkv: Optional[float]=None, qk_ln: bool=False, softmax_scale: Optional[float]=None, attn_pdrop: float=0.0, low_precision_layernorm: bool=False, device: Optional[str]=None):
super().__init__()
self.attn_impl = attn_impl
self.clip_qkv = clip_qkv
self.qk_ln = qk_ln
self.d_model = d_model
self.n_heads = n_heads
self.head_dim = d_model // n_heads
self.softmax_scale = softmax_scale
if self.softmax_scale is None:
self.softmax_scale = 1 / math.sqrt(self.head_dim)
self.attn_dropout_p = attn_pdrop
self.Wqkv = nn.Linear(d_model, d_model + 2 * self.head_dim, device=device)
fuse_splits = (d_model, d_model + self.head_dim)
self.Wqkv._fused = (0, fuse_splits)
if self.qk_ln:
layernorm_class = LPLayerNorm if low_precision_layernorm else nn.LayerNorm
self.q_ln = layernorm_class(d_model, device=device)
self.k_ln = layernorm_class(self.head_dim, device=device)
if self.attn_impl == 'flash':
self.attn_fn = flash_attn_fn
elif self.attn_impl == 'triton':
self.attn_fn = triton_flash_attn_fn
warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')
elif self.attn_impl == 'torch':
self.attn_fn = scaled_multihead_dot_product_attention
if torch.cuda.is_available():
warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')
else:
raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')
self.out_proj = nn.Linear(self.d_model, self.d_model, device=device)
self.out_proj._is_residual = True
def forward(self, x, past_key_value=None, attn_bias=None, attention_mask=None, is_causal=True, needs_weights=False):
qkv = self.Wqkv(x)
if self.clip_qkv:
qkv.clamp_(min=-self.clip_qkv, max=self.clip_qkv)
(query, key, value) = qkv.split([self.d_model, self.head_dim, self.head_dim], dim=2)
key_padding_mask = attention_mask
if self.qk_ln:
dtype = query.dtype
query = self.q_ln(query).to(dtype)
key = self.k_ln(key).to(dtype)
if past_key_value is not None:
if len(past_key_value) != 0:
key = torch.cat([past_key_value[0], key], dim=1)
value = torch.cat([past_key_value[1], value], dim=1)
past_key_value = (key, value)
if attn_bias is not None:
attn_bias = attn_bias[:, :, -query.size(1):, -key.size(1):]
(context, attn_weights) = self.attn_fn(query, key, value, self.n_heads, softmax_scale=self.softmax_scale, attn_bias=attn_bias, key_padding_mask=key_padding_mask, is_causal=is_causal, dropout_p=self.attn_dropout_p, training=self.training, needs_weights=needs_weights, multiquery=True)
return (self.out_proj(context), attn_weights, past_key_value)
def attn_bias_shape(attn_impl, n_heads, seq_len, alibi, prefix_lm, causal, use_sequence_id):
if attn_impl == 'flash':
return None
elif attn_impl in ['torch', 'triton']:
if alibi:
if (prefix_lm or not causal) or use_sequence_id:
return (1, n_heads, seq_len, seq_len)
return (1, n_heads, 1, seq_len)
elif prefix_lm or use_sequence_id:
return (1, 1, seq_len, seq_len)
return None
else:
raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')
def build_attn_bias(attn_impl, attn_bias, n_heads, seq_len, causal=False, alibi=False, alibi_bias_max=8):
if attn_impl == 'flash':
return None
elif attn_impl in ['torch', 'triton']:
if alibi:
(device, dtype) = (attn_bias.device, attn_bias.dtype)
attn_bias = attn_bias.add(build_alibi_bias(n_heads, seq_len, full=not causal, alibi_bias_max=alibi_bias_max, device=device, dtype=dtype))
return attn_bias
else:
raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')
def gen_slopes(n_heads, alibi_bias_max=8, device=None):
_n_heads = 2 ** math.ceil(math.log2(n_heads))
m = torch.arange(1, _n_heads + 1, dtype=torch.float32, device=device)
m = m.mul(alibi_bias_max / _n_heads)
slopes = 1.0 / torch.pow(2, m)
if _n_heads != n_heads:
slopes = torch.concat([slopes[1::2], slopes[::2]])[:n_heads]
return slopes.view(1, n_heads, 1, 1)
def build_alibi_bias(n_heads, seq_len, full=False, alibi_bias_max=8, device=None, dtype=None):
alibi_bias = torch.arange(1 - seq_len, 1, dtype=torch.int32, device=device).view(1, 1, 1, seq_len)
if full:
alibi_bias = alibi_bias - torch.arange(1 - seq_len, 1, dtype=torch.int32, device=device).view(1, 1, seq_len, 1)
alibi_bias = alibi_bias.abs().mul(-1)
slopes = gen_slopes(n_heads, alibi_bias_max, device=device)
alibi_bias = alibi_bias * slopes
return alibi_bias.to(dtype=dtype)
ATTN_CLASS_REGISTRY = {'multihead_attention': MultiheadAttention, 'multiquery_attention': MultiQueryAttention}
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/blocks.py
================================================
"""GPT Blocks used for the GPT Model."""
from typing import Dict, Optional, Tuple
import torch
import torch.nn as nn
from .attention import ATTN_CLASS_REGISTRY
from .norm import NORM_CLASS_REGISTRY
class MPTMLP(nn.Module):
def __init__(self, d_model: int, expansion_ratio: int, device: Optional[str]=None):
super().__init__()
self.up_proj = nn.Linear(d_model, expansion_ratio * d_model, device=device)
self.act = nn.GELU(approximate='none')
self.down_proj = nn.Linear(expansion_ratio * d_model, d_model, device=device)
self.down_proj._is_residual = True
def forward(self, x):
return self.down_proj(self.act(self.up_proj(x)))
class MPTBlock(nn.Module):
def __init__(self, d_model: int, n_heads: int, expansion_ratio: int, attn_config: Dict = None, resid_pdrop: float=0.0, norm_type: str='low_precision_layernorm', device: Optional[str]=None, **kwargs):
if attn_config is None:
attn_config = {
'attn_type': 'multihead_attention',
'attn_pdrop': 0.0,
'attn_impl': 'triton',
'qk_ln': False,
'clip_qkv': None,
'softmax_scale': None,
'prefix_lm': False,
'attn_uses_sequence_id': False,
'alibi': False,
'alibi_bias_max': 8,
}
del kwargs
super().__init__()
norm_class = NORM_CLASS_REGISTRY[norm_type.lower()]
attn_class = ATTN_CLASS_REGISTRY[attn_config['attn_type']]
self.norm_1 = norm_class(d_model, device=device)
self.attn = attn_class(attn_impl=attn_config['attn_impl'], clip_qkv=attn_config['clip_qkv'], qk_ln=attn_config['qk_ln'], softmax_scale=attn_config['softmax_scale'], attn_pdrop=attn_config['attn_pdrop'], d_model=d_model, n_heads=n_heads, device=device)
self.norm_2 = norm_class(d_model, device=device)
self.ffn = MPTMLP(d_model=d_model, expansion_ratio=expansion_ratio, device=device)
self.resid_attn_dropout = nn.Dropout(resid_pdrop)
self.resid_ffn_dropout = nn.Dropout(resid_pdrop)
def forward(self, x: torch.Tensor, past_key_value: Optional[Tuple[torch.Tensor]]=None, attn_bias: Optional[torch.Tensor]=None, attention_mask: Optional[torch.ByteTensor]=None, is_causal: bool=True) -> Tuple[torch.Tensor, Optional[Tuple[torch.Tensor]]]:
a = self.norm_1(x)
(b, _, past_key_value) = self.attn(a, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=is_causal)
x = x + self.resid_attn_dropout(b)
m = self.norm_2(x)
n = self.ffn(m)
x = x + self.resid_ffn_dropout(n)
return (x, past_key_value)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/configuration_mpt.py
================================================
"""A HuggingFace-style model configuration."""
from typing import Dict, Optional, Union
from transformers import PretrainedConfig
attn_config_defaults: Dict = {'attn_type': 'multihead_attention', 'attn_pdrop': 0.0, 'attn_impl': 'triton', 'qk_ln': False, 'clip_qkv': None, 'softmax_scale': None, 'prefix_lm': False, 'attn_uses_sequence_id': False, 'alibi': False, 'alibi_bias_max': 8}
init_config_defaults: Dict = {'name': 'kaiming_normal_', 'fan_mode': 'fan_in', 'init_nonlinearity': 'relu'}
class MPTConfig(PretrainedConfig):
model_type = 'mpt'
def __init__(self, d_model: int=2048, n_heads: int=16, n_layers: int=24, expansion_ratio: int=4, max_seq_len: int=2048, vocab_size: int=50368, resid_pdrop: float=0.0, emb_pdrop: float=0.0, learned_pos_emb: bool=True, attn_config: Dict=attn_config_defaults, init_device: str='cpu', logit_scale: Optional[Union[float, str]]=None, no_bias: bool=False, verbose: int=0, embedding_fraction: float=1.0, norm_type: str='low_precision_layernorm', use_cache: bool=False, init_config: Dict=init_config_defaults, **kwargs):
"""The MPT configuration class.
Args:
d_model (int): The size of the embedding dimension of the model.
n_heads (int): The number of attention heads.
n_layers (int): The number of layers in the model.
expansion_ratio (int): The ratio of the up/down scale in the MLP.
max_seq_len (int): The maximum sequence length of the model.
vocab_size (int): The size of the vocabulary.
resid_pdrop (float): The dropout probability applied to the attention output before combining with residual.
emb_pdrop (float): The dropout probability for the embedding layer.
learned_pos_emb (bool): Whether to use learned positional embeddings
attn_config (Dict): A dictionary used to configure the model's attention module:
attn_type (str): type of attention to use. Options: multihead_attention, multiquery_attention
attn_pdrop (float): The dropout probability for the attention layers.
attn_impl (str): The attention implementation to use. One of 'torch', 'flash', or 'triton'.
qk_ln (bool): Whether to apply layer normalization to the queries and keys in the attention layer.
clip_qkv (Optional[float]): If not None, clip the queries, keys, and values in the attention layer to
this value.
softmax_scale (Optional[float]): If not None, scale the softmax in the attention layer by this value. If None,
use the default scale of ``1/sqrt(d_keys)``.
prefix_lm (Optional[bool]): Whether the model should operate as a Prefix LM. This requires passing an
extra `prefix_mask` argument which indicates which tokens belong to the prefix. Tokens in the prefix
can attend to one another bi-directionally. Tokens outside the prefix use causal attention.
attn_uses_sequence_id (Optional[bool]): Whether to restrict attention to tokens that have the same sequence_id.
When the model is in `train` mode, this requires passing an extra `sequence_id` argument which indicates
which sub-sequence each token belongs to.
Defaults to ``False`` meaning any provided `sequence_id` will be ignored.
alibi (bool): Whether to use the alibi bias instead of position embeddings.
alibi_bias_max (int): The maximum value of the alibi bias.
init_device (str): The device to use for parameter initialization.
logit_scale (Optional[Union[float, str]]): If not None, scale the logits by this value.
no_bias (bool): Whether to use bias in all layers.
verbose (int): The verbosity level. 0 is silent.
embedding_fraction (float): The fraction to scale the gradients of the embedding layer by.
norm_type (str): choose type of norm to use
multiquery_attention (bool): Whether to use multiquery attention implementation.
use_cache (bool): Whether or not the model should return the last key/values attentions
init_config (Dict): A dictionary used to configure the model initialization:
init_config.name: The parameter initialization scheme to use. Options: 'default_', 'baseline_',
'kaiming_uniform_', 'kaiming_normal_', 'neox_init_', 'small_init_', 'xavier_uniform_', or
'xavier_normal_'. These mimic the parameter initialization methods in PyTorch.
init_div_is_residual (Union[int, float, str, bool]): Value to divide initial weights by if ``module._is_residual`` is True.
emb_init_std (Optional[float]): The standard deviation of the normal distribution used to initialize the embedding layer.
emb_init_uniform_lim (Optional[Union[Tuple[float, float], float]]): The lower and upper limits of the uniform distribution
used to initialize the embedding layer. Mutually exclusive with ``emb_init_std``.
init_std (float): The standard deviation of the normal distribution used to initialize the model,
if using the baseline_ parameter initialization scheme.
init_gain (float): The gain to use for parameter initialization with kaiming or xavier initialization schemes.
fan_mode (str): The fan mode to use for parameter initialization with kaiming initialization schemes.
init_nonlinearity (str): The nonlinearity to use for parameter initialization with kaiming initialization schemes.
---
See llmfoundry.models.utils.param_init_fns.py for info on other param init config options
"""
self.d_model = d_model
self.n_heads = n_heads
self.n_layers = n_layers
self.expansion_ratio = expansion_ratio
self.max_seq_len = max_seq_len
self.vocab_size = vocab_size
self.resid_pdrop = resid_pdrop
self.emb_pdrop = emb_pdrop
self.learned_pos_emb = learned_pos_emb
self.attn_config = attn_config
self.init_device = init_device
self.logit_scale = logit_scale
self.no_bias = no_bias
self.verbose = verbose
self.embedding_fraction = embedding_fraction
self.norm_type = norm_type
self.use_cache = use_cache
self.init_config = init_config
if 'name' in kwargs:
del kwargs['name']
if 'loss_fn' in kwargs:
del kwargs['loss_fn']
super().__init__(**kwargs)
self._validate_config()
def _set_config_defaults(self, config, config_defaults):
for (k, v) in config_defaults.items():
if k not in config:
config[k] = v
return config
def _validate_config(self):
self.attn_config = self._set_config_defaults(self.attn_config, attn_config_defaults)
self.init_config = self._set_config_defaults(self.init_config, init_config_defaults)
if self.d_model % self.n_heads != 0:
raise ValueError('d_model must be divisible by n_heads')
if any((prob < 0 or prob > 1 for prob in [self.attn_config['attn_pdrop'], self.resid_pdrop, self.emb_pdrop])):
raise ValueError("self.attn_config['attn_pdrop'], resid_pdrop, emb_pdrop are probabilities and must be between 0 and 1")
if self.attn_config['attn_impl'] not in ['torch', 'flash', 'triton']:
raise ValueError(f"Unknown attn_impl={self.attn_config['attn_impl']}")
if self.attn_config['prefix_lm'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:
raise NotImplementedError('prefix_lm only implemented with torch and triton attention.')
if self.attn_config['alibi'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:
raise NotImplementedError('alibi only implemented with torch and triton attention.')
if self.attn_config['attn_uses_sequence_id'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:
raise NotImplementedError('attn_uses_sequence_id only implemented with torch and triton attention.')
if self.embedding_fraction > 1 or self.embedding_fraction <= 0:
raise ValueError('model.embedding_fraction must be between 0 (exclusive) and 1 (inclusive)!')
if isinstance(self.logit_scale, str) and self.logit_scale != 'inv_sqrt_d_model':
raise ValueError(f"self.logit_scale={self.logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.")
if self.init_config.get('name', None) is None:
raise ValueError(f"self.init_config={self.init_config!r} 'name' needs to be set.")
if not self.learned_pos_emb and (not self.attn_config['alibi']):
raise ValueError(
'Positional information must be provided to the model using either learned_pos_emb or alibi.'
)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/modeling_mpt.py
================================================
"""A simple, flexible implementation of a GPT model.
Inspired by https://github.com/karpathy/minGPT/blob/master/mingpt/model.py
"""
import math
import warnings
from typing import List, Optional, Tuple, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast
from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
from .attention import attn_bias_shape, build_attn_bias
from .blocks import MPTBlock
from .norm import NORM_CLASS_REGISTRY
from .configuration_mpt import MPTConfig
from .param_init_fns import MODEL_INIT_REGISTRY, generic_param_init_fn_
Tokenizer = Union[PreTrainedTokenizer, PreTrainedTokenizerFast]
from transformers.utils import logging
logger = logging.get_logger(__name__)
class MPTPreTrainedModel(PreTrainedModel):
config_class = MPTConfig
base_model_prefix = 'model'
class MPTModel(MPTPreTrainedModel):
def __init__(self, config: MPTConfig):
config._validate_config()
super().__init__(config)
self.attn_impl = config.attn_config['attn_impl']
self.prefix_lm = config.attn_config['prefix_lm']
self.attn_uses_sequence_id = config.attn_config['attn_uses_sequence_id']
self.alibi = config.attn_config['alibi']
self.alibi_bias_max = config.attn_config['alibi_bias_max']
if config.norm_type.lower() not in NORM_CLASS_REGISTRY.keys():
norm_options = ' | '.join(NORM_CLASS_REGISTRY.keys())
raise NotImplementedError(f'Requested norm type ({config.norm_type}) is not implemented within this repo (Options: {norm_options}).')
norm_class = NORM_CLASS_REGISTRY[config.norm_type.lower()]
self.embedding_fraction = config.embedding_fraction
self.wte = nn.Embedding(config.vocab_size, config.d_model, device=config.init_device)
if not self.alibi:
self.wpe = nn.Embedding(config.max_seq_len, config.d_model, device=config.init_device)
self.emb_drop = nn.Dropout(config.emb_pdrop)
self.blocks = nn.ModuleList([MPTBlock(device=config.init_device, **config.to_dict()) for _ in range(config.n_layers)])
self.norm_f = norm_class(config.d_model, device=config.init_device)
if config.init_device != 'meta':
self.apply(self.param_init_fn)
self.is_causal = not self.prefix_lm
self._attn_bias_initialized = False
self.attn_bias = None
self.attn_bias_shape = attn_bias_shape(self.attn_impl, config.n_heads, config.max_seq_len, self.alibi, prefix_lm=self.prefix_lm, causal=self.is_causal, use_sequence_id=self.attn_uses_sequence_id)
if config.no_bias:
for module in self.modules():
if hasattr(module, 'bias') and isinstance(module.bias, nn.Parameter):
if config.verbose:
warnings.warn(f'Removing bias ({module.bias}) from {module}.')
module.register_parameter('bias', None)
if config.verbose and config.verbose > 2:
print(self)
if 'verbose' not in self.config.init_config:
self.config.init_config['verbose'] = self.config.verbose
if self.config.init_config['verbose'] > 1:
init_fn_name = self.config.init_config['name']
warnings.warn(f'Using {init_fn_name} initialization.')
self.gradient_checkpointing = False
def get_input_embeddings(self):
return self.wte
def set_input_embeddings(self, value):
self.wte = value
@torch.no_grad()
def _attn_bias(self, device, dtype, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None):
if not self._attn_bias_initialized:
if self.attn_bias_shape:
self.attn_bias = torch.zeros(self.attn_bias_shape, device=device, dtype=dtype)
self.attn_bias = build_attn_bias(self.attn_impl, self.attn_bias, self.config.n_heads, self.config.max_seq_len, causal=self.is_causal, alibi=self.alibi, alibi_bias_max=self.alibi_bias_max)
self._attn_bias_initialized = True
if self.attn_impl == 'flash':
return (self.attn_bias, attention_mask)
if self.attn_bias is not None:
self.attn_bias = self.attn_bias.to(dtype=dtype, device=device)
attn_bias = self.attn_bias
if self.prefix_lm:
assert isinstance(attn_bias, torch.Tensor)
assert isinstance(prefix_mask, torch.Tensor)
attn_bias = self._apply_prefix_mask(attn_bias, prefix_mask)
if self.attn_uses_sequence_id and sequence_id is not None:
assert isinstance(attn_bias, torch.Tensor)
attn_bias = self._apply_sequence_id(attn_bias, sequence_id)
if attention_mask is not None:
s_k = attention_mask.shape[-1]
if attn_bias is None:
attn_bias = torch.zeros((1, 1, 1, s_k), device=device, dtype=dtype)
else:
attn_bias = attn_bias[:, :, :, -s_k:]
if prefix_mask is not None and attention_mask.shape != prefix_mask.shape:
raise ValueError(f'attention_mask shape={attention_mask.shape} ' + f'and prefix_mask shape={prefix_mask.shape} are not equal.')
min_val = torch.finfo(attn_bias.dtype).min
attn_bias = attn_bias.masked_fill(~attention_mask.view(-1, 1, 1, s_k), min_val)
return (attn_bias, None)
def _apply_prefix_mask(self, attn_bias: torch.Tensor, prefix_mask: torch.Tensor):
(s_k, s_q) = attn_bias.shape[-2:]
if s_k != self.config.max_seq_len or s_q != self.config.max_seq_len:
raise ValueError(
f'attn_bias does not match the expected shape. The last two dimensions should both be {self.config.max_length} '
+ f'but are {s_k} and {s_q}.'
)
seq_len = prefix_mask.shape[-1]
if seq_len > self.config.max_seq_len:
raise ValueError(f'prefix_mask sequence length cannot exceed max_seq_len={self.config.max_seq_len}')
attn_bias = attn_bias[..., :seq_len, :seq_len]
causal = torch.tril(torch.ones((seq_len, seq_len), dtype=torch.bool, device=prefix_mask.device)).view(1, 1, seq_len, seq_len)
prefix = prefix_mask.view(-1, 1, 1, seq_len)
cannot_attend = ~torch.logical_or(causal, prefix.bool())
return self._extracted_from__apply_sequence_id_15(attn_bias, cannot_attend)
def _apply_sequence_id(self, attn_bias: torch.Tensor, sequence_id: torch.LongTensor):
seq_len = sequence_id.shape[-1]
if seq_len > self.config.max_seq_len:
raise ValueError(f'sequence_id sequence length cannot exceed max_seq_len={self.config.max_seq_len}')
attn_bias = attn_bias[..., :seq_len, :seq_len]
cannot_attend = torch.logical_not(torch.eq(sequence_id.view(-1, seq_len, 1), sequence_id.view(-1, 1, seq_len))).unsqueeze(1)
return self._extracted_from__apply_sequence_id_15(attn_bias, cannot_attend)
# TODO Rename this here and in `_apply_prefix_mask` and `_apply_sequence_id`
def _extracted_from__apply_sequence_id_15(self, attn_bias, cannot_attend):
min_val = torch.finfo(attn_bias.dtype).min
attn_bias = attn_bias.masked_fill(cannot_attend, min_val)
return attn_bias
def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, tok_emb: Optional[torch.FloatTensor]=None):
return_dict = return_dict if return_dict is not None else self.config.return_dict
use_cache = use_cache if use_cache is not None else self.config.use_cache
if self.gradient_checkpointing and self.training and use_cache:
logger.warning_once(
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
)
use_cache = False
if attention_mask is not None:
attention_mask = attention_mask.bool()
if prefix_mask is not None:
prefix_mask = prefix_mask.bool()
if not return_dict:
raise NotImplementedError('return_dict False is not implemented yet for MPT')
if output_attentions:
raise NotImplementedError('output_attentions is not implemented yet for MPT')
if attention_mask is not None and attention_mask[:, 0].sum() != attention_mask.shape[0] and self.training:
raise NotImplementedError('MPT does not support training with left padding.')
if self.prefix_lm and prefix_mask is None:
raise ValueError('prefix_mask is a required argument when MPT is configured with prefix_lm=True.')
if self.training:
if self.attn_uses_sequence_id and sequence_id is None:
raise ValueError('sequence_id is a required argument when MPT is configured with attn_uses_sequence_id=True ' + 'and the model is in train mode.')
elif self.attn_uses_sequence_id is False and sequence_id is not None:
warnings.warn('MPT received non-None input for `sequence_id` but is configured with attn_uses_sequence_id=False. ' + 'This input will be ignored. If you want the model to use `sequence_id`, set attn_uses_sequence_id to True.')
if input_ids is not None:
S = input_ids.size(1)
assert S <= self.config.max_seq_len, f'Cannot forward input with seq_len={S}, this model only supports seq_len<={self.config.max_seq_len}'
tok_emb = self.wte(input_ids)
else:
assert tok_emb is not None
S = tok_emb.size(1)
if self.alibi:
x = tok_emb
else:
past_position = 0
if past_key_values is not None:
if len(past_key_values) != self.config.n_layers:
raise ValueError(
f'past_key_values must provide a past_key_value for each attention layer in the network (len(past_key_values)={len(past_key_values)!r}; self.config.n_layers={self.config.n_layers!r}).'
)
past_position = past_key_values[0][0].size(1)
if S + past_position > self.config.max_seq_len:
raise ValueError(f'Cannot forward input with past sequence length {past_position} and current sequence length {S + 1}, this model only supports total sequence length <= {self.config.max_seq_len}.')
pos = torch.arange(past_position, S + past_position, dtype=torch.long, device=input_ids.device).unsqueeze(0)
if attention_mask is not None:
pos = torch.clamp(pos - torch.cumsum((~attention_mask).to(torch.int32), dim=1)[:, past_position:], min=0)
pos_emb = self.wpe(pos)
x = tok_emb + pos_emb
if self.embedding_fraction == 1:
x = self.emb_drop(x)
else:
x_shrunk = x * self.embedding_fraction + x.detach() * (1 - self.embedding_fraction)
assert isinstance(self.emb_drop, nn.Module)
x = self.emb_drop(x_shrunk)
(attn_bias, attention_mask) = self._attn_bias(device=x.device, dtype=x.dtype, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id)
if use_cache and past_key_values is None:
past_key_values = [() for _ in range(self.config.n_layers)]
all_hidden_states = () if output_hidden_states else None
for (b_idx, block) in enumerate(self.blocks):
if output_hidden_states:
assert all_hidden_states is not None
all_hidden_states = all_hidden_states + (x,)
past_key_value = past_key_values[b_idx] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
(x, past_key_value) = torch.utils.checkpoint.checkpoint(
block,
x, past_key_value, attn_bias, attention_mask, self.is_causal
)
else:
(x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
if past_key_values is not None:
past_key_values[b_idx] = past_key_value
x = self.norm_f(x)
return BaseModelOutputWithPast(last_hidden_state=x, past_key_values=past_key_values, hidden_states=all_hidden_states)
def param_init_fn(self, module):
init_fn_name = self.config.init_config['name']
MODEL_INIT_REGISTRY[init_fn_name](module=module, n_layers=self.config.n_layers, d_model=self.config.d_model, **self.config.init_config)
def fsdp_wrap_fn(self, module):
return isinstance(module, MPTBlock)
def activation_checkpointing_fn(self, module):
return isinstance(module, MPTBlock)
class MPTForCausalLM(MPTPreTrainedModel):
def __init__(self, config: MPTConfig):
super().__init__(config)
if not config.tie_word_embeddings:
raise ValueError('MPTForCausalLM only supports tied word embeddings')
self.transformer = MPTModel(config)
self.logit_scale = None
if config.logit_scale is not None:
logit_scale = config.logit_scale
if isinstance(logit_scale, str):
if logit_scale == 'inv_sqrt_d_model':
logit_scale = 1 / math.sqrt(config.d_model)
else:
raise ValueError(f"logit_scale={logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.")
self.logit_scale = logit_scale
def get_input_embeddings(self):
return self.transformer.wte
def set_input_embeddings(self, value):
self.transformer.wte = value
def get_output_embeddings(self):
return self.transformer.wte
def set_output_embeddings(self, new_embeddings):
self.transformer.wte = new_embeddings
def set_decoder(self, decoder):
self.transformer = decoder
def get_decoder(self):
return self.transformer
def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, labels: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None):
return_dict = return_dict if return_dict is not None else self.config.return_dict
use_cache = use_cache if use_cache is not None else self.config.use_cache
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
logits = F.linear(outputs.last_hidden_state, self.transformer.wte.weight)
if self.logit_scale is not None:
if self.logit_scale == 0:
warnings.warn(f'Multiplying logits by self.logit_scale={self.logit_scale!r}. This will produce uniform (uninformative) outputs.')
logits *= self.logit_scale
loss = None
if labels is not None:
labels = torch.roll(labels, shifts=-1)
labels[:, -1] = -100
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.to(logits.device).view(-1))
return CausalLMOutputWithPast(loss=loss, logits=logits, past_key_values=outputs.past_key_values, hidden_states=outputs.hidden_states)
def param_init_fn(self, module):
init_fn_name = self.config.init_config['name']
MODEL_INIT_REGISTRY[init_fn_name](module=module, n_layers=self.config.n_layers, d_model=self.config.d_model, **self.config.init_config)
def fsdp_wrap_fn(self, module):
return isinstance(module, MPTBlock)
def activation_checkpointing_fn(self, module):
return isinstance(module, MPTBlock)
def prepare_inputs_for_generation(self, input_ids, past_key_values=None, inputs_embeds=None, **kwargs):
if inputs_embeds is not None:
raise NotImplementedError('inputs_embeds is not implemented for MPT yet')
attention_mask = kwargs['attention_mask'].bool()
if attention_mask[:, -1].sum() != attention_mask.shape[0]:
raise NotImplementedError('MPT does not support generation with right padding.')
if self.transformer.attn_uses_sequence_id and self.training:
sequence_id = torch.zeros_like(input_ids[:1])
else:
sequence_id = None
if past_key_values is not None:
input_ids = input_ids[:, -1].unsqueeze(-1)
if self.transformer.prefix_lm:
prefix_mask = torch.ones_like(attention_mask)
if kwargs.get('use_cache') == False:
raise NotImplementedError('MPT with prefix_lm=True does not support use_cache=False.')
else:
prefix_mask = None
return {'input_ids': input_ids, 'attention_mask': attention_mask, 'prefix_mask': prefix_mask, 'sequence_id': sequence_id, 'past_key_values': past_key_values, 'use_cache': kwargs.get('use_cache', True)}
@staticmethod
def _reorder_cache(past_key_values, beam_idx):
"""Used by HuggingFace generate when using beam search with kv-caching.
See https://github.com/huggingface/transformers/blob/3ec7a47664ebe40c40f4b722f6bb1cd30c3821ec/src/transformers/models/gpt2/modeling_gpt2.py#L1122-L1133
for an example in transformers.
"""
return [
tuple(
(past_state.index_select(0, beam_idx) for past_state in layer_past)
)
for layer_past in past_key_values
]
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/norm.py
================================================
import torch
def _cast_if_autocast_enabled(tensor):
if torch.is_autocast_enabled():
if tensor.device.type == 'cuda':
dtype = torch.get_autocast_gpu_dtype()
elif tensor.device.type == 'cpu':
dtype = torch.get_autocast_cpu_dtype()
else:
raise NotImplementedError()
return tensor.to(dtype=dtype)
return tensor
class LPLayerNorm(torch.nn.LayerNorm):
def __init__(self, normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None):
super().__init__(normalized_shape=normalized_shape, eps=eps, elementwise_affine=elementwise_affine, device=device, dtype=dtype)
def forward(self, x):
module_device = x.device
downcast_x = _cast_if_autocast_enabled(x)
downcast_weight = _cast_if_autocast_enabled(self.weight) if self.weight is not None else self.weight
downcast_bias = _cast_if_autocast_enabled(self.bias) if self.bias is not None else self.bias
with torch.autocast(enabled=False, device_type=module_device.type):
return torch.nn.functional.layer_norm(downcast_x, self.normalized_shape, downcast_weight, downcast_bias, self.eps)
def rms_norm(x, weight=None, eps=1e-05):
output = x / torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + eps)
return output * weight if weight is not None else output
class RMSNorm(torch.nn.Module):
def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=None, device=None):
super().__init__()
self.eps = eps
if weight:
self.weight = torch.nn.Parameter(torch.ones(normalized_shape, dtype=dtype, device=device))
else:
self.register_parameter('weight', None)
def forward(self, x):
return rms_norm(x.float(), self.weight, self.eps).to(dtype=x.dtype)
class LPRMSNorm(RMSNorm):
def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=None, device=None):
super().__init__(normalized_shape=normalized_shape, eps=eps, weight=weight, dtype=dtype, device=device)
def forward(self, x):
downcast_x = _cast_if_autocast_enabled(x)
downcast_weight = _cast_if_autocast_enabled(self.weight) if self.weight is not None else self.weight
with torch.autocast(enabled=False, device_type=x.device.type):
return rms_norm(downcast_x, downcast_weight, self.eps).to(dtype=x.dtype)
NORM_CLASS_REGISTRY = {'layernorm': torch.nn.LayerNorm, 'low_precision_layernorm': LPLayerNorm, 'rmsnorm': RMSNorm, 'low_precision_rmsnorm': LPRMSNorm}
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/llava/mpt/param_init_fns.py
================================================
import math
import warnings
from collections.abc import Sequence
from functools import partial
from typing import Optional, Tuple, Union
import torch
from torch import nn
from .norm import NORM_CLASS_REGISTRY
def torch_default_param_init_fn_(module: nn.Module, verbose: int=0, **kwargs):
del kwargs
if verbose > 1:
warnings.warn("Initializing network using module's reset_parameters attribute")
if hasattr(module, 'reset_parameters'):
module.reset_parameters()
def fused_init_helper_(module: nn.Module, init_fn_):
_fused = getattr(module, '_fused', None)
if _fused is None:
raise RuntimeError('Internal logic error')
(dim, splits) = _fused
splits = (0, *splits, module.weight.size(dim))
for (s, e) in zip(splits[:-1], splits[1:]):
slice_indices = [slice(None)] * module.weight.ndim
slice_indices[dim] = slice(s, e)
init_fn_(module.weight[slice_indices])
def generic_param_init_fn_(module: nn.Module, init_fn_, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):
del kwargs
if verbose > 1:
warnings.warn('If model has bias parameters they are initialized to 0.')
init_div_is_residual = init_div_is_residual
if init_div_is_residual is False:
div_is_residual = 1.0
elif init_div_is_residual is True:
div_is_residual = math.sqrt(2 * n_layers)
elif isinstance(init_div_is_residual, (float, int)):
div_is_residual = init_div_is_residual
elif isinstance(init_div_is_residual, str) and init_div_is_residual.isnumeric():
div_is_residual = float(init_div_is_residual)
else:
div_is_residual = 1.0
raise ValueError(f'Expected init_div_is_residual to be boolean or numeric, got {init_div_is_residual}')
if init_div_is_residual is not False and verbose > 1:
warnings.warn(
f'Initializing _is_residual layers then dividing them by {div_is_residual:.3f}. Set `init_div_is_residual: false` in init config to disable this.'
)
if isinstance(module, nn.Linear):
if hasattr(module, '_fused'):
fused_init_helper_(module, init_fn_)
else:
init_fn_(module.weight)
if module.bias is not None:
torch.nn.init.zeros_(module.bias)
if init_div_is_residual is not False and getattr(module, '_is_residual', False):
with torch.no_grad():
module.weight.div_(div_is_residual)
elif isinstance(module, nn.Embedding):
if emb_init_std is not None:
std = emb_init_std
if std == 0:
warnings.warn('Embedding layer initialized to 0.')
emb_init_fn_ = partial(torch.nn.init.normal_, mean=0.0, std=std)
if verbose > 1:
warnings.warn(f'Embedding layer initialized using normal distribution with mean=0 and std={std!r}.')
elif emb_init_uniform_lim is not None:
lim = emb_init_uniform_lim
if isinstance(lim, Sequence):
if len(lim) > 2:
raise ValueError(f'Uniform init requires a min and a max limit. User input: {lim}.')
if lim[0] == lim[1]:
warnings.warn(f'Embedding layer initialized to {lim[0]}.')
else:
if lim == 0:
warnings.warn('Embedding layer initialized to 0.')
lim = [-lim, lim]
(a, b) = lim
emb_init_fn_ = partial(torch.nn.init.uniform_, a=a, b=b)
if verbose > 1:
warnings.warn(f'Embedding layer initialized using uniform distribution in range {lim}.')
else:
emb_init_fn_ = init_fn_
emb_init_fn_(module.weight)
elif isinstance(module, tuple(set(NORM_CLASS_REGISTRY.values()))):
if verbose > 1:
warnings.warn(
'Norm weights are set to 1. If norm layer has a bias it is initialized to 0.'
)
if hasattr(module, 'weight') and module.weight is not None:
torch.nn.init.ones_(module.weight)
if hasattr(module, 'bias') and module.bias is not None:
torch.nn.init.zeros_(module.bias)
elif isinstance(module, nn.MultiheadAttention):
if module._qkv_same_embed_dim:
_extracted_from_generic_param_init_fn__69(module, d_model, init_fn_)
else:
assert module.q_proj_weight is not None and module.k_proj_weight is not None and (module.v_proj_weight is not None)
assert module.in_proj_weight is None
init_fn_(module.q_proj_weight)
init_fn_(module.k_proj_weight)
init_fn_(module.v_proj_weight)
if module.in_proj_bias is not None:
torch.nn.init.zeros_(module.in_proj_bias)
if module.bias_k is not None:
torch.nn.init.zeros_(module.bias_k)
if module.bias_v is not None:
torch.nn.init.zeros_(module.bias_v)
init_fn_(module.out_proj.weight)
if init_div_is_residual is not False and getattr(module.out_proj, '_is_residual', False):
with torch.no_grad():
module.out_proj.weight.div_(div_is_residual)
if module.out_proj.bias is not None:
torch.nn.init.zeros_(module.out_proj.bias)
else:
for _ in module.parameters(recurse=False):
raise NotImplementedError(f'{module.__class__.__name__} parameters are not initialized by param_init_fn.')
# TODO Rename this here and in `generic_param_init_fn_`
def _extracted_from_generic_param_init_fn__69(module, d_model, init_fn_):
assert module.in_proj_weight is not None
assert module.q_proj_weight is None and module.k_proj_weight is None and (module.v_proj_weight is None)
assert d_model is not None
_d = d_model
splits = (0, _d, 2 * _d, 3 * _d)
for (s, e) in zip(splits[:-1], splits[1:]):
init_fn_(module.in_proj_weight[s:e])
def _normal_init_(std, mean=0.0):
return partial(torch.nn.init.normal_, mean=mean, std=std)
def _normal_param_init_fn_(module: nn.Module, std: float, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):
del kwargs
init_fn_ = _normal_init_(std=std)
if verbose > 1:
warnings.warn(f'Using torch.nn.init.normal_ init fn mean=0.0, std={std}')
generic_param_init_fn_(module=module, init_fn_=init_fn_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def baseline_param_init_fn_(module: nn.Module, init_std: float, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):
del kwargs
if init_std is None:
raise ValueError("You must set model.init_config['init_std'] to a float value to use the default initialization scheme.")
_normal_param_init_fn_(module=module, std=init_std, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def small_param_init_fn_(module: nn.Module, n_layers: int, d_model: int, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):
del kwargs
std = math.sqrt(2 / (5 * d_model))
_normal_param_init_fn_(module=module, std=std, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def neox_param_init_fn_(module: nn.Module, n_layers: int, d_model: int, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):
"""From section 2.3.1 of GPT-NeoX-20B:
An Open-Source AutoregressiveLanguage Model — Black et. al. (2022)
see https://github.com/EleutherAI/gpt-neox/blob/9610391ab319403cef079b438edd016a2443af54/megatron/model/init_functions.py#L151
and https://github.com/EleutherAI/gpt-neox/blob/main/megatron/model/transformer.py
"""
del kwargs
residual_div = n_layers / math.sqrt(10)
if verbose > 1:
warnings.warn(f'setting init_div_is_residual to {residual_div}')
small_param_init_fn_(module=module, d_model=d_model, n_layers=n_layers, init_div_is_residual=residual_div, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def kaiming_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, fan_mode: str='fan_in', init_nonlinearity: str='leaky_relu', verbose: int=0, **kwargs):
del kwargs
if verbose > 1:
warnings.warn(
f'Using nn.init.kaiming_uniform_ init fn with parameters: a={init_gain}, mode={fan_mode}, nonlinearity={init_nonlinearity}'
)
kaiming_uniform_ = partial(nn.init.kaiming_uniform_, a=init_gain, mode=fan_mode, nonlinearity=init_nonlinearity)
generic_param_init_fn_(module=module, init_fn_=kaiming_uniform_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def kaiming_normal_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, fan_mode: str='fan_in', init_nonlinearity: str='leaky_relu', verbose: int=0, **kwargs):
del kwargs
if verbose > 1:
warnings.warn(
f'Using nn.init.kaiming_normal_ init fn with parameters: a={init_gain}, mode={fan_mode}, nonlinearity={init_nonlinearity}'
)
kaiming_normal_ = partial(torch.nn.init.kaiming_normal_, a=init_gain, mode=fan_mode, nonlinearity=init_nonlinearity)
generic_param_init_fn_(module=module, init_fn_=kaiming_normal_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def xavier_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, verbose: int=0, **kwargs):
del kwargs
xavier_uniform_ = partial(torch.nn.init.xavier_uniform_, gain=init_gain)
if verbose > 1:
warnings.warn(
f'Using torch.nn.init.xavier_uniform_ init fn with parameters: gain={init_gain}'
)
generic_param_init_fn_(module=module, init_fn_=xavier_uniform_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
def xavier_normal_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, verbose: int=0, **kwargs):
xavier_normal_ = partial(torch.nn.init.xavier_normal_, gain=init_gain)
if verbose > 1:
warnings.warn(
f'Using torch.nn.init.xavier_normal_ init fn with parameters: gain={init_gain}'
)
generic_param_init_fn_(module=module, init_fn_=xavier_normal_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)
MODEL_INIT_REGISTRY = {'default_': torch_default_param_init_fn_, 'baseline_': baseline_param_init_fn_, 'kaiming_uniform_': kaiming_uniform_param_init_fn_, 'kaiming_normal_': kaiming_normal_param_init_fn_, 'neox_init_': neox_param_init_fn_, 'small_init_': small_param_init_fn_, 'xavier_uniform_': xavier_uniform_param_init_fn_, 'xavier_normal_': xavier_normal_param_init_fn_}
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/nets/PixArt.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
# --------------------------------------------------------
# References:
# GLIDE: https://github.com/openai/glide-text2im
# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py
# --------------------------------------------------------
import math
import torch
import torch.nn as nn
import os
import numpy as np
from timm.models.layers import DropPath
from timm.models.vision_transformer import PatchEmbed, Mlp
from diffusion.model.builder import MODELS
from diffusion.model.utils import auto_grad_checkpoint, to_2tuple
from diffusion.model.nets.PixArt_blocks import t2i_modulate, CaptionEmbedder, WindowAttention, MultiHeadCrossAttention, T2IFinalLayer, TimestepEmbedder, LabelEmbedder, FinalLayer
from diffusion.utils.logger import get_root_logger
from diffusion.model.cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init
import json
class PixArtBlock(nn.Module):
"""
A PixArt block with adaptive layer norm (adaLN-single) conditioning.
"""
def __init__(self, hidden_size, num_heads, mlp_ratio=4.0, drop_path=0., window_size=0, input_size=None, use_rel_pos=False, **block_kwargs):
super().__init__()
self.hidden_size = hidden_size
self.norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.attn = WindowAttention(hidden_size, num_heads=num_heads, qkv_bias=True,
input_size=input_size if window_size == 0 else (window_size, window_size),
use_rel_pos=use_rel_pos, **block_kwargs)
self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads, **block_kwargs)
self.norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
# to be compatible with lower version pytorch
approx_gelu = lambda: nn.GELU(approximate="tanh")
self.mlp = Mlp(in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0)
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
self.window_size = window_size
self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size ** 0.5)
def forward(self, x, y, t, current, cache_dic, mask=None, **kwargs):
B, N, C = x.shape
shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None] + t.reshape(B, 6, -1)).chunk(6, dim=1)
is_force_fresh = global_force_fresh(cache_dic, current)
current['is_force_fresh'] = is_force_fresh
if is_force_fresh: # Compute all tokens, and save them to cache
current['module'] = 'attn'
cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['attn_map'][-1][current['layer']] = self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa))#.reshape(B, N, C)
force_init(cache_dic, current, x)
x = x + self.drop_path(gate_msa * cache_dic['cache'][-1][current['layer']][current['module']])
current['module'] = 'cross-attn'
cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['cross_attn_map'][-1][current['layer']] = self.cross_attn(x, y, mask)
force_init(cache_dic, current, x)
x = x + cache_dic['cache'][-1][current['layer']][current['module']]
current['module'] = 'mlp'
cache_dic['cache'][-1][current['layer']][current['module']] = self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp))
force_init(cache_dic, current, x)
x = x + self.drop_path(gate_mlp * cache_dic['cache'][-1][current['layer']][current['module']])
else:
current['module'] = 'attn'
# no partial computation for attn. if you want to have an exploration, below may help.
#fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)
#fresh_tokens, fresh_attn_map = self.attn(t2i_modulate(self.norm1(fresh_tokens), shift_msa, scale_msa))#.reshape(B, N, C)
#update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_attn_map)
#cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['attn_map'][-1][current['layer']] = self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa))#.reshape(B, N, C)
x = x + self.drop_path(gate_msa * cache_dic['cache'][-1][current['layer']][current['module']])
current['module'] = 'cross-attn'
fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)
fresh_tokens, fresh_cross_attn_map = self.cross_attn(fresh_tokens, y, mask)
update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map)
x = x + cache_dic['cache'][-1][current['layer']][current['module']]
current['module'] = 'mlp'
fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)
fresh_tokens = self.mlp(t2i_modulate(self.norm2(fresh_tokens), shift_mlp, scale_mlp))
update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current)
x = x + self.drop_path(gate_mlp * cache_dic['cache'][-1][current['layer']][current['module']])
return x
#############################################################################
# Core PixArt Model #
#################################################################################
@MODELS.register_module()
class PixArt(nn.Module):
"""
Diffusion model with a Transformer backbone.
"""
def __init__(self, input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, pred_sigma=True, drop_path: float = 0., window_size=0, window_block_indexes=None, use_rel_pos=False, caption_channels=4096, lewei_scale=1.0, config=None, model_max_length=120, **kwargs):
if window_block_indexes is None:
window_block_indexes = []
super().__init__()
self.pred_sigma = pred_sigma
self.in_channels = in_channels
self.out_channels = in_channels * 2 if pred_sigma else in_channels
self.patch_size = patch_size
self.num_heads = num_heads
self.lewei_scale = lewei_scale,
self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size, bias=True)
self.t_embedder = TimestepEmbedder(hidden_size)
num_patches = self.x_embedder.num_patches
self.base_size = input_size // self.patch_size
# Will use fixed sin-cos embedding:
self.register_buffer("pos_embed", torch.zeros(1, num_patches, hidden_size))
approx_gelu = lambda: nn.GELU(approximate="tanh")
self.t_block = nn.Sequential(
nn.SiLU(),
nn.Linear(hidden_size, 6 * hidden_size, bias=True)
)
self.y_embedder = CaptionEmbedder(in_channels=caption_channels, hidden_size=hidden_size, uncond_prob=class_dropout_prob, act_layer=approx_gelu, token_num=model_max_length)
drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)] # stochastic depth decay rule
self.blocks = nn.ModuleList([
PixArtBlock(hidden_size, num_heads, mlp_ratio=mlp_ratio, drop_path=drop_path[i],
input_size=(input_size // patch_size, input_size // patch_size),
window_size=window_size if i in window_block_indexes else 0,
use_rel_pos=use_rel_pos if i in window_block_indexes else False)
for i in range(depth)
])
self.final_layer = T2IFinalLayer(hidden_size, patch_size, self.out_channels)
self.initialize_weights()
if config:
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
logger.warning(f"lewei scale: {self.lewei_scale}, base size: {self.base_size}")
else:
print(f'Warning: lewei scale: {self.lewei_scale}, base size: {self.base_size}')
def forward(self, x, timestep, current, cache_dic, y, mask=None, data_info=None, **kwargs):
"""
Forward pass of PixArt.
x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)
t: (N,) tensor of diffusion timesteps
y: (N, 1, 120, C) tensor of class labels
"""
x = x.to(self.dtype)
timestep = timestep.to(self.dtype)
y = y.to(self.dtype)
pos_embed = self.pos_embed.to(self.dtype)
self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size
x = self.x_embedder(x) + pos_embed # (N, T, D), where T = H * W / patch_size ** 2
t = self.t_embedder(timestep.to(x.dtype)) # (N, D)
t0 = self.t_block(t)
y = self.y_embedder(y, self.training) # (N, 1, L, D)
if mask is not None:
if mask.shape[0] != y.shape[0]:
mask = mask.repeat(y.shape[0] // mask.shape[0], 1)
mask = mask.squeeze(1).squeeze(1)
y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])
y_lens = mask.sum(dim=1).tolist()
else:
y_lens = [y.shape[2]] * y.shape[0]
y = y.squeeze(1).view(1, -1, x.shape[-1])
for i, block in enumerate(self.blocks):
current['layer'] = i
x = auto_grad_checkpoint(block, x, y, t0, current, cache_dic, y_lens) # (N, T, D) #support grad checkpoint
x = self.final_layer(x, t) # (N, T, patch_size ** 2 * out_channels)
x = self.unpatchify(x) # (N, out_channels, H, W)
return x
def forward_with_dpmsolver(self, x, timestep, current, cache_dic, y, mask=None, **kwargs):
"""
dpm solver donnot need variance prediction
"""
# https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
model_out = self.forward(x, timestep, current, cache_dic, y, mask)
return model_out.chunk(2, dim=1)[0]
def forward_with_cfg(self, x, timestep, current, cache_dic, y, cfg_scale, mask=None, **kwargs):
"""
Forward pass of PixArt, but also batches the unconditional forward pass for classifier-free guidance.
"""
# https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
half = x[: len(x) // 2]
combined = torch.cat([half, half], dim=0)
model_out = self.forward(combined, timestep, current, cache_dic, y, mask, kwargs)
model_out = model_out['x'] if isinstance(model_out, dict) else model_out
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)
eps = torch.cat([half_eps, half_eps], dim=0)
return torch.cat([eps, rest], dim=1)
def unpatchify(self, x):
"""
x: (N, T, patch_size**2 * C)
imgs: (N, H, W, C)
"""
c = self.out_channels
p = self.x_embedder.patch_size[0]
h = w = int(x.shape[1] ** 0.5)
assert h * w == x.shape[1]
x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
x = torch.einsum('nhwpqc->nchpwq', x)
return x.reshape(shape=(x.shape[0], c, h * p, h * p))
def initialize_weights(self):
# Initialize transformer layers:
def _basic_init(module):
if isinstance(module, nn.Linear):
torch.nn.init.xavier_uniform_(module.weight)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
self.apply(_basic_init)
# Initialize (and freeze) pos_embed by sin-cos embedding:
pos_embed = get_2d_sincos_pos_embed(self.pos_embed.shape[-1], int(self.x_embedder.num_patches ** 0.5), lewei_scale=self.lewei_scale, base_size=self.base_size)
self.pos_embed.data.copy_(torch.from_numpy(pos_embed).float().unsqueeze(0))
# Initialize patch_embed like nn.Linear (instead of nn.Conv2d):
w = self.x_embedder.proj.weight.data
nn.init.xavier_uniform_(w.view([w.shape[0], -1]))
# Initialize timestep embedding MLP:
nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)
nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)
nn.init.normal_(self.t_block[1].weight, std=0.02)
# Initialize caption embedding MLP:
nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)
nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)
# Zero-out adaLN modulation layers in PixArt blocks:
for block in self.blocks:
nn.init.constant_(block.cross_attn.proj.weight, 0)
nn.init.constant_(block.cross_attn.proj.bias, 0)
# Zero-out output layers:
nn.init.constant_(self.final_layer.linear.weight, 0)
nn.init.constant_(self.final_layer.linear.bias, 0)
@property
def dtype(self):
return next(self.parameters()).dtype
def get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra_tokens=0, lewei_scale=1.0, base_size=16):
"""
grid_size: int of the grid height and width
return:
pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)
"""
if isinstance(grid_size, int):
grid_size = to_2tuple(grid_size)
grid_h = np.arange(grid_size[0], dtype=np.float32) / (grid_size[0]/base_size) / lewei_scale
grid_w = np.arange(grid_size[1], dtype=np.float32) / (grid_size[1]/base_size) / lewei_scale
grid = np.meshgrid(grid_w, grid_h) # here w goes first
grid = np.stack(grid, axis=0)
grid = grid.reshape([2, 1, grid_size[1], grid_size[0]])
pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)
if cls_token and extra_tokens > 0:
pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)
return pos_embed
def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
assert embed_dim % 2 == 0
# use half of dimensions to encode grid_h
emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0]) # (H*W, D/2)
emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1]) # (H*W, D/2)
return np.concatenate([emb_h, emb_w], axis=1)
def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
"""
embed_dim: output dimension for each position
pos: a list of positions to be encoded: size (M,)
out: (M, D)
"""
assert embed_dim % 2 == 0
omega = np.arange(embed_dim // 2, dtype=np.float64)
omega /= embed_dim / 2.
omega = 1. / 10000 ** omega # (D/2,)
pos = pos.reshape(-1) # (M,)
out = np.einsum('m,d->md', pos, omega) # (M, D/2), outer product
emb_sin = np.sin(out) # (M, D/2)
emb_cos = np.cos(out) # (M, D/2)
return np.concatenate([emb_sin, emb_cos], axis=1)
#################################################################################
# PixArt Configs #
#################################################################################
@MODELS.register_module()
def PixArt_XL_2(**kwargs):
return PixArt(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/nets/PixArtMS.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
# --------------------------------------------------------
# References:
# GLIDE: https://github.com/openai/glide-text2im
# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py
# --------------------------------------------------------
import torch
import torch.nn as nn
from timm.models.layers import DropPath
from timm.models.vision_transformer import Mlp
from diffusion.model.builder import MODELS
from diffusion.model.utils import auto_grad_checkpoint, to_2tuple
from diffusion.model.nets.PixArt_blocks import t2i_modulate, CaptionEmbedder, WindowAttention, MultiHeadCrossAttention, T2IFinalLayer, TimestepEmbedder, SizeEmbedder
from diffusion.model.nets.PixArt import PixArt, get_2d_sincos_pos_embed
class PatchEmbed(nn.Module):
""" 2D Image to Patch Embedding
"""
def __init__(
self,
patch_size=16,
in_chans=3,
embed_dim=768,
norm_layer=None,
flatten=True,
bias=True,
):
super().__init__()
patch_size = to_2tuple(patch_size)
self.patch_size = patch_size
self.flatten = flatten
self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, bias=bias)
self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()
def forward(self, x):
x = self.proj(x)
if self.flatten:
x = x.flatten(2).transpose(1, 2) # BCHW -> BNC
x = self.norm(x)
return x
class PixArtMSBlock(nn.Module):
"""
A PixArt block with adaptive layer norm zero (adaLN-Zero) conditioning.
"""
def __init__(self, hidden_size, num_heads, mlp_ratio=4.0, drop_path=0., window_size=0, input_size=None, use_rel_pos=False, **block_kwargs):
super().__init__()
self.hidden_size = hidden_size
self.norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.attn = WindowAttention(hidden_size, num_heads=num_heads, qkv_bias=True,
input_size=input_size if window_size == 0 else (window_size, window_size),
use_rel_pos=use_rel_pos, **block_kwargs)
self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads, **block_kwargs)
self.norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
# to be compatible with lower version pytorch
approx_gelu = lambda: nn.GELU(approximate="tanh")
self.mlp = Mlp(in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0)
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
self.window_size = window_size
self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size ** 0.5)
def forward(self, x, y, t, mask=None, **kwargs):
B, N, C = x.shape
shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None] + t.reshape(B, 6, -1)).chunk(6, dim=1)
x = x + self.drop_path(gate_msa * self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa)))
x = x + self.cross_attn(x, y, mask)
x = x + self.drop_path(gate_mlp * self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)))
return x
#############################################################################
# Core PixArt Model #
#################################################################################
@MODELS.register_module()
class PixArtMS(PixArt):
"""
Diffusion model with a Transformer backbone.
"""
def __init__(self, input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, learn_sigma=True, pred_sigma=True, drop_path: float = 0., window_size=0, window_block_indexes=None, use_rel_pos=False, caption_channels=4096, lewei_scale=1., config=None, model_max_length=120, **kwargs):
if window_block_indexes is None:
window_block_indexes = []
super().__init__(
input_size=input_size,
patch_size=patch_size,
in_channels=in_channels,
hidden_size=hidden_size,
depth=depth,
num_heads=num_heads,
mlp_ratio=mlp_ratio,
class_dropout_prob=class_dropout_prob,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
drop_path=drop_path,
window_size=window_size,
window_block_indexes=window_block_indexes,
use_rel_pos=use_rel_pos,
lewei_scale=lewei_scale,
config=config,
model_max_length=model_max_length,
**kwargs,
)
self.h = self.w = 0
approx_gelu = lambda: nn.GELU(approximate="tanh")
self.t_block = nn.Sequential(
nn.SiLU(),
nn.Linear(hidden_size, 6 * hidden_size, bias=True)
)
self.x_embedder = PatchEmbed(patch_size, in_channels, hidden_size, bias=True)
self.y_embedder = CaptionEmbedder(in_channels=caption_channels, hidden_size=hidden_size, uncond_prob=class_dropout_prob, act_layer=approx_gelu, token_num=model_max_length)
self.csize_embedder = SizeEmbedder(hidden_size//3) # c_size embed
self.ar_embedder = SizeEmbedder(hidden_size//3) # aspect ratio embed
drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)] # stochastic depth decay rule
self.blocks = nn.ModuleList([
PixArtMSBlock(hidden_size, num_heads, mlp_ratio=mlp_ratio, drop_path=drop_path[i],
input_size=(input_size // patch_size, input_size // patch_size),
window_size=window_size if i in window_block_indexes else 0,
use_rel_pos=use_rel_pos if i in window_block_indexes else False)
for i in range(depth)
])
self.final_layer = T2IFinalLayer(hidden_size, patch_size, self.out_channels)
self.initialize()
def forward(self, x, timestep, y, mask=None, data_info=None, **kwargs):
"""
Forward pass of PixArt.
x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)
t: (N,) tensor of diffusion timesteps
y: (N, 1, 120, C) tensor of class labels
"""
bs = x.shape[0]
x = x.to(self.dtype)
timestep = timestep.to(self.dtype)
y = y.to(self.dtype)
c_size, ar = data_info['img_hw'].to(self.dtype), data_info['aspect_ratio'].to(self.dtype)
self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size
pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(x.device).to(self.dtype)
x = self.x_embedder(x) + pos_embed # (N, T, D), where T = H * W / patch_size ** 2
t = self.t_embedder(timestep) # (N, D)
csize = self.csize_embedder(c_size, bs) # (N, D)
ar = self.ar_embedder(ar, bs) # (N, D)
t = t + torch.cat([csize, ar], dim=1)
t0 = self.t_block(t)
y = self.y_embedder(y, self.training) # (N, D)
if mask is not None:
if mask.shape[0] != y.shape[0]:
mask = mask.repeat(y.shape[0] // mask.shape[0], 1)
mask = mask.squeeze(1).squeeze(1)
y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])
y_lens = mask.sum(dim=1).tolist()
else:
y_lens = [y.shape[2]] * y.shape[0]
y = y.squeeze(1).view(1, -1, x.shape[-1])
for block in self.blocks:
x = auto_grad_checkpoint(block, x, y, t0, y_lens, **kwargs) # (N, T, D) #support grad checkpoint
x = self.final_layer(x, t) # (N, T, patch_size ** 2 * out_channels)
x = self.unpatchify(x) # (N, out_channels, H, W)
return x
def forward_with_dpmsolver(self, x, timestep, y, data_info, **kwargs):
"""
dpm solver donnot need variance prediction
"""
# https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
model_out = self.forward(x, timestep, y, data_info=data_info, **kwargs)
return model_out.chunk(2, dim=1)[0]
def forward_with_cfg(self, x, timestep, y, cfg_scale, data_info, **kwargs):
"""
Forward pass of PixArt, but also batches the unconditional forward pass for classifier-free guidance.
"""
# https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
half = x[: len(x) // 2]
combined = torch.cat([half, half], dim=0)
model_out = self.forward(combined, timestep, y, data_info=data_info)
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)
eps = torch.cat([half_eps, half_eps], dim=0)
return torch.cat([eps, rest], dim=1)
def unpatchify(self, x):
"""
x: (N, T, patch_size**2 * C)
imgs: (N, H, W, C)
"""
c = self.out_channels
p = self.x_embedder.patch_size[0]
assert self.h * self.w == x.shape[1]
x = x.reshape(shape=(x.shape[0], self.h, self.w, p, p, c))
x = torch.einsum('nhwpqc->nchpwq', x)
return x.reshape(shape=(x.shape[0], c, self.h * p, self.w * p))
def initialize(self):
# Initialize transformer layers:
def _basic_init(module):
if isinstance(module, nn.Linear):
torch.nn.init.xavier_uniform_(module.weight)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
self.apply(_basic_init)
# Initialize patch_embed like nn.Linear (instead of nn.Conv2d):
w = self.x_embedder.proj.weight.data
nn.init.xavier_uniform_(w.view([w.shape[0], -1]))
# Initialize timestep embedding MLP:
nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)
nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)
nn.init.normal_(self.t_block[1].weight, std=0.02)
nn.init.normal_(self.csize_embedder.mlp[0].weight, std=0.02)
nn.init.normal_(self.csize_embedder.mlp[2].weight, std=0.02)
nn.init.normal_(self.ar_embedder.mlp[0].weight, std=0.02)
nn.init.normal_(self.ar_embedder.mlp[2].weight, std=0.02)
# Initialize caption embedding MLP:
nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)
nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)
# Zero-out adaLN modulation layers in PixArt blocks:
for block in self.blocks:
nn.init.constant_(block.cross_attn.proj.weight, 0)
nn.init.constant_(block.cross_attn.proj.bias, 0)
# Zero-out output layers:
nn.init.constant_(self.final_layer.linear.weight, 0)
nn.init.constant_(self.final_layer.linear.bias, 0)
#################################################################################
# PixArt Configs #
#################################################################################
@MODELS.register_module()
def PixArtMS_XL_2(**kwargs):
return PixArtMS(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/nets/PixArt_blocks.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
# --------------------------------------------------------
# References:
# GLIDE: https://github.com/openai/glide-text2im
# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py
# --------------------------------------------------------
import math
import torch
import torch.nn as nn
from timm.models.vision_transformer import Mlp, Attention as Attention_
from einops import rearrange, repeat
import xformers.ops
from diffusion.model.utils import add_decomposed_rel_pos
from diffusion.model.cache_functions import cached_attention_forward
def modulate(x, shift, scale):
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
def t2i_modulate(x, shift, scale):
return x * (1 + scale) + shift
class MultiHeadCrossAttention(nn.Module):
def __init__(self, d_model, num_heads, attn_drop=0., proj_drop=0., **block_kwargs):
super(MultiHeadCrossAttention, self).__init__()
assert d_model % num_heads == 0, "d_model must be divisible by num_heads"
self.d_model = d_model
self.num_heads = num_heads
self.head_dim = d_model // num_heads
self.q_linear = nn.Linear(d_model, d_model)
self.kv_linear = nn.Linear(d_model, d_model*2)
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(d_model, d_model)
self.proj_drop = nn.Dropout(proj_drop)
def forward(self, x, cond, mask=None):
# query: img tokens; key/value: condition; mask: if padding tokens
B, N, C = x.shape
q = self.q_linear(x).view(1, -1, self.num_heads, self.head_dim)
kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)
k, v = kv.unbind(2)
attn_bias = None
if mask is not None:
attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)
#x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
# we need to save the cross-attn map here, so we use our own function for cross-attention, not the xformers.ops.memory_efficient_attention
# maybe there is a future version of xformers.ops.memory_efficient_attention that can return the attn_map
x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
x = x.view(B, -1, C)
attn_map = attn_map.view(B, -1, attn_map.shape[-1])
x = self.proj(x)
x = self.proj_drop(x)
#q = self.q_linear(x).reshape(B, -1, self.num_heads, self.head_dim)
#kv = self.kv_linear(cond).reshape(B, -1, 2, self.num_heads, self.head_dim)
#k, v = kv.unbind(2)
#attn_bias = None
#if mask is not None:
# attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)
# attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float('-inf'))
##x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
#x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
#x = x.contiguous().reshape(B, -1, C)
#x = self.proj(x)
#x = self.proj_drop(x)
return x, attn_map
class WindowAttention(Attention_):
"""Multi-head Attention block with relative position embeddings."""
def __init__(
self,
dim,
num_heads=8,
qkv_bias=True,
use_rel_pos=False,
rel_pos_zero_init=True,
input_size=None,
**block_kwargs,
):
"""
Args:
dim (int): Number of input channels.
num_heads (int): Number of attention heads.
qkv_bias (bool: If True, add a learnable bias to query, key, value.
rel_pos (bool): If True, add relative positional embeddings to the attention map.
rel_pos_zero_init (bool): If True, zero initialize relative positional parameters.
input_size (int or None): Input resolution for calculating the relative positional
parameter size.
"""
super().__init__(dim, num_heads=num_heads, qkv_bias=qkv_bias, **block_kwargs)
self.use_rel_pos = use_rel_pos
if self.use_rel_pos:
# initialize relative positional embeddings
self.rel_pos_h = nn.Parameter(torch.zeros(2 * input_size[0] - 1, self.head_dim))
self.rel_pos_w = nn.Parameter(torch.zeros(2 * input_size[1] - 1, self.head_dim))
if not rel_pos_zero_init:
nn.init.trunc_normal_(self.rel_pos_h, std=0.02)
nn.init.trunc_normal_(self.rel_pos_w, std=0.02)
def forward(self, x, mask=None):
B, N, C = x.shape
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads)
q, k, v = qkv.unbind(2)
if use_fp32_attention := getattr(self, 'fp32_attention', False):
q, k, v = q.float(), k.float(), v.float()
attn_bias = None
if mask is not None:
attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)
attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float('-inf'))
#x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
#attn_map = None
# we need to save the self-attn map here, so we use our own function for self-attention, not the xformers.ops.memory_efficient_attention
# maybe there is a future version of xformers.ops.memory_efficient_attention that can return the attn_map
# However, you can use the xformers.ops.memory_efficient_attention for self-attention, and use our own function for cross-attention.
# This is because in our final version, only cross attention map is used, you can use the xformers.ops.memory_efficient_attention for self-attention for a faster speed, if you don't need the self-attention score(s1).
x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)
x = x.view(B, N, C)
x = self.proj(x)
x = self.proj_drop(x)
return x, attn_map
#################################################################################
# AMP attention with fp32 softmax to fix loss NaN problem during training #
#################################################################################
class Attention(Attention_):
def forward(self, x):
B, N, C = x.shape
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
q, k, v = qkv.unbind(0) # make torchscript happy (cannot use tensor as tuple)
use_fp32_attention = getattr(self, 'fp32_attention', False)
if use_fp32_attention:
q, k = q.float(), k.float()
with torch.cuda.amp.autocast(enabled=not use_fp32_attention):
attn = (q @ k.transpose(-2, -1)) * self.scale
attn = attn.softmax(dim=-1)
attn = self.attn_drop(attn)
x = (attn @ v).transpose(1, 2).reshape(B, N, C)
x = self.proj(x)
x = self.proj_drop(x)
return x
class FinalLayer(nn.Module):
"""
The final layer of PixArt.
"""
def __init__(self, hidden_size, patch_size, out_channels):
super().__init__()
self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
self.adaLN_modulation = nn.Sequential(
nn.SiLU(),
nn.Linear(hidden_size, 2 * hidden_size, bias=True)
)
def forward(self, x, c):
shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)
x = modulate(self.norm_final(x), shift, scale)
x = self.linear(x)
return x
class T2IFinalLayer(nn.Module):
"""
The final layer of PixArt.
"""
def __init__(self, hidden_size, patch_size, out_channels):
super().__init__()
self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
self.scale_shift_table = nn.Parameter(torch.randn(2, hidden_size) / hidden_size ** 0.5)
self.out_channels = out_channels
def forward(self, x, t):
shift, scale = (self.scale_shift_table[None] + t[:, None]).chunk(2, dim=1)
x = t2i_modulate(self.norm_final(x), shift, scale)
x = self.linear(x)
return x
class MaskFinalLayer(nn.Module):
"""
The final layer of PixArt.
"""
def __init__(self, final_hidden_size, c_emb_size, patch_size, out_channels):
super().__init__()
self.norm_final = nn.LayerNorm(final_hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(final_hidden_size, patch_size * patch_size * out_channels, bias=True)
self.adaLN_modulation = nn.Sequential(
nn.SiLU(),
nn.Linear(c_emb_size, 2 * final_hidden_size, bias=True)
)
def forward(self, x, t):
shift, scale = self.adaLN_modulation(t).chunk(2, dim=1)
x = modulate(self.norm_final(x), shift, scale)
x = self.linear(x)
return x
class DecoderLayer(nn.Module):
"""
The final layer of PixArt.
"""
def __init__(self, hidden_size, decoder_hidden_size):
super().__init__()
self.norm_decoder = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(hidden_size, decoder_hidden_size, bias=True)
self.adaLN_modulation = nn.Sequential(
nn.SiLU(),
nn.Linear(hidden_size, 2 * hidden_size, bias=True)
)
def forward(self, x, t):
shift, scale = self.adaLN_modulation(t).chunk(2, dim=1)
x = modulate(self.norm_decoder(x), shift, scale)
x = self.linear(x)
return x
#################################################################################
# Embedding Layers for Timesteps and Class Labels #
#################################################################################
class TimestepEmbedder(nn.Module):
"""
Embeds scalar timesteps into vector representations.
"""
def __init__(self, hidden_size, frequency_embedding_size=256):
super().__init__()
self.mlp = nn.Sequential(
nn.Linear(frequency_embedding_size, hidden_size, bias=True),
nn.SiLU(),
nn.Linear(hidden_size, hidden_size, bias=True),
)
self.frequency_embedding_size = frequency_embedding_size
@staticmethod
def timestep_embedding(t, dim, max_period=10000):
"""
Create sinusoidal timestep embeddings.
:param t: a 1-D Tensor of N indices, one per batch element.
These may be fractional.
:param dim: the dimension of the output.
:param max_period: controls the minimum frequency of the embeddings.
:return: an (N, D) Tensor of positional embeddings.
"""
# https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py
half = dim // 2
freqs = torch.exp(
-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32, device=t.device) / half)
args = t[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
return embedding
def forward(self, t):
t_freq = self.timestep_embedding(t, self.frequency_embedding_size).to(self.dtype)
return self.mlp(t_freq)
@property
def dtype(self):
# 返回模型参数的数据类型
return next(self.parameters()).dtype
class SizeEmbedder(TimestepEmbedder):
"""
Embeds scalar timesteps into vector representations.
"""
def __init__(self, hidden_size, frequency_embedding_size=256):
super().__init__(hidden_size=hidden_size, frequency_embedding_size=frequency_embedding_size)
self.mlp = nn.Sequential(
nn.Linear(frequency_embedding_size, hidden_size, bias=True),
nn.SiLU(),
nn.Linear(hidden_size, hidden_size, bias=True),
)
self.frequency_embedding_size = frequency_embedding_size
self.outdim = hidden_size
def forward(self, s, bs):
if s.ndim == 1:
s = s[:, None]
assert s.ndim == 2
if s.shape[0] != bs:
s = s.repeat(bs//s.shape[0], 1)
assert s.shape[0] == bs
b, dims = s.shape[0], s.shape[1]
s = rearrange(s, "b d -> (b d)")
s_freq = self.timestep_embedding(s, self.frequency_embedding_size).to(self.dtype)
s_emb = self.mlp(s_freq)
s_emb = rearrange(s_emb, "(b d) d2 -> b (d d2)", b=b, d=dims, d2=self.outdim)
return s_emb
@property
def dtype(self):
# 返回模型参数的数据类型
return next(self.parameters()).dtype
class LabelEmbedder(nn.Module):
"""
Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.
"""
def __init__(self, num_classes, hidden_size, dropout_prob):
super().__init__()
use_cfg_embedding = dropout_prob > 0
self.embedding_table = nn.Embedding(num_classes + use_cfg_embedding, hidden_size)
self.num_classes = num_classes
self.dropout_prob = dropout_prob
def token_drop(self, labels, force_drop_ids=None):
"""
Drops labels to enable classifier-free guidance.
"""
if force_drop_ids is None:
drop_ids = torch.rand(labels.shape[0]).cuda() < self.dropout_prob
else:
drop_ids = force_drop_ids == 1
labels = torch.where(drop_ids, self.num_classes, labels)
return labels
def forward(self, labels, train, force_drop_ids=None):
use_dropout = self.dropout_prob > 0
if (train and use_dropout) or (force_drop_ids is not None):
labels = self.token_drop(labels, force_drop_ids)
return self.embedding_table(labels)
class CaptionEmbedder(nn.Module):
"""
Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.
"""
def __init__(self, in_channels, hidden_size, uncond_prob, act_layer=nn.GELU(approximate='tanh'), token_num=120):
super().__init__()
self.y_proj = Mlp(in_features=in_channels, hidden_features=hidden_size, out_features=hidden_size, act_layer=act_layer, drop=0)
self.register_buffer("y_embedding", nn.Parameter(torch.randn(token_num, in_channels) / in_channels ** 0.5))
self.uncond_prob = uncond_prob
def token_drop(self, caption, force_drop_ids=None):
"""
Drops labels to enable classifier-free guidance.
"""
if force_drop_ids is None:
drop_ids = torch.rand(caption.shape[0]).cuda() < self.uncond_prob
else:
drop_ids = force_drop_ids == 1
caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)
return caption
def forward(self, caption, train, force_drop_ids=None):
if train:
assert caption.shape[2:] == self.y_embedding.shape
use_dropout = self.uncond_prob > 0
if (train and use_dropout) or (force_drop_ids is not None):
caption = self.token_drop(caption, force_drop_ids)
caption = self.y_proj(caption)
return caption
class CaptionEmbedderDoubleBr(nn.Module):
"""
Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.
"""
def __init__(self, in_channels, hidden_size, uncond_prob, act_layer=nn.GELU(approximate='tanh'), token_num=120):
super().__init__()
self.proj = Mlp(in_features=in_channels, hidden_features=hidden_size, out_features=hidden_size, act_layer=act_layer, drop=0)
self.embedding = nn.Parameter(torch.randn(1, in_channels) / 10 ** 0.5)
self.y_embedding = nn.Parameter(torch.randn(token_num, in_channels) / 10 ** 0.5)
self.uncond_prob = uncond_prob
def token_drop(self, global_caption, caption, force_drop_ids=None):
"""
Drops labels to enable classifier-free guidance.
"""
if force_drop_ids is None:
drop_ids = torch.rand(global_caption.shape[0]).cuda() < self.uncond_prob
else:
drop_ids = force_drop_ids == 1
global_caption = torch.where(drop_ids[:, None], self.embedding, global_caption)
caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)
return global_caption, caption
def forward(self, caption, train, force_drop_ids=None):
assert caption.shape[2: ] == self.y_embedding.shape
global_caption = caption.mean(dim=2).squeeze()
use_dropout = self.uncond_prob > 0
if (train and use_dropout) or (force_drop_ids is not None):
global_caption, caption = self.token_drop(global_caption, caption, force_drop_ids)
y_embed = self.proj(global_caption)
return y_embed, caption
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/nets/__init__.py
================================================
from .PixArt import PixArt, PixArt_XL_2
from .PixArtMS import PixArtMS, PixArtMS_XL_2, PixArtMSBlock
from .pixart_controlnet import ControlPixArtHalf, ControlPixArtMSHalf
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/nets/pixart_controlnet.py
================================================
import re
import torch
import torch.nn as nn
from copy import deepcopy
from torch import Tensor
from torch.nn import Module, Linear, init
from typing import Any, Mapping
from diffusion.model.nets import PixArtMSBlock, PixArtMS, PixArt
from diffusion.model.nets.PixArt import get_2d_sincos_pos_embed
from diffusion.model.utils import auto_grad_checkpoint
# The implementation of ControlNet-Half architrecture
# https://github.com/lllyasviel/ControlNet/discussions/188
class ControlT2IDitBlockHalf(Module):
def __init__(self, base_block: PixArtMSBlock, block_index: 0) -> None:
super().__init__()
self.copied_block = deepcopy(base_block)
self.block_index = block_index
for p in self.copied_block.parameters():
p.requires_grad_(True)
self.copied_block.load_state_dict(base_block.state_dict())
self.copied_block.train()
self.hidden_size = hidden_size = base_block.hidden_size
if self.block_index == 0:
self.before_proj = Linear(hidden_size, hidden_size)
init.zeros_(self.before_proj.weight)
init.zeros_(self.before_proj.bias)
self.after_proj = Linear(hidden_size, hidden_size)
init.zeros_(self.after_proj.weight)
init.zeros_(self.after_proj.bias)
def forward(self, x, y, t, mask=None, c=None):
if self.block_index == 0:
# the first block
c = self.before_proj(c)
c = self.copied_block(x + c, y, t, mask)
c_skip = self.after_proj(c)
else:
# load from previous c and produce the c for skip connection
c = self.copied_block(c, y, t, mask)
c_skip = self.after_proj(c)
return c, c_skip
# The implementation of ControlPixArtHalf net
class ControlPixArtHalf(Module):
# only support single res model
def __init__(self, base_model: PixArt, copy_blocks_num: int = 13) -> None:
super().__init__()
self.base_model = base_model.eval()
self.controlnet = []
self.copy_blocks_num = copy_blocks_num
self.total_blocks_num = len(base_model.blocks)
for p in self.base_model.parameters():
p.requires_grad_(False)
# Copy first copy_blocks_num block
for i in range(copy_blocks_num):
self.controlnet.append(ControlT2IDitBlockHalf(base_model.blocks[i], i))
self.controlnet = nn.ModuleList(self.controlnet)
def __getattr__(self, name: str) -> Tensor or Module:
if name in ['forward', 'forward_with_dpmsolver', 'forward_with_cfg', 'forward_c', 'load_state_dict']:
return self.__dict__[name]
elif name in ['base_model', 'controlnet']:
return super().__getattr__(name)
else:
return getattr(self.base_model, name)
def forward_c(self, c):
self.h, self.w = c.shape[-2]//self.patch_size, c.shape[-1]//self.patch_size
pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(c.device).to(self.dtype)
return self.x_embedder(c) + pos_embed if c is not None else c
# def forward(self, x, t, c, **kwargs):
# return self.base_model(x, t, c=self.forward_c(c), **kwargs)
def forward(self, x, timestep, y, mask=None, data_info=None, c=None, **kwargs):
# modify the original PixArtMS forward function
if c is not None:
c = c.to(self.dtype)
c = self.forward_c(c)
"""
Forward pass of PixArt.
x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)
t: (N,) tensor of diffusion timesteps
y: (N, 1, 120, C) tensor of class labels
"""
x = x.to(self.dtype)
timestep = timestep.to(self.dtype)
y = y.to(self.dtype)
pos_embed = self.pos_embed.to(self.dtype)
self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size
x = self.x_embedder(x) + pos_embed # (N, T, D), where T = H * W / patch_size ** 2
t = self.t_embedder(timestep.to(x.dtype)) # (N, D)
t0 = self.t_block(t)
y = self.y_embedder(y, self.training) # (N, 1, L, D)
if mask is not None:
if mask.shape[0] != y.shape[0]:
mask = mask.repeat(y.shape[0] // mask.shape[0], 1)
mask = mask.squeeze(1).squeeze(1)
y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])
y_lens = mask.sum(dim=1).tolist()
else:
y_lens = [y.shape[2]] * y.shape[0]
y = y.squeeze(1).view(1, -1, x.shape[-1])
# define the first layer
x = auto_grad_checkpoint(self.base_model.blocks[0], x, y, t0, y_lens, **kwargs) # (N, T, D) #support grad checkpoint
if c is not None:
# update c
for index in range(1, self.copy_blocks_num + 1):
c, c_skip = auto_grad_checkpoint(self.controlnet[index - 1], x, y, t0, y_lens, c, **kwargs)
x = auto_grad_checkpoint(self.base_model.blocks[index], x + c_skip, y, t0, y_lens, **kwargs)
# update x
for index in range(self.copy_blocks_num + 1, self.total_blocks_num):
x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)
else:
for index in range(1, self.total_blocks_num):
x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)
x = self.final_layer(x, t) # (N, T, patch_size ** 2 * out_channels)
x = self.unpatchify(x) # (N, out_channels, H, W)
return x
def forward_with_dpmsolver(self, x, t, y, data_info, c, **kwargs):
model_out = self.forward(x, t, y, data_info=data_info, c=c, **kwargs)
return model_out.chunk(2, dim=1)[0]
# def forward_with_dpmsolver(self, x, t, y, data_info, c, **kwargs):
# return self.base_model.forward_with_dpmsolver(x, t, y, data_info=data_info, c=self.forward_c(c), **kwargs)
def forward_with_cfg(self, x, t, y, cfg_scale, data_info, c, **kwargs):
return self.base_model.forward_with_cfg(x, t, y, cfg_scale, data_info, c=self.forward_c(c), **kwargs)
def load_state_dict(self, state_dict: Mapping[str, Any], strict: bool = True):
if all((k.startswith('base_model') or k.startswith('controlnet')) for k in state_dict.keys()):
return super().load_state_dict(state_dict, strict)
else:
new_key = {}
for k in state_dict.keys():
new_key[k] = re.sub(r"(blocks\.\d+)(.*)", r"\1.base_block\2", k)
for k, v in new_key.items():
if k != v:
print(f"replace {k} to {v}")
state_dict[v] = state_dict.pop(k)
return self.base_model.load_state_dict(state_dict, strict)
def unpatchify(self, x):
"""
x: (N, T, patch_size**2 * C)
imgs: (N, H, W, C)
"""
c = self.out_channels
p = self.x_embedder.patch_size[0]
assert self.h * self.w == x.shape[1]
x = x.reshape(shape=(x.shape[0], self.h, self.w, p, p, c))
x = torch.einsum('nhwpqc->nchpwq', x)
imgs = x.reshape(shape=(x.shape[0], c, self.h * p, self.w * p))
return imgs
@property
def dtype(self):
# 返回模型参数的数据类型
return next(self.parameters()).dtype
# The implementation for PixArtMS_Half + 1024 resolution
class ControlPixArtMSHalf(ControlPixArtHalf):
# support multi-scale res model (multi-scale model can also be applied to single reso training & inference)
def __init__(self, base_model: PixArtMS, copy_blocks_num: int = 13) -> None:
super().__init__(base_model=base_model, copy_blocks_num=copy_blocks_num)
def forward(self, x, timestep, y, mask=None, data_info=None, c=None, **kwargs):
# modify the original PixArtMS forward function
"""
Forward pass of PixArt.
x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)
t: (N,) tensor of diffusion timesteps
y: (N, 1, 120, C) tensor of class labels
"""
if c is not None:
c = c.to(self.dtype)
c = self.forward_c(c)
bs = x.shape[0]
x = x.to(self.dtype)
timestep = timestep.to(self.dtype)
y = y.to(self.dtype)
c_size, ar = data_info['img_hw'].to(self.dtype), data_info['aspect_ratio'].to(self.dtype)
self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size
pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(x.device).to(self.dtype)
x = self.x_embedder(x) + pos_embed # (N, T, D), where T = H * W / patch_size ** 2
t = self.t_embedder(timestep) # (N, D)
csize = self.csize_embedder(c_size, bs) # (N, D)
ar = self.ar_embedder(ar, bs) # (N, D)
t = t + torch.cat([csize, ar], dim=1)
t0 = self.t_block(t)
y = self.y_embedder(y, self.training) # (N, D)
if mask is not None:
if mask.shape[0] != y.shape[0]:
mask = mask.repeat(y.shape[0] // mask.shape[0], 1)
mask = mask.squeeze(1).squeeze(1)
y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])
y_lens = mask.sum(dim=1).tolist()
else:
y_lens = [y.shape[2]] * y.shape[0]
y = y.squeeze(1).view(1, -1, x.shape[-1])
# define the first layer
x = auto_grad_checkpoint(self.base_model.blocks[0], x, y, t0, y_lens, **kwargs) # (N, T, D) #support grad checkpoint
if c is not None:
# update c
for index in range(1, self.copy_blocks_num + 1):
c, c_skip = auto_grad_checkpoint(self.controlnet[index - 1], x, y, t0, y_lens, c, **kwargs)
x = auto_grad_checkpoint(self.base_model.blocks[index], x + c_skip, y, t0, y_lens, **kwargs)
# update x
for index in range(self.copy_blocks_num + 1, self.total_blocks_num):
x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)
else:
for index in range(1, self.total_blocks_num):
x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)
x = self.final_layer(x, t) # (N, T, patch_size ** 2 * out_channels)
x = self.unpatchify(x) # (N, out_channels, H, W)
return x
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/respace.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
import numpy as np
import torch as th
from .gaussian_diffusion import GaussianDiffusion
def space_timesteps(num_timesteps, section_counts):
"""
Create a list of timesteps to use from an original diffusion process,
given the number of timesteps we want to take from equally-sized portions
of the original process.
For example, if there's 300 timesteps and the section counts are [10,15,20]
then the first 100 timesteps are strided to be 10 timesteps, the second 100
are strided to be 15 timesteps, and the final 100 are strided to be 20.
If the stride is a string starting with "ddim", then the fixed striding
from the DDIM paper is used, and only one section is allowed.
:param num_timesteps: the number of diffusion steps in the original
process to divide up.
:param section_counts: either a list of numbers, or a string containing
comma-separated numbers, indicating the step count
per section. As a special case, use "ddimN" where N
is a number of steps to use the striding from the
DDIM paper.
:return: a set of diffusion steps from the original process to use.
"""
if isinstance(section_counts, str):
if section_counts.startswith("ddim"):
desired_count = int(section_counts[len("ddim") :])
for i in range(1, num_timesteps):
if len(range(0, num_timesteps, i)) == desired_count:
return set(range(0, num_timesteps, i))
raise ValueError(
f"cannot create exactly {num_timesteps} steps with an integer stride"
)
section_counts = [int(x) for x in section_counts.split(",")]
size_per = num_timesteps // len(section_counts)
extra = num_timesteps % len(section_counts)
start_idx = 0
all_steps = []
for i, section_count in enumerate(section_counts):
size = size_per + (1 if i < extra else 0)
if size < section_count:
raise ValueError(
f"cannot divide section of {size} steps into {section_count}"
)
frac_stride = 1 if section_count <= 1 else (size - 1) / (section_count - 1)
cur_idx = 0.0
taken_steps = []
for _ in range(section_count):
taken_steps.append(start_idx + round(cur_idx))
cur_idx += frac_stride
all_steps += taken_steps
start_idx += size
return set(all_steps)
class SpacedDiffusion(GaussianDiffusion):
"""
A diffusion process which can skip steps in a base diffusion process.
:param use_timesteps: a collection (sequence or set) of timesteps from the
original diffusion process to retain.
:param kwargs: the kwargs to create the base diffusion process.
"""
def __init__(self, use_timesteps, **kwargs):
self.use_timesteps = set(use_timesteps)
self.timestep_map = []
self.original_num_steps = len(kwargs["betas"])
base_diffusion = GaussianDiffusion(**kwargs) # pylint: disable=missing-kwoa
last_alpha_cumprod = 1.0
new_betas = []
for i, alpha_cumprod in enumerate(base_diffusion.alphas_cumprod):
if i in self.use_timesteps:
new_betas.append(1 - alpha_cumprod / last_alpha_cumprod)
last_alpha_cumprod = alpha_cumprod
self.timestep_map.append(i)
kwargs["betas"] = np.array(new_betas)
super().__init__(**kwargs)
def p_mean_variance(
self, model, *args, **kwargs
): # pylint: disable=signature-differs
return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
def training_losses(
self, model, *args, **kwargs
): # pylint: disable=signature-differs
return super().training_losses(self._wrap_model(model), *args, **kwargs)
def training_losses_diffusers(
self, model, *args, **kwargs
): # pylint: disable=signature-differs
return super().training_losses_diffusers(self._wrap_model(model), *args, **kwargs)
def condition_mean(self, cond_fn, *args, **kwargs):
return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)
def condition_score(self, cond_fn, *args, **kwargs):
return super().condition_score(self._wrap_model(cond_fn), *args, **kwargs)
def _wrap_model(self, model):
if isinstance(model, _WrappedModel):
return model
return _WrappedModel(
model, self.timestep_map, self.original_num_steps
)
def _scale_timesteps(self, t):
# Scaling is done by the wrapped model.
return t
class _WrappedModel:
def __init__(self, model, timestep_map, original_num_steps):
self.model = model
self.timestep_map = timestep_map
# self.rescale_timesteps = rescale_timesteps
self.original_num_steps = original_num_steps
def __call__(self, x, timestep, **kwargs):
map_tensor = th.tensor(self.timestep_map, device=timestep.device, dtype=timestep.dtype)
new_ts = map_tensor[timestep]
# if self.rescale_timesteps:
# new_ts = new_ts.float() * (1000.0 / self.original_num_steps)
return self.model(x, timestep=new_ts, **kwargs)
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/sa_solver.py
================================================
import torch
import torch.nn.functional as F
import math
from tqdm import tqdm
class NoiseScheduleVP:
def __init__(
self,
schedule='discrete',
betas=None,
alphas_cumprod=None,
continuous_beta_0=0.1,
continuous_beta_1=20.,
dtype=torch.float32,
):
"""Thanks to DPM-Solver for their code base"""
"""Create a wrapper class for the forward SDE (VP type).
***
Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t.
We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.
***
The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ).
We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper).
Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have:
log_alpha_t = self.marginal_log_mean_coeff(t)
sigma_t = self.marginal_std(t)
lambda_t = self.marginal_lambda(t)
Moreover, as lambda(t) is an invertible function, we also support its inverse function:
t = self.inverse_lambda(lambda_t)
===============================================================
We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).
1. For discrete-time DPMs:
For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by:
t_i = (i + 1) / N
e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1.
We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.
Args:
betas: A `torch.Tensor`. The beta array for the discrete-time DPM. (See the original DDPM paper for details)
alphas_cumprod: A `torch.Tensor`. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)
Note that we always have alphas_cumprod = cumprod(1 - betas). Therefore, we only need to set one of `betas` and `alphas_cumprod`.
**Important**: Please pay special attention for the args for `alphas_cumprod`:
The `alphas_cumprod` is the \hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that
q_{t_n | 0}(x_{t_n} | x_0) = N ( \sqrt{\hat{alpha_n}} * x_0, (1 - \hat{alpha_n}) * I ).
Therefore, the notation \hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have
alpha_{t_n} = \sqrt{\hat{alpha_n}},
and
log(alpha_{t_n}) = 0.5 * log(\hat{alpha_n}).
2. For continuous-time DPMs:
We support two types of VPSDEs: linear (DDPM) and cosine (improved-DDPM). The hyperparameters for the noise
schedule are the default settings in DDPM and improved-DDPM:
Args:
beta_min: A `float` number. The smallest beta for the linear schedule.
beta_max: A `float` number. The largest beta for the linear schedule.
cosine_s: A `float` number. The hyperparameter in the cosine schedule.
cosine_beta_max: A `float` number. The hyperparameter in the cosine schedule.
T: A `float` number. The ending time of the forward process.
===============================================================
Args:
schedule: A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs,
'linear' or 'cosine' for continuous-time DPMs.
Returns:
A wrapper object of the forward SDE (VP type).
===============================================================
Example:
# For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):
>>> ns = NoiseScheduleVP('discrete', betas=betas)
# For discrete-time DPMs, given alphas_cumprod (the \hat{alpha_n} array for n = 0, 1, ..., N - 1):
>>> ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)
# For continuous-time DPMs (VPSDE), linear schedule:
>>> ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)
"""
if schedule not in ['discrete', 'linear', 'cosine']:
raise ValueError(
f"Unsupported noise schedule {schedule}. The schedule needs to be 'discrete' or 'linear' or 'cosine'"
)
self.schedule = schedule
if schedule == 'discrete':
if betas is not None:
log_alphas = 0.5 * torch.log(1 - betas).cumsum(dim=0)
else:
assert alphas_cumprod is not None
log_alphas = 0.5 * torch.log(alphas_cumprod)
self.total_N = len(log_alphas)
self.T = 1.
self.t_array = torch.linspace(0., 1., self.total_N + 1)[1:].reshape((1, -1)).to(dtype=dtype)
self.log_alpha_array = log_alphas.reshape((1, -1,)).to(dtype=dtype)
else:
self.total_N = 1000
self.beta_0 = continuous_beta_0
self.beta_1 = continuous_beta_1
self.cosine_s = 0.008
self.cosine_beta_max = 999.
self.cosine_t_max = math.atan(self.cosine_beta_max * (1. + self.cosine_s) / math.pi) * 2. * (
1. + self.cosine_s) / math.pi - self.cosine_s
self.cosine_log_alpha_0 = math.log(math.cos(self.cosine_s / (1. + self.cosine_s) * math.pi / 2.))
self.schedule = schedule
self.T = 0.9946 if schedule == 'cosine' else 1.
def marginal_log_mean_coeff(self, t):
"""
Compute log(alpha_t) of a given continuous-time label t in [0, T].
"""
if self.schedule == 'discrete':
return interpolate_fn(t.reshape((-1, 1)), self.t_array.to(t.device),
self.log_alpha_array.to(t.device)).reshape((-1))
elif self.schedule == 'linear':
return -0.25 * t ** 2 * (self.beta_1 - self.beta_0) - 0.5 * t * self.beta_0
elif self.schedule == 'cosine':
log_alpha_fn = lambda s: torch.log(torch.cos((s + self.cosine_s) / (1. + self.cosine_s) * math.pi / 2.))
return log_alpha_fn(t) - self.cosine_log_alpha_0
def marginal_alpha(self, t):
"""
Compute alpha_t of a given continuous-time label t in [0, T].
"""
return torch.exp(self.marginal_log_mean_coeff(t))
def marginal_std(self, t):
"""
Compute sigma_t of a given continuous-time label t in [0, T].
"""
return torch.sqrt(1. - torch.exp(2. * self.marginal_log_mean_coeff(t)))
def marginal_lambda(self, t):
"""
Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].
"""
log_mean_coeff = self.marginal_log_mean_coeff(t)
log_std = 0.5 * torch.log(1. - torch.exp(2. * log_mean_coeff))
return log_mean_coeff - log_std
def inverse_lambda(self, lamb):
"""
Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.
"""
if self.schedule == 'linear':
tmp = 2. * (self.beta_1 - self.beta_0) * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))
Delta = self.beta_0 ** 2 + tmp
return tmp / (torch.sqrt(Delta) + self.beta_0) / (self.beta_1 - self.beta_0)
elif self.schedule == 'discrete':
log_alpha = -0.5 * torch.logaddexp(torch.zeros((1,)).to(lamb.device), -2. * lamb)
t = interpolate_fn(log_alpha.reshape((-1, 1)), torch.flip(self.log_alpha_array.to(lamb.device), [1]),
torch.flip(self.t_array.to(lamb.device), [1]))
return t.reshape((-1,))
else:
log_alpha = -0.5 * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))
t_fn = lambda log_alpha_t: torch.arccos(torch.exp(log_alpha_t + self.cosine_log_alpha_0)) * 2. * (
1. + self.cosine_s) / math.pi - self.cosine_s
return t_fn(log_alpha)
def edm_sigma(self, t):
return self.marginal_std(t) / self.marginal_alpha(t)
def edm_inverse_sigma(self, edmsigma):
alpha = 1 / (edmsigma ** 2 + 1).sqrt()
sigma = alpha * edmsigma
lambda_t = torch.log(alpha / sigma)
return self.inverse_lambda(lambda_t)
def model_wrapper(
model,
noise_schedule,
model_type="noise",
model_kwargs={},
guidance_type="uncond",
condition=None,
unconditional_condition=None,
guidance_scale=1.,
classifier_fn=None,
classifier_kwargs={},
):
"""Thanks to DPM-Solver for their code base"""
"""Create a wrapper function for the noise prediction model.
SA-Solver needs to solve the continuous-time diffusion SDEs. For DPMs trained on discrete-time labels, we need to
firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.
We support four types of the diffusion model by setting `model_type`:
1. "noise": noise prediction model. (Trained by predicting noise).
2. "x_start": data prediction model. (Trained by predicting the data x_0 at time 0).
3. "v": velocity prediction model. (Trained by predicting the velocity).
The "v" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].
[1] Salimans, Tim, and Jonathan Ho. "Progressive distillation for fast sampling of diffusion models."
arXiv preprint arXiv:2202.00512 (2022).
[2] Ho, Jonathan, et al. "Imagen Video: High Definition Video Generation with Diffusion Models."
arXiv preprint arXiv:2210.02303 (2022).
4. "score": marginal score function. (Trained by denoising score matching).
Note that the score function and the noise prediction model follows a simple relationship:
```
noise(x_t, t) = -sigma_t * score(x_t, t)
```
We support three types of guided sampling by DPMs by setting `guidance_type`:
1. "uncond": unconditional sampling by DPMs.
The input `model` has the following format:
``
model(x, t_input, **model_kwargs) -> noise | x_start | v | score
``
2. "classifier": classifier guidance sampling [3] by DPMs and another classifier.
The input `model` has the following format:
``
model(x, t_input, **model_kwargs) -> noise | x_start | v | score
``
The input `classifier_fn` has the following format:
``
classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)
``
[3] P. Dhariwal and A. Q. Nichol, "Diffusion models beat GANs on image synthesis,"
in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.
3. "classifier-free": classifier-free guidance sampling by conditional DPMs.
The input `model` has the following format:
``
model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score
``
And if cond == `unconditional_condition`, the model output is the unconditional DPM output.
[4] Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance."
arXiv preprint arXiv:2207.12598 (2022).
The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)
or continuous-time labels (i.e. epsilon to T).
We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:
``
def model_fn(x, t_continuous) -> noise:
t_input = get_model_input_time(t_continuous)
return noise_pred(model, x, t_input, **model_kwargs)
``
where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for SA-Solver.
===============================================================
Args:
model: A diffusion model with the corresponding format described above.
noise_schedule: A noise schedule object, such as NoiseScheduleVP.
model_type: A `str`. The parameterization type of the diffusion model.
"noise" or "x_start" or "v" or "score".
model_kwargs: A `dict`. A dict for the other inputs of the model function.
guidance_type: A `str`. The type of the guidance for sampling.
"uncond" or "classifier" or "classifier-free".
condition: A pytorch tensor. The condition for the guided sampling.
Only used for "classifier" or "classifier-free" guidance type.
unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.
Only used for "classifier-free" guidance type.
guidance_scale: A `float`. The scale for the guided sampling.
classifier_fn: A classifier function. Only used for the classifier guidance.
classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.
Returns:
A noise prediction model that accepts the noised data and the continuous time as the inputs.
"""
def get_model_input_time(t_continuous):
"""
Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.
For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].
For continuous-time DPMs, we just use `t_continuous`.
"""
if noise_schedule.schedule == 'discrete':
return (t_continuous - 1. / noise_schedule.total_N) * 1000.
else:
return t_continuous
def noise_pred_fn(x, t_continuous, cond=None):
t_input = get_model_input_time(t_continuous)
if cond is None:
output = model(x, t_input, **model_kwargs)
else:
output = model(x, t_input, cond, **model_kwargs)
if model_type == "noise":
return output
elif model_type == "x_start":
alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
return (x - alpha_t[0] * output) / sigma_t[0]
elif model_type == "v":
alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
return alpha_t[0] * output + sigma_t[0] * x
elif model_type == "score":
sigma_t = noise_schedule.marginal_std(t_continuous)
return -sigma_t[0] * output
def cond_grad_fn(x, t_input):
"""
Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).
"""
with torch.enable_grad():
x_in = x.detach().requires_grad_(True)
log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)
return torch.autograd.grad(log_prob.sum(), x_in)[0]
def model_fn(x, t_continuous):
"""
The noise predicition model function that is used for DPM-Solver.
"""
if guidance_type == "uncond":
return noise_pred_fn(x, t_continuous)
elif guidance_type == "classifier":
assert classifier_fn is not None
t_input = get_model_input_time(t_continuous)
cond_grad = cond_grad_fn(x, t_input)
sigma_t = noise_schedule.marginal_std(t_continuous)
noise = noise_pred_fn(x, t_continuous)
return noise - guidance_scale * sigma_t * cond_grad
elif guidance_type == "classifier-free":
if guidance_scale == 1. or unconditional_condition is None:
return noise_pred_fn(x, t_continuous, cond=condition)
x_in = torch.cat([x] * 2)
t_in = torch.cat([t_continuous] * 2)
c_in = torch.cat([unconditional_condition, condition])
noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in).chunk(2)
return noise_uncond + guidance_scale * (noise - noise_uncond)
assert model_type in ["noise", "x_start", "v", "score"]
assert guidance_type in ["uncond", "classifier", "classifier-free"]
return model_fn
class SASolver:
def __init__(
self,
model_fn,
noise_schedule,
algorithm_type="data_prediction",
correcting_x0_fn=None,
correcting_xt_fn=None,
thresholding_max_val=1.,
dynamic_thresholding_ratio=0.995
):
"""
Construct a SA-Solver
The default value for algorithm_type is "data_prediction" and we recommend not to change it to
"noise_prediction". For details, please see Appendix A.2.4 in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
"""
self.model = lambda x, t: model_fn(x, t.expand((x.shape[0])))
self.noise_schedule = noise_schedule
assert algorithm_type in ["data_prediction", "noise_prediction"]
if correcting_x0_fn == "dynamic_thresholding":
self.correcting_x0_fn = self.dynamic_thresholding_fn
else:
self.correcting_x0_fn = correcting_x0_fn
self.correcting_xt_fn = correcting_xt_fn
self.dynamic_thresholding_ratio = dynamic_thresholding_ratio
self.thresholding_max_val = thresholding_max_val
self.predict_x0 = algorithm_type == "data_prediction"
self.sigma_min = float(self.noise_schedule.edm_sigma(torch.tensor([1e-3])))
self.sigma_max = float(self.noise_schedule.edm_sigma(torch.tensor([1])))
def dynamic_thresholding_fn(self, x0, t=None):
"""
The dynamic thresholding method.
"""
dims = x0.dim()
p = self.dynamic_thresholding_ratio
s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)
s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)
x0 = torch.clamp(x0, -s, s) / s
return x0
def noise_prediction_fn(self, x, t):
"""
Return the noise prediction model.
"""
return self.model(x, t)
def data_prediction_fn(self, x, t):
"""
Return the data prediction model (with corrector).
"""
noise = self.noise_prediction_fn(x, t)
alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)
x0 = (x - sigma_t * noise) / alpha_t
if self.correcting_x0_fn is not None:
x0 = self.correcting_x0_fn(x0)
return x0
def model_fn(self, x, t):
"""
Convert the model to the noise prediction model or the data prediction model.
"""
if self.predict_x0:
return self.data_prediction_fn(x, t)
else:
return self.noise_prediction_fn(x, t)
def get_time_steps(self, skip_type, t_T, t_0, N, order, device):
"""Compute the intermediate time steps for sampling.
"""
if skip_type == 'logSNR':
lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))
lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))
logSNR_steps = lambda_T + torch.linspace(torch.tensor(0.).cpu().item(),
(lambda_0 - lambda_T).cpu().item() ** (1. / order), N + 1).pow(
order).to(device)
return self.noise_schedule.inverse_lambda(logSNR_steps)
elif skip_type == 'time':
t = torch.linspace(t_T ** (1. / order), t_0 ** (1. / order), N + 1).pow(order).to(device)
return t
elif skip_type == 'karras':
sigma_min = max(0.002, self.sigma_min)
sigma_max = min(80, self.sigma_max)
sigma_steps = torch.linspace(sigma_max ** (1. / 7), sigma_min ** (1. / 7), N + 1).pow(7).to(device)
return self.noise_schedule.edm_inverse_sigma(sigma_steps)
else:
raise ValueError(
f"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time' or 'karras'"
)
def denoise_to_zero_fn(self, x, s):
"""
Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.
"""
return self.data_prediction_fn(x, s)
def get_coefficients_exponential_negative(self, order, interval_start, interval_end):
"""
Calculate the integral of exp(-x) * x^order dx from interval_start to interval_end
For calculating the coefficient of gradient terms after the lagrange interpolation,
see Eq.(15) and Eq.(18) in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
For noise_prediction formula.
"""
assert order in [0, 1, 2, 3], "order is only supported for 0, 1, 2 and 3"
if order == 0:
return torch.exp(-interval_end) * (torch.exp(interval_end - interval_start) - 1)
elif order == 1:
return torch.exp(-interval_end) * (
(interval_start + 1) * torch.exp(interval_end - interval_start) - (interval_end + 1))
elif order == 2:
return torch.exp(-interval_end) * (
(interval_start ** 2 + 2 * interval_start + 2) * torch.exp(interval_end - interval_start) - (
interval_end ** 2 + 2 * interval_end + 2))
elif order == 3:
return torch.exp(-interval_end) * (
(interval_start ** 3 + 3 * interval_start ** 2 + 6 * interval_start + 6) * torch.exp(
interval_end - interval_start) - (interval_end ** 3 + 3 * interval_end ** 2 + 6 * interval_end + 6))
def get_coefficients_exponential_positive(self, order, interval_start, interval_end, tau):
"""
Calculate the integral of exp(x(1+tau^2)) * x^order dx from interval_start to interval_end
For calculating the coefficient of gradient terms after the lagrange interpolation,
see Eq.(15) and Eq.(18) in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
For data_prediction formula.
"""
assert order in [0, 1, 2, 3], "order is only supported for 0, 1, 2 and 3"
# after change of variable(cov)
interval_end_cov = (1 + tau ** 2) * interval_end
interval_start_cov = (1 + tau ** 2) * interval_start
if order == 0:
return torch.exp(interval_end_cov) * (1 - torch.exp(-(interval_end_cov - interval_start_cov))) / (
(1 + tau ** 2))
elif order == 1:
return torch.exp(interval_end_cov) * ((interval_end_cov - 1) - (interval_start_cov - 1) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 2)
elif order == 2:
return torch.exp(interval_end_cov) * ((interval_end_cov ** 2 - 2 * interval_end_cov + 2) - (
interval_start_cov ** 2 - 2 * interval_start_cov + 2) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 3)
elif order == 3:
return torch.exp(interval_end_cov) * (
(interval_end_cov ** 3 - 3 * interval_end_cov ** 2 + 6 * interval_end_cov - 6) - (
interval_start_cov ** 3 - 3 * interval_start_cov ** 2 + 6 * interval_start_cov - 6) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 4)
def lagrange_polynomial_coefficient(self, order, lambda_list):
"""
Calculate the coefficient of lagrange polynomial
For lagrange interpolation
"""
assert order in [0, 1, 2, 3]
assert order == len(lambda_list) - 1
if order == 0:
return [[1]]
elif order == 1:
return [[1 / (lambda_list[0] - lambda_list[1]), -lambda_list[1] / (lambda_list[0] - lambda_list[1])],
[1 / (lambda_list[1] - lambda_list[0]), -lambda_list[0] / (lambda_list[1] - lambda_list[0])]]
elif order == 2:
denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2])
denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2])
denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1])
return [[1 / denominator1,
(-lambda_list[1] - lambda_list[2]) / denominator1,
lambda_list[1] * lambda_list[2] / denominator1],
[1 / denominator2,
(-lambda_list[0] - lambda_list[2]) / denominator2,
lambda_list[0] * lambda_list[2] / denominator2],
[1 / denominator3,
(-lambda_list[0] - lambda_list[1]) / denominator3,
lambda_list[0] * lambda_list[1] / denominator3]
]
elif order == 3:
denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2]) * (
lambda_list[0] - lambda_list[3])
denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2]) * (
lambda_list[1] - lambda_list[3])
denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1]) * (
lambda_list[2] - lambda_list[3])
denominator4 = (lambda_list[3] - lambda_list[0]) * (lambda_list[3] - lambda_list[1]) * (
lambda_list[3] - lambda_list[2])
return [[1 / denominator1,
(-lambda_list[1] - lambda_list[2] - lambda_list[3]) / denominator1,
(lambda_list[1] * lambda_list[2] + lambda_list[1] * lambda_list[3] + lambda_list[2] * lambda_list[
3]) / denominator1,
(-lambda_list[1] * lambda_list[2] * lambda_list[3]) / denominator1],
[1 / denominator2,
(-lambda_list[0] - lambda_list[2] - lambda_list[3]) / denominator2,
(lambda_list[0] * lambda_list[2] + lambda_list[0] * lambda_list[3] + lambda_list[2] * lambda_list[
3]) / denominator2,
(-lambda_list[0] * lambda_list[2] * lambda_list[3]) / denominator2],
[1 / denominator3,
(-lambda_list[0] - lambda_list[1] - lambda_list[3]) / denominator3,
(lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[3] + lambda_list[1] * lambda_list[
3]) / denominator3,
(-lambda_list[0] * lambda_list[1] * lambda_list[3]) / denominator3],
[1 / denominator4,
(-lambda_list[0] - lambda_list[1] - lambda_list[2]) / denominator4,
(lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[2] + lambda_list[1] * lambda_list[
2]) / denominator4,
(-lambda_list[0] * lambda_list[1] * lambda_list[2]) / denominator4]
]
def get_coefficients_fn(self, order, interval_start, interval_end, lambda_list, tau):
"""
Calculate the coefficient of gradients.
"""
assert order in [1, 2, 3, 4]
assert order == len(lambda_list), 'the length of lambda list must be equal to the order'
coefficients = []
lagrange_coefficient = self.lagrange_polynomial_coefficient(order - 1, lambda_list)
for i in range(order):
coefficient = sum(
lagrange_coefficient[i][j]
* self.get_coefficients_exponential_positive(
order - 1 - j, interval_start, interval_end, tau
)
if self.predict_x0
else lagrange_coefficient[i][j]
* self.get_coefficients_exponential_negative(
order - 1 - j, interval_start, interval_end
)
for j in range(order)
)
coefficients.append(coefficient)
assert len(coefficients) == order, 'the length of coefficients does not match the order'
return coefficients
def adams_bashforth_update(self, order, x, tau, model_prev_list, t_prev_list, noise, t):
"""
SA-Predictor, without the "rescaling" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
"""
assert order in [1, 2, 3, 4], "order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4"
# get noise schedule
ns = self.noise_schedule
alpha_t = ns.marginal_alpha(t)
sigma_t = ns.marginal_std(t)
lambda_t = ns.marginal_lambda(t)
alpha_prev = ns.marginal_alpha(t_prev_list[-1])
sigma_prev = ns.marginal_std(t_prev_list[-1])
gradient_part = torch.zeros_like(x)
h = lambda_t - ns.marginal_lambda(t_prev_list[-1])
lambda_list = [ns.marginal_lambda(t_prev_list[-(i + 1)]) for i in range(order)]
gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,
lambda_list, tau)
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_prev_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part
return x_t
def adams_moulton_update(self, order, x, tau, model_prev_list, t_prev_list, noise, t):
"""
SA-Corrector, without the "rescaling" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
"""
assert order in [1, 2, 3, 4], "order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4"
# get noise schedule
ns = self.noise_schedule
alpha_t = ns.marginal_alpha(t)
sigma_t = ns.marginal_std(t)
lambda_t = ns.marginal_lambda(t)
alpha_prev = ns.marginal_alpha(t_prev_list[-1])
sigma_prev = ns.marginal_std(t_prev_list[-1])
gradient_part = torch.zeros_like(x)
h = lambda_t - ns.marginal_lambda(t_prev_list[-1])
t_list = t_prev_list + [t]
lambda_list = [ns.marginal_lambda(t_list[-(i + 1)]) for i in range(order)]
gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,
lambda_list, tau)
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_prev_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part
return x_t
def adams_bashforth_update_few_steps(self, order, x, tau, model_prev_list, t_prev_list, noise, t):
"""
SA-Predictor, with the "rescaling" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
"""
assert order in [1, 2, 3, 4], "order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4"
# get noise schedule
ns = self.noise_schedule
alpha_t = ns.marginal_alpha(t)
sigma_t = ns.marginal_std(t)
lambda_t = ns.marginal_lambda(t)
alpha_prev = ns.marginal_alpha(t_prev_list[-1])
sigma_prev = ns.marginal_std(t_prev_list[-1])
gradient_part = torch.zeros_like(x)
h = lambda_t - ns.marginal_lambda(t_prev_list[-1])
lambda_list = [ns.marginal_lambda(t_prev_list[-(i + 1)]) for i in range(order)]
gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,
lambda_list, tau)
if self.predict_x0:
if order == 2: ## if order = 2 we do a modification that does not influence the convergence order similar to unipc. Note: This is used only for few steps sampling.
# The added term is O(h^3). Empirically we find it will slightly improve the image quality.
# ODE case
# gradient_coefficients[0] += 1.0 * torch.exp(lambda_t) * (h ** 2 / 2 - (h - 1 + torch.exp(-h))) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(t_prev_list[-2]))
# gradient_coefficients[1] -= 1.0 * torch.exp(lambda_t) * (h ** 2 / 2 - (h - 1 + torch.exp(-h))) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(t_prev_list[-2]))
gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2)) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(
t_prev_list[-2]))
gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2)) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(
t_prev_list[-2]))
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_prev_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part
return x_t
def adams_moulton_update_few_steps(self, order, x, tau, model_prev_list, t_prev_list, noise, t):
"""
SA-Corrector, without the "rescaling" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
"""
assert order in [1, 2, 3, 4], "order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4"
# get noise schedule
ns = self.noise_schedule
alpha_t = ns.marginal_alpha(t)
sigma_t = ns.marginal_std(t)
lambda_t = ns.marginal_lambda(t)
alpha_prev = ns.marginal_alpha(t_prev_list[-1])
sigma_prev = ns.marginal_std(t_prev_list[-1])
gradient_part = torch.zeros_like(x)
h = lambda_t - ns.marginal_lambda(t_prev_list[-1])
t_list = t_prev_list + [t]
lambda_list = [ns.marginal_lambda(t_list[-(i + 1)]) for i in range(order)]
gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,
lambda_list, tau)
if self.predict_x0:
if order == 2: ## if order = 2 we do a modification that does not influence the convergence order similar to UniPC. Note: This is used only for few steps sampling.
# The added term is O(h^3). Empirically we find it will slightly improve the image quality.
# ODE case
# gradient_coefficients[0] += 1.0 * torch.exp(lambda_t) * (h / 2 - (h - 1 + torch.exp(-h)) / h)
# gradient_coefficients[1] -= 1.0 * torch.exp(lambda_t) * (h / 2 - (h - 1 + torch.exp(-h)) / h)
gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2 * h))
gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2 * h))
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_prev_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part
return x_t
def sample_few_steps(self, x, tau, steps=5, t_start=None, t_end=None, skip_type='time', skip_order=1,
predictor_order=3, corrector_order=4, pc_mode='PEC', return_intermediate=False
):
"""
For the PC-mode, please refer to the wiki page
https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode
'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations
We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.
"""
skip_first_step = False
skip_final_step = True
lower_order_final = True
denoise_to_zero = False
assert pc_mode in ['PEC', 'PECE'], 'Predictor-corrector mode only supports PEC and PECE'
t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
t_T = self.noise_schedule.T if t_start is None else t_start
assert t_0 > 0 and t_T > 0, "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
device = x.device
intermediates = []
with torch.no_grad():
assert steps >= max(predictor_order, corrector_order - 1)
timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, order=skip_order,
device=device)
assert timesteps.shape[0] - 1 == steps
# Init the initial values.
step = 0
t = timesteps[step]
noise = torch.randn_like(x)
t_prev_list = [t]
# do not evaluate if skip_first_step
if skip_first_step:
if self.predict_x0:
alpha_t = self.noise_schedule.marginal_alpha(t)
sigma_t = self.noise_schedule.marginal_std(t)
model_prev_list = [(1 - sigma_t) / alpha_t * x]
else:
model_prev_list = [x]
else:
model_prev_list = [self.model_fn(x, t)]
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
# determine the first several values
for step in tqdm(range(1, max(predictor_order, corrector_order - 1))):
t = timesteps[step]
predictor_order_used = min(predictor_order, step)
corrector_order_used = min(corrector_order, step + 1)
noise = torch.randn_like(x)
# predictor step
x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list,
noise=noise, t=t)
# evaluation step
model_x = self.model_fn(x_p, t)
# update model_list
model_prev_list.append(model_x)
# corrector step
if corrector_order > 0:
x = self.adams_moulton_update_few_steps(order=corrector_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list,
noise=noise, t=t)
else:
x = x_p
# evaluation step if correction and mode = pece
if corrector_order > 0 and pc_mode == 'PECE':
model_x = self.model_fn(x, t)
del model_prev_list[-1]
model_prev_list.append(model_x)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
t_prev_list.append(t)
for step in tqdm(range(max(predictor_order, corrector_order - 1), steps + 1)):
if lower_order_final:
predictor_order_used = min(predictor_order, steps - step + 1)
corrector_order_used = min(corrector_order, steps - step + 2)
else:
predictor_order_used = predictor_order
corrector_order_used = corrector_order
t = timesteps[step]
noise = torch.randn_like(x)
# predictor step
if skip_final_step and step == steps and not denoise_to_zero:
x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=0,
model_prev_list=model_prev_list,
t_prev_list=t_prev_list, noise=noise, t=t)
else:
x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list,
t_prev_list=t_prev_list, noise=noise, t=t)
# evaluation step
# do not evaluate if skip_final_step and step = steps
if not skip_final_step or step < steps:
model_x = self.model_fn(x_p, t)
# update model_list
# do not update if skip_final_step and step = steps
if not skip_final_step or step < steps:
model_prev_list.append(model_x)
# corrector step
# do not correct if skip_final_step and step = steps
if corrector_order > 0 and (not skip_final_step or step < steps):
x = self.adams_moulton_update_few_steps(order=corrector_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list,
t_prev_list=t_prev_list, noise=noise, t=t)
else:
x = x_p
# evaluation step if mode = pece and step != steps
if corrector_order > 0 and (pc_mode == 'PECE' and step < steps):
model_x = self.model_fn(x, t)
del model_prev_list[-1]
model_prev_list.append(model_x)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
t_prev_list.append(t)
del model_prev_list[0]
if denoise_to_zero:
t = torch.ones((1,)).to(device) * t_0
x = self.denoise_to_zero_fn(x, t)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step + 1)
if return_intermediate:
intermediates.append(x)
return (x, intermediates) if return_intermediate else x
def sample_more_steps(self, x, tau, steps=20, t_start=None, t_end=None, skip_type='time', skip_order=1,
predictor_order=3, corrector_order=4, pc_mode='PEC', return_intermediate=False
):
"""
For the PC-mode, please refer to the wiki page
https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode
'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations
We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.
"""
skip_first_step = False
skip_final_step = False
lower_order_final = True
denoise_to_zero = True
assert pc_mode in ['PEC', 'PECE'], 'Predictor-corrector mode only supports PEC and PECE'
t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
t_T = self.noise_schedule.T if t_start is None else t_start
assert t_0 > 0 and t_T > 0, "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
device = x.device
intermediates = []
with torch.no_grad():
assert steps >= max(predictor_order, corrector_order - 1)
timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, order=skip_order,
device=device)
assert timesteps.shape[0] - 1 == steps
# Init the initial values.
step = 0
t = timesteps[step]
noise = torch.randn_like(x)
t_prev_list = [t]
# do not evaluate if skip_first_step
if skip_first_step:
if self.predict_x0:
alpha_t = self.noise_schedule.marginal_alpha(t)
sigma_t = self.noise_schedule.marginal_std(t)
model_prev_list = [(1 - sigma_t) / alpha_t * x]
else:
model_prev_list = [x]
else:
model_prev_list = [self.model_fn(x, t)]
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
# determine the first several values
for step in tqdm(range(1, max(predictor_order, corrector_order - 1))):
t = timesteps[step]
predictor_order_used = min(predictor_order, step)
corrector_order_used = min(corrector_order, step + 1)
noise = torch.randn_like(x)
# predictor step
x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list, noise=noise,
t=t)
# evaluation step
model_x = self.model_fn(x_p, t)
# update model_list
model_prev_list.append(model_x)
# corrector step
if corrector_order > 0:
x = self.adams_moulton_update(order=corrector_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list, noise=noise,
t=t)
else:
x = x_p
# evaluation step if mode = pece
if corrector_order > 0 and pc_mode == 'PECE':
model_x = self.model_fn(x, t)
del model_prev_list[-1]
model_prev_list.append(model_x)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
t_prev_list.append(t)
for step in tqdm(range(max(predictor_order, corrector_order - 1), steps + 1)):
if lower_order_final:
predictor_order_used = min(predictor_order, steps - step + 1)
corrector_order_used = min(corrector_order, steps - step + 2)
else:
predictor_order_used = predictor_order
corrector_order_used = corrector_order
t = timesteps[step]
noise = torch.randn_like(x)
# predictor step
if skip_final_step and step == steps and not denoise_to_zero:
x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=0,
model_prev_list=model_prev_list, t_prev_list=t_prev_list,
noise=noise, t=t)
else:
x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list,
noise=noise, t=t)
# evaluation step
# do not evaluate if skip_final_step and step = steps
if not skip_final_step or step < steps:
model_x = self.model_fn(x_p, t)
# update model_list
# do not update if skip_final_step and step = steps
if not skip_final_step or step < steps:
model_prev_list.append(model_x)
# corrector step
# do not correct if skip_final_step and step = steps
if corrector_order > 0:
if not skip_final_step or step < steps:
x = self.adams_moulton_update(order=corrector_order_used, x=x, tau=tau(t),
model_prev_list=model_prev_list, t_prev_list=t_prev_list,
noise=noise, t=t)
else:
x = x_p
else:
x = x_p
# evaluation step if mode = pece and step != steps
if corrector_order > 0 and (pc_mode == 'PECE' and step < steps):
model_x = self.model_fn(x, t)
del model_prev_list[-1]
model_prev_list.append(model_x)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step)
if return_intermediate:
intermediates.append(x)
t_prev_list.append(t)
del model_prev_list[0]
if denoise_to_zero:
t = torch.ones((1,)).to(device) * t_0
x = self.denoise_to_zero_fn(x, t)
if self.correcting_xt_fn is not None:
x = self.correcting_xt_fn(x, t, step + 1)
if return_intermediate:
intermediates.append(x)
if return_intermediate:
return x, intermediates
else:
return x
def sample(self, mode, x, tau, steps, t_start=None, t_end=None, skip_type='time', skip_order=1, predictor_order=3,
corrector_order=4, pc_mode='PEC', return_intermediate=False
):
"""
For the PC-mode, please refer to the wiki page
https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode
'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations
We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.
'few_steps' mode is recommended. The differences between 'few_steps' and 'more_steps' are as below:
1) 'few_steps' do not correct at final step and do not denoise to zero, while 'more_steps' do these two.
Thus the NFEs for 'few_steps' = steps, NFEs for 'more_steps' = steps + 2
For most of the experiments and tasks, we find these two operations do not have much help to sample quality.
2) 'few_steps' use a rescaling trick as in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf
We find it will slightly improve the sample quality especially in few steps.
"""
assert mode in ['few_steps', 'more_steps'], "mode must be either 'few_steps' or 'more_steps'"
if mode == 'few_steps':
return self.sample_few_steps(x=x, tau=tau, steps=steps, t_start=t_start, t_end=t_end, skip_type=skip_type,
skip_order=skip_order, predictor_order=predictor_order,
corrector_order=corrector_order, pc_mode=pc_mode,
return_intermediate=return_intermediate)
else:
return self.sample_more_steps(x=x, tau=tau, steps=steps, t_start=t_start, t_end=t_end, skip_type=skip_type,
skip_order=skip_order, predictor_order=predictor_order,
corrector_order=corrector_order, pc_mode=pc_mode,
return_intermediate=return_intermediate)
#############################################################
# other utility functions
#############################################################
def interpolate_fn(x, xp, yp):
"""
A piecewise linear function y = f(x), using xp and yp as keypoints.
We implement f(x) in a differentiable way (i.e. applicable for autograd).
The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)
Args:
x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).
xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.
yp: PyTorch tensor with shape [C, K].
Returns:
The function values f(x), with shape [N, C].
"""
N, K = x.shape[0], xp.shape[1]
all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)
sorted_all_x, x_indices = torch.sort(all_x, dim=2)
x_idx = torch.argmin(x_indices, dim=2)
cand_start_idx = x_idx - 1
start_idx = torch.where(
torch.eq(x_idx, 0),
torch.tensor(1, device=x.device),
torch.where(
torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,
),
)
end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)
start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)
end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)
start_idx2 = torch.where(
torch.eq(x_idx, 0),
torch.tensor(0, device=x.device),
torch.where(
torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,
),
)
y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)
start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)
end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)
cand = start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)
return cand
def expand_dims(v, dims):
"""
Expand the tensor `v` to the dim `dims`.
Args:
`v`: a PyTorch tensor with shape [N].
`dim`: a `int`.
Returns:
a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.
"""
return v[(...,) + (None,) * (dims - 1)]
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/t5.py
================================================
# -*- coding: utf-8 -*-
import os
import re
import html
import urllib.parse as ul
import ftfy
import torch
from bs4 import BeautifulSoup
from transformers import T5EncoderModel, AutoTokenizer
from huggingface_hub import hf_hub_download
class T5Embedder:
available_models = ['t5-v1_1-xxl']
bad_punct_regex = re.compile(r'['+'#®•©™&@·º½¾¿¡§~'+'\)'+'\('+'\]'+'\['+'\}'+'\{'+'\|'+'\\'+'\/'+'\*' + r']{1,}') # noqa
def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, local_cache=False, cache_dir=None, hf_token=None, use_text_preprocessing=True,
t5_model_kwargs=None, torch_dtype=None, use_offload_folder=None, model_max_length=120):
self.device = torch.device(device)
self.torch_dtype = torch_dtype or torch.bfloat16
if t5_model_kwargs is None:
t5_model_kwargs = {'low_cpu_mem_usage': True, 'torch_dtype': self.torch_dtype}
if use_offload_folder is not None:
t5_model_kwargs['offload_folder'] = use_offload_folder
t5_model_kwargs['device_map'] = {
'shared': self.device,
'encoder.embed_tokens': self.device,
'encoder.block.0': self.device,
'encoder.block.1': self.device,
'encoder.block.2': self.device,
'encoder.block.3': self.device,
'encoder.block.4': self.device,
'encoder.block.5': self.device,
'encoder.block.6': self.device,
'encoder.block.7': self.device,
'encoder.block.8': self.device,
'encoder.block.9': self.device,
'encoder.block.10': self.device,
'encoder.block.11': self.device,
'encoder.block.12': 'disk',
'encoder.block.13': 'disk',
'encoder.block.14': 'disk',
'encoder.block.15': 'disk',
'encoder.block.16': 'disk',
'encoder.block.17': 'disk',
'encoder.block.18': 'disk',
'encoder.block.19': 'disk',
'encoder.block.20': 'disk',
'encoder.block.21': 'disk',
'encoder.block.22': 'disk',
'encoder.block.23': 'disk',
'encoder.final_layer_norm': 'disk',
'encoder.dropout': 'disk',
}
else:
t5_model_kwargs['device_map'] = {'shared': self.device, 'encoder': self.device}
self.use_text_preprocessing = use_text_preprocessing
self.hf_token = hf_token
self.cache_dir = cache_dir or os.path.expanduser('~/.cache/IF_')
self.dir_or_name = dir_or_name
tokenizer_path, path = dir_or_name, dir_or_name
if local_cache:
cache_dir = os.path.join(self.cache_dir, dir_or_name)
tokenizer_path, path = cache_dir, cache_dir
elif dir_or_name in self.available_models:
cache_dir = os.path.join(self.cache_dir, dir_or_name)
for filename in [
'config.json', 'special_tokens_map.json', 'spiece.model', 'tokenizer_config.json',
'pytorch_model.bin.index.json', 'pytorch_model-00001-of-00002.bin', 'pytorch_model-00002-of-00002.bin'
]:
hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,
force_filename=filename, token=self.hf_token)
tokenizer_path, path = cache_dir, cache_dir
else:
cache_dir = os.path.join(self.cache_dir, 't5-v1_1-xxl')
for filename in [
'config.json', 'special_tokens_map.json', 'spiece.model', 'tokenizer_config.json',
]:
hf_hub_download(repo_id='DeepFloyd/t5-v1_1-xxl', filename=filename, cache_dir=cache_dir,
force_filename=filename, token=self.hf_token)
tokenizer_path = cache_dir
print(tokenizer_path)
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
self.model = T5EncoderModel.from_pretrained(path, **t5_model_kwargs).eval()
self.model_max_length = model_max_length
def get_text_embeddings(self, texts):
texts = [self.text_preprocessing(text) for text in texts]
text_tokens_and_mask = self.tokenizer(
texts,
max_length=self.model_max_length,
padding='max_length',
truncation=True,
return_attention_mask=True,
add_special_tokens=True,
return_tensors='pt'
)
text_tokens_and_mask['input_ids'] = text_tokens_and_mask['input_ids']
text_tokens_and_mask['attention_mask'] = text_tokens_and_mask['attention_mask']
with torch.no_grad():
text_encoder_embs = self.model(
input_ids=text_tokens_and_mask['input_ids'].to(self.device),
attention_mask=text_tokens_and_mask['attention_mask'].to(self.device),
)['last_hidden_state'].detach()
return text_encoder_embs, text_tokens_and_mask['attention_mask'].to(self.device)
def text_preprocessing(self, text):
if self.use_text_preprocessing:
# The exact text cleaning as was in the training stage:
text = self.clean_caption(text)
text = self.clean_caption(text)
return text
else:
return text.lower().strip()
@staticmethod
def basic_clean(text):
text = ftfy.fix_text(text)
text = html.unescape(html.unescape(text))
return text.strip()
def clean_caption(self, caption):
caption = str(caption)
caption = ul.unquote_plus(caption)
caption = caption.strip().lower()
caption = re.sub('', 'person', caption)
# urls:
caption = re.sub(
r'\b((?:https?:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))', # noqa
'', caption) # regex for urls
caption = re.sub(
r'\b((?:www:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))', # noqa
'', caption) # regex for urls
# html:
caption = BeautifulSoup(caption, features='html.parser').text
# @
caption = re.sub(r'@[\w\d]+\b', '', caption)
# 31C0—31EF CJK Strokes
# 31F0—31FF Katakana Phonetic Extensions
# 3200—32FF Enclosed CJK Letters and Months
# 3300—33FF CJK Compatibility
# 3400—4DBF CJK Unified Ideographs Extension A
# 4DC0—4DFF Yijing Hexagram Symbols
# 4E00—9FFF CJK Unified Ideographs
caption = re.sub(r'[\u31c0-\u31ef]+', '', caption)
caption = re.sub(r'[\u31f0-\u31ff]+', '', caption)
caption = re.sub(r'[\u3200-\u32ff]+', '', caption)
caption = re.sub(r'[\u3300-\u33ff]+', '', caption)
caption = re.sub(r'[\u3400-\u4dbf]+', '', caption)
caption = re.sub(r'[\u4dc0-\u4dff]+', '', caption)
caption = re.sub(r'[\u4e00-\u9fff]+', '', caption)
#######################################################
# все виды тире / all types of dash --> "-"
caption = re.sub(
r'[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]+', # noqa
'-', caption)
# кавычки к одному стандарту
caption = re.sub(r'[`´«»“”¨]', '"', caption)
caption = re.sub(r'[‘’]', "'", caption)
# "
caption = re.sub(r'"?', '', caption)
# &
caption = re.sub(r'&', '', caption)
# ip adresses:
caption = re.sub(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', ' ', caption)
# article ids:
caption = re.sub(r'\d:\d\d\s+$', '', caption)
# \n
caption = re.sub(r'\\n', ' ', caption)
# "#123"
caption = re.sub(r'#\d{1,3}\b', '', caption)
# "#12345.."
caption = re.sub(r'#\d{5,}\b', '', caption)
# "123456.."
caption = re.sub(r'\b\d{6,}\b', '', caption)
# filenames:
caption = re.sub(r'[\S]+\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)', '', caption)
#
caption = re.sub(r'[\"\']{2,}', r'"', caption) # """AUSVERKAUFT"""
caption = re.sub(r'[\.]{2,}', r' ', caption) # """AUSVERKAUFT"""
caption = re.sub(self.bad_punct_regex, r' ', caption) # ***AUSVERKAUFT***, #AUSVERKAUFT
caption = re.sub(r'\s+\.\s+', r' ', caption) # " . "
# this-is-my-cute-cat / this_is_my_cute_cat
regex2 = re.compile(r'(?:\-|\_)')
if len(re.findall(regex2, caption)) > 3:
caption = re.sub(regex2, ' ', caption)
caption = self.basic_clean(caption)
caption = re.sub(r'\b[a-zA-Z]{1,3}\d{3,15}\b', '', caption) # jc6640
caption = re.sub(r'\b[a-zA-Z]+\d+[a-zA-Z]+\b', '', caption) # jc6640vc
caption = re.sub(r'\b\d+[a-zA-Z]+\d+\b', '', caption) # 6640vc231
caption = re.sub(r'(worldwide\s+)?(free\s+)?shipping', '', caption)
caption = re.sub(r'(free\s)?download(\sfree)?', '', caption)
caption = re.sub(r'\bclick\b\s(?:for|on)\s\w+', '', caption)
caption = re.sub(r'\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\simage[s]?)?', '', caption)
caption = re.sub(r'\bpage\s+\d+\b', '', caption)
caption = re.sub(r'\b\d*[a-zA-Z]+\d+[a-zA-Z]+\d+[a-zA-Z\d]*\b', r' ', caption) # j2d1a2a...
caption = re.sub(r'\b\d+\.?\d*[xх×]\d+\.?\d*\b', '', caption)
caption = re.sub(r'\b\s+\:\s+', r': ', caption)
caption = re.sub(r'(\D[,\./])\b', r'\1 ', caption)
caption = re.sub(r'\s+', ' ', caption)
caption.strip()
caption = re.sub(r'^[\"\']([\w\W]+)[\"\']$', r'\1', caption)
caption = re.sub(r'^[\'\_,\-\:;]', r'', caption)
caption = re.sub(r'[\'\_,\-\:\-\+]$', r'', caption)
caption = re.sub(r'^\.\S+$', '', caption)
return caption.strip()
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/timestep_sampler.py
================================================
# Modified from OpenAI's diffusion repos
# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py
# ADM: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion
# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py
from abc import ABC, abstractmethod
import numpy as np
import torch as th
import torch.distributed as dist
def create_named_schedule_sampler(name, diffusion):
"""
Create a ScheduleSampler from a library of pre-defined samplers.
:param name: the name of the sampler.
:param diffusion: the diffusion object to sample for.
"""
if name == "uniform":
return UniformSampler(diffusion)
elif name == "loss-second-moment":
return LossSecondMomentResampler(diffusion)
else:
raise NotImplementedError(f"unknown schedule sampler: {name}")
class ScheduleSampler(ABC):
"""
A distribution over timesteps in the diffusion process, intended to reduce
variance of the objective.
By default, samplers perform unbiased importance sampling, in which the
objective's mean is unchanged.
However, subclasses may override sample() to change how the resampled
terms are reweighted, allowing for actual changes in the objective.
"""
@abstractmethod
def weights(self):
"""
Get a numpy array of weights, one per diffusion step.
The weights needn't be normalized, but must be positive.
"""
def sample(self, batch_size, device):
"""
Importance-sample timesteps for a batch.
:param batch_size: the number of timesteps.
:param device: the torch device to save to.
:return: a tuple (timesteps, weights):
- timesteps: a tensor of timestep indices.
- weights: a tensor of weights to scale the resulting losses.
"""
w = self.weights()
p = w / np.sum(w)
indices_np = np.random.choice(len(p), size=(batch_size,), p=p)
indices = th.from_numpy(indices_np).long().to(device)
weights_np = 1 / (len(p) * p[indices_np])
weights = th.from_numpy(weights_np).float().to(device)
return indices, weights
class UniformSampler(ScheduleSampler):
def __init__(self, diffusion):
self.diffusion = diffusion
self._weights = np.ones([diffusion.num_timesteps])
def weights(self):
return self._weights
class LossAwareSampler(ScheduleSampler):
def update_with_local_losses(self, local_ts, local_losses):
"""
Update the reweighting using losses from a model.
Call this method from each rank with a batch of timesteps and the
corresponding losses for each of those timesteps.
This method will perform synchronization to make sure all of the ranks
maintain the exact same reweighting.
:param local_ts: an integer Tensor of timesteps.
:param local_losses: a 1D Tensor of losses.
"""
batch_sizes = [
th.tensor([0], dtype=th.int32, device=local_ts.device)
for _ in range(dist.get_world_size())
]
dist.all_gather(
batch_sizes,
th.tensor([len(local_ts)], dtype=th.int32, device=local_ts.device),
)
# Pad all_gather batches to be the maximum batch size.
batch_sizes = [x.item() for x in batch_sizes]
max_bs = max(batch_sizes)
timestep_batches = [th.zeros(max_bs, device=local_ts.device) for _ in batch_sizes]
loss_batches = [th.zeros(max_bs, device=local_losses.device) for _ in batch_sizes]
dist.all_gather(timestep_batches, local_ts)
dist.all_gather(loss_batches, local_losses)
timesteps = [
x.item() for y, bs in zip(timestep_batches, batch_sizes) for x in y[:bs]
]
losses = [x.item() for y, bs in zip(loss_batches, batch_sizes) for x in y[:bs]]
self.update_with_all_losses(timesteps, losses)
@abstractmethod
def update_with_all_losses(self, ts, losses):
"""
Update the reweighting using losses from a model.
Sub-classes should override this method to update the reweighting
using losses from the model.
This method directly updates the reweighting without synchronizing
between workers. It is called by update_with_local_losses from all
ranks with identical arguments. Thus, it should have deterministic
behavior to maintain state across workers.
:param ts: a list of int timesteps.
:param losses: a list of float losses, one per timestep.
"""
class LossSecondMomentResampler(LossAwareSampler):
def __init__(self, diffusion, history_per_term=10, uniform_prob=0.001):
self.diffusion = diffusion
self.history_per_term = history_per_term
self.uniform_prob = uniform_prob
self._loss_history = np.zeros(
[diffusion.num_timesteps, history_per_term], dtype=np.float64
)
self._loss_counts = np.zeros([diffusion.num_timesteps], dtype=np.int)
def weights(self):
if not self._warmed_up():
return np.ones([self.diffusion.num_timesteps], dtype=np.float64)
weights = np.sqrt(np.mean(self._loss_history ** 2, axis=-1))
weights /= np.sum(weights)
weights *= 1 - self.uniform_prob
weights += self.uniform_prob / len(weights)
return weights
def update_with_all_losses(self, ts, losses):
for t, loss in zip(ts, losses):
if self._loss_counts[t] == self.history_per_term:
# Shift out the oldest loss term.
self._loss_history[t, :-1] = self._loss_history[t, 1:]
self._loss_history[t, -1] = loss
else:
self._loss_history[t, self._loss_counts[t]] = loss
self._loss_counts[t] += 1
def _warmed_up(self):
return (self._loss_counts == self.history_per_term).all()
================================================
FILE: PixArt-alpha-ToCa/diffusion/model/utils.py
================================================
import os
import sys
import torch.nn as nn
from torch.utils.checkpoint import checkpoint, checkpoint_sequential
import torch.nn.functional as F
import torch
import torch.distributed as dist
import re
import math
from collections.abc import Iterable
from itertools import repeat
from torchvision import transforms as T
import random
from PIL import Image
def _ntuple(n):
def parse(x):
if isinstance(x, Iterable) and not isinstance(x, str):
return x
return tuple(repeat(x, n))
return parse
to_1tuple = _ntuple(1)
to_2tuple = _ntuple(2)
def set_grad_checkpoint(model, use_fp32_attention=False, gc_step=1):
assert isinstance(model, nn.Module)
def set_attr(module):
module.grad_checkpointing = True
module.fp32_attention = use_fp32_attention
module.grad_checkpointing_step = gc_step
model.apply(set_attr)
def auto_grad_checkpoint(module, *args, **kwargs):
if getattr(module, 'grad_checkpointing', False):
if not isinstance(module, Iterable):
return checkpoint(module, *args, **kwargs)
gc_step = module[0].grad_checkpointing_step
return checkpoint_sequential(module, gc_step, *args, **kwargs)
return module(*args, **kwargs)
def checkpoint_sequential(functions, step, input, *args, **kwargs):
# Hack for keyword-only parameter in a python 2.7-compliant way
preserve = kwargs.pop('preserve_rng_state', True)
if kwargs:
raise ValueError("Unexpected keyword arguments: " + ",".join(kwargs))
def run_function(start, end, functions):
def forward(input):
for j in range(start, end + 1):
input = functions[j](input, *args)
return input
return forward
if isinstance(functions, torch.nn.Sequential):
functions = list(functions.children())
# the last chunk has to be non-volatile
end = -1
segment = len(functions) // step
for start in range(0, step * (segment - 1), step):
end = start + step - 1
input = checkpoint(run_function(start, end, functions), input, preserve_rng_state=preserve)
return run_function(end + 1, len(functions) - 1, functions)(input)
def window_partition(x, window_size):
"""
Partition into non-overlapping windows with padding if needed.
Args:
x (tensor): input tokens with [B, H, W, C].
window_size (int): window size.
Returns:
windows: windows after partition with [B * num_windows, window_size, window_size, C].
(Hp, Wp): padded height and width before partition
"""
B, H, W, C = x.shape
pad_h = (window_size - H % window_size) % window_size
pad_w = (window_size - W % window_size) % window_size
if pad_h > 0 or pad_w > 0:
x = F.pad(x, (0, 0, 0, pad_w, 0, pad_h))
Hp, Wp = H + pad_h, W + pad_w
x = x.view(B, Hp // window_size, window_size, Wp // window_size, window_size, C)
windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)
return windows, (Hp, Wp)
def window_unpartition(windows, window_size, pad_hw, hw):
"""
Window unpartition into original sequences and removing padding.
Args:
x (tensor): input tokens with [B * num_windows, window_size, window_size, C].
window_size (int): window size.
pad_hw (Tuple): padded height and width (Hp, Wp).
hw (Tuple): original height and width (H, W) before padding.
Returns:
x: unpartitioned sequences with [B, H, W, C].
"""
Hp, Wp = pad_hw
H, W = hw
B = windows.shape[0] // (Hp * Wp // window_size // window_size)
x = windows.view(B, Hp // window_size, Wp // window_size, window_size, window_size, -1)
x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, Hp, Wp, -1)
if Hp > H or Wp > W:
x = x[:, :H, :W, :].contiguous()
return x
def get_rel_pos(q_size, k_size, rel_pos):
"""
Get relative positional embeddings according to the relative positions of
query and key sizes.
Args:
q_size (int): size of query q.
k_size (int): size of key k.
rel_pos (Tensor): relative position embeddings (L, C).
Returns:
Extracted positional embeddings according to relative positions.
"""
max_rel_dist = int(2 * max(q_size, k_size) - 1)
# Interpolate rel pos if needed.
if rel_pos.shape[0] != max_rel_dist:
# Interpolate rel pos.
rel_pos_resized = F.interpolate(
rel_pos.reshape(1, rel_pos.shape[0], -1).permute(0, 2, 1),
size=max_rel_dist,
mode="linear",
)
rel_pos_resized = rel_pos_resized.reshape(-1, max_rel_dist).permute(1, 0)
else:
rel_pos_resized = rel_pos
# Scale the coords with short length if shapes for q and k are different.
q_coords = torch.arange(q_size)[:, None] * max(k_size / q_size, 1.0)
k_coords = torch.arange(k_size)[None, :] * max(q_size / k_size, 1.0)
relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)
return rel_pos_resized[relative_coords.long()]
def add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size):
"""
Calculate decomposed Relative Positional Embeddings from :paper:`mvitv2`.
https://github.com/facebookresearch/mvit/blob/19786631e330df9f3622e5402b4a419a263a2c80/mvit/models/attention.py # noqa B950
Args:
attn (Tensor): attention map.
q (Tensor): query q in the attention layer with shape (B, q_h * q_w, C).
rel_pos_h (Tensor): relative position embeddings (Lh, C) for height axis.
rel_pos_w (Tensor): relative position embeddings (Lw, C) for width axis.
q_size (Tuple): spatial sequence size of query q with (q_h, q_w).
k_size (Tuple): spatial sequence size of key k with (k_h, k_w).
Returns:
attn (Tensor): attention map with added relative positional embeddings.
"""
q_h, q_w = q_size
k_h, k_w = k_size
Rh = get_rel_pos(q_h, k_h, rel_pos_h)
Rw = get_rel_pos(q_w, k_w, rel_pos_w)
B, _, dim = q.shape
r_q = q.reshape(B, q_h, q_w, dim)
rel_h = torch.einsum("bhwc,hkc->bhwk", r_q, Rh)
rel_w = torch.einsum("bhwc,wkc->bhwk", r_q, Rw)
attn = (
attn.view(B, q_h, q_w, k_h, k_w) + rel_h[:, :, :, :, None] + rel_w[:, :, :, None, :]
).view(B, q_h * q_w, k_h * k_w)
return attn
def mean_flat(tensor):
return tensor.mean(dim=list(range(1, tensor.ndim)))
#################################################################################
# Token Masking and Unmasking #
#################################################################################
def get_mask(batch, length, mask_ratio, device, mask_type=None, data_info=None, extra_len=0):
"""
Get the binary mask for the input sequence.
Args:
- batch: batch size
- length: sequence length
- mask_ratio: ratio of tokens to mask
- data_info: dictionary with info for reconstruction
return:
mask_dict with following keys:
- mask: binary mask, 0 is keep, 1 is remove
- ids_keep: indices of tokens to keep
- ids_restore: indices to restore the original order
"""
assert mask_type in ['random', 'fft', 'laplacian', 'group']
mask = torch.ones([batch, length], device=device)
len_keep = int(length * (1 - mask_ratio)) - extra_len
if mask_type in ['random', 'group']:
noise = torch.rand(batch, length, device=device) # noise in [0, 1]
ids_shuffle = torch.argsort(noise, dim=1) # ascend: small is keep, large is remove
ids_restore = torch.argsort(ids_shuffle, dim=1)
# keep the first subset
ids_keep = ids_shuffle[:, :len_keep]
ids_removed = ids_shuffle[:, len_keep:]
elif mask_type in ['fft', 'laplacian']:
if 'strength' in data_info:
strength = data_info['strength']
else:
N = data_info['N'][0]
img = data_info['ori_img']
# 获取原图的尺寸信息
_, C, H, W = img.shape
if mask_type == 'fft':
# 对图片进行reshape,将其变为patch (3, H/N, N, W/N, N)
reshaped_image = img.reshape((batch, -1, H // N, N, W // N, N))
fft_image = torch.fft.fftn(reshaped_image, dim=(3, 5))
# 取绝对值并求和获取频率强度
strength = torch.sum(torch.abs(fft_image), dim=(1, 3, 5)).reshape((batch, -1,))
elif type == 'laplacian':
laplacian_kernel = torch.tensor([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], dtype=torch.float32).reshape(1, 1, 3, 3)
laplacian_kernel = laplacian_kernel.repeat(C, 1, 1, 1)
# 对图片进行reshape,将其变为patch (3, H/N, N, W/N, N)
reshaped_image = img.reshape(-1, C, H // N, N, W // N, N).permute(0, 2, 4, 1, 3, 5).reshape(-1, C, N, N)
laplacian_response = F.conv2d(reshaped_image, laplacian_kernel, padding=1, groups=C)
strength = laplacian_response.sum(dim=[1, 2, 3]).reshape((batch, -1,))
# 对频率强度进行归一化,然后使用torch.multinomial进行采样
probabilities = strength / (strength.max(dim=1)[0][:, None]+1e-5)
ids_shuffle = torch.multinomial(probabilities.clip(1e-5, 1), length, replacement=False)
ids_keep = ids_shuffle[:, :len_keep]
ids_restore = torch.argsort(ids_shuffle, dim=1)
ids_removed = ids_shuffle[:, len_keep:]
mask[:, :len_keep] = 0
mask = torch.gather(mask, dim=1, index=ids_restore)
return {'mask': mask,
'ids_keep': ids_keep,
'ids_restore': ids_restore,
'ids_removed': ids_removed}
def mask_out_token(x, ids_keep, ids_removed=None):
"""
Mask out the tokens specified by ids_keep.
Args:
- x: input sequence, [N, L, D]
- ids_keep: indices of tokens to keep
return:
- x_masked: masked sequence
"""
N, L, D = x.shape # batch, length, dim
x_remain = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))
if ids_removed is not None:
x_masked = torch.gather(x, dim=1, index=ids_removed.unsqueeze(-1).repeat(1, 1, D))
return x_remain, x_masked
else:
return x_remain
def mask_tokens(x, mask_ratio):
"""
Perform per-sample random masking by per-sample shuffling.
Per-sample shuffling is done by argsort random noise.
x: [N, L, D], sequence
"""
N, L, D = x.shape # batch, length, dim
len_keep = int(L * (1 - mask_ratio))
noise = torch.rand(N, L, device=x.device) # noise in [0, 1]
# sort noise for each sample
ids_shuffle = torch.argsort(noise, dim=1) # ascend: small is keep, large is remove
ids_restore = torch.argsort(ids_shuffle, dim=1)
# keep the first subset
ids_keep = ids_shuffle[:, :len_keep]
x_masked = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))
# generate the binary mask: 0 is keep, 1 is remove
mask = torch.ones([N, L], device=x.device)
mask[:, :len_keep] = 0
mask = torch.gather(mask, dim=1, index=ids_restore)
return x_masked, mask, ids_restore
def unmask_tokens(x, ids_restore, mask_token):
# x: [N, T, D] if extras == 0 (i.e., no cls token) else x: [N, T+1, D]
mask_tokens = mask_token.repeat(x.shape[0], ids_restore.shape[1] - x.shape[1], 1)
x = torch.cat([x, mask_tokens], dim=1)
x = torch.gather(x, dim=1, index=ids_restore.unsqueeze(-1).repeat(1, 1, x.shape[2])) # unshuffle
return x
# Parse 'None' to None and others to float value
def parse_float_none(s):
assert isinstance(s, str)
return None if s == 'None' else float(s)
#----------------------------------------------------------------------------
# Parse a comma separated list of numbers or ranges and return a list of ints.
# Example: '1,2,5-10' returns [1, 2, 5, 6, 7, 8, 9, 10]
def parse_int_list(s):
if isinstance(s, list): return s
ranges = []
range_re = re.compile(r'^(\d+)-(\d+)$')
for p in s.split(','):
if m := range_re.match(p):
ranges.extend(range(int(m.group(1)), int(m.group(2))+1))
else:
ranges.append(int(p))
return ranges
def init_processes(fn, args):
""" Initialize the distributed environment. """
os.environ['MASTER_ADDR'] = args.master_address
os.environ['MASTER_PORT'] = str(random.randint(2000, 6000))
print(f'MASTER_ADDR = {os.environ["MASTER_ADDR"]}')
print(f'MASTER_PORT = {os.environ["MASTER_PORT"]}')
torch.cuda.set_device(args.local_rank)
dist.init_process_group(backend='nccl', init_method='env://', rank=args.global_rank, world_size=args.global_size)
fn(args)
if args.global_size > 1:
cleanup()
def mprint(*args, **kwargs):
"""
Print only from rank 0.
"""
if dist.get_rank() == 0:
print(*args, **kwargs)
def cleanup():
"""
End DDP training.
"""
dist.barrier()
mprint("Done!")
dist.barrier()
dist.destroy_process_group()
#----------------------------------------------------------------------------
# logging info.
class Logger(object):
"""
Redirect stderr to stdout, optionally print stdout to a file,
and optionally force flushing on both stdout and the file.
"""
def __init__(self, file_name=None, file_mode="w", should_flush=True):
self.file = None
if file_name is not None:
self.file = open(file_name, file_mode)
self.should_flush = should_flush
self.stdout = sys.stdout
self.stderr = sys.stderr
sys.stdout = self
sys.stderr = self
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.close()
def write(self, text):
"""Write text to stdout (and a file) and optionally flush."""
if len(text) == 0: # workaround for a bug in VSCode debugger: sys.stdout.write(''); sys.stdout.flush() => crash
return
if self.file is not None:
self.file.write(text)
self.stdout.write(text)
if self.should_flush:
self.flush()
def flush(self):
"""Flush written text to both stdout and a file, if open."""
if self.file is not None:
self.file.flush()
self.stdout.flush()
def close(self):
"""Flush, close possible files, and remove stdout/stderr mirroring."""
self.flush()
# if using multiple loggers, prevent closing in wrong order
if sys.stdout is self:
sys.stdout = self.stdout
if sys.stderr is self:
sys.stderr = self.stderr
if self.file is not None:
self.file.close()
class StackedRandomGenerator:
def __init__(self, device, seeds):
super().__init__()
self.generators = [torch.Generator(device).manual_seed(int(seed) % (1 << 32)) for seed in seeds]
def randn(self, size, **kwargs):
assert size[0] == len(self.generators)
return torch.stack([torch.randn(size[1:], generator=gen, **kwargs) for gen in self.generators])
def randn_like(self, input):
return self.randn(input.shape, dtype=input.dtype, layout=input.layout, device=input.device)
def randint(self, *args, size, **kwargs):
assert size[0] == len(self.generators)
return torch.stack([torch.randint(*args, size=size[1:], generator=gen, **kwargs) for gen in self.generators])
def prepare_prompt_ar(prompt, ratios, device='cpu', show=True):
# get aspect_ratio or ar
aspect_ratios = re.findall(r"--aspect_ratio\s+(\d+:\d+)", prompt)
ars = re.findall(r"--ar\s+(\d+:\d+)", prompt)
custom_hw = re.findall(r"--hw\s+(\d+:\d+)", prompt)
if show:
print("aspect_ratios:", aspect_ratios, "ars:", ars, "hws:", custom_hw)
prompt_clean = prompt.split("--aspect_ratio")[0].split("--ar")[0].split("--hw")[0]
if len(aspect_ratios) + len(ars) + len(custom_hw) == 0 and show:
print( "Wrong prompt format. Set to default ar: 1. change your prompt into format '--ar h:w or --hw h:w' for correct generating")
if len(aspect_ratios) != 0:
ar = float(aspect_ratios[0].split(':')[0]) / float(aspect_ratios[0].split(':')[1])
elif len(ars) != 0:
ar = float(ars[0].split(':')[0]) / float(ars[0].split(':')[1])
else:
ar = 1.
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
if len(custom_hw) != 0:
custom_hw = [float(custom_hw[0].split(':')[0]), float(custom_hw[0].split(':')[1])]
else:
custom_hw = ratios[closest_ratio]
default_hw = ratios[closest_ratio]
prompt_show = f'prompt: {prompt_clean.strip()}\nSize: --ar {closest_ratio}, --bin hw {ratios[closest_ratio]}, --custom hw {custom_hw}'
return prompt_clean, prompt_show, torch.tensor(default_hw, device=device)[None], torch.tensor([float(closest_ratio)], device=device)[None], torch.tensor(custom_hw, device=device)[None]
def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int):
orig_hw = torch.tensor([samples.shape[2], samples.shape[3]], dtype=torch.int)
custom_hw = torch.tensor([int(new_height), int(new_width)], dtype=torch.int)
if (orig_hw != custom_hw).all():
ratio = max(custom_hw[0] / orig_hw[0], custom_hw[1] / orig_hw[1])
resized_width = int(orig_hw[1] * ratio)
resized_height = int(orig_hw[0] * ratio)
transform = T.Compose([
T.Resize((resized_height, resized_width)),
T.CenterCrop(custom_hw.tolist())
])
return transform(samples)
else:
return samples
def resize_and_crop_img(img: Image, new_width, new_height):
orig_width, orig_height = img.size
ratio = max(new_width/orig_width, new_height/orig_height)
resized_width = int(orig_width * ratio)
resized_height = int(orig_height * ratio)
img = img.resize((resized_width, resized_height), Image.LANCZOS)
left = (resized_width - new_width)/2
top = (resized_height - new_height)/2
right = (resized_width + new_width)/2
bottom = (resized_height + new_height)/2
img = img.crop((left, top, right, bottom))
return img
def mask_feature(emb, mask):
if emb.shape[0] == 1:
keep_index = mask.sum().item()
return emb[:, :, :keep_index, :], keep_index
else:
masked_feature = emb * mask[:, None, :, None]
return masked_feature, emb.shape[2]
================================================
FILE: PixArt-alpha-ToCa/diffusion/sa_sampler.py
================================================
"""SAMPLING ONLY."""
import torch
import numpy as np
from diffusion.model.sa_solver import NoiseScheduleVP, model_wrapper, SASolver
from .model import gaussian_diffusion as gd
class SASolverSampler(object):
def __init__(self, model,
noise_schedule="linear",
diffusion_steps=1000,
device='cpu',
):
super().__init__()
self.model = model
self.device = device
to_torch = lambda x: x.clone().detach().to(torch.float32).to(device)
betas = torch.tensor(gd.get_named_beta_schedule(noise_schedule, diffusion_steps))
alphas = 1.0 - betas
self.register_buffer('alphas_cumprod', to_torch(np.cumprod(alphas, axis=0)))
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor and attr.device != torch.device("cuda"):
attr = attr.to(torch.device("cuda"))
setattr(self, name, attr)
@torch.no_grad()
def sample(self, S, batch_size, shape, conditioning=None, callback=None, normals_sequence=None, img_callback=None, quantize_x0=False, eta=0., mask=None, x0=None, temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None, verbose=True, x_T=None, log_every_t=100, unconditional_guidance_scale=1., unconditional_conditioning=None, model_kwargs=None, **kwargs):
if model_kwargs is None:
model_kwargs = {}
if conditioning is not None:
if isinstance(conditioning, dict):
cbs = conditioning[list(conditioning.keys())[0]].shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
elif conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
# sampling
C, H, W = shape
size = (batch_size, C, H, W)
device = self.device
img = torch.randn(size, device=device) if x_T is None else x_T
ns = NoiseScheduleVP('discrete', alphas_cumprod=self.alphas_cumprod)
model_fn = model_wrapper(
self.model,
ns,
model_type="noise",
guidance_type="classifier-free",
condition=conditioning,
unconditional_condition=unconditional_conditioning,
guidance_scale=unconditional_guidance_scale,
model_kwargs=model_kwargs,
)
sasolver = SASolver(model_fn, ns, algorithm_type="data_prediction")
tau_t = lambda t: eta if 0.2 <= t <= 0.8 else 0
x = sasolver.sample(mode='few_steps', x=img, tau=tau_t, steps=S, skip_type='time', skip_order=1, predictor_order=2, corrector_order=2, pc_mode='PEC', return_intermediate=False)
return x.to(device), None
================================================
FILE: PixArt-alpha-ToCa/diffusion/sa_solver_diffusers.py
================================================
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# DISCLAIMER: check https://arxiv.org/abs/2309.05019
# The codebase is modified based on https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
import math
from typing import List, Optional, Tuple, Union, Callable
import numpy as np
import torch
from diffusers.configuration_utils import ConfigMixin, register_to_config
from diffusers.utils.torch_utils import randn_tensor
from diffusers.schedulers.scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, SchedulerOutput
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
"""
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1].
Contains a function alpha_bar that takes an argument t and transforms it to the cumulative product of (1-beta) up
to that part of the diffusion process.
Args:
num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
"""
if alpha_transform_type == "cosine":
def alpha_bar_fn(t):
return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = []
for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32)
class SASolverScheduler(SchedulerMixin, ConfigMixin):
"""
`SASolverScheduler` is a fast dedicated high-order solver for diffusion SDEs.
This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
methods the library implements for all schedulers such as loading and saving.
Args:
num_train_timesteps (`int`, defaults to 1000):
The number of diffusion steps to train the model.
beta_start (`float`, defaults to 0.0001):
The starting `beta` value of inference.
beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, *optional*):
Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
predictor_order (`int`, defaults to 2):
The predictor order which can be `1` or `2` or `3` or '4'. It is recommended to use `predictor_order=2` for guided
sampling, and `predictor_order=3` for unconditional sampling.
corrector_order (`int`, defaults to 2):
The corrector order which can be `1` or `2` or `3` or '4'. It is recommended to use `corrector_order=2` for guided
sampling, and `corrector_order=3` for unconditional sampling.
predictor_corrector_mode (`str`, defaults to `PEC`):
The predictor-corrector mode can be `PEC` or 'PECE'. It is recommended to use `PEC` mode for fast
sampling, and `PECE` for high-quality sampling (PECE needs around twice model evaluations as PEC).
prediction_type (`str`, defaults to `epsilon`, *optional*):
Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
`sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
Video](https://imagen.research.google/video/paper.pdf) paper).
thresholding (`bool`, defaults to `False`):
Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
as Stable Diffusion.
dynamic_thresholding_ratio (`float`, defaults to 0.995):
The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
sample_max_value (`float`, defaults to 1.0):
The threshold value for dynamic thresholding. Valid only when `thresholding=True` and
`algorithm_type="dpmsolver++"`.
algorithm_type (`str`, defaults to `data_prediction`):
Algorithm type for the solver; can be `data_prediction` or `noise_prediction`. It is recommended to use `data_prediction`
with `solver_order=2` for guided sampling like in Stable Diffusion.
lower_order_final (`bool`, defaults to `True`):
Whether to use lower-order solvers in the final steps. Default = True.
use_karras_sigmas (`bool`, *optional*, defaults to `False`):
Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,
the sigmas are determined according to a sequence of noise levels {σi}.
lambda_min_clipped (`float`, defaults to `-inf`):
Clipping threshold for the minimum value of `lambda(t)` for numerical stability. This is critical for the
cosine (`squaredcos_cap_v2`) noise schedule.
variance_type (`str`, *optional*):
Set to "learned" or "learned_range" for diffusion models that predict variance. If set, the model's output
contains the predicted Gaussian variance.
timestep_spacing (`str`, defaults to `"linspace"`):
The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
steps_offset (`int`, defaults to 0):
An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
Diffusion.
"""
_compatibles = [e.name for e in KarrasDiffusionSchedulers]
order = 1
@register_to_config
def __init__(
self,
num_train_timesteps: int = 1000,
beta_start: float = 0.0001,
beta_end: float = 0.02,
beta_schedule: str = "linear",
trained_betas: Optional[Union[np.ndarray, List[float]]] = None,
predictor_order: int = 2,
corrector_order: int = 2,
predictor_corrector_mode: str = 'PEC',
prediction_type: str = "epsilon",
tau_func: Callable = lambda t: 1 if t >= 200 and t <= 800 else 0,
thresholding: bool = False,
dynamic_thresholding_ratio: float = 0.995,
sample_max_value: float = 1.0,
algorithm_type: str = "data_prediction",
lower_order_final: bool = True,
use_karras_sigmas: Optional[bool] = False,
lambda_min_clipped: float = -float("inf"),
variance_type: Optional[str] = None,
timestep_spacing: str = "linspace",
steps_offset: int = 0,
):
if trained_betas is not None:
self.betas = torch.tensor(trained_betas, dtype=torch.float32)
elif beta_schedule == "linear":
self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)
elif beta_schedule == "scaled_linear":
# this schedule is very specific to the latent diffusion model.
self.betas = (
torch.linspace(beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps, dtype=torch.float32) ** 2
)
elif beta_schedule == "squaredcos_cap_v2":
# Glide cosine schedule
self.betas = betas_for_alpha_bar(num_train_timesteps)
else:
raise NotImplementedError(f"{beta_schedule} does is not implemented for {self.__class__}")
self.alphas = 1.0 - self.betas
self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
# Currently we only support VP-type noise schedule
self.alpha_t = torch.sqrt(self.alphas_cumprod)
self.sigma_t = torch.sqrt(1 - self.alphas_cumprod)
self.lambda_t = torch.log(self.alpha_t) - torch.log(self.sigma_t)
# standard deviation of the initial noise distribution
self.init_noise_sigma = 1.0
if algorithm_type not in ["data_prediction", "noise_prediction"]:
raise NotImplementedError(f"{algorithm_type} does is not implemented for {self.__class__}")
# setable values
self.num_inference_steps = None
timesteps = np.linspace(0, num_train_timesteps - 1, num_train_timesteps, dtype=np.float32)[::-1].copy()
self.timesteps = torch.from_numpy(timesteps)
self.timestep_list = [None] * max(predictor_order, corrector_order - 1)
self.model_outputs = [None] * max(predictor_order, corrector_order - 1)
self.tau_func = tau_func
self.predict_x0 = algorithm_type == "data_prediction"
self.lower_order_nums = 0
self.last_sample = None
def set_timesteps(self, num_inference_steps: int = None, device: Union[str, torch.device] = None):
"""
Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args:
num_inference_steps (`int`):
The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, *optional*):
The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
"""
# Clipping the minimum of all lambda(t) for numerical stability.
# This is critical for cosine (squaredcos_cap_v2) noise schedule.
clipped_idx = torch.searchsorted(torch.flip(self.lambda_t, [0]), self.config.lambda_min_clipped)
last_timestep = ((self.config.num_train_timesteps - clipped_idx).numpy()).item()
# "linspace", "leading", "trailing" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891
if self.config.timestep_spacing == "linspace":
timesteps = (
np.linspace(0, last_timestep - 1, num_inference_steps + 1).round()[::-1][:-1].copy().astype(np.int64)
)
elif self.config.timestep_spacing == "leading":
step_ratio = last_timestep // (num_inference_steps + 1)
# creates integer timesteps by multiplying by ratio
# casting to int to avoid issues when num_inference_step is power of 3
timesteps = (np.arange(0, num_inference_steps + 1) * step_ratio).round()[::-1][:-1].copy().astype(np.int64)
timesteps += self.config.steps_offset
elif self.config.timestep_spacing == "trailing":
step_ratio = self.config.num_train_timesteps / num_inference_steps
# creates integer timesteps by multiplying by ratio
# casting to int to avoid issues when num_inference_step is power of 3
timesteps = np.arange(last_timestep, 0, -step_ratio).round().copy().astype(np.int64)
timesteps -= 1
else:
raise ValueError(
f"{self.config.timestep_spacing} is not supported. Please make sure to choose one of 'linspace', 'leading' or 'trailing'."
)
sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5)
if self.config.use_karras_sigmas:
log_sigmas = np.log(sigmas)
sigmas = self._convert_to_karras(in_sigmas=sigmas, num_inference_steps=num_inference_steps)
timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas]).round()
timesteps = np.flip(timesteps).copy().astype(np.int64)
self.sigmas = torch.from_numpy(sigmas)
# when num_inference_steps == num_train_timesteps, we can end up with
# duplicates in timesteps.
_, unique_indices = np.unique(timesteps, return_index=True)
timesteps = timesteps[np.sort(unique_indices)]
self.timesteps = torch.from_numpy(timesteps).to(device)
self.num_inference_steps = len(timesteps)
self.model_outputs = [
None,
] * max(self.config.predictor_order, self.config.corrector_order - 1)
self.lower_order_nums = 0
self.last_sample = None
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sample
def _threshold_sample(self, sample: torch.FloatTensor) -> torch.FloatTensor:
"""
"Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (the
prediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide by
s. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventing
pixels from saturation at each step. We find that dynamic thresholding results in significantly better
photorealism as well as better image-text alignment, especially when using very large guidance weights."
https://arxiv.org/abs/2205.11487
"""
dtype = sample.dtype
batch_size, channels, height, width = sample.shape
if dtype not in (torch.float32, torch.float64):
sample = sample.float() # upcast for quantile calculation, and clamp not implemented for cpu half
# Flatten sample for doing quantile calculation along each image
sample = sample.reshape(batch_size, channels * height * width)
abs_sample = sample.abs() # "a certain percentile absolute pixel value"
s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)
s = torch.clamp(
s, min=1, max=self.config.sample_max_value
) # When clamped to min=1, equivalent to standard clipping to [-1, 1]
s = s.unsqueeze(1) # (batch_size, 1) because clamp will broadcast along dim=0
sample = torch.clamp(sample, -s, s) / s # "we threshold xt0 to the range [-s, s] and then divide by s"
sample = sample.reshape(batch_size, channels, height, width)
sample = sample.to(dtype)
return sample
# Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._sigma_to_t
def _sigma_to_t(self, sigma, log_sigmas):
# get log sigma
log_sigma = np.log(sigma)
# get distribution
dists = log_sigma - log_sigmas[:, np.newaxis]
# get sigmas range
low_idx = np.cumsum((dists >= 0), axis=0).argmax(axis=0).clip(max=log_sigmas.shape[0] - 2)
high_idx = low_idx + 1
low = log_sigmas[low_idx]
high = log_sigmas[high_idx]
# interpolate sigmas
w = (low - log_sigma) / (low - high)
w = np.clip(w, 0, 1)
# transform interpolation to time range
t = (1 - w) * low_idx + w * high_idx
t = t.reshape(sigma.shape)
return t
# Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._convert_to_karras
def _convert_to_karras(self, in_sigmas: torch.FloatTensor, num_inference_steps) -> torch.FloatTensor:
"""Constructs the noise schedule of Karras et al. (2022)."""
sigma_min: float = in_sigmas[-1].item()
sigma_max: float = in_sigmas[0].item()
rho = 7.0 # 7.0 is the value used in the paper
ramp = np.linspace(0, 1, num_inference_steps)
min_inv_rho = sigma_min ** (1 / rho)
max_inv_rho = sigma_max ** (1 / rho)
return (max_inv_rho + ramp * (min_inv_rho - max_inv_rho)) ** rho
def convert_model_output(
self, model_output: torch.FloatTensor, timestep: int, sample: torch.FloatTensor
) -> torch.FloatTensor:
"""
Convert the model output to the corresponding type the DPMSolver/DPMSolver++ algorithm needs. DPM-Solver is
designed to discretize an integral of the noise prediction model, and DPM-Solver++ is designed to discretize an
integral of the data prediction model.
The algorithm and model type are decoupled. You can use either DPMSolver or DPMSolver++ for both noise
prediction and data prediction models.
Args:
model_output (`torch.FloatTensor`):
The direct output from the learned diffusion model.
timestep (`int`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
Returns:
`torch.FloatTensor`:
The converted model output.
"""
# SA-Solver_data_prediction needs to solve an integral of the data prediction model.
if self.config.algorithm_type in ["data_prediction"]:
if self.config.prediction_type == "epsilon":
# SA-Solver only needs the "mean" output.
if self.config.variance_type in ["learned", "learned_range"]:
model_output = model_output[:, :3]
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
x0_pred = (sample - sigma_t * model_output) / alpha_t
elif self.config.prediction_type == "sample":
x0_pred = model_output
elif self.config.prediction_type == "v_prediction":
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
x0_pred = alpha_t * sample - sigma_t * model_output
else:
raise ValueError(
f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or"
" `v_prediction` for the SASolverScheduler."
)
if self.config.thresholding:
x0_pred = self._threshold_sample(x0_pred)
return x0_pred
# SA-Solver_noise_prediction needs to solve an integral of the noise prediction model.
elif self.config.algorithm_type in ["noise_prediction"]:
if self.config.prediction_type == "epsilon":
# SA-Solver only needs the "mean" output.
if self.config.variance_type in ["learned", "learned_range"]:
epsilon = model_output[:, :3]
else:
epsilon = model_output
elif self.config.prediction_type == "sample":
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
epsilon = (sample - alpha_t * model_output) / sigma_t
elif self.config.prediction_type == "v_prediction":
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
epsilon = alpha_t * model_output + sigma_t * sample
else:
raise ValueError(
f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or"
" `v_prediction` for the SASolverScheduler."
)
if self.config.thresholding:
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
x0_pred = (sample - sigma_t * epsilon) / alpha_t
x0_pred = self._threshold_sample(x0_pred)
epsilon = (sample - alpha_t * x0_pred) / sigma_t
return epsilon
def get_coefficients_exponential_negative(self, order, interval_start, interval_end):
"""
Calculate the integral of exp(-x) * x^order dx from interval_start to interval_end
"""
assert order in [0, 1, 2, 3], "order is only supported for 0, 1, 2 and 3"
if order == 0:
return torch.exp(-interval_end) * (torch.exp(interval_end - interval_start) - 1)
elif order == 1:
return torch.exp(-interval_end) * (
(interval_start + 1) * torch.exp(interval_end - interval_start) - (interval_end + 1))
elif order == 2:
return torch.exp(-interval_end) * (
(interval_start ** 2 + 2 * interval_start + 2) * torch.exp(interval_end - interval_start) - (
interval_end ** 2 + 2 * interval_end + 2))
elif order == 3:
return torch.exp(-interval_end) * (
(interval_start ** 3 + 3 * interval_start ** 2 + 6 * interval_start + 6) * torch.exp(
interval_end - interval_start) - (interval_end ** 3 + 3 * interval_end ** 2 + 6 * interval_end + 6))
def get_coefficients_exponential_positive(self, order, interval_start, interval_end, tau):
"""
Calculate the integral of exp(x(1+tau^2)) * x^order dx from interval_start to interval_end
"""
assert order in [0, 1, 2, 3], "order is only supported for 0, 1, 2 and 3"
# after change of variable(cov)
interval_end_cov = (1 + tau ** 2) * interval_end
interval_start_cov = (1 + tau ** 2) * interval_start
if order == 0:
return torch.exp(interval_end_cov) * (1 - torch.exp(-(interval_end_cov - interval_start_cov))) / (
(1 + tau ** 2))
elif order == 1:
return torch.exp(interval_end_cov) * ((interval_end_cov - 1) - (interval_start_cov - 1) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 2)
elif order == 2:
return torch.exp(interval_end_cov) * ((interval_end_cov ** 2 - 2 * interval_end_cov + 2) - (
interval_start_cov ** 2 - 2 * interval_start_cov + 2) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 3)
elif order == 3:
return torch.exp(interval_end_cov) * (
(interval_end_cov ** 3 - 3 * interval_end_cov ** 2 + 6 * interval_end_cov - 6) - (
interval_start_cov ** 3 - 3 * interval_start_cov ** 2 + 6 * interval_start_cov - 6) * torch.exp(
-(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 4)
def lagrange_polynomial_coefficient(self, order, lambda_list):
"""
Calculate the coefficient of lagrange polynomial
"""
assert order in [0, 1, 2, 3]
assert order == len(lambda_list) - 1
if order == 0:
return [[1]]
elif order == 1:
return [[1 / (lambda_list[0] - lambda_list[1]), -lambda_list[1] / (lambda_list[0] - lambda_list[1])],
[1 / (lambda_list[1] - lambda_list[0]), -lambda_list[0] / (lambda_list[1] - lambda_list[0])]]
elif order == 2:
denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2])
denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2])
denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1])
return [[1 / denominator1,
(-lambda_list[1] - lambda_list[2]) / denominator1,
lambda_list[1] * lambda_list[2] / denominator1],
[1 / denominator2,
(-lambda_list[0] - lambda_list[2]) / denominator2,
lambda_list[0] * lambda_list[2] / denominator2],
[1 / denominator3,
(-lambda_list[0] - lambda_list[1]) / denominator3,
lambda_list[0] * lambda_list[1] / denominator3]
]
elif order == 3:
denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2]) * (
lambda_list[0] - lambda_list[3])
denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2]) * (
lambda_list[1] - lambda_list[3])
denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1]) * (
lambda_list[2] - lambda_list[3])
denominator4 = (lambda_list[3] - lambda_list[0]) * (lambda_list[3] - lambda_list[1]) * (
lambda_list[3] - lambda_list[2])
return [[1 / denominator1,
(-lambda_list[1] - lambda_list[2] - lambda_list[3]) / denominator1,
(lambda_list[1] * lambda_list[2] + lambda_list[1] * lambda_list[3] + lambda_list[2] * lambda_list[
3]) / denominator1,
(-lambda_list[1] * lambda_list[2] * lambda_list[3]) / denominator1],
[1 / denominator2,
(-lambda_list[0] - lambda_list[2] - lambda_list[3]) / denominator2,
(lambda_list[0] * lambda_list[2] + lambda_list[0] * lambda_list[3] + lambda_list[2] * lambda_list[
3]) / denominator2,
(-lambda_list[0] * lambda_list[2] * lambda_list[3]) / denominator2],
[1 / denominator3,
(-lambda_list[0] - lambda_list[1] - lambda_list[3]) / denominator3,
(lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[3] + lambda_list[1] * lambda_list[
3]) / denominator3,
(-lambda_list[0] * lambda_list[1] * lambda_list[3]) / denominator3],
[1 / denominator4,
(-lambda_list[0] - lambda_list[1] - lambda_list[2]) / denominator4,
(lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[2] + lambda_list[1] * lambda_list[
2]) / denominator4,
(-lambda_list[0] * lambda_list[1] * lambda_list[2]) / denominator4]
]
def get_coefficients_fn(self, order, interval_start, interval_end, lambda_list, tau):
assert order in [1, 2, 3, 4]
assert order == len(lambda_list), 'the length of lambda list must be equal to the order'
coefficients = []
lagrange_coefficient = self.lagrange_polynomial_coefficient(order - 1, lambda_list)
for i in range(order):
coefficient = sum(
lagrange_coefficient[i][j]
* self.get_coefficients_exponential_positive(
order - 1 - j, interval_start, interval_end, tau
)
if self.predict_x0
else lagrange_coefficient[i][j]
* self.get_coefficients_exponential_negative(
order - 1 - j, interval_start, interval_end
)
for j in range(order)
)
coefficients.append(coefficient)
assert len(coefficients) == order, 'the length of coefficients does not match the order'
return coefficients
def stochastic_adams_bashforth_update(
self,
model_output: torch.FloatTensor,
prev_timestep: int,
sample: torch.FloatTensor,
noise: torch.FloatTensor,
order: int,
tau: torch.FloatTensor,
) -> torch.FloatTensor:
"""
One step for the SA-Predictor.
Args:
model_output (`torch.FloatTensor`):
The direct output from the learned diffusion model at the current timestep.
prev_timestep (`int`):
The previous discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
order (`int`):
The order of SA-Predictor at this timestep.
Returns:
`torch.FloatTensor`:
The sample tensor at the previous timestep.
"""
assert noise is not None
timestep_list = self.timestep_list
model_output_list = self.model_outputs
s0, t = self.timestep_list[-1], prev_timestep
lambda_t, lambda_s0 = self.lambda_t[t], self.lambda_t[s0]
alpha_t, alpha_s0 = self.alpha_t[t], self.alpha_t[s0]
sigma_t, sigma_s0 = self.sigma_t[t], self.sigma_t[s0]
gradient_part = torch.zeros_like(sample)
h = lambda_t - lambda_s0
lambda_list = [self.lambda_t[timestep_list[-(i + 1)]] for i in range(order)]
gradient_coefficients = self.get_coefficients_fn(order, lambda_s0, lambda_t, lambda_list, tau)
x = sample
if self.predict_x0 and order == 2:
gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2)) / (self.lambda_t[timestep_list[-1]] - self.lambda_t[
timestep_list[-2]])
gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2)) / (self.lambda_t[timestep_list[-1]] - self.lambda_t[
timestep_list[-2]])
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_output_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_output_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_s0) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_s0) * x + gradient_part + noise_part
x_t = x_t.to(x.dtype)
return x_t
def stochastic_adams_moulton_update(
self,
this_model_output: torch.FloatTensor,
this_timestep: int,
last_sample: torch.FloatTensor,
last_noise: torch.FloatTensor,
this_sample: torch.FloatTensor,
order: int,
tau: torch.FloatTensor,
) -> torch.FloatTensor:
"""
One step for the SA-Corrector.
Args:
this_model_output (`torch.FloatTensor`):
The model outputs at `x_t`.
this_timestep (`int`):
The current timestep `t`.
last_sample (`torch.FloatTensor`):
The generated sample before the last predictor `x_{t-1}`.
this_sample (`torch.FloatTensor`):
The generated sample after the last predictor `x_{t}`.
order (`int`):
The order of SA-Corrector at this step.
Returns:
`torch.FloatTensor`:
The corrected sample tensor at the current timestep.
"""
assert last_noise is not None
timestep_list = self.timestep_list
model_output_list = self.model_outputs
s0, t = self.timestep_list[-1], this_timestep
lambda_t, lambda_s0 = self.lambda_t[t], self.lambda_t[s0]
alpha_t, alpha_s0 = self.alpha_t[t], self.alpha_t[s0]
sigma_t, sigma_s0 = self.sigma_t[t], self.sigma_t[s0]
gradient_part = torch.zeros_like(this_sample)
h = lambda_t - lambda_s0
t_list = timestep_list + [this_timestep]
lambda_list = [self.lambda_t[t_list[-(i + 1)]] for i in range(order)]
model_prev_list = model_output_list + [this_model_output]
gradient_coefficients = self.get_coefficients_fn(order, lambda_s0, lambda_t, lambda_list, tau)
x = last_sample
if self.predict_x0 and order == 2:
gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2 * h))
gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (
h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (
(1 + tau ** 2) ** 2 * h))
for i in range(order):
if self.predict_x0:
gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[
i] * model_prev_list[-(i + 1)]
else:
gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]
if self.predict_x0:
noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * last_noise
else:
noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * last_noise
if self.predict_x0:
x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_s0) * x + gradient_part + noise_part
else:
x_t = (alpha_t / alpha_s0) * x + gradient_part + noise_part
x_t = x_t.to(x.dtype)
return x_t
def step(
self,
model_output: torch.FloatTensor,
timestep: int,
sample: torch.FloatTensor,
generator=None,
return_dict: bool = True,
) -> Union[SchedulerOutput, Tuple]:
"""
Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with
the SA-Solver.
Args:
model_output (`torch.FloatTensor`):
The direct output from learned diffusion model.
timestep (`int`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
generator (`torch.Generator`, *optional*):
A random number generator.
return_dict (`bool`):
Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`.
Returns:
[`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:
If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a
tuple is returned where the first element is the sample tensor.
"""
if self.num_inference_steps is None:
raise ValueError(
"Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler"
)
if isinstance(timestep, torch.Tensor):
timestep = timestep.to(self.timesteps.device)
step_index = (self.timesteps == timestep).nonzero()
if len(step_index) == 0:
step_index = len(self.timesteps) - 1
else:
step_index = step_index.item()
use_corrector = (
step_index > 0 and self.last_sample is not None
)
model_output_convert = self.convert_model_output(model_output, timestep, sample)
if use_corrector:
current_tau = self.tau_func(self.timestep_list[-1])
sample = self.stochastic_adams_moulton_update(
this_model_output=model_output_convert,
this_timestep=timestep,
last_sample=self.last_sample,
last_noise=self.last_noise,
this_sample=sample,
order=self.this_corrector_order,
tau=current_tau,
)
prev_timestep = 0 if step_index == len(self.timesteps) - 1 else self.timesteps[step_index + 1]
for i in range(max(self.config.predictor_order, self.config.corrector_order - 1) - 1):
self.model_outputs[i] = self.model_outputs[i + 1]
self.timestep_list[i] = self.timestep_list[i + 1]
self.model_outputs[-1] = model_output_convert
self.timestep_list[-1] = timestep
noise = randn_tensor(
model_output.shape, generator=generator, device=model_output.device, dtype=model_output.dtype
)
if self.config.lower_order_final:
this_predictor_order = min(self.config.predictor_order, len(self.timesteps) - step_index)
this_corrector_order = min(self.config.corrector_order, len(self.timesteps) - step_index + 1)
else:
this_predictor_order = self.config.predictor_order
this_corrector_order = self.config.corrector_order
self.this_predictor_order = min(this_predictor_order, self.lower_order_nums + 1) # warmup for multistep
self.this_corrector_order = min(this_corrector_order, self.lower_order_nums + 2) # warmup for multistep
assert self.this_predictor_order > 0
assert self.this_corrector_order > 0
self.last_sample = sample
self.last_noise = noise
current_tau = self.tau_func(self.timestep_list[-1])
prev_sample = self.stochastic_adams_bashforth_update(
model_output=model_output_convert,
prev_timestep=prev_timestep,
sample=sample,
noise=noise,
order=self.this_predictor_order,
tau=current_tau,
)
if self.lower_order_nums < max(self.config.predictor_order, self.config.corrector_order - 1):
self.lower_order_nums += 1
if not return_dict:
return (prev_sample,)
return SchedulerOutput(prev_sample=prev_sample)
def scale_model_input(self, sample: torch.FloatTensor, *args, **kwargs) -> torch.FloatTensor:
"""
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep.
Args:
sample (`torch.FloatTensor`):
The input sample.
Returns:
`torch.FloatTensor`:
A scaled input sample.
"""
return sample
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noise
def add_noise(
self,
original_samples: torch.FloatTensor,
noise: torch.FloatTensor,
timesteps: torch.IntTensor,
) -> torch.FloatTensor:
# Make sure alphas_cumprod and timestep have same device and dtype as original_samples
alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device, dtype=original_samples.dtype)
timesteps = timesteps.to(original_samples.device)
sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
sqrt_alpha_prod = sqrt_alpha_prod.flatten()
while len(sqrt_alpha_prod.shape) < len(original_samples.shape):
sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
def __len__(self):
return self.config.num_train_timesteps
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/__init__.py
================================================
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/checkpoint.py
================================================
import os
import re
import torch
from diffusion.utils.logger import get_root_logger
def save_checkpoint(work_dir,
epoch,
model,
model_ema=None,
optimizer=None,
lr_scheduler=None,
keep_last=False,
step=None,
):
os.makedirs(work_dir, exist_ok=True)
state_dict = dict(state_dict=model.state_dict())
if model_ema is not None:
state_dict['state_dict_ema'] = model_ema.state_dict()
if optimizer is not None:
state_dict['optimizer'] = optimizer.state_dict()
if lr_scheduler is not None:
state_dict['scheduler'] = lr_scheduler.state_dict()
if epoch is not None:
state_dict['epoch'] = epoch
file_path = os.path.join(work_dir, f"epoch_{epoch}.pth")
if step is not None:
file_path = file_path.split('.pth')[0] + f"_step_{step}.pth"
logger = get_root_logger()
torch.save(state_dict, file_path)
logger.info(f'Saved checkpoint of epoch {epoch} to {file_path.format(epoch)}.')
if keep_last:
for i in range(epoch):
previous_ckgt = file_path.format(i)
if os.path.exists(previous_ckgt):
os.remove(previous_ckgt)
def load_checkpoint(checkpoint,
model,
model_ema=None,
optimizer=None,
lr_scheduler=None,
load_ema=False,
resume_optimizer=True,
resume_lr_scheduler=True
):
assert isinstance(checkpoint, str)
ckpt_file = checkpoint
checkpoint = torch.load(ckpt_file, map_location="cpu")
state_dict_keys = ['pos_embed', 'base_model.pos_embed', 'model.pos_embed']
for key in state_dict_keys:
if key in checkpoint['state_dict']:
del checkpoint['state_dict'][key]
if 'state_dict_ema' in checkpoint and key in checkpoint['state_dict_ema']:
del checkpoint['state_dict_ema'][key]
break
if load_ema:
state_dict = checkpoint['state_dict_ema']
else:
state_dict = checkpoint.get('state_dict', checkpoint) # to be compatible with the official checkpoint
# model.load_state_dict(state_dict)
missing, unexpect = model.load_state_dict(state_dict, strict=False)
if model_ema is not None:
model_ema.load_state_dict(checkpoint['state_dict_ema'], strict=False)
if optimizer is not None and resume_optimizer:
optimizer.load_state_dict(checkpoint['optimizer'])
if lr_scheduler is not None and resume_lr_scheduler:
lr_scheduler.load_state_dict(checkpoint['scheduler'])
logger = get_root_logger()
if optimizer is not None:
epoch = checkpoint.get('epoch', re.match(r'.*epoch_(\d*).*.pth', ckpt_file).group()[0])
logger.info(f'Resume checkpoint of epoch {epoch} from {ckpt_file}. Load ema: {load_ema}, '
f'resume optimizer: {resume_optimizer}, resume lr scheduler: {resume_lr_scheduler}.')
return epoch, missing, unexpect
logger.info(f'Load checkpoint from {ckpt_file}. Load ema: {load_ema}.')
return missing, unexpect
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/data_sampler.py
================================================
# Copyright (c) OpenMMLab. All rights reserved.
import os
from typing import Sequence
from torch.utils.data import BatchSampler, Sampler, Dataset
from random import shuffle, choice
from copy import deepcopy
from diffusion.utils.logger import get_root_logger
class AspectRatioBatchSampler(BatchSampler):
"""A sampler wrapper for grouping images with similar aspect ratio into a same batch.
Args:
sampler (Sampler): Base sampler.
dataset (Dataset): Dataset providing data information.
batch_size (int): Size of mini-batch.
drop_last (bool): If ``True``, the sampler will drop the last batch if
its size would be less than ``batch_size``.
aspect_ratios (dict): The predefined aspect ratios.
"""
def __init__(self,
sampler: Sampler,
dataset: Dataset,
batch_size: int,
aspect_ratios: dict,
drop_last: bool = False,
config=None,
valid_num=0, # take as valid aspect-ratio when sample number >= valid_num
**kwargs) -> None:
if not isinstance(sampler, Sampler):
raise TypeError('sampler should be an instance of ``Sampler``, '
f'but got {sampler}')
if not isinstance(batch_size, int) or batch_size <= 0:
raise ValueError('batch_size should be a positive integer value, '
f'but got batch_size={batch_size}')
self.sampler = sampler
self.dataset = dataset
self.batch_size = batch_size
self.aspect_ratios = aspect_ratios
self.drop_last = drop_last
self.ratio_nums_gt = kwargs.get('ratio_nums', None)
self.config = config
assert self.ratio_nums_gt
# buckets for each aspect ratio
self._aspect_ratio_buckets = {ratio: [] for ratio in aspect_ratios}
self.current_available_bucket_keys = [str(k) for k, v in self.ratio_nums_gt.items() if v >= valid_num]
logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
logger.warning(f"Using valid_num={valid_num} in config file. Available {len(self.current_available_bucket_keys)} aspect_ratios: {self.current_available_bucket_keys}")
def __iter__(self) -> Sequence[int]:
for idx in self.sampler:
data_info = self.dataset.get_data_info(idx)
height, width = data_info['height'], data_info['width']
ratio = height / width
# find the closest aspect ratio
closest_ratio = min(self.aspect_ratios.keys(), key=lambda r: abs(float(r) - ratio))
if closest_ratio not in self.current_available_bucket_keys:
continue
bucket = self._aspect_ratio_buckets[closest_ratio]
bucket.append(idx)
# yield a batch of indices in the same aspect ratio group
if len(bucket) == self.batch_size:
yield bucket[:]
del bucket[:]
# yield the rest data and reset the buckets
for bucket in self._aspect_ratio_buckets.values():
while len(bucket) > 0:
if len(bucket) <= self.batch_size:
if not self.drop_last:
yield bucket[:]
bucket = []
else:
yield bucket[:self.batch_size]
bucket = bucket[self.batch_size:]
class BalancedAspectRatioBatchSampler(AspectRatioBatchSampler):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Assign samples to each bucket
self.ratio_nums_gt = kwargs.get('ratio_nums', None)
assert self.ratio_nums_gt
self._aspect_ratio_buckets = {float(ratio): [] for ratio in self.aspect_ratios.keys()}
self.original_buckets = {}
self.current_available_bucket_keys = [k for k, v in self.ratio_nums_gt.items() if v >= 3000]
self.all_available_keys = deepcopy(self.current_available_bucket_keys)
self.exhausted_bucket_keys = []
self.total_batches = len(self.sampler) // self.batch_size
self._aspect_ratio_count = {}
for k in self.all_available_keys:
self._aspect_ratio_count[float(k)] = 0
self.original_buckets[float(k)] = []
logger = get_root_logger(os.path.join(self.config.work_dir, 'train_log.log'))
logger.warning(f"Available {len(self.current_available_bucket_keys)} aspect_ratios: {self.current_available_bucket_keys}")
def __iter__(self) -> Sequence[int]:
i = 0
for idx in self.sampler:
data_info = self.dataset.get_data_info(idx)
height, width = data_info['height'], data_info['width']
ratio = height / width
closest_ratio = float(min(self.aspect_ratios.keys(), key=lambda r: abs(float(r) - ratio)))
if closest_ratio not in self.all_available_keys:
continue
if self._aspect_ratio_count[closest_ratio] < self.ratio_nums_gt[closest_ratio]:
self._aspect_ratio_count[closest_ratio] += 1
self._aspect_ratio_buckets[closest_ratio].append(idx)
self.original_buckets[closest_ratio].append(idx) # Save the original samples for each bucket
if not self.current_available_bucket_keys:
self.current_available_bucket_keys, self.exhausted_bucket_keys = self.exhausted_bucket_keys, []
if closest_ratio not in self.current_available_bucket_keys:
continue
key = closest_ratio
bucket = self._aspect_ratio_buckets[key]
if len(bucket) == self.batch_size:
yield bucket[:self.batch_size]
del bucket[:self.batch_size]
i += 1
self.exhausted_bucket_keys.append(key)
self.current_available_bucket_keys.remove(key)
for _ in range(self.total_batches - i):
key = choice(self.all_available_keys)
bucket = self._aspect_ratio_buckets[key]
if len(bucket) >= self.batch_size:
yield bucket[:self.batch_size]
del bucket[:self.batch_size]
# If a bucket is exhausted
if not bucket:
self._aspect_ratio_buckets[key] = deepcopy(self.original_buckets[key][:])
shuffle(self._aspect_ratio_buckets[key])
else:
self._aspect_ratio_buckets[key] = deepcopy(self.original_buckets[key][:])
shuffle(self._aspect_ratio_buckets[key])
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/dist_utils.py
================================================
"""
This file contains primitives for multi-gpu communication.
This is useful when doing distributed training.
"""
import os
import pickle
import shutil
import gc
import mmcv
import torch
import torch.distributed as dist
from mmcv.runner import get_dist_info
def is_distributed():
return get_world_size() > 1
def get_world_size():
if not dist.is_available():
return 1
return dist.get_world_size() if dist.is_initialized() else 1
def get_rank():
if not dist.is_available():
return 0
return dist.get_rank() if dist.is_initialized() else 0
def get_local_rank():
if not dist.is_available():
return 0
return int(os.getenv('LOCAL_RANK', 0)) if dist.is_initialized() else 0
def is_master():
return get_rank() == 0
def is_local_master():
return get_local_rank() == 0
def get_local_proc_group(group_size=8):
world_size = get_world_size()
if world_size <= group_size or group_size == 1:
return None
assert world_size % group_size == 0, f'world size ({world_size}) should be evenly divided by group size ({group_size}).'
process_groups = getattr(get_local_proc_group, 'process_groups', {})
if group_size not in process_groups:
num_groups = dist.get_world_size() // group_size
groups = [list(range(i * group_size, (i + 1) * group_size)) for i in range(num_groups)]
process_groups.update({group_size: [torch.distributed.new_group(group) for group in groups]})
get_local_proc_group.process_groups = process_groups
group_idx = get_rank() // group_size
return get_local_proc_group.process_groups.get(group_size)[group_idx]
def synchronize():
"""
Helper function to synchronize (barrier) among all processes when
using distributed training
"""
if not dist.is_available():
return
if not dist.is_initialized():
return
world_size = dist.get_world_size()
if world_size == 1:
return
dist.barrier()
def all_gather(data):
"""
Run all_gather on arbitrary picklable data (not necessarily tensors)
Args:
data: any picklable object
Returns:
list[data]: list of data gathered from each rank
"""
to_device = torch.device("cuda")
# to_device = torch.device("cpu")
world_size = get_world_size()
if world_size == 1:
return [data]
# serialized to a Tensor
buffer = pickle.dumps(data)
storage = torch.ByteStorage.from_buffer(buffer)
tensor = torch.ByteTensor(storage).to(to_device)
# obtain Tensor size of each rank
local_size = torch.LongTensor([tensor.numel()]).to(to_device)
size_list = [torch.LongTensor([0]).to(to_device) for _ in range(world_size)]
dist.all_gather(size_list, local_size)
size_list = [int(size.item()) for size in size_list]
max_size = max(size_list)
tensor_list = [
torch.ByteTensor(size=(max_size,)).to(to_device) for _ in size_list
]
if local_size != max_size:
padding = torch.ByteTensor(size=(max_size - local_size,)).to(to_device)
tensor = torch.cat((tensor, padding), dim=0)
dist.all_gather(tensor_list, tensor)
data_list = []
for size, tensor in zip(size_list, tensor_list):
buffer = tensor.cpu().numpy().tobytes()[:size]
data_list.append(pickle.loads(buffer))
return data_list
def reduce_dict(input_dict, average=True):
"""
Args:
input_dict (dict): all the values will be reduced
average (bool): whether to do average or sum
Reduce the values in the dictionary from all processes so that process with rank
0 has the averaged results. Returns a dict with the same fields as
input_dict, after reduction.
"""
world_size = get_world_size()
if world_size < 2:
return input_dict
with torch.no_grad():
reduced_dict = _extracted_from_reduce_dict_14(input_dict, average, world_size)
return reduced_dict
# TODO Rename this here and in `reduce_dict`
def _extracted_from_reduce_dict_14(input_dict, average, world_size):
names = []
values = []
# sort the keys so that they are consistent across processes
for k in sorted(input_dict.keys()):
names.append(k)
values.append(input_dict[k])
values = torch.stack(values, dim=0)
dist.reduce(values, dst=0)
if dist.get_rank() == 0 and average:
# only main process gets accumulated, so only divide by
# world_size in this case
values /= world_size
return dict(zip(names, values))
def broadcast(data, **kwargs):
if get_world_size() == 1:
return data
data = [data]
dist.broadcast_object_list(data, **kwargs)
return data[0]
def all_gather_cpu(result_part, tmpdir=None, collect_by_master=True):
rank, world_size = get_dist_info()
if tmpdir is None:
tmpdir = './tmp'
if rank == 0:
mmcv.mkdir_or_exist(tmpdir)
synchronize()
# dump the part result to the dir
mmcv.dump(result_part, os.path.join(tmpdir, f'part_{rank}.pkl'))
synchronize()
if collect_by_master and rank != 0:
return None
# load results of all parts from tmp dir
results = []
for i in range(world_size):
part_file = os.path.join(tmpdir, f'part_{i}.pkl')
results.append(mmcv.load(part_file))
if not collect_by_master:
synchronize()
# remove tmp dir
if rank == 0:
shutil.rmtree(tmpdir)
return results
def all_gather_tensor(tensor, group_size=None, group=None):
if group_size is None:
group_size = get_world_size()
if group_size == 1:
output = [tensor]
else:
output = [torch.zeros_like(tensor) for _ in range(group_size)]
dist.all_gather(output, tensor, group=group)
return output
def gather_difflen_tensor(feat, num_samples_list, concat=True, group=None, group_size=None):
world_size = get_world_size()
if world_size == 1:
return feat if concat else [feat]
num_samples, *feat_dim = feat.size()
# padding to max number of samples
feat_padding = feat.new_zeros((max(num_samples_list), *feat_dim))
feat_padding[:num_samples] = feat
# gather
feat_gather = all_gather_tensor(feat_padding, group=group, group_size=group_size)
for r, num in enumerate(num_samples_list):
feat_gather[r] = feat_gather[r][:num]
if concat:
feat_gather = torch.cat(feat_gather)
return feat_gather
class GatherLayer(torch.autograd.Function):
'''Gather tensors from all process, supporting backward propagation.
'''
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
num_samples = torch.tensor(input.size(0), dtype=torch.long, device=input.device)
ctx.num_samples_list = all_gather_tensor(num_samples)
output = gather_difflen_tensor(input, ctx.num_samples_list, concat=False)
return tuple(output)
@staticmethod
def backward(ctx, *grads): # tuple(output)'s grad
input, = ctx.saved_tensors
num_samples_list = ctx.num_samples_list
rank = get_rank()
start, end = sum(num_samples_list[:rank]), sum(num_samples_list[:rank + 1])
grads = torch.cat(grads)
if is_distributed():
dist.all_reduce(grads)
grad_out = torch.zeros_like(input)
grad_out[:] = grads[start:end]
return grad_out, None, None
class GatherLayerWithGroup(torch.autograd.Function):
'''Gather tensors from all process, supporting backward propagation.
'''
@staticmethod
def forward(ctx, input, group, group_size):
ctx.save_for_backward(input)
ctx.group_size = group_size
output = all_gather_tensor(input, group=group, group_size=group_size)
return tuple(output)
@staticmethod
def backward(ctx, *grads): # tuple(output)'s grad
input, = ctx.saved_tensors
grads = torch.stack(grads)
if is_distributed():
dist.all_reduce(grads)
grad_out = torch.zeros_like(input)
grad_out[:] = grads[get_rank() % ctx.group_size]
return grad_out, None, None
def gather_layer_with_group(data, group=None, group_size=None):
if group_size is None:
group_size = get_world_size()
return GatherLayer.apply(data, group, group_size)
from typing import Union
import math
# from torch.distributed.fsdp.fully_sharded_data_parallel import TrainingState_, _calc_grad_norm
@torch.no_grad()
def clip_grad_norm_(
self, max_norm: Union[float, int], norm_type: Union[float, int] = 2.0
) -> None:
self._lazy_init()
self._wait_for_previous_optim_step()
assert self._is_root, "clip_grad_norm should only be called on the root (parent) instance"
self._assert_state(TrainingState_.IDLE)
max_norm = float(max_norm)
norm_type = float(norm_type)
# Computes the max norm for this shard's gradients and sync's across workers
local_norm = _calc_grad_norm(self.params_with_grad, norm_type).cuda() # type: ignore[arg-type]
if norm_type == math.inf:
total_norm = local_norm
dist.all_reduce(total_norm, op=torch.distributed.ReduceOp.MAX, group=self.process_group)
else:
total_norm = local_norm ** norm_type
dist.all_reduce(total_norm, group=self.process_group)
total_norm = total_norm ** (1.0 / norm_type)
clip_coef = torch.tensor(max_norm, dtype=total_norm.dtype, device=total_norm.device) / (total_norm + 1e-6)
if clip_coef < 1:
# multiply by clip_coef, aka, (max_norm/total_norm).
for p in self.params_with_grad:
assert p.grad is not None
p.grad.detach().mul_(clip_coef.to(p.grad.device))
return total_norm
def flush():
gc.collect()
torch.cuda.empty_cache()
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/logger.py
================================================
import logging
import os
import torch.distributed as dist
from datetime import datetime
from .dist_utils import is_local_master
from mmcv.utils.logging import logger_initialized
def get_root_logger(log_file=None, log_level=logging.INFO, name='PixArt'):
"""Get root logger.
Args:
log_file (str, optional): File path of log. Defaults to None.
log_level (int, optional): The level of logger.
Defaults to logging.INFO.
name (str): logger name
Returns:
:obj:`logging.Logger`: The obtained logger
"""
if log_file is None:
log_file = '/dev/null'
return get_logger(name=name, log_file=log_file, log_level=log_level)
def get_logger(name, log_file=None, log_level=logging.INFO):
"""Initialize and get a logger by name.
If the logger has not been initialized, this method will initialize the
logger by adding one or two handlers, otherwise the initialized logger will
be directly returned. During initialization, a StreamHandler will always be
added. If `log_file` is specified and the process rank is 0, a FileHandler
will also be added.
Args:
name (str): Logger name.
log_file (str | None): The log filename. If specified, a FileHandler
will be added to the logger.
log_level (int): The logger level. Note that only the process of
rank 0 is affected, and other processes will set the level to
"Error" thus be silent most of the time.
Returns:
logging.Logger: The expected logger.
"""
logger = logging.getLogger(name)
logger.propagate = False # disable root logger to avoid duplicate logging
if name in logger_initialized:
return logger
# handle hierarchical names
# e.g., logger "a" is initialized, then logger "a.b" will skip the
# initialization since it is a child of "a".
for logger_name in logger_initialized:
if name.startswith(logger_name):
return logger
stream_handler = logging.StreamHandler()
handlers = [stream_handler]
rank = dist.get_rank() if dist.is_available() and dist.is_initialized() else 0
# only rank 0 will add a FileHandler
if rank == 0 and log_file is not None:
file_handler = logging.FileHandler(log_file, 'w')
handlers.append(file_handler)
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s')
for handler in handlers:
handler.setFormatter(formatter)
handler.setLevel(log_level)
logger.addHandler(handler)
# only rank0 for each node will print logs
log_level = log_level if is_local_master() else logging.ERROR
logger.setLevel(log_level)
logger_initialized[name] = True
return logger
def rename_file_with_creation_time(file_path):
# 获取文件的创建时间
creation_time = os.path.getctime(file_path)
creation_time_str = datetime.fromtimestamp(creation_time).strftime('%Y-%m-%d_%H-%M-%S')
# 构建新的文件名
dir_name, file_name = os.path.split(file_path)
name, ext = os.path.splitext(file_name)
new_file_name = f"{name}_{creation_time_str}{ext}"
new_file_path = os.path.join(dir_name, new_file_name)
# 重命名文件
os.rename(file_path, new_file_path)
print(f"File renamed to: {new_file_path}")
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/lr_scheduler.py
================================================
from diffusers import get_cosine_schedule_with_warmup, get_constant_schedule_with_warmup
from torch.optim import Optimizer
from torch.optim.lr_scheduler import LambdaLR
import math
from diffusion.utils.logger import get_root_logger
def build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio):
if not config.get('lr_schedule_args', None):
config.lr_schedule_args = {}
if config.get('lr_warmup_steps', None):
config['num_warmup_steps'] = config.get('lr_warmup_steps') # for compatibility with old version
logger = get_root_logger()
logger.info(
f'Lr schedule: {config.lr_schedule}, ' + ",".join(
[f"{key}:{value}" for key, value in config.lr_schedule_args.items()]) + '.')
if config.lr_schedule == 'cosine':
lr_scheduler = get_cosine_schedule_with_warmup(
optimizer=optimizer,
**config.lr_schedule_args,
num_training_steps=(len(train_dataloader) * config.num_epochs),
)
elif config.lr_schedule == 'constant':
lr_scheduler = get_constant_schedule_with_warmup(
optimizer=optimizer,
**config.lr_schedule_args,
)
elif config.lr_schedule == 'cosine_decay_to_constant':
assert lr_scale_ratio >= 1
lr_scheduler = get_cosine_decay_to_constant_with_warmup(
optimizer=optimizer,
**config.lr_schedule_args,
final_lr=1 / lr_scale_ratio,
num_training_steps=(len(train_dataloader) * config.num_epochs),
)
else:
raise RuntimeError(f'Unrecognized lr schedule {config.lr_schedule}.')
return lr_scheduler
def get_cosine_decay_to_constant_with_warmup(optimizer: Optimizer,
num_warmup_steps: int,
num_training_steps: int,
final_lr: float = 0.0,
num_decay: float = 0.667,
num_cycles: float = 0.5,
last_epoch: int = -1
):
"""
Create a schedule with a cosine annealing lr followed by a constant lr.
Args:
optimizer ([`~torch.optim.Optimizer`]):
The optimizer for which to schedule the learning rate.
num_warmup_steps (`int`):
The number of steps for the warmup phase.
num_training_steps (`int`):
The number of total training steps.
final_lr (`int`):
The final constant lr after cosine decay.
num_decay (`int`):
The
last_epoch (`int`, *optional*, defaults to -1):
The index of the last epoch when resuming training.
Return:
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
"""
def lr_lambda(current_step):
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
num_decay_steps = int(num_training_steps * num_decay)
if current_step > num_decay_steps:
return final_lr
progress = float(current_step - num_warmup_steps) / float(max(1, num_decay_steps - num_warmup_steps))
return (
max(
0.0,
0.5 * (1.0 + math.cos(math.pi * num_cycles * 2.0 * progress)),
)
* (1 - final_lr)
) + final_lr
return LambdaLR(optimizer, lr_lambda, last_epoch)
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/misc.py
================================================
import collections
import datetime
import os
import random
import subprocess
import time
from multiprocessing import JoinableQueue, Process
import numpy as np
import torch
import torch.distributed as dist
from mmcv import Config
from mmcv.runner import get_dist_info
from diffusion.utils.logger import get_root_logger
os.environ["MOX_SILENT_MODE"] = "1" # mute moxing log
def read_config(file):
# solve config loading conflict when multi-processes
import time
while True:
config = Config.fromfile(file)
if len(config) == 0:
time.sleep(0.1)
continue
break
return config
def init_random_seed(seed=None, device='cuda'):
"""Initialize random seed.
If the seed is not set, the seed will be automatically randomized,
and then broadcast to all processes to prevent some potential bugs.
Args:
seed (int, Optional): The seed. Default to None.
device (str): The device where the seed will be put on.
Default to 'cuda'.
Returns:
int: Seed to be used.
"""
if seed is not None:
return seed
# Make sure all ranks share the same random seed to prevent
# some potential bugs. Please refer to
# https://github.com/open-mmlab/mmdetection/issues/6339
rank, world_size = get_dist_info()
seed = np.random.randint(2 ** 31)
if world_size == 1:
return seed
if rank == 0:
random_num = torch.tensor(seed, dtype=torch.int32, device=device)
else:
random_num = torch.tensor(0, dtype=torch.int32, device=device)
dist.broadcast(random_num, src=0)
return random_num.item()
def set_random_seed(seed, deterministic=False):
"""Set random seed.
Args:
seed (int): Seed to be used.
deterministic (bool): Whether to set the deterministic option for
CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`
to True and `torch.backends.cudnn.benchmark` to False.
Default: False.
"""
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
if deterministic:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
class SimpleTimer:
def __init__(self, num_tasks, log_interval=1, desc="Process"):
self.num_tasks = num_tasks
self.desc = desc
self.count = 0
self.log_interval = log_interval
self.start_time = time.time()
self.logger = get_root_logger()
def log(self):
self.count += 1
if (self.count % self.log_interval) == 0 or self.count == self.num_tasks:
time_elapsed = time.time() - self.start_time
avg_time = time_elapsed / self.count
eta_sec = avg_time * (self.num_tasks - self.count)
eta_str = str(datetime.timedelta(seconds=int(eta_sec)))
elapsed_str = str(datetime.timedelta(seconds=int(time_elapsed)))
log_info = f"{self.desc} [{self.count}/{self.num_tasks}], elapsed_time:{elapsed_str}," \
f" avg_time: {avg_time}, eta: {eta_str}."
self.logger.info(log_info)
class DebugUnderflowOverflow:
"""
This debug class helps detect and understand where the model starts getting very large or very small, and more
importantly `nan` or `inf` weight and activation elements.
There are 2 working modes:
1. Underflow/overflow detection (default)
2. Specific batch absolute min/max tracing without detection
Mode 1: Underflow/overflow detection
To activate the underflow/overflow detection, initialize the object with the model :
```python
debug_overflow = DebugUnderflowOverflow(model)
```
then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or
output elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this
event, each frame reporting
1. the fully qualified module name plus the class name whose `forward` was run
2. the absolute min and max value of all elements for each module weights, and the inputs and output
For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16 mixed precision :
```
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
[...]
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
```
You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value
was around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than
64K, and we get an overlow.
As you can see it's the previous frames that we need to look into when the numbers start going into very large for
fp16 numbers.
The tracking is done in a forward hook, which gets invoked immediately after `forward` has completed.
By default the last 21 frames are printed. You can change the default to adjust for your needs. For example :
```python
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
```
To validate that you have set up this debugging feature correctly, and you intend to use it in a training that may
take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in the next
section.
Mode 2. Specific batch absolute min/max tracing without detection
The second work mode is per-batch tracing with the underflow/overflow detection feature turned off.
Let's say you want to watch the absolute min and max values for all the ingredients of each `forward` call of a
given batch, and only do that for batches 1 and 3. Then you instantiate this class as :
```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3])
```
And now full batches 1 and 3 will be traced using the same format as explained above. Batches are 0-indexed.
This is helpful if you know that the program starts misbehaving after a certain batch number, so you can
fast-forward right to that area.
Early stopping:
You can also specify the batch number after which to stop the training, with :
```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3], abort_after_batch_num=3)
```
This feature is mainly useful in the tracing mode, but you can use it for any mode.
**Performance**:
As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the
training down. Therefore remember to turn it off once the debugging needs have been met.
Args:
model (`nn.Module`):
The model to debug.
max_frames_to_save (`int`, *optional*, defaults to 21):
How many frames back to record
trace_batch_nums(`List[int]`, *optional*, defaults to `[]`):
Which batch numbers to trace (turns detection off)
abort_after_batch_num (`int``, *optional*):
Whether to abort after a certain batch number has finished
"""
def __init__(self, model, max_frames_to_save=21, trace_batch_nums=None, abort_after_batch_num=None):
if trace_batch_nums is None:
trace_batch_nums = []
self.model = model
self.trace_batch_nums = trace_batch_nums
self.abort_after_batch_num = abort_after_batch_num
# keep a LIFO buffer of frames to dump as soon as inf/nan is encountered to give context to the problem emergence
self.frames = collections.deque([], max_frames_to_save)
self.frame = []
self.batch_number = 0
self.total_calls = 0
self.detected_overflow = False
self.prefix = " "
self.analyse_model()
self.register_forward_hook()
def save_frame(self, frame=None):
if frame is not None:
self.expand_frame(frame)
self.frames.append("\n".join(self.frame))
self.frame = [] # start a new frame
def expand_frame(self, line):
self.frame.append(line)
def trace_frames(self):
print("\n".join(self.frames))
self.frames = []
def reset_saved_frames(self):
self.frames = []
def dump_saved_frames(self):
print(f"\nDetected inf/nan during batch_number={self.batch_number} "
f"Last {len(self.frames)} forward frames:"
f"{'abs min':8} {'abs max':8} metadata"
f"'\n'.join(self.frames)"
f"\n\n")
self.frames = []
def analyse_model(self):
# extract the fully qualified module names, to be able to report at run time. e.g.:
# encoder.block.2.layer.0.SelfAttention.o
#
# for shared weights only the first shared module name will be registered
self.module_names = {m: name for name, m in self.model.named_modules()}
# self.longest_module_name = max(len(v) for v in self.module_names.values())
def analyse_variable(self, var, ctx):
if torch.is_tensor(var):
self.expand_frame(self.get_abs_min_max(var, ctx))
if self.detect_overflow(var, ctx):
self.detected_overflow = True
elif var is None:
self.expand_frame(f"{'None':>17} {ctx}")
else:
self.expand_frame(f"{'not a tensor':>17} {ctx}")
def batch_start_frame(self):
self.expand_frame(f"\n\n{self.prefix} *** Starting batch number={self.batch_number} ***")
self.expand_frame(f"{'abs min':8} {'abs max':8} metadata")
def batch_end_frame(self):
self.expand_frame(f"{self.prefix} *** Finished batch number={self.batch_number - 1} ***\n\n")
def create_frame(self, module, input, output):
self.expand_frame(f"{self.prefix} {self.module_names[module]} {module.__class__.__name__}")
# params
for name, p in module.named_parameters(recurse=False):
self.analyse_variable(p, name)
# inputs
if isinstance(input, tuple):
for i, x in enumerate(input):
self.analyse_variable(x, f"input[{i}]")
else:
self.analyse_variable(input, "input")
# outputs
if isinstance(output, tuple):
for i, x in enumerate(output):
# possibly a tuple of tuples
if isinstance(x, tuple):
for j, y in enumerate(x):
self.analyse_variable(y, f"output[{i}][{j}]")
else:
self.analyse_variable(x, f"output[{i}]")
else:
self.analyse_variable(output, "output")
self.save_frame()
def register_forward_hook(self):
self.model.apply(self._register_forward_hook)
def _register_forward_hook(self, module):
module.register_forward_hook(self.forward_hook)
def forward_hook(self, module, input, output):
# - input is a tuple of packed inputs (could be non-Tensors)
# - output could be a Tensor or a tuple of Tensors and non-Tensors
last_frame_of_batch = False
trace_mode = self.batch_number in self.trace_batch_nums
if trace_mode:
self.reset_saved_frames()
if self.total_calls == 0:
self.batch_start_frame()
self.total_calls += 1
# count batch numbers - the very first forward hook of the batch will be called when the
# batch completes - i.e. it gets called very last - we know this batch has finished
if module == self.model:
self.batch_number += 1
last_frame_of_batch = True
self.create_frame(module, input, output)
# if last_frame_of_batch:
# self.batch_end_frame()
if trace_mode:
self.trace_frames()
if last_frame_of_batch:
self.batch_start_frame()
if self.detected_overflow and not trace_mode:
self.dump_saved_frames()
# now we can abort, as it's pointless to continue running
raise ValueError(
"DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. "
"Please scroll up above this traceback to see the activation values prior to this event."
)
# abort after certain batch if requested to do so
if self.abort_after_batch_num is not None and self.batch_number > self.abort_after_batch_num:
raise ValueError(
f"DebugUnderflowOverflow: aborting after {self.batch_number} batches due to `abort_after_batch_num={self.abort_after_batch_num}` arg"
)
@staticmethod
def get_abs_min_max(var, ctx):
abs_var = var.abs()
return f"{abs_var.min():8.2e} {abs_var.max():8.2e} {ctx}"
@staticmethod
def detect_overflow(var, ctx):
"""
Report whether the tensor contains any `nan` or `inf` entries.
This is useful for detecting overflows/underflows and best to call right after the function that did some math that
modified the tensor in question.
This function contains a few other helper features that you can enable and tweak directly if you want to track
various other things.
Args:
var: the tensor variable to check
ctx: the message to print as a context
Return:
`True` if `inf` or `nan` was detected, `False` otherwise
"""
detected = False
if torch.isnan(var).any().item():
detected = True
print(f"{ctx} has nans")
if torch.isinf(var).any().item():
detected = True
print(f"{ctx} has infs")
if var.dtype == torch.float32 and torch.ge(var.abs(), 65535).any().item():
detected = True
print(f"{ctx} has overflow values {var.abs().max().item()}.")
return detected
================================================
FILE: PixArt-alpha-ToCa/diffusion/utils/optimizer.py
================================================
import math
from mmcv import Config
from mmcv.runner import build_optimizer as mm_build_optimizer, OPTIMIZER_BUILDERS, DefaultOptimizerConstructor, \
OPTIMIZERS
from mmcv.utils import _BatchNorm, _InstanceNorm
from torch.nn import GroupNorm, LayerNorm
from .logger import get_root_logger
from typing import Tuple, Optional, Callable
import torch
from torch.optim.optimizer import Optimizer
def auto_scale_lr(effective_bs, optimizer_cfg, rule='linear', base_batch_size=256):
assert rule in ['linear', 'sqrt']
logger = get_root_logger()
# scale by world size
if rule == 'sqrt':
scale_ratio = math.sqrt(effective_bs / base_batch_size)
elif rule == 'linear':
scale_ratio = effective_bs / base_batch_size
optimizer_cfg['lr'] *= scale_ratio
logger.info(f'Automatically adapt lr to {optimizer_cfg["lr"]:.7f} (using {rule} scaling rule).')
return scale_ratio
@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor(DefaultOptimizerConstructor):
def add_params(self, params, module, prefix='', is_dcn_module=None):
"""Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param
groups, with specific rules defined by paramwise_cfg.
Args:
params (list[dict]): A list of param groups, it will be modified
in place.
module (nn.Module): The module to be added.
prefix (str): The prefix of the module
"""
# get param-wise options
custom_keys = self.paramwise_cfg.get('custom_keys', {})
# first sort with alphabet order and then sort with reversed len of str
# sorted_keys = sorted(sorted(custom_keys.keys()), key=len, reverse=True)
bias_lr_mult = self.paramwise_cfg.get('bias_lr_mult', 1.)
bias_decay_mult = self.paramwise_cfg.get('bias_decay_mult', 1.)
norm_decay_mult = self.paramwise_cfg.get('norm_decay_mult', 1.)
bypass_duplicate = self.paramwise_cfg.get('bypass_duplicate', False)
# special rules for norm layers and depth-wise conv layers
is_norm = isinstance(module,
(_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm))
for name, param in module.named_parameters(recurse=False):
base_lr = self.base_lr
if name == 'bias' and not is_norm and not is_dcn_module:
base_lr *= bias_lr_mult
# apply weight decay policies
base_wd = self.base_wd
# norm decay
if is_norm:
if self.base_wd is not None:
base_wd *= norm_decay_mult
elif name == 'bias' and not is_dcn_module:
if self.base_wd is not None:
# TODO: current bias_decay_mult will have affect on DCN
base_wd *= bias_decay_mult
param_group = {'params': [param]}
if not param.requires_grad:
param_group['requires_grad'] = False
params.append(param_group)
continue
if bypass_duplicate and self._is_in(param_group, params):
logger = get_root_logger()
logger.warn(f'{prefix} is duplicate. It is skipped since '
f'bypass_duplicate={bypass_duplicate}')
continue
# if the parameter match one of the custom keys, ignore other rules
is_custom = False
for key in custom_keys:
scope, key_name = key if isinstance(key, tuple) else (None, key)
if scope is not None and scope not in f'{prefix}':
continue
if key_name in f'{prefix}.{name}':
is_custom = True
if 'lr_mult' in custom_keys[key]:
# if 'base_classes' in f'{prefix}.{name}' or 'attn_base' in f'{prefix}.{name}':
# param_group['lr'] = self.base_lr
# else:
param_group['lr'] = self.base_lr * custom_keys[key]['lr_mult']
elif 'lr' not in param_group:
param_group['lr'] = base_lr
if self.base_wd is not None:
if 'decay_mult' in custom_keys[key]:
param_group['weight_decay'] = self.base_wd * custom_keys[key]['decay_mult']
elif 'weight_decay' not in param_group:
param_group['weight_decay'] = base_wd
if not is_custom:
# bias_lr_mult affects all bias parameters
# except for norm.bias dcn.conv_offset.bias
if base_lr != self.base_lr:
param_group['lr'] = base_lr
if base_wd != self.base_wd:
param_group['weight_decay'] = base_wd
params.append(param_group)
for child_name, child_mod in module.named_children():
child_prefix = f'{prefix}.{child_name}' if prefix else child_name
self.add_params(
params,
child_mod,
prefix=child_prefix,
is_dcn_module=is_dcn_module)
def build_optimizer(model, optimizer_cfg):
# default parameter-wise config
logger = get_root_logger()
if hasattr(model, 'module'):
model = model.module
# set optimizer constructor
optimizer_cfg.setdefault('constructor', 'MyOptimizerConstructor')
# parameter-wise setting: cancel weight decay for some specific modules
custom_keys = dict()
for name, module in model.named_modules():
if hasattr(module, 'zero_weight_decay'):
custom_keys |= {
(name, key): dict(decay_mult=0)
for key in module.zero_weight_decay
}
paramwise_cfg = Config(dict(cfg=dict(custom_keys=custom_keys)))
if given_cfg := optimizer_cfg.get('paramwise_cfg'):
paramwise_cfg.merge_from_dict(dict(cfg=given_cfg))
optimizer_cfg['paramwise_cfg'] = paramwise_cfg.cfg
# build optimizer
optimizer = mm_build_optimizer(model, optimizer_cfg)
weight_decay_groups = dict()
lr_groups = dict()
for group in optimizer.param_groups:
if not group.get('requires_grad', True): continue
lr_groups.setdefault(group['lr'], []).append(group)
weight_decay_groups.setdefault(group['weight_decay'], []).append(group)
learnable_count, fix_count = 0, 0
for p in model.parameters():
if p.requires_grad:
learnable_count += 1
else:
fix_count += 1
fix_info = f"{learnable_count} are learnable, {fix_count} are fix"
lr_info = "Lr group: " + ", ".join([f'{len(group)} params with lr {lr:.5f}' for lr, group in lr_groups.items()])
wd_info = "Weight decay group: " + ", ".join(
[f'{len(group)} params with weight decay {wd}' for wd, group in weight_decay_groups.items()])
opt_info = f"Optimizer: total {len(optimizer.param_groups)} param groups, {fix_info}. {lr_info}; {wd_info}."
logger.info(opt_info)
return optimizer
@OPTIMIZERS.register_module()
class Lion(Optimizer):
def __init__(
self,
params,
lr: float = 1e-4,
betas: Tuple[float, float] = (0.9, 0.99),
weight_decay: float = 0.0,
):
assert lr > 0.
assert all(0. <= beta <= 1. for beta in betas)
defaults = dict(lr=lr, betas=betas, weight_decay=weight_decay)
super().__init__(params, defaults)
@staticmethod
def update_fn(p, grad, exp_avg, lr, wd, beta1, beta2):
# stepweight decay
p.data.mul_(1 - lr * wd)
# weight update
update = exp_avg.clone().lerp_(grad, 1 - beta1).sign_()
p.add_(update, alpha=-lr)
# decay the momentum running average coefficient
exp_avg.lerp_(grad, 1 - beta2)
@staticmethod
def exists(val):
return val is not None
@torch.no_grad()
def step(
self,
closure: Optional[Callable] = None
):
loss = None
if self.exists(closure):
with torch.enable_grad():
loss = closure()
for group in self.param_groups:
for p in filter(lambda p: self.exists(p.grad), group['params']):
grad, lr, wd, beta1, beta2, state = p.grad, group['lr'], group['weight_decay'], *group['betas'], \
self.state[p]
# init state - exponential moving average of gradient values
if len(state) == 0:
state['exp_avg'] = torch.zeros_like(p)
exp_avg = state['exp_avg']
self.update_fn(
p,
grad,
exp_avg,
lr,
wd,
beta1,
beta2
)
return loss
================================================
FILE: PixArt-alpha-ToCa/docker-compose.yml
================================================
version: "3.8"
services:
pixart:
container_name: pixart
image: pixart:latest
build:
context: .
ports:
- 12345:12345
environment:
- APP_CONTEXT=1024 #1024, 512, LCM
tmpfs:
- /tmp
volumes:
- ./docker/cache/gradio:/workspace/gradio_cached_examples/30:rw
- ./docker/cache/huggingface:/root/.cache/huggingface:rw
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
================================================
FILE: PixArt-alpha-ToCa/docker-entrypoint.sh
================================================
#!/usr/bin/env bash
set -Eeuo pipefail
# Check if APP_CONTEXT matches one of the specific values
if [ "$APP_CONTEXT" = "1024" ]; then
echo "APP_CONTEXT is 1024"
/usr/bin/python /workspace/app/app.py "$@"
elif [ "$APP_CONTEXT" = "512" ]; then
echo "APP_CONTEXT is 512"
/usr/bin/python /workspace/app/app_512.py "$@"
elif [ "$APP_CONTEXT" = "LCM" ]; then
echo "APP_CONTEXT is LCM"
/usr/bin/python /workspace/app/app_lcm.py "$@"
else
echo "APP_CONTEXT is not set to 1024, 512, or LCM, defaulting to 1024"
/usr/bin/python /workspace/app/app.py "$@"
fi
================================================
FILE: PixArt-alpha-ToCa/docker-readme.md
================================================
================================================
FILE: PixArt-alpha-ToCa/environment-pixart.yml
================================================
name: pixart
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- ca-certificates=2024.7.2=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.3=he6710b0_2
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- ncurses=6.4=h6a678d5_0
- openssl=1.1.1w=h7f8727e_0
- pip=24.2=py39h06a4308_0
- python=3.9.0=hdb3f193_2
- readline=8.2=h5eee18b_0
- setuptools=72.1.0=py39h06a4308_0
- sqlite=3.45.3=h5eee18b_0
- tk=8.6.14=h39e8969_0
- wheel=0.43.0=py39h06a4308_0
- xz=5.4.6=h5eee18b_1
- zlib=1.2.13=h5eee18b_1
- pip:
- absl-py==2.1.0
- accelerate==0.34.0
- addict==2.4.0
- aiofiles==23.2.1
- aiohappyeyeballs==2.4.0
- aiohttp==3.10.5
- aiosignal==1.3.1
- altair==5.4.1
- annotated-types==0.7.0
- anyio==4.4.0
- async-timeout==4.0.3
- attrs==24.2.0
- beautifulsoup4==4.12.3
- bs4==0.0.2
- certifi==2024.8.30
- charset-normalizer==3.3.2
- click==8.1.7
- coloredlogs==15.0.1
- contourpy==1.3.0
- cycler==0.12.1
- datasets==2.21.0
- diffusers==0.31.0.dev0
- dill==0.3.8
- einops==0.8.0
- exceptiongroup==1.2.2
- fastapi==0.112.2
- ffmpy==0.4.0
- filelock==3.15.4
- fonttools==4.53.1
- frozenlist==1.4.1
- fsspec==2024.6.1
- ftfy==6.2.3
- gradio==4.1.1
- gradio-client==0.7.0
- grpcio==1.66.1
- h11==0.14.0
- httpcore==1.0.5
- httpx==0.27.2
- huggingface-hub==0.24.6
- humanfriendly==10.0
- idna==3.8
- importlib-metadata==8.4.0
- importlib-resources==6.4.4
- jinja2==3.1.4
- jsonschema==4.23.0
- jsonschema-specifications==2023.12.1
- kiwisolver==1.4.5
- markdown==3.7
- markdown-it-py==3.0.0
- markupsafe==2.1.5
- matplotlib==3.9.2
- mdurl==0.1.2
- mmcv==1.7.0
- mpmath==1.3.0
- multidict==6.0.5
- multiprocess==0.70.16
- narwhals==1.6.1
- networkx==3.2.1
- numpy==1.26.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==9.1.0.70
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.20.5
- nvidia-nvjitlink-cu12==12.6.68
- nvidia-nvtx-cu12==12.1.105
- opencv-python==4.10.0.84
- optimum==1.21.4
- orjson==3.10.7
- packaging==24.1
- pandas==2.2.2
- peft==0.6.2
- pillow==10.4.0
- platformdirs==4.2.2
- protobuf==3.20.2
- psutil==6.0.0
- pyarrow==17.0.0
- pydantic==2.8.2
- pydantic-core==2.20.1
- pydub==0.25.1
- pygments==2.18.0
- pyparsing==3.1.4
- python-dateutil==2.9.0.post0
- python-multipart==0.0.9
- pytorch-fid==0.3.0
- pytz==2024.1
- pyyaml==6.0.2
- referencing==0.35.1
- regex==2024.7.24
- requests==2.32.3
- rich==13.8.0
- rpds-py==0.20.0
- safetensors==0.4.4
- scipy==1.13.1
- semantic-version==2.10.0
- sentencepiece==0.1.99
- shellingham==1.5.4
- six==1.16.0
- sniffio==1.3.1
- soupsieve==2.6
- starlette==0.38.4
- sympy==1.13.2
- tensorboard==2.17.1
- tensorboard-data-server==0.7.2
- tensorboardx==2.6.2.2
- timm==0.6.12
- tokenizers==0.19.1
- tomli==2.0.1
- tomlkit==0.12.0
- torch==2.4.0
- torchaudio==2.1.1+cu118
- torchvision==0.16.1+cu118
- tqdm==4.66.5
- transformers==4.43.4
- triton==3.0.0
- typer==0.12.5
- typing-extensions==4.12.2
- tzdata==2024.1
- urllib3==2.2.2
- uvicorn==0.30.6
- wcwidth==0.2.13
- websockets==11.0.3
- werkzeug==3.0.4
- xformers==0.0.27.post2
- xxhash==3.5.0
- yapf==0.40.1
- yarl==1.9.7
- zipp==3.20.1
prefix: /root/miniconda3/envs/pixart
================================================
FILE: PixArt-alpha-ToCa/environment.yml
================================================
name: PixArt
channels:
- pytorch
- nvidia
dependencies:
- python >= 3.8
- pytorch >= 1.13
- torchvision
- pytorch-cuda=11.7
- pip:
- timm==0.6.12
- diffusers
- accelerate
- mmcv==1.7.0
- diffusers
- accelerate==0.15.0
- tensorboard
- transformers==4.26.1
- sentencepiece~=0.1.97
- ftfy~=6.1.1
- beautifulsoup4~=4.11.1
- opencv-python
- bs4
- einops
- xformers
================================================
FILE: PixArt-alpha-ToCa/notebooks/PixArt_xl2_img512_internal_for_pokemon_sample_training.py
================================================
_base_ = ['/workspace/PixArt-alpha/configs/PixArt_xl2_internal.py']
data_root = '/workspace'
image_list_json = ['data_info.json',]
data = dict(type='InternalData', root='/workspace/pixart-pokemon', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)
image_size = 512
# model setting
window_block_indexes = []
window_size=0
use_rel_pos=False
model = 'PixArt_XL_2'
fp32_attention = True
load_from = "/workspace/PixArt-alpha/output/pretrained_models/PixArt-XL-2-512x512.pth"
vae_pretrained = "output/pretrained_models/sd-vae-ft-ema"
lewei_scale = 1.0
# training setting
use_fsdp=False # if use FSDP mode
num_workers=10
train_batch_size = 38 # 32
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
optimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)
lr_schedule_args = dict(num_warmup_steps=1000)
eval_sampling_steps = 200
log_interval = 20
save_model_steps=100
work_dir = 'output/debug'
================================================
FILE: PixArt-alpha-ToCa/notebooks/convert-checkpoint-to-diffusers.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "2878bb5d-33a3-4a5b-b15c-c832c700129b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/workspace/PixArt-alpha\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.10/dist-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.\n",
" self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
]
}
],
"source": [
"%cd PixArt-alpha"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7dd2d98c-3f8f-40f1-a9e1-bc916774afb3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of transformer parameters: 610856096\n"
]
}
],
"source": [
"!python tools/convert_pixart_alpha_to_diffusers.py \\\n",
" --orig_ckpt_path \"/workspace/PixArt-alpha/output/trained_model/checkpoints/epoch_5_step_110.pth\" \\\n",
" --dump_path \"/workspace/PixArt-alpha/output/diffusers_trained\" \\\n",
" --only_transformer=True \\\n",
" --image_size 512 \\\n",
" --multi_scale_train=False\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: PixArt-alpha-ToCa/notebooks/infer.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "8b2458c4-c461-4ddc-af94-fcd837357da4",
"metadata": {},
"outputs": [],
"source": [
"from diffusers import PixArtAlphaPipeline\n",
"import torch\n",
"from diffusers import Transformer2DModel"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81a5bc0f-682b-4ff9-92e9-43b68b3df8fc",
"metadata": {},
"outputs": [],
"source": [
"# for comparison\n",
"\n",
"orig_pipe = pipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-512x512\", torch_dtype=torch.float16)\n",
"orig_pipe = orig_pipe.to(\"cuda\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efc07821-5479-4ca3-a2c6-114ac484fd1e",
"metadata": {},
"outputs": [],
"source": [
"transformer = Transformer2DModel.from_pretrained(\"/workspace/PixArt-alpha/output/diffusers_trained/transformer\", torch_dtype=torch.float16)\n",
"pipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-512x512\", torch_dtype=torch.float16, transformer=transformer)\n",
"pipe = pipe.to(\"cuda\")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "57da873b-2c13-463b-b558-ee69522ccefc",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d69c7683773c4c25914764800ec1ef4f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/20 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAEAAElEQVR4nOy9daBlV5E9vKr2Ofc+b3d3i3VHOi7EiZIggcECBA06gcGH38DA4PINMDDIEFyGAEkIcXdvSXfSSbv36+fv3nvO2bvq+2Pvc+/rJMzAIEn6nQVpef3kyjlVtVetWkUuS8lEABEBABQgQBFAKFCgQIEC+x8UQiKSx/4CBQoUKDB8oNwo/AsUKFCgwDACMZSAguopUKBAgWEHLkJ/gQIFCgxP8HP9AAoUKFCgwHODIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYokgABQoUKDBMUSSAAgUKFBimKBJAgQIFCgxTFAmgQIECBYYpigRQoECBAsMURQIoUKBAgWGKIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYokgABQoUKDBMUSSAAgUKFBimKBJAgQIFCgxTFAmgQIECBYYpigRQoECBAsMURQIoUKBAgWGKIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYInquH0CBAgUKPA+hjT8IAALTc/ho/kYgVf3fP6tAgQIF9lMoQM/2N1EoYAElNAGiIAL2+eQXPIoTQIECBYYL9o310PAhVSUFVCCiBBJrs0xqA0lPpbqpd/fDm57q3rNt7sRJ55586qi21v0pBRQJoECBAvsfnkZsqIZArwKCqhAyZzOX9vf1dO7du7e7b9Bmu/r69nR19+zt42yga/fmWn//rv6eQU339Fb7Ogc62kfA4LVnn0P7Udjcf55JgQIFhjfygh4kgKooAFUnNsuy6uBgd0/Ptm1bN2zcsHHLpo1bt23Zvmlv556+gZ7UZQmpaY3QwqaJS6Wm5hLHTTqyo600uW1kS/vc1qlpf+2hu7Y0t4wCSEWI9xP5TJEAChQo8AKEQmnInwFRslaVaGAwHaj0Pf7kkw89/PDGTRu3bl6/cdP6gVrfYGUwc1lc0lJTXG6KRoxqGTW9deGkRa1t5XJby7gJo5qbS83tho0Kab91lSSt9Gra7TY/unHl3Y+9530fPOekY6HUiP76gmeDigRQoECBFxBU1REZECngrHPOCnSgv7Jl+85HV619dMXK1WtX79q7fc/eXQNJf1tH85hxHR3TW2dOmjd23KhR49rHTRw5esyIEaPbSsZETERaLkeaSRSpMWnV9leT2t7+/oGq7e7q7N5ZefKe9ZtX9x5zyClvvOiNJTI8NOS/wKM/ChVQgQIFnt/QIXpMqAJEA4OVvv6+zTt2PLzy0d9fc9WGDU909XYOpIMwtqM9njl34px50ydPmzxz5pTRoztGj+hobm2Km2LDTKzKKioOGRROkIFt5khSRmq1P+V0Z1ff5h3d6zZ3DQxkj938eLKlrz2b+PtfXzd58sRSOdpPqJ8cxQmgQIECzzmehUxRFSJSkKhasdXqYG9Pz7onNq5c+/g999/x6IqHd3Rud2Xb1BRPnDlm+XGLFyybN3nymOmTx4wbPbLJlGImA2YDqCg5qFWCqGikoo5EVGAodgpRMDRTV0mT3X3dj2/etWlb75OPd9kB6u+Uvs3Vb1/+xQmTJzSVo/2A83kaigRQoECB5xyEwOQTAAEECuLe/sH+/sHHnnjy/ofvvfnm6zZveXxwsJdKOnnqmOVnLZy78Pipk8eNGzdy7JgR5XIUl0rMEFUFjJKSgkhYoaRgKCmpA4FUVKBkGE6c1qqsLDEPWGzb23vPg2u27awO9IpJm3t39/XtrLzqNf9w4vEnNJdM/jD3KxQUUIECBZ5bqNZnr4Aks32DfRs2bnpg5SO33nnXmpWP7tq73cXZ6EkjFh884/DDlixaMnPsyPZSOWppiSIiJiFRNhBhJYif2VUV/12ZBKIqpCBDDlBVEmWoQpIUCrGMXZW++x5+5IEHVw70ANoWo61v6+C2NVsnjxj/u1/9Ztq4sRGzKoFoP0sBxQmgQIECf3+oBtUmKUiBWpLu6dq78vHHbrjthkdW3Ldp65ODyUDr6HjBAZPPWHr0vLkzZ86eNnbsqFLMTKIkCqhaYhaAmEUVkQIEgUKIxBCDoHAKgYANFEpONVNjlAxltmRI1CSbNq+/6a5H1z61o6eCOGpvx4jqrqTvib2lbvryVz83efTYiBng/S32AygSQIECBf7O0PArKSgVu2PXjgcffujGm25e+cQjO/t2SimdOH3kWSceu2DJ7FnTp4wZ0dzaFMcRKTlyTo1V578DMZjEsOeNlCBKgJABhJTDWIASgcgX/FaYTBSTqrUZLKjCtfseuOvW2+/evdsqt5fiEca02ordvWVHz57et7z6jUcdsbwUsyoB2C8TQEEBFShQ4O8AH2hIFKKwgj1dvQ+tXX3V9VevXnFXV9fOUoeZs2jqIUvnzZs7ecrUsa1tzaUoVnJEyJw1xsCAlRiqIsxEBCgYjJya0dBEIJCqECmUVAlK6pwzxMRGbAYi50xNaUd351U3X7di9VpDpShqMlmriUeiqjvWbtv51M7D5h/5X9/7/sSxI3zlr1okgL8LRISYaP/rthQoMBwRpnN9/9QquvZ2PbJy9XW33nL3igc2795Q7sCM2aOOOvKQRQfPHj9mdHOM5jLBpVyCOJiYnarhSJVBGpGSgIwBhEFQIWUChWwA+IpfVVmZAKHcAQJECgWpUwvqyir3rlpx1ZU3DVSzUnszwZByE1rKWdve9XvXr9zULM3XXn3t/LmzSsbU4+N+GZKeXxSQAk502/ZNI0eOLDW3C6RsfPcdtJ++AQUK7I8Ikn3f3XWgzr09D65cccNdNzzw8G3b9mxxpWz6/AmvufCQJUvmTJk6tlSKYoqIUyiEQBFbVS5BCaxCbJkNKaCWDMCqqqJg4pzmyaO0+mAPL/0xgBAUFhyJUJpJprx66+bf3njTxk2bI+oodZQQOWthuCWSlsFttd3rerkWf+rz/7Jg7sy4Ti/tp+U/nm8JgAAR96WvfHVX584Js2ZB7bjmltGjJk6eOHvCuI6O1qZRo0e3tLc3tTRzVGIiKAwxQYsTQ4ECzxPkFD9bla6ePSseeeSGW+688+F7N+7ZhLZs8qzW0884cvHSBdMnTRjTRE0RCE6NgwgTAQwlEEOJlQlKZFQ8lwPiiFRViAAiBgjsiR8DUlIV8mx9iNoAlKBMSeoypq29ndfe9sD9j65UF8fxaOayEDmkHEdGSq4f25/auXfDzgsufOkF57w0MqZO+++v0R/PtwQAIIrjKGr6/S23xg+vGDeyxGmXEokZxRxFqRsxcnTTyI6pM+ZMGDNl8qTxk8ZNmDxx8vgx40aP6WgqlcqGGGr25/erQIHnKcSX/YBT9PQOPLhq1W+u/8MD99+ya8cmjbLJc8edeeKSAw6aPX/O5JaO5lLcBJIIIhBwrCpMBKV6s5UNBSJYKPxRCT70hxRDgDLAxAL2qqCIhBnOOWJwzE7UOmQa99rsmlvvuP3BR/oqKDWNYhdRbKxmqsxoK3PcmsZbHtvQvWXrkvlzPvmJf2lpioZJTfm86wEAevv9d5/76ovi8a3vfv9rJ44t9/d19/bb/sGUUt25q3N3d081zQb3DMKpIVT60vETJrd3jD5o/pIFc+Ysmjdv9pSpLU1lo8ZEjCErHP4qQ3z73SRggQL/Z2j9NwVlont7e1asWn3dzTff+eBdG7evR3M2elLzwQcvPGjZ7FkzpowZ3dZSMqwOsUKZSckz9IaIwAoONxcRMRMxAeoJfvUqTCWi3AKOiKFCALFRJVEQFKpMCkMQdcoOmoHvW73u2lvu2LylS0yzK5UMlwwrGU3VGuJmtHRErf1PdK269aFmW/r+t79/ymmnGq8ZHQb3+vMuASi0rzrwyje86prrrznrlS9+1SVnzl88JWIoIamaLE3Zqh1MBnt6e3s6d3Xt7e6vPPnklt17Bmt9g719yYiWMTMnzVs8d/FBiw+eMnnqrOnT2pqbGAookbfx2P/f1AIF/nbwLpwaSn5NkrSzq+veFQ9fe+cNDz16R2fXTilh3MyOpQdPnztn+txZU8eMHtVaKisrEQkRkXGwzDAKEmGwRqQKhjIpgQle48MMINC7BDDyI0HIAYq890sAMYNIRACyMJGVOKOos7/vR1f8/pHHNqhEXG6OuMTEFJFjcWKZTHNUbkqbsj3JQ1femfTWPv7BD733Xe9taWkaPlHieZcAAHWK6274wwWveQ13YN7S0W+77KULFs8tRcrUUlJuQ8wCFRuVOVHnJKokNklcb9fAtu17t2/vfPyxjbu37HWZtpjmGZNnzJ4ye/7c+fPnz582fUpTqclQCYo8GRQoUOBPhYa4rwKtVCvr1z9193333HnXfWs2rd5T3RuNwuRZHQsXz5q7aObUSePHtpdby3FJoQxWgiEBEUIPl1nrjA+IAK/mAYFZwT6eE+BJfALBEEDgMD8gMACYVaAkkTHOOd8ZIJZUokpk7nz40V//7qbu3jSK25hbyUhsWKHMxrIA2sTlFpRbspa7f3fLnnWdpx570g9/9F9jRo1g5iIBPJcQ6EBl8KJXveq2R+9rmkKzDmx7yctOOuaE5c2mXMqoRQw5iHOGQcgIBiBmFscOVE3TWmb37hnYsmHr5vXbtmzZvXlDV1ePi0yppWX0CceedOKxLzpg8YIxHa0MJuAvyQMFHVRgv4bCx1qwiAooFe3p71255olrb7v5vgfu3bb18cwlLWNaZiyYuPSQ6TNnjZs6bUxrW1McxQpf76sJwZ+IILCRiUmZVH1gBxHXy3gChSKfDIjBDFLyLA9IDUEocLlMnuoheFNosAN5oScyV9rVX/vZddfc/eAKE3WYqMVoZDgiVjKkClWCQYSorKUObnngt3d1bdo7bcyUX/36lwvmzC6VSs/1y/53xfMxAfjAesNNN130ltdjfLk0pjJ6NM55+UknH3/suPKIks2Q2jguEyA2M5EhYYUoEZisE2LORESpWkv7+9LtW/ufeGrnhvVbtu3o2rlld+fO3jHto4496qiLznvFsUcvbzJNTmD4z4vkz/qSFcmgwAsfQfcIv0SRWKAC9PT1PfDAI1ffdMMt9926vXOHRW3UhI4Z88YfcvDCeQdOnzh2REfZNJdAajUSVVb2rL0Rr9MTMUQUEYhIiaAMgIi8Zt//QCYmEJigDDYhI6gQALAyQUL/Vw0BRPBqfwdlow5klBPLa9Zv/8lvr9y4oztuHalc8ttbYsMWwmQESswRTBM3tVHrY7c/uPWBbaWK+eV///KEk48tRfxX7Be+IPB8TAAAFJpZ96a3X/LL269a9KLFlcFOW9p9zOGHXHjmmTMntEWZRKZdRdWRiVidKoGZASWQgh0pEdssI0Jay6ppbe+u3p079nTu6uzeXVm7dtvadbsGu5IpE2e89LyXnXve+TOnTy8B+PPaPsPnIikwXKAQ33n1kp6evv6HVq/672t+c8f9t+3euZFapH1CadaCyYcdPG/evJlTpowul01Eka/QlUhBVjJlIfIjWazgiDiCGmVQpEogZZ8giHJ+Xxnw9I0v9xnEAJHR0BhQCjY/3vPB+OQAFuUMbJyyFTOYuD/ccc81N9+pWYmiDjIxRWAGK4hhQVYdxxKpaZLWjqh94wMbV9+9IuqXz//Lpy5+y5ubm6L9ZtHjn47naQIAVIAn1j1+2NknTjps5oyDp3b1bqr27Jo3c9JFF5x40Mx5lLWRapmNnwtXBVhVwCBRkjAKQoYAsUROGbUkqVQG+7vt9m09nbsq27Z23//AyrWPbWRuPu7QYy5+7T8ce/QxLXErwOYZ10ER7AvsN3jWsSYNJb8SuJpVH3vi8Rtvuem6m25ct+XxNK6NmzVi9pKJixfOnrdoxqSxo9tjilkZouydlSGOlViUvAenQkREFUoUs4mYjTJgWEGGQELeqpMYQM7lgInzHKDsNUFhwktZVUkVTARGJOqiyDhywlaFE+Gtu3v++8obVz65Po5GMrUwxwpFLFAXwTgRjWNicqiWqNTm2qUzueXn96Y99h8uOPtLX/7C6HGj/1waYP/A8zYBANBU7Zve/aaf3vb7A08/pmNSU5p0dnZuHN9uXnrmGUcuPbYVUs6sISGKRFVZCGxgiMhaJWYQiwipMhklIQAMB0lqFZdJb1dl187uDZt23XvX6kcfemr77sG5Mxa++eI3n3/2S8aNbCeVEj/vhiQKFPgrItfzqIAVcNAduzuvuunG313/+yfW3J+kfeOnNh+4dNYhS+dOnzFpwqRR5XLMMEpghkAQqCLx/4FIvde+qoiEwEJgMmw48vemEhmIChERgcIGAOV8nJPJMEBM0GDjQzDQMPzrK3RiIQOnIoatmP6KPvjYE7/5w419g1mp3EJUYjKkzExOPJekKiT+OMHUxk1Ng+aO396x98nKsYce9Y1vfXnRwnmRAcDDsNB7PicAKNy6jU8de/4ZbvqI+csXR+X+xPX29exoYzrzRaedefTy8VwSyTiCiqeAiJVVAZCIkjcVInbWgUBsxDmwEglIVcUqVbOse2+ybvPu2+5a8dA9q9ev2zalddJrX/YPL7vworkzZ0TE/MwGwfC7Sgrsj/A+OVBQxaYPrVjxiyv++/b7b9+xd1v7pPLcxROPOPSAxYunjR8zorVkIiOkqgZQA1JREEhCAzfEeiFAgkZIFaoiECgxQIYjNjFMoHz8DUThRqprMZjYgJly5x6QkgbVkCiYoX5SOFMiJUqVd3QO3HL3w/c9+FhmI4rKbGJlS6REBCElIiZRZWaBEBCTGW06Hrl69aYVG8Y3TfrKV7583stfXGL/fYfjBOnzNgF4BYKz4r7y1S984FufPfrCF5tWrlAnRVVbqbi+yjHLlrzmjLPGjWwXVyMVopiIoaSqXselVuF7S0San3tVVdn5bx9gKMlSRLxh3Z4V9z9++0333f/AU+0ds88682Uvv/C8g5csboWJoA1p2LAYECmwX0I1WPR4WkW2795+w523/OoPP1/1xMpUq9MWjjn+6EWLFsydPXNqa0s5YhYoEzkIgQTifdcg/htQfSRAFQqGqoQjhYpfyaJqQGATGcPEBjDh1iYlsBeCslf2gJW8XghEgDQeNJSZADgVMpEjscoDSbryyY033/Hglq1dbFoMl5hIWYXJc0oMcuzJIyEYoyhxqYxy17qdD165oikrf/gDH3zne99VLoPZDNsb+nmbADxUIT29XfOOPRKzOw477fCB6m6LPlat9iTsBpZMGvPql7x8xuTx0AyAihiKCQAcAApBW5lYc4MSkNe2CYiIFSIwUJeRYUtswV29gw8+uPEP195/650rylH7iceecN6JZ55y7NGjW0fFiKLCla7ACxWaC/mpu6979crHfnP1b29/+I5d/Vtbx8sBh8479JilC2fPmNTR3FwiEhVfITFBTcgbfhQ3mK8FRwbNlzmGk7eGsVwnImEZI4hNxMzMBjDepjnU/hwmu1SM1/4D8AofFcpvXgchAwdh5gym5uzO7t67H1j90KonBvtBpoWZ/bEeRI4ARaQEgiNVIhhloRJFJRfFA003/Owa7ecjFx9x+c9+OHnSOArzxs/Vm/Ic4/mcAHy0Fqvy7R/+4B2ffN8xFx49aurYfttbS3qNGmROB/pGt7W+8sILDpg7LZKKIYpMOXCSREYZBFXlELUVyiDVoDqAQry/uDFkrcBAGDZLTak0OCAPP/zkT//7xrsfWBNh1EGHHH7hGReee/pZo0vlZnD8HL8yBQr86dCg6lEk1q7fvOUPN//+uhuuW/fkCtNGsxeNWX7kwiMOXzBm/OjmplZlYYGSUyBYbPpObmDsxVvuwys0c0sezVWjCHW7CtR5H2gFEzGzIWImVjAhrOsN/j5QYoWE0V/fOCCFEikxERkr3tOfySn1DqaPb952+wMrt2ztsZZBTUwEY4mUxEAg7CcQBMSh18waU9xkSy22vOKGR3as2zGuffz3vvf9Y45eHoXFAsOX130+JwD4t0XgBqqVE08/abPdccS5x0k7Kmk/skxsBotksFLS7KXnnHLCsiVNMZgMgQwZ9leXejYQAIPqQ+weEsoZFf8hcYCBISg5QFOJdtfsjXev/s2vr1+1el2JRx9/5MmvPu9Vpx+9vIPLJcTDTjJW4IWB3MPYc/xEVnX7rh133//Qb6++6tHV93dXdoybMfLgI+ceeeSyBfOmjGwpR2TBVsEwqg4ASSjOVV3uwqD5xpV8gTtp4OilwQKFB5ALSb0zIzHBMMEPBHiLT38CIN9JJrDPKArJ1aFgQwZQ5VQNRKnmsLu3775HHnt45VPdfRamhVACCaloLMF1GqxMBCESBYtCWQ1zE8otSfuOlZsevv7+Nm7753/71JvfdEkpP33sd4t+/ww8zxOAhzrIjdddc/6bXz7r+IUzDluQIRHNsiRJrZSoVBvod7b39KOWnn3qSSNiNipxXIYLlxSTgYRTJcK8oH+/NdwkTAIhsO8bizLUmrimECuCUqmzUrnr1oevuuquhx7dVG6afvxRL37LK199xOID/FHAPMcvToECQ6EKb4zGSugb7F+5etXlv/vZ3Q/csadz54h2e8CyGUefuGTeolmTJ4yNo5iFPA/vIJ6SV+sMsxD7kE5e9ONrqeCxTEqNbKAgUdH6EcDH9rzCZxAp2Fu3h1+UlPMZXg1LvIg0zA4zkUK9I5BTUsdqlfursm7bnjseefTJp7arlMlvg1eNiCDqWJWUoazkQutYAN9rMCVTbqPWrjUDD137IPUn73rbW//xIx/saGkO7tP7tdvz/4oXRAKAQNM0ueg1r7zq4WuPfcWpLePLVlMn1gE2FXGWYDGwd/HcKS8948wZ48YYUSWYiGDBahRKMI1jnobTHsGLy4acasVrjYWQKWxEBILAOMRd1fSmOx795a9vWvX41nHt0846/ayXnfXSw+cuaAFHMPQXmUoUKPBXQB6RVcAbNm+87sYbf3/z71etecS1JOPnjjjy6EOPOnTO9Mnj2lqjOFbjnLKKMMg4JQDMrF7SoyFaayj8KWQJaH4PkaqKbyoo/J/zEeLcvI0Yub4zjHuF2S9DflQgWDTCE7MUVP8s6iJEIAJlqXMZ0a7eyqNrNz782JNde2ugFoFRODLWwJEQKTuGQAyYIAJSw0rWd5KNiWLXYrronl8/2LO596TlR/7w5z8eN7YjNsY/oeHJ/NTxwkgAvvLYvHXbUaccl46tHnfhGZZrFRkQQ7CSOUdQOJek/RM6Wl9++mkHz5naZOCnxJkiUiIxOfuT+5CoXx7qV8VB4EBEogwoqacyfcMYEKUIStJU6k8rt956389/esOqx7eNGD3nrJPPf/35Fy2ZOceAymAz3C+nAs8BPAMvBAL115J7H3r4J7/+2T333NLVtXHslI7lx8497KgDFy+cO3JUeylWC0tgJTFKAlEvugcF8Sbys7K3XPOGnAjVP8LtQJ7xCb+K39XSeCQgGDYcCnsf1sNMgKogDHP5b8UAWJmhIFU4JsAwSJ1SKjKYuie2dt636olNm3e4RJRKpExEyhAVUiFiEpLAGoFASuxUNEqIjYEpm5grzY9c+9CutbuXTFv0jW9964hDDzT+KxqK1OGLF0oCgEKc6m9+8YtXvueS5S85euKBEweywUxr1jkTGSsimYhTSapjmnHmMcuPP/zg5sioqiFWq4YiUi9fUKY6W0l+16iSqOeJQkLQkCIAf7QVkGFvSshiSjsHq9ff+vDvfnvjunWbxjZPPe1F55592kVHLVnYQRJTxODhflkV+HtBg0cCtu3a9btrrvnlb3+15slVKNfmHDDx6JMOOPzgA2dMHdfRGhtAyYpYRCzK3n6ZQNrg9sOlHwp58dL+oeyIKogRVKCad8+cOPWObBCtMz9EOfvvg2ygelQlEC6ihpkaCYCJXJgpBgmhkrrdXT1rNmxd8fiWzt6Ks8SI2c+rAQCcF5DmmSUXe7A4QgTiRKGxa22V8lP3bXns7sdGNXV88d++9IqXv7RsfKob7uSPxwsmAQBQSJLUXnfJW//77l+/+I1ncmtUc7VEqmqYQFnmDFitE1dr0+SE5YeefMzyjnJkSNmByUC99hn+EieVMPwRLnu/OVoJ6isjnyoUKl4z6n2sSJQYzGzivqzy4D2P/vq3N9316GbTPO+0Ey98w/nnHDp7XiviyC+tLtJAgb8JNHduIAd5ZNWqy3/5o9tuu2FX7562sbrsqFknHHvAkiXzxo0ZbZggRqBsIOJUlBhKBCHAASKob1jxpEzO86uGYVxP3AfalOqTXKoQVafqVBVB8h/uLCYi5sD/UBBeh+8afjHwds8c+raqIFViq6hmrqdSXb9916rHN2zevivJCABT5HVHnD9QZ3I9h7+Hc0WRH0s2LOWoyVTatq7csvr2lc2u/KF/+sClb3t7SzlmMvvxjt8/Fy+gBBBGwzZt2nLCmSfLBHvEWcfZkh2UAVERhTHGicIpaSbVWiTJsgPnn3XicRM72iImUoGCEEFVxYC8TgEA2Bf8pEJg9aq2QAH5M64fTBc0IrohGBAZI4Z7s+yOFU/+8re33LNi/biWqaccecL5x5x14qGHtiA2iIvLrMBfF+rDNmMwrd1x1z3/9bMf3P3A7VlUmTJr1FHHHXrkcYtnzxw3qpkiFrIsTE5MfbmVaDjmKgAWyfX77G+CoPn0TL9nRvNR3Xz5ih/TUn8sVlXAioqKk9zc3zM/FKx8hsTZ0J+goMZThTIZUTAYgACJo76ksn3P3pWPb3xqy87+fgvEhkPlBYWDEsBCgEqQcxAgvnmgJAQGRAmRchtaBjebW6+8OetJXnLmuV//96+OHTPCUCHa2AcvoAQQkDn73e99+72fvOyQM5aPXzAx4cyRtZKCmdQRGcksKUOsrfXOnT7p/BedMGfKxJgzEjDFANgZEMQ3oAQgIWVAXCMB+CvUZwgl3yyGMpMvmZgjWCJSYgMRbTIDWrvj3ke//8OrH9/Y09Ey66Rjznzry1934OQZJUi5EAoV+IvhY64K1FBP3+BVN/zh8p/+eNWqh6lZlh027tjjDz3siAPGTxpVLkVKNiIn6gBDnm9XFXGBepewLd0NEW3mPL0v48kTQARiXyl7qig37gF5V33fLUYmIip+t6PWx3ihnt4JS9opby8QSAXhXAGFkiHrkAkGK7K9t3/Nxg3rntrc05NaR0wxkwFSIkcaEUjyB5prkpDz+KFroQCRxGzipIX66NZf3ZX1ZYceuPQ/v/PtOdOmxFHsX8riZF7HCy8BCLRvoOf8l15457p7T3jpi1qmtlVsTcj6QoDViIoK4FSylCWdMCI659STDpg3o8wcCasKq/HD5whUoLfA9ZoHJfJXM4E0l7eFHEAUTAzh/QrBCopUwQrWxNCONP35Vbf+5orbdu0dWDhp6WvOftlFZ5w9uW1UjMgUMqEC/xcooCKBWN/d2/O73131Xz//4ap1D7WMbVl86OxTzzjhsKXTJoxqM3BkrCqDxFfUykxM6hzBl8YILL83aSAOKszAffofpgoIlHK3fqoX9qp5A5gCAaWkUAuIqnjSlPKxKg01O7zcM3wTBohhRZWYhMiJy1SrSdY7UHv8qe1rN23dvqcrzWBMEwkRiFU1yphInQlSvTzw54d2AljhXyElIoaUyZQHRt171V27Nu6Z0DLhu//1/VNedDx5FVIR//fFCy8BAFDoqjVrTznvdB1vDz/35KglrUrFqgMROSaGqLCFqooVSLW5TGccf8zyJQs6mqIIYkDiW0XC3oiEIIAIsdf8AP6ylXBsDcdm0mB6yAohYvGeo4ghGnGmrA5OSmbjnr2/ueKGa294aNdeXbrghIsvesOLjzthFJVKGH524wX+AqhK6E+R2bRj+/d/cvmvr/nvrdvWjpnRevTx844++vBlByzqaG9mFmElIieOjVGnrMowjklFiMQQiSOAHSsJAMdB4uALHIiP/76uz+v3YM8PwAdezxeR1jWc/qBsxY/eE0GCbiJ8N9Ig9vdnbTLqHf0FIMeSqVYz7R6sbtnVuXbT5i3bOgf6UmPKZIjgGGABhWF98sPDSs6rjBp6paAvVSFlEkZU0lKzLT9xx/p1DzzeSs3f+MY3zz373LgccejyPRdv5PMYL9QEYCHf+vo33vPpDx1y1lEzDhpflWqCTCSLKAKROGEYVVVxIiIubSJ71NIDTj582biO5shvJxWCI7+0zn9XCaYRQ1Rr4g+bgrx/lT8AkNc0g8TvsFZLgCEhZstcI7N6w5YfXXHtbXesddR89vJz3/yKVy6bvaQVJVMwQgX+N2hDk0kbtm/6rx/+4hdX/mzX3o2j57YsO3buCccdcciBs8a2tpWcg7c8Z2/FHGQ33vTHF8UIIhlShV/UAnUEGCUNzp1BCZFf93XvBwBhptiffTUvvfNmMQl8H1i8Z5DPFErBHVQp6IN8i5aF1AkzZVZqcANJumtv37qt25/asn1v7wAkYpT8KklBLSJiMaTs/PYwMCAS+hLqJT8CAgkRqYoSDEsJ5ajSMrCp59bf3MtVfPL/ffRd73xXuVwKr0QR/Z+BF2QCAKDQ/srgq1772utX33LGq0/RVk4pSZHmI4cMJ0SAaCYCkNiMJFs8dco5Jx8zfXIbO4koNiCyDIILs8IMWH+NiRc8i+8Q28bx0ZdB3sDQ1zee5gzDjewHGv3S6wFj71v5+I9+8bvHHt0ybuKCV5z1xte9+BWTyh0leHuK4jxQ4OkIgVhUmbfs3vX//ee3r77mt3t2bR47o/2Ek+Ydc/yyJQfNa20tRUDYkwvxXdzc/SEQ+t66JxfmhEzgzSHyQXgf9OtnXglCTT/K5fkerhf89Q4AEFhRqIq3/BQRqFfUGSH2oqCYIBCnjggckSrEKoBMMJC63f2Dm3Z2rnty6+7Orky0ThSFqRwmUoVwECIhtCZCC8BXbd6nlMWpEGesJtZSs5Z7NtQeuumRyp7qG1/zmk9/8pNtrS3GRKFyKxLAM/BCTQAABPrUpqfOuPCcWsvAEecclzRVU3IQIYhXi0FFFETsRNQRCzgbnDSm9YwTDz9k/tyyMpwYYnHKbES9/UlQB4VBMPWXoPMfIZJgfhgEEvlccUgN8G0z31swpIjiGsuOysDvrr/hiqtu3bHFnH3cue96ySVHLjqoTEyFIKFAAyFEiQgz7+zZ893Lf/zj3/x8264NY6a0HXHU/FNfdPTSg2aOaCsTRMkvQfXB0LMj+Z2seYtKxct98pkXX9zXjwcauqh5ad/w86wrPYc8tiETwCFteJWoiCjI/05goljA4rl4tQJRUmWIiBOxVtJU9vZXN+/eu37bru079iY18ZYVxKqwgABMavKEQ/7wTfXphJCglNSvnCdVp6xMLkYUJU2y1973h5V7NnadeMzRP/7Rj8ePHRlxpMED5u/0Rr6w8AJOAApkyK644oq3vu9t4w+ZuORFSxNKrKSAdf7KFvhLiADnhJVUXZbVxrTEpxx95PID5o8ox6RCApAJ8wGhCSyAyW+mIADVoDbzGranXU6BHiKqM6tEbCAgIo0oMdWntm/+zuVX3n7XxinNiy59/WUvOeXFI+JyDCmawwXgqRgBM3UN9Hz/Jz/5yZU/27rzyVHTSkcdOffYo5YffsCi9rYykzoSeGIcyA+jyDer1G9kyv8UpnsDsZPT9koqon4QHgj9XBFP1/glSkCdM/EHEv99Ke8ThIRDXv/jVDRkJCMEcWBWglNC5pwVcUA1k56B2vbO7k0792zetmtwMIOqgd/+4psQyv54ovnzCcO6AhDyif18IAHwlRqlBDZqmqkJPfGDNzy4Z333wXOXfPNb3zxw8aLI8PC2evvf8QJOAAAEUstq//KpT37tx18/5uyj22eNqnFqtaZEqo5gFGAJ9I0TpxASTWvVFoPjD11y4mHLxo7oMETqREUIkZLjBhEa7i8ln0xECeHec0RhX13QBWlDYoD81M0qiNiQOI4ki6gzG/zNzXf/+pe3bN1izz7qle98zRuWzZnVFLoCxUU6HOHX8IoAzP21ym+uvOq/fvrd9TufbJvetOigSccfc+jhh8wb39FRsqrsRAj1KXYf/clLeHKJWj3U56RJkOvkPS0o/KZe9QqHoNgRbxzKxJ7HrF+L/svrwmj/dyAMDWsw0yUnTgC/R95bulmxRJo5l6lUreutVHfu7d24Y/e2XXv7uqrQCDBEatj5l8AnGIaycP2+ygNTSEU6lHSlsOeFODMwURaXk/ipe3euffCJCW2jvvT5L5533tkRc5hgK+6tP44XdgIAoNAdnXve/M4337vyzqNfdgqNsRWXgpxv4YKUwXCOiKHigqoTBKDat3jOrDNOOnL62NElIogwGQCqjoMHlm9twUEZwgoldSSkIGFPSQ6pkii/yYQoSIiIoGpY1Te+TDlOIlq97fF///Yv7rl548Sxi9/z+nf+w9kXtnG5uRgZG27I624hZILrb7vlW5d/Z83aR7nVLT1k4lHHLjv++IM7RrSwKBkYQOAgPkLnGsw6K5KH8vrfNBTO/mNat+7XOgDJc4aXcALeuJ+CnUNOHOXNYfH0S5gBU1aCUMgkDlAigTpYUQdRq5KJ1qz0VJI9PdUtO7u27tqzt6cvS21EhhSG/EiCI5AGGYYy6n1qYMhTyo/i+RMm369wIDJEZSppX9S1fu/DN61p0ZYPf/iyt77pzeVSFG7nIYej4hZ7JvaHBGAhDz744AWvfiVP4uVnLsvKaQLr95ESK4QZ0NzOwVkHYnWi4pytTBnbcs5Jxx4wY37JEMFChWG8KShBhYzm3BBBPKfpDUIBhJqfJJc9U16ThauOCKpBbS3KEbMS0lh3JNVfXnHND398TXVXfPbJF77/kvcum7OgWSNDxRr6/R4KqLrQYwXTqsef/Op3v33TrVdmcW3qvLGnnXnSqScunjS6g52FcSJERutLVLya5xl6lnCN5t8fOTeUMzx51siHt4KBs4fkS32JiL1ZP/mrOrSVw4/IWwq5hSI5iKoQOFMIkXWZlTRzqapmTvqrdm9/dUdnz5Yd3bv39FWqCRlm8tt5XZhJo/p38021MIhcd5sIDQ0EOjb3lFYhEDkoSjDNtty3Ue67/t5ql73oJS/7yte+2NHaxBSFs1ER9/9HvOATAACBJs5974eXv/cTl80/ZOqi45ckcVKVhIj9ajt4Z3JVEiWosMKReJGEq7SV+IxjjjviwEVtTQKbRhSrEAuBfY0TWgCEUDYB6iUT4Vr1wwHwU5P142YghoRUVISZ/eiLMkFdRBWu3b9i9eU/uPquO9aNbV/yT+/66KvPOX+UiWIwF12B/RgqEiifaPue3d/58fd+ftUv96ads6bxCS9afvrJx0+ZPDYmq6xMxqljZkggfpSG2LMN4RqBIbR//efkCp+ha7xyOogU6sSrjfKLO6+Q6ttfNGeHKARnEmj954lCBU6dgDMVqy7LMutSByTWdfdXd3YNbtm1d9fu7v7BQXA4SoPhVBnMwoC6XObDQV6Hhkln/jTyv3gTIAUp4BQwoJjiKCm7Lnvf7x8c2F098dgTvvkf35o8frThopD6U7E/JAAADtpbGbjkzW+76pbfHXn+keMWjqogcWoVDmxUlARMhkQ8exh0bCIiakhKNjnp6MOPPXRxRymKxCjEIBZ1YBZ1/t7z9uFhDEUlmEkg9KoAhJkAYEgVJl4r5A0MScFkmKAOHHFC2Ny96/s/uvK3V9yfDERnnvDyD7/zHctmLygXR4H9E6oSLEYSMr+++nffu/w/1mxaGY/SZccecuG5y5cunN9KRJyJCAwDLKLGH181X6JYL2hzpSfwtIDZ+D1IPP1yRnizcwmfn3duJbBFqgrv3EmU93qD6afXRJPPFl5xJAoRQGBVrGotSTJ1zrlMtL+Sdvb3bd/dtW1HT99ALbOqKoYc+4YxUUbKYOOYCJYABYeJm/qKeWo8g/rsbtAs+d61IyWjVHKluL/j/j/c07W1e8rIyd//8Y8OX3awKSj/Pwf7SQIAYIH773/gwosu2FvqOelVp5ZGGStJhlRAhplBEOXgRAKCKPkbgNSRUUhWOeSgmeeccOKYptioiyhWgnNKfkeRCupug160AZVwLBfKZdMCAwgFsTKFo7hPFRw8c6FMyszkIBTxIOxtN935ve9duXJN16gJC975+ne/5cKXj4ybS36rZYEXLvZhH4IIJyOs37nzS9/4+jU3XSPZpvmLpp//0lOPOvqQMe0lQRYxhKwTZmYi9ttNPMdfD4oAUBfCPMvPDPy5Br1Onf/x/y75b6qqoiJhbMCfBcIqXg2eQFAiFSJlUUskQmRFxPOhKjZ1mbWpuNRmDlFidU9PdVtn97bde3p7+2tJJk68HSiHgwY7qBIxhLT+hJ7xPOqyVVL2ev/cj8LLjpg0QhS5KKpFT9y5cfPqLe2m5fOf/+JLL7ggMly3oCjunj8F+08CEMDBfe87333bRy8bv3DkoWcfFrVI1Vn1x2aBIVIhDmJiXxyJYWOdQkidZRpcOGPyWcceNX3cGFYDUoZRtewnykKh74U/gRtVKOcyCSj7+1VJWf3p1X9OzmT607T6ZXYAGWYlEznI41u2Xv7Lq664+h7XM+olZ1/4sbe8a9GU2TEiQwUd9MJHMBRHJRv86bVXf/Py761/as2oia1nnHLwBeefOnPauAgJqYpxfsZQyG/I9bJjqosi95F6/lFtS+NAClVRqTcAKHyRBiG/wIVOQCCGcjsgN1RIRGA4EhKCONJMXKqSOZvazFpnnROlJJO+Srars2/zju49XX21LLPOMSl7ZZDfUUmsfr/k0H1LVM8BecxWaONko1DiYCnh638HVqMUa4kG4l2rd6y8Yy1XzQc++L7L3nuZn/hVBReh/0/G/pMAACi0d3Dw3Ze97/Jf/eCA0w5asHxWgjQlUbjALKoh8hPs5PsBSmBQ5hyBnE1J7cyxHWefeOLsaRNiOAb7nXaqqjD5xepPA6J5saX5vZPfQn4s3vcGgDDDmIsbvIOtKBkGG3VgApWiXtt9+933f+O7N6xb0z138tLPfPBfT11+bDMoghZHgRcoQhEOqOrj65/6t29/4Q933FDlviOWTLroFeccc8QhLc0Ra+bUMbFAocrEEqbZBSR+whxDI38eK5/OBw35LM3bwHXVfqhfwuaUBsUv9QGwXDfqt8IPPTKIwu9LytSmWZaKTVyWWmcdWSfVmnT3Vbbs6dzVubdSSdPMAgQiOGcCdQOXG8/lyyDzo3RAo3G27wf9/ZYvoVElcowolnIpizvX9a64dXXSnf7DRa/4zKc+NaKjPYpi5FmrwJ+I/SoBAGqhG7dsPvOcczb1bT3m3ENGzR6VRJJKFvlChKJwVQVPEb/JQshQlgmRsABZdeKo9pOPXX7g7JnNJiKjJKKqTHFwjSPJe1X12RQ/yZI77SqDIGFYwBc0QTEdhvRzkZCAVE2JDakzRmuRPrRl039c/vvbr3+s2Yx9w3lvec/rXjttxKgIUdEZfsFBFArHxJ29fT/89Y9+/Oufb+rd1DoSLzrlyFe85Nh50yYZB5BTsSAimCCIVwNlgoDdkIsFuXgZCDXHkL/to3Wsf0b9EEph02/e3ZV6Dwy+uSD+RBxEniEpiAKicBDnuX7RxKWJtYnLrKgVVDPp70927Ozetbenp38gTVMK/kOsChXHAINV4cJqSM3ZGR2SAKjxNMKzUuRpwjuuKDGpAxGTMxpHSVOyPX3k5sf2bOo6+vAjLv/h5VMmjotNVAh+/g/YzxJAKLx/c/Vv33jpW9E2eMwFJ5lxUU1Tz2eGLXjqnQvZi5pJRcnLIBRKEFZx7SV30pFHHHbg7LaSifx+CxdxsAtVABzOAarCSiocmFPKJdpK4duGK79xeedGWwQvr2aNSRExrGrWZHbb6m+vuvEH37+mazOOXHbcpz7w0SMXLC0BMYrO8AsFeZBlvf/hh77+3f+4a8UDA9I9a/7EN7zqxccedWBrkzrNmNmpsFLExjkhVYoA4WC3kHsUYt/Wbh1PD3b7KIKonjAofzC+caAQUXj3NB/6pXEgIP+o/WJ4B7WqVjVzLnNZarNMbOrEilYzqdTcrr39uzq7u7oHKtWqN1JU56IgmQbgAm8DkSCKftqL9MwnsM8/UW4OpACRJeJIo9iadI9Zc/uaXRs6F0yb96WvfOnYo5cbL1/C0/ouBf537G8JAIAANal+9MMf+8p3vjlz2bRFJy7kdrVqhXzoVYChQsI+PMO7RhCUSKyIt/m0tZKxRy6dd9QhB4xrHxlHETsip+wvSxBQnxfzU5EC1Osq3/jN98n4E4AfsMmdWZCLlJnhHDMzA06ImWxEfcbe9vCq7/3Hb9Y+tH58x7wPXXrZa859WbtpjhEVl/fzHKri25Z91d7/+smPf/zLHz+5ZW37pNFnnXfcy1922rRxreyqpI5iOCL2VYhzgDHMfhOL10Tmgk/Vp0dKHVo1h9+GBr59FKIhqtdr7nACyHU+IlBVbwetAoH6UWGnyNRl4mrOZS7LXGadS61LrVSq0jNQ2b27Z3dnX99ANSib4KDeBAJO1JBhFYTDcVgvU3ehG6LzGYr8aeSpy2/rVlGNwGSNcmyb4n6z5vaNTz66fvzIsV/47BcvvPB8w/4r8h5HgT8H+2EC8Bf3Uxs3vvHt77hzxe0HHj9v1qGzXJwlsEJgZhIiOFYDeHEPsThCGH0Rhljx2h8jgwctnHPs4QdP6OhoNqbE7J2kvfUuyFCYyZT6jVov2Hwe2Hdwn3Nbq/wQTLkIj1TFGKIIEDiJ1Jpo4+5dv/jJDVf89s6B/ui8U1//wXe9b+GkSS0wcVHkPF8h6pTgrL3v0Qe+9p//fvMdN2bNtcOOnPnq155/+AGLmmJm40QdHEUmFlKxlgyIWCVWxERCmqlaHiLz2ScBUFCiDS0j/D80PkcRvBXCIUJFXW7M7+94AUIHWAQKsYAqiaj/n4Vm0ExcmmWpuMylicvSTAbTrHsw2dPZ39nZ0987YFVBpC7zU2EgBRnx37LB9/jjMlCnf+iZoX8ocuqfAvsDqJBGTCWUqKe0fdXOlbeuaTMtH//Ex974+oubmiKGKer+/zP2wwSAnAi64ebb3nTpW7or2444f3n7rLaEMkeqcAaGoH5tO4XaxI/LiD90OlGAWEVExFbnz516/LIDZ48b3xybEjNISQ2UVA2C0bhQ0FKT1tdO5BVQeEBh6TzqUoyhgo2wi8zPxTMBQmwkMnv6kyuuue5Hv7x555N2wqyF//T6d7z23LPb0FwqjgLPL6iE4tp19/d/77++85Nf/XRj5/qxk5rOevkp519w/Ixxo2JnlZyoM2xEDSk7ccawEsQJU+xUFWzgmBzljDyAXHcQCJ1QRe8b8yhw5/ULShsMY8POYUhTGjnP4zzbA1FVp1aciGbiMriaOBFXTZIks5U07c9sZ8/gzt29/b2VLLHOQWABB3GsIPUiH5Lcrk6Dj3/oYFBjNPl/BvnDj4alBOwTilHT4pr3rqk8cPODaY+9+DVv+Myn/19HexNTQf3/Rdg/E4BHNXVf/urXPvOlT8WToyNfsjweEaVwDqkSE4OFySt71G+Y8Jeo522cMsQpawzi1PZOHTf25OWHzJ82oa0pJqeGS0RgjUQkdIMhuZoCCt9mgOeC/c0M5LeC/7PPDo2bulH1EAHEqkysMJFluX/12u//6Ppb7nmM0hGvOPeVH3r7e+e2T2rCcJCIDr046Zn/UFfC1GPlcxELVL2vDdy9D9z32a98/r6HH9KouuyIRa9//SnLDp1TLhvnDaYgKkxU32erAPtLRVSU/F89ie63ndAQ5ianHhX5YG94CfKHgIaJj1KQJIS6X+qKB1EVkBUnIPHqGhHnnFNVJ367S+qshaRqa2lSqWWVVLr6azu7+/d291UGa1nNQnxSCu0DViJl8akgZByw5go4Pz+jQxmg+jOov78NAssrRUGq5FQ1Io4RRdWWwW2Vh254tNJZO+3kU7/2la9NnjS22PD+l2N/TgACdPX1vu+y9/70ql9NO3DikpMWoY0TTeFXzJEh9dv0QjkV6i7UbxvykmXrLDSdNKZ09LJFS+bO7yi3lGAIYijyCmeEUYDGS6m+uZAXZTKkAxZWzWhY0+Q/7pvDoNCkBlgIhsg3CawpbRro/vGvr7vyqrt7dw4esvC0z7/vY8fNX1ymeP9iPfPzUkO0jmfkgJClRTRzjuufx0SkTOyJN/LZ8W/+2iigTkDMu7t3ffcH37/8Z5fv3Lt1/LQJZ5934ktfdurkiQauAlIxCoo4cBp1yQtI2St2csIjbNMiIl9BB5scFdQXIDaKCKA+KeAPlLnQJ99gp+EoqkJghQpUFFbUqrNKwiqSQURcyA2ZOCc2cTYTO1hL+qvpQDXt7qvu6Rro7q9WK0kwE1KHsHAgbEeC+E5Y/SYgqh9Scu6pEfr35bHQeC5hZAdgMPyeSRZplibZFd/7+3u6dw0umL74e9//7tJDFtf1UShOAH8B9ucEAECBNRueeM3rX//ImgeXnH7Q7GUzXGQdnIMFU96YUhYmSH3Sl+BUiZmcd7olBtTZ/hGtTYcfdMDSefPHd7SWGYZYRZmisHggXOj+HpGhnblQgGleq4a7eh/9czgH0D5nZaUwNSYxVYD7V6766S//cOedW2aMOvj/u+yTpxx+dMQcPV9PAs+s7v7HT27UqoCKIs3sQH//jl27O/fs7u7t6evtySRL06zcVM6sG6gM9vT128wmSTViU2ppcqkbO3bcvNlzx42fNG706IljRpfLUVyK/TpDQ/yMPSd/+fNTvyvugRUPfebLn7n99lu5xR162PzXvOYlhy9fWGqyhNTBhsXlBIIjcEPar/my9PDn0K0lzk+VnrbJXxxqhM88yIPrtI8Gk7ScbA8lNpGKP5GKqkCdinXi1FpVR8453/OFiFjJErECrWV2oFrr6a/21bK+anVvz2BPb59NRazzvlrqlP00l4ZjBITDo/B5zT+cp0X5ISNqz/6KEsLUr0KgBmCKjGXpliduW79j7fYx7WO/8Y1vnnbKyVEc+dehCP1/Ifb7BKAWcvXvr3n1xRenLdmyk5dMOWBCFkuKTMhBlWEIZBwxYHMelfPxGa+ZIyZPlopN25uipQvmHLpw3vjR7c1RHEURCef7w4TrjGs4CjfiuJ+DQaOWCwUfoV7Ged6TtXHP+54AVNgQwFSL5PHde358xXXXX79ybN/Ed7/2bW+86B/ao5bn4Z5hzfPfH79FNS9lVZRFUatVO/fs2bDhqYcffXjNujXbtm/v6u3p7u2vJbU0TVWsU2vVqYqIOOeEFAo2KJfLxnCaOYW2tbeQxi2l5mljx02aPGXhggMXLFgwd878OTOnx1FcMjERG/4rvFzeTafXuu/++Hs/+N5/bNmzoWNM6czzTnnlRWfNnDKKURNKvbuxCoiMhl5RfuTbR8vTODoOec1kSAcpBEc/eKKhpdQoN1TVEamKaYRZQnCo9Ttb2Hllp1hxDhArzkIyp1YU4pyIqEs0G8xsf3+tb7DWV0l6+ivdfYMDlTRLs+ClSKHYz+t5/+DUe/l4+TPqyqV67Y+hXQt9+pMe8ioQQMwqKoAhjTV2vbzjkW1r715fds2f+MRH3vaWN8eluBiL+WthP08AAARazexn/u2zn/7K59onlI44+7C2qc0Jpc4IEdSCiY0w+/KJnFNiCn+DAhA/jiJKRCRZzZCbP3PikcsOmD5pdAmmKWplgIVUnd8EELxCwyWde+gC+dJ5XzkF/p/UazYoN31h1Ft3DAAOSgBzpM4quTQq70Xt+tvu/9Xlt+/YUDnrqLM/+5FPTG4dU6Lnd1u4MaApAJxTBVmSgYHeNWsff/CBRx56+JEn1q3t6u4eSPoFKbHjSIVJVRhKjFLJNLWWOka0t7e3jh4zcvSYkR0d7c3NTa1tzeWmUpamLW1NAKx1g9W0Vks2P7F9/aYdu3d0VWuuOWqaPnnGvJkLTjjuxMOXHTp+/FhDbGCYzf8pkKjXem7Z0fmFb339V1f8rFbdNnvhhDe86cJTTz+mqcTOJcziJwENsQuGtDk9Ei6SeoYHUI+NfhbFR0y3z8+sSwf8xrl8n7tI6F45v5U6/N9bx7GCnDoFWYWFs5I3ep2IFQexKpk6kHMqSWYHklr3YNo3kAxUsv6Bak9//0ClkomQsoqy99ESQjjghg15+fPKj8GN6J8/y30+0vho429K+cSv9/4RhjFZFGfxtkd3P/XARhmQi1/z6n/+2Mfb2lr8xO/TvkWB/xv2/wSggACdPV3/8OpX33jXLdOXTV183Mx4VJONnCPHjgggjT3fn7fL6ptWw4yAKphZRFUc1JFWp08bc+hBi+ZNmToibo2ZDNjfvXXi2u9L8qM1LtyWDAVBcqopfCYNuUG8+EE1d7QKv4mSUVFiTS20HPXB3vzAI1f//sE1d21dNHneVz78meULl0Vget5VRjrkV6+xhUbo6u+//vpbrrr+D3fefXN/X49zKUgQaxRzeXRpyuSxU6ePnztvese4MR0j2pqbSx3trU0tTU3lUmRMZCIiEBOTP5z5Jr4wE1SdCEVGBWpN6rRSSbZv79m0fvOdt97/+Mp1PXv6Jo2asGD2whe96PSTj3/R3DnzWuMSEfhPPhMoBCrk3Ion13/kc5+57rprmjtw+pkHXnLpy+fNmqKuAhGw+tWKwWsnED2SE1zkWSnsUwcHm9kws0X7MP3wn+tFx+y9hSAQERWR3CjO+GuOgzKNoSyAQC1JJs6qiDrnnHXirIpVUacqjpEhq2ZJ70Cld2BwoGIHU+kfqPb1Dg5UE7HW20SLn3iE73g1+rh5By3v+Ybj69Mq/GcMMwz5ECEcHZRJRZTFQEqIqb+pZ93eFbc/VenMTj/lhG//57fHjfE7fusvaoG/FPt/AkDIAXrPvfe/4W1vemrPhvlHzZh/+CzXJCllngLyigsWZVUOXVtW0voEL/tGmyoYXi/nJJkweuRRBx+0ePqkkc1NJTZkQELqHLEBIEHOoeEOobp/hKuT0Y3bJJynG6gvbAWHKWUHIsMAQa2QVCJ5dO2ma//wyD23rW2vjviXf/znc089pwkxP78KIw2eCGCxApg1a9Z+5Tv/ce0t1+/esYmbyMSutZ0nTxyzcPHMJQfNmzpr6sTJ45ubSnGkyiKxdxTzbRjyByo/IOU7JeS9AohUhZiIyO+2ZWLNlI0RAbFxlpIk2bl5z4r7HnvozhWrH91YqWhH29gD5i0554yzTzjhhPkL5sRU4mDU9D89HVWr0Ntuv+2yf/now+tXLZw1+vVvuODFZx/VPiISTQTC/gRJpE680pg0tzILuyWo/n5T+KjmjW/NB8lBqn7/lyJvGKt43tCJSpjd9TIzH5ENUd02M8ykCyiDpGSty1ScWOecWFUnoipKLM5kkO5qz96+/r7BWs26wUrS1zfQNziYpZbVRGpUoCJCYXlvLmEGDeF2/L0S7CfCqsh9r4JwodPQv/jf2Z+tlQVCJlWgJKWyLfWsq625Y+3AruTwA5d+5etfO2DxPGaf4Ar81TAsEgAABZzqz37x80s/8I+2qXrIqUsmzR+TcOZUHTkxrAoWGFAkzFAXump5p8lf86TELM4KQ1WyNB3T1nL0wQuXzJkxqrUtirlEgKgKExPAKg7wDrh1DyLK6R/N/+YLPp8LpH4eCExq+FRhZgcIERNYBVAhVJVXbdl2860r77lu1cCe2sUXvvldr3vL+JZRudfEc4zQRiGoaC1LNzz11De+8d0rb7hiV/eOljGlaXNHL1w8b/EhC+fNmz5u3Mi4pJEhGEdWhRxBxKgwwUBDG4YJDFEirnfN66N2dbtkf1ajsD1FVMkYoyoRlURALqrW3JNrtz9w94q7br1vx1M705qMHz1+6YFLTzrm9FNfdNK0aVPLpSYQ8TMkhv4SsEl65bVXfPwrn96w48mjT136lkvOPfTAhayZaE3VMbPhWKDihCloC1iBYD2I8LgDTa5ELCrBNyInDInzPsAQw7eQQER8Izc0m7xOVOEXuSiUvA8PIKKqZElT2EydtVbEwqqIOIVTAcOCBit2b3d3V6WvYm3iXP9AOtAzkKRpBoVTcspCEJ9g4TX+aAyxhARACDPMqGeFZ/Z5h8SZxnMKRwcikH/X2VRjirUao0sfuXZd99bBaWPGffUrXz3ljJNi5vAjng8X9/6C4ZIAPAbS2gc//JHv/+Q/W6e0LT3tkKbxTY4yx9axMhE5IihLxCAhF8bnwwZIX1rle/JAorBwpGiGO+ygAw45aNao5lKz4ZIxRiKIBTgch4nFkZeBB59HL5/Tes1Pfj2efztybgDq7YaUiYKo1BKIYEBgkCID1SKzYXvn3XevvvKqu3Zt6Dtz+Ys//b7/N3viVCLhsETpuTkQKMQfl3p6B2++/e6vff2rq1bdN1jrXHjw1NNOW7Zk8dxZi2Y1jSwrG6vWsTg4JSg7dhQWjOdREABxUEcRyK/0ITCFT8ojAu2jq0Hg0zhvhhoVIUTOOgFHHA32DD65atMd1971yH1rtm/vTQYxYeK0Q5cuPf/cc4478biJY8YR4hLHwdlJAYWDXv7jH37iXz+YlAbf8PbTXvaKMyaMak9cQv686NSYsN0EREYJzH51kFLuNIUg/lVAxesyVXNn2aEL2Sn/my+5RQGIKETyPS3IzcVBCmUVJf9jIAQnkjmbqXXWM0Uk4pfDWLDRGH3V6p6u7u6+gcFqasWkVgb6BwYGK4m1NpNQ4CtYPemvrn50afA/ucOQEvx6JKrnh32vhfyXxnuUW6OD/NSYwjgSisSUtVzZiXUPPLlz7d5xrWM/9JEPXPza18TRs6TkAn85hlcCcMDmLZtee/HrH1736JQDJ84/bh63cyqpshKJOmLASOw/1wddH5bDOdd3CIgIZJ0fVCHNsiiSJYtmLD9w/qSOkU2xKStDJDKxN11UqN8nFuigOuUfiNq8lAp7OQik0jgU+PpOGQh3om9PkIDYCRBFFmZbte+mu1dc+7t7tq3ZuWTEQV/8508vP/hQCuvJ/s4IT5kJvf09V19zzec+/9UntzzRPC4+aPm8C151/LypU8d0NMWxJC61ZbUKMJwTP/rARCriN3gidNL98/ADejnFETIDPUNJQ0MTAOVyFaIwoeS3njg/leQgIM1MT3d13ertd9503123P7Rn4142mDl56rnnnPPS8y46/JBDS2j2KxwrSe3b3//yv339C2PHj3jXB99w2umzmyLnRC1UyRAZdX4YUAlKZAAwmTAllh+FRJWYfDnvxI/f5poZ8tu48iZSPtalCvEObsG20wvE8qQXagX1amHrVKCZOutc6jKFSgYVwBlVopg1wmAt2dm1e09XVyVNrKgVrlSz/p5KUsuciMvDOKS+8y5E9XyjUZ4dQjOg0dIeInxD/Vns2/zNw79fBObbyKowAJIIxtTKcT8/dsfObY/vKim/5lWv+uQn/7mttUQUFbX/3wLDKwH4p3r7nbe/+R1v39G3+YBTlk5cMj7h1JEVzfzeMHYmfC4pS51u9rFI8gtaSUUZTkRZM+fYpQtmzjj8wEXTx40a0RQZJQgz+/vZz8oEjtc/klDcNYYj/fdVgOtGvoH1DmE/TK8Fy174Da5Myk7Flajbpg8+vPam61Y+csuTLRj9ics+dd7pZ7YSR3/Hm6YxC6248qqr/uVzn3hy06oZs9tOOe+oY05ZPm7SuJZSRBCIOlJLyMgSlIMclAWqQszwpS7q7cHQPRFfytcrx8YTy1vm/gtUiSjvoEoQ2gYqCggLdlnF+entWIWVS+TcYHfvqvvWXveb2x+5f01flxs3ctzJR5182bv+ceGiA5XSL3/lk9/53ueXvWjJ695z6ey5k0eUegQCihQsAsMEziWX+fsYrh7/8L3Al/KTnap1TuH3w4cukWH2DWKpXxC+PPbsvwben4jgt1SAvJ+5KhFTps4661SdU+esSNjHogpCzJHpz9z2nq5de7p6B3qdAxFnNqtUqwOVmkudComQhBe7XsrXtcxDq/v8tW/84z4ORX/k6sgvdICUmMiFAXwHkFGUJOa+5nX3PbFtZTcl5txzzvzs5z87blQ7m+e3wu2FjOGVAAAoYNV+59vf+cDHPtQxdcTikxa1TW9JTWY18e1W4ghKrEJgrxR1DB+aAD+oyYAFQH64hiwIWerE2umTxh5xyPy5k8a3NzWVEBs23uXTMEGEQi1oglYCyG+khl2EJ4LzjzABAgllYdAlCUBgf/OLIQbYqUUc94u7f81TN1374B23rs32lt/+2rf+4xvfNKbcxvirz0A9C0SFlIhp1drVn/vc52+4/ZqW8fTil5/44jMOnThxFCIVdmJFCSpGhBz785GK39JZX1dYz5NhSy2RkpCCBUTsY2n96RAQhE95WyCEXE+0UENuFag7VbATqz6PgFKnRJEyomDDibSqG9dtu+3399x+9b27dvSOiMctWjS/FCe7dj1+zgWHn/8PL+6YNlPVxZwwVCXyuq1QCqsQwYmru/D7xqxn/YIuH2Ebr/PBO/QwiABmDpaxwVc27w/4XKB5gZ2fcvLmgThRUcrEZmpFVR1UxLtdMRNFxlrd09e7Ydeuzt5eK2qisqSoVpK+an+WZdYJBEZZ/fWm4cnk15t/gRsDZqE0aRxknyliyL8o/EL12iAMbsO//XAEQmaUjbIZ4F0rKmvueaJs4/nT5vznD7+7cMFsJi5q/78dhl0CAKBA78DAhz7ywR/87MfjZo868PRl0WitaVWMAmJgfIBhgVFvzAaFOMqjSl6VB+qT1akDiMBpbXD0qLalBy1YMGPK2Pb2EkzMkaGMVJiIvD5FYhB8TAvRLlD8+UR/PmUTlmgA4XAdPkcIrKSiYgyp8y3iKLUO5bhC9MSmjTdc9/ANV6/o3F47ffkpX/3kZ2aMmRAh+tutFROIWuXI9A1U/v3rX/qvH34nS/Zc8JrTzn/1GSPHdBBniVTJUL5YMCcQmCDOKVRFfVekHrrrRsh+E0Pwx5dc5EL5wDbymN+oOnO7zCF9YhUCe6mXd1gTqFM456xXtxjjxJo4surNeqKylpqpqbpt8JY/3H7dtXd37u2cOX3CP77/jVNmmaY246IyxaUYpYjInzCUXJiBDS4MYd+6eNmPf9CB/AlVfGj2EIkIhRmq8PD9XKBVp+GMKKE40CHDX0GDL1acc+JERODUibh8GzUBkaqpOe3q792xa8+evs5EbKkck0ZJ1fb31QZqSeKsim/DciAjneO8c9QI8PgjOk6tv1VDvqDxCY1PRH6kITDlD15JlWCUIhfJAPq39D9281NRWp40cuRX//1rxx57jDGF0f/fFsM0AQh0/dYNb7j4jfc8+MCCYxfMPGISOshypqykhiFQNgArSEnC4C5yWkHrwy9KyvnIpmf8rU3bO6IlC6cfOGvOuPaOlrhkyDVFrCIMJjEkkebkZwhkngjdp0ryN12eF/KaV/OyKwhK/UYnFTAE5EAOaiPdsLP75lsfvuHaFduf6DlszkFf+NinDl+4NMLfYsu8Oieet3jgoYc/8IkPPvTIrSecduRb3vGS2bOnWJM4JKl1ZAhkAInIqSMLqFfJixP1Hgjk1FfDSmRyrkGFiUB1m/xAkTdmJNBom9YpoPDnPJEE9Tzlk3XkxDmoU3Eifn+hZ4msqIthIiOixmpJo7LGMUU9tgoXlazEJalUt8fNOnL0+BFNowzY5MYddfbLT8R6aaaqOoV4foYA7wxI9aga7jzfqc5zvI/FUIULjD81Jh1CzoA/CoiKc86GF1HVqVMBlJmU2KlmmeuvZVt2de3q3p05GzdFTJzWXLVa6+0ZrCZOlYkJzNY5hBkDUJiEx9Cgq89MAE/n+huJImTxQJbSkK/0l60B/HJ3QB0RIhfFtbhnQ7rq1sek17Vw+Yv//rmXXXAhM/t37/khats/MRwTADwRBHfVlb9/+zvftSfdfeiLD5u4aLwrJZYSITXEJAbQfLuvI8A4Y1QBCMPmtzEIJJqHbN/Molpaa2riBTOnHLJgztRxo5ojU2IyhAiGYEhiDZ8vQRUaXIRyLrguF/Q/QYVyoiMvAUkhYPbaEWVkYn3tSCAll0K293TffttjN1+3au1jOye3zfzy//vc6UccV6bI/PVuJYU6cRFHfX29n/3C537wi69PXjDikvdcePAhS9rKpcylGrtMLCM2YFJ2LkPkoGTJKFid8wMOyH0lnToQGFGDV/Yew0EGD2JDwf2mHmL2jQ3hYKBQMHnNOgcGBhAY55ySJtam1qaaWQlbGaM4hpCyOhKoRCYicawaUSkTtaxEqNXSpFYj6PixIya0j2pjikCqxuuRSASo28oq/CpdiCBkAL+hHb6nj7rXCKmqYQ6nndz+QaHO+4vktYaX/qj4nrkTVVHJrPMsEggK57cRgLiSaddgpbOnp7N3b6WWxOVyk2mWVCqDg339lf5a1TkFmBwEXuXmtxtRw0cOQN3Q7Zmx/tl+p3qv+GmHsvA8/LtFIhGgxJmQsMJoxNVSZXt17e3raztsE0Wf/rf/9/JXvKypKSY2/qcX8f9vh2GaAOAXh6Xp17781X/+/KeaxpaOOOfItmlxQomjDBCDkpA4kAETCYsaMaxKgKPg7wUguPyoMrMT543bMoWqi5HNmzV56cJ5k8aMaI+jplIcc6wOhkqAAk7hOHim1/nUcKrI1R2Ur9UISiFt3HUqBCVRJQcXblUVE7HLnDJZoj29tTsfXnXtdStX3b9tjBn1+cv+9eVnnVsi+qucAwSq1pkoevCRhz700X969MkHzrpo+Sted9bkqSPJpc45GAWTOKgQw7CSkrMkvkgGVNV4YTugfouI+vAWFoiHJriS+tYLgUOl7Lu4mrM99WhDoRssUCYmcaRQGP8OCSCAgyZpkjqXOmvzOV0wFGwAJvY8vkCJYJjUETESdk7FOZclKmnWFPOEjlETRrSWyDAMERkhalj3hAXoouJC9PfJW3KNr2rjnfTtAS/+r1NX6vvh/gtURcibOfvCX0XEirNOVMWpsr8OWFK1SeYGK9nu3sG9A301myCSZtMMZ9Ik6e0a6O0dTJxzxkBh/CMKdLwlApTz3oOEdbxDdP15DH964G9An+VP+ZNE4zyrRokICaBG2WRxsptW37xmcHui/XjHO97ysX/5aBxrxBFQ1/UW+Fth+K6ZZaClVHrjJZesemzNz3/78ycffPKAMYui5piMOmRWHYMjMqSB5nEkru7HoqSeNPUEKpOI5GObYgypGBWz7qndtYpdduC8mVPGSKZNBhFHUAs4Y8THBIIR3yAINb/kwz++WvRWYn4sWZnCmI/m9kFQJYGQhBFla4lJBIYxpq104tGLW1qbW/n+Ffdt+sePf6AcxxecdgbB/IU3ladklOmnP//5Rz79z+VS7z//6yXHn7aUIitp1THYGBBsZg0RxQbOOREoWJlARsWJKBkFk6+cNbRdNETMXCyr4fUmMgINvWz1g9nIc0Cuoc/DDkNzv0p/fPDrbSVzLhOX2sxZv/oTxKwQCEEteQlA3cEDsAoRMWBYVXVRzGTEGRLVnspgZGRES2tTFHxko9C8DW0aBQAm7/4k/huGhdH+FQz/+beaKe9n+KODijrA+PwYTgCkAhWodVlmnSiEVOEE6sgJaX8l7avW+gaqA4PJoK2RQUtcMiZymevp6d3b051kmVOGGhJVqFXl8FRDA6N+8MoNRRpXiQ7NBEPTQP2KyB98vSXc+AaBycrbPOQlTsIambTJ9cpTD23s31EjR6+9+FXv/sd3l2MwGww5ShT422H4JgCP0aNHfuyfP/LkpicfWvtI29iW2YdNM63sNMjxUGc/A9ngl+xxnYj3NhL5PIyv5eC7AgpWRFt3dqqk1WTm3KlTRpajmLJSxICSeDoXCiGONIw9he8TKtMGFdRw2fVVsYQZAW/wjhAlFN4IgZidKCna4/JRSxePbO9oa7n7tt+vevdH3tMa//uZJ53G/9e6ytegpFpLs89/+fP//o0vH3DEgg995L1zFky06Bepgsj4c6VQxIaI1AEgQxQSGqCAYeM8EYz8oTdK+qBwQSCUc8or74XUmyH1sn9oQM3ZCxUJZFDmNHNZKtY6l4lzGuIpKO8y5IIuoF6Fk5KKEBE5EYgaitQqM3Ecq3VpmnUPCDGhzKUoItJMxYBUNVwcjZk+T9145S78Y8x5/7y5H5Kcp4Ygos7/FggVFlXrxIl13glVRUmFNHM2yZIkTQarSU9/MlhLUydCiOMSE5HDQM/A7q7ugUpiLZGJiQjsd2zljdj8vBmC95DY3vjj00yJ/mfKoP5u5FNsQcnlvwkbFVESIoqsSbvN9hU7dq7pRM0cd9Th73rvO8eM62Bi3yzB0wm+An99DF8KyMMXPrfde8eb3/a2zbvXH3nu0aPndWSUOBaQiL9JlfMZYG/DQCwNzVveImYCERyF7S/qvM+ndUzS0Vo+eMH8hbOnjCg3laNyOTIxC5MwRVAFG1+sciOQBQWL1kdrFMhXZAPsRPwOVwBQdQjNAT8loCQARI0xnFmbqW7e2nXjHx684ke3jy/P+u//+sUBc+aXoj97Skyhos5QtH3r9re//13X3vn7l1109JsvfcXYMSMECZFTciCoX7QcJPjsU6gXxYs/wXgneYFVdTTUCKERfLSR9vwHBGRC3qrbPvjf8iBRr2F9KsxUAXWimbOZdZmzTkRBVrUxbeWVpsReaRoma0O8Ch3XkKHATpwSDEMzB4Uia2tq7mhuaW9qKkURKQzIic3XVAWaSnPj5tzBH3nrPxS4UufZKQx8hckAIe+7JlDnxD9+UaeAg9Rsreay/ko6UK1WUttbqYqwiESmBCicpDXX0zPQ29+XWe9sayhcqGrEKDTMXzytnn9m9N/34/8TBdT4eL1waXyM8gwHWFITu7Kpxevv3b51xY5arz384KWf+bdPHn7oQVFMBFMwP383DPcEAEAAC/vDH/zwPf/0T6XR0dJTDx41szkxLkMixh/gIyW/PJLDviLHAMCiCpcbkkGJQnuYxY8SqITpGWs7mpsWzZ2xcOa0Me0jyyVTMlqKUDJ+QpjJ79UD8s4bAH8GkVw44jkEKFRATlRgvY2Mv1e9JZioQJRJmdk5FRE2EQiZ4217un7zq7t+88O7l81b/otv/WTK2NHRnzMp7BkrIqxcsfZtl136+MbVr3jzWa979dGjRpecAAYQFWs59+mB1qML+3jmeStVFThVUlWr6iB59Kf8B0lOQQRvgaAnGbrjS4MZRk6pExGUyL8gouIgiXVOxIp1zjlRH/0B8n2GuoQojBN7Ct+TTxjSb6GGMlWCEF/C6iB1hqg5jtubm1uamkomMkQqwnVxTx73FPm4rK+2cz6F8sfjtQWaR3+/XZqVBCLeflbEihPn53tdkqX9SW2gNjg4mFZtVlE4oogidQqntaqtDA7091UqNaeisSn5I5QKvHyZhX1Txbs6h9ezkVH37eE+MzzoPl/zbKnCv3T1Fo0qACavWWOWSCLtbela17nurs2De7MpY8d/+atfPvXUE5pK8AYqRfT/u6FIAKEo6x+sfOYzn/vat78xelr7stOXlcZqlRNnLBlvzwAjHCgfVd+89fILCSyEL+1s3sfzH3OOPa9jSBEbN3fG1CXzZ40d0dbRUiLVmKNShIgMqzKY8/GknP/MDYJyqzgh8tHBivOhQYI40A8FQ1XJubprlnUAhQZ1Jrp9d+13v7zz97+494SDz/zpt7/TUY7/dH8VUVGi66+/+e3veXtWtm962zkvPmf5yJGp1VQ59v0JEqsMztmPvNXJPnRC1SnUWyAElaQK5QEjf+FCnK7X+lQvpvMmS0P6GXoCoqRe2aniRKxzVlzinPhBKh94/NMILF0YJPbh2SuLlAH4cp1V2R/tiCACJnhmR/IKXsURQ0UI2hSVW5vKreWmUmQMMcQFkiofAwu0Va5/ryc2BCOgfCxA82QYBELqxFmoVZtZmzqXplnN2sFarZqklSSrpWmWqSPJfF2dps5qpb9WrdaSJEsTxxwxGSgJi09CpBpUtAphNA45Q+6Fp98bOejZP6l+XgVC0m+odD1NCar7KgqIYjVRwn3r7cM3rMAAjWwaedk/vf9Nl7wuNspBAbzvYyrwt8Rw7wEgb1e1t7a+9x/fs2nL1l9decXj9z654JgZpZFRatSKzQO8L+VCEFGg4TjQKJbYk/L5PcAMdk5916CWYc3GzYNJ/4HzZ08bN6G5XCIFE4i9QJoVauqGK1D25gW+pvZuYDnJweQNBzSs5YayshcG+UoLIKcCYjAlWcqRAevoMc2nvOTwvX0D1/366m9+/z/e/7ZLVa2h/+Ua8C4EDu5b//ndj3/q4+OnjHrHu99w8mkLyk1Vp45M5AeIKPApIQoAoZ3qGR5RgZJT50ekcmrLh48htWYe/cP0FAKhE96oeqr1YSUkYBUVJ86KpDazIkEk4wUteW2bsxLBRMKHZ0Dytez+gecckt8GQf6l9qE6lPFBhBqa1qwiqc2QKIBMopKJ/crH3LoOvtQeSmgPJa3qrs7+tdAgeYVCU3Gpy1KxaZYmma2lWS1La2k2WE2TLEs9gQakiXOpy6wd7O+3mSSpdVaYo9iUADg/EKzeTDt4miggXF9H/fR7Qeuhe0h5/8ywnItx6++JT9LIj06aH7F8VoeyEtSoQcLJ7mzd3U+4Ptds+E1vfO1r3/DqOAb5/lrB+/99USQAIL+Cx4wa+eEPfWjL1i13PnovlWnR0XOiDnVqwV6u7xSAMtTb0KtnjY2DD73CsEQajMfYH+WhGhkEjSPDkVm/dWeSunSuzpgwES0RlMTAsEYMA84PAGAKc0A+YNSP0pSTyGHXTG4nKWp9ZDEMdeJZBgfrHCRCqilElGTM2Ojs8w/fs3HnJz73sRlTprz8nAv/51dGAILLnPvAhz/0rcu/uejwme9+36uXLp1Tbhpwxqn4vVpQ55SViTSokzwX5We94LyjATRPnHXqJrz49SBIeeJAENj6GJqLZDQ0BzXM2ZIV58Iqc5c5Z/1IrKoSnPoRUp+g698gL1DVF+ZDzOVD9KW6OV8okMOkWm444ckmMiKu3uRNMqualKxtaQIr4sgYIDDe+a5EIce5J5E/1/ljkbeabag8RZy4TKWqrppWa0mWplmaZbXMpS6tJGkttaJIM0mtTZKkVstcYm2aicJaESLiSAX+BwurNyJkQX2wXEioEWYb19WQe6ExO7fvx4feMlT/NT/ChYZJ/mfOXx6BcQCMRpHG1U7Z8OD2vVv62svNl779zW9/99taykQNWq7A3xUFBdSAAgLcccedr7vkrZt2bVx62oHTl43TZrGwDpYMqyhJpAxA2DvHCIwSqwIiTBn5ydVGYRsGOz2Fwl6couLs+BFtB82bM2fqzJaWUrnEpZhKFBtoZBikpBr5+0/AFD7itf9hEaBKpqrQTNSpiHjzFxBgIL40E/iWtCpD4NI0IWOy1A3WsG7tzu988de8p3TjlTfPnTwrMs9KBAVT0p6+vZde+u7fXP+7g4+ee8m7Ljr08JllqsGIEoBIHZG3rWao1uUlEFEHJ4rMiqjk5xc/BpU/t7AbERKOTVBf/hPUT3D59rofcFNyCOFT/IJgkczZ1Fkr3vO4XsCH6M2EsDPM7+cK1HQQW+WplHzLQdgq4H1XFUz+P6+VhDoYQLySx2/0qQt4fGuEROMoiqOoHEWlOI6ImE04HYZ5Qh6i7lLVYNXGQYKpTsl3ep21SZb0ZbaaJWlmrdg0sUnmajatpalTSWq2VknT1GZZ5hzUkjrnD0NK/gUSJlaoBqkwWHzTJNef5VvqG8cr3cc2qXFL/LFb5Y9+VPNur1ECNFOjRI4QcRLHA9H6e3Y/9dDGqKYvOfusL/77l0aNaDNeBVeE/+cCRQJ4OhIn3//eD9/3T++vlipHnHXopEVjtJRllFoSJmKJAIVGua5NAkfij8jKIGI4P+jpFx2FEUsE1wbAACyu2tJcmjdz1uL5s8aMbI5Um6NyZGCAKCLDRKoRIvZfR6FAFlIBfKvTKZTUDyg5EVVYiKoYAoMVCJZnRkV9LxRW1MIlie1P9JG71//8K1cfPuuYK37867ZyyzN343oGZdveXW9665vvuvO25acc8Ia3n7/kwOnl2AplOSdm4Byz33LpKRXvHcZWJHOZqlqRBovjOaLgaIS68kdznj/PEGBv+s915wQVsFNx6vxCKy/ssWGbudaN5Oo8Efmf9nQ5IeUVfr6d0WcXwLEAMEoUDJ+c/wYqjgmAUfiQLaxGya99Rm7QoFBihiETGY6NiY0xxsTGQJSJvc01RKEMkjDgR54aYVGkYi1cLbO1JE2qaeaySiaZuExcmmW1Wpr4f0oy61yaZDa1zop1pEokXvzlwOQ8PxVeNFIg9DYk/OA6p1YfPKF9gwANOf38UTztH582AaAgMqosBDIVhRqNShq7rtKWR7ZvWbGj2pVeeM5Zn/7cpydPGGuYiVjzwb8Cf2cUFNDTERl+9Wsv2r5t8ye/+NlHb1kVtx4ydmYLG8ORimfZgZzI9a01/3UNW5icNQ0Sdg1/aLCcgMCUBhO7+sknara6cM70CaNHQk0MjhnixGhkDEOVlRhhaXyumvdtYG81Cb/0VSQv5ZQsFCQQEVEh61mFzDmxSGxaE1ut2QyYOH/MoacccM8Vd3z5K1/58Ac+ZPa9+Tz5sXHb1kvec8nt995x1nlHXvzWC2YvGc9aFRU2JI6C145hhYMSEWsYeVKBcxLsCjTng/PIXM8GGFpIBy4iiMfh1Dv3+KgiTtWpeC2/dWKdy1wI/SHoB8+wITpRNMaEkTdvck6aQic597IPAYxAoeIXIjhSVRcxq6jQkCVcqgALHDV+IPulJgJJrTgnNpLIuYw4YkOsjDB9l58+yA/0OohYl1lXzWpJltWszazLrKtlaZJpZm1mbZplg5U0zbI0s1lmnYhYJ85PguXXo0+c4ak1avn8POY7sHnoD699vfXin3hdVIVnBvinf2ToB/YhiQj5oUtFYRRABEJqODF7nujZ8OBmW7GHHbjsAx/+0JSJ43yGblCBBf7uKE4A+6D+WnR2d7/v/R/40U9+OGLumCPOOLh1QtnGaYYU5Bevc70NRr76pGA1ibo6sbH3O+8J+1zg5aEEZnIuU3GTx409eMGiaePHNMcmiiiO2LABNI7KkSKCRCQNMSJUFZnXBRJ5V8vMOfGrZCBK5OBrZc+9iIQ14FpJk0FrByoW4MRKpcte+5837lzZf93vbj5k4SJjcqMgBZzs2tt58aVvufm+G8582dFvvOScWTPGRpEVEjgyzF5eA1YEr84QeZx4eY9Ya506EDnNDZsRXokwzEaNFoCG0bkwQuu5ES95ciLWSSYuc5Ja66xzohL26MJLbAwREXPePR5iFaT1bLCPohQY8i/+DVMJzpRMwbEHnpYyRkRUEBN77s3l3W1+2kCCQplYxXnnV2ZiIsPGnwBi4ymg0LW2EtJYZjXNslqaptZm4jLn0symLrU1l2RZktkkSZPMWSfWioj6wTSIAH42nXJpkeR9WD8wqIFuCxek1jMwDXnMTydeniX6/2+3ytAuSqPtCwCOxJDGKElf3LO+d+1t62yPmz9j1pe/9qXlRxxmDFMYuiii/3OGIgE8CxTqgC3btl188RtvvfvOyUsmHHTywvI4k3KmECVRMoCSmnz4iwLxEVyi8yKX6h1NqX9v7wdAxH4DmULFpiNb25YuWDhj6oSW5jIbjYhjw4bjiDkiFwMcBpYI3ipAnFM4dU5Z1CUq1llPRjBB/GIQp9Y5EScuTW2WCippOpDZWmZdKgLDWtr8yParvn/ThSdc+M0v/kdzXCKQCoiwc/u2N73lDXeveuDU84953TvPmT69LSYLOBCzxmLBsAAkSPxDVesDpIg4Veucp4PyZ05Djk3eeTh/jfLGhhKEICJCPoOpVUltljqXOJtmznd4QRxKfmICMbMBKLBFOeNDyDntPCDVtZdDfq6X54e3iyEQk3fcnRPDRkispgCBo6D44rD3i8OsGkJXWusLvCSfIgeTXwlEJorqnSEFEmt9RZ/YzGbiRFIrmbVWXZq5apZkqU1rWZpm1rosvLUKYYWKiL8CRAlKnJfQElrYhCDvD5V9nmL9i7NvoNUhb8s+H/8fYsLQDJr/2nh/8+8b3mhlZ8pZuX+jW3Xb6t6tAxM6xn/uC/963nkvjkuGC8n/8wBFAnh2+JLpznvvfvNbLl371Jo5R85dcPQsGgFl6zhzrIaYnKEgCfR6diXxMh7JD9iUW0PuM+vEpGHDn+dAWAGNxS2YN3vB/GltpVIpjiNDpBQbYwxFxjBRTCiR/1I/tQQrzjp1qhmpU+eLfgTHGJdZZ5211tmslmbZQJLVVPpraaqZOpckrsm0uIHokRtXrvzDY7/87q9POvKkUilipg2bN77jHW9/dNUDL375Ma99y/ljJpc5skSelWEIQ4jZL5pVVZhc2wPfnxBxEiaYAPIsTS7oCdJKVVH2RqB5gxuezVan6kefnLpMXOpsYm0q1lpxwb3NMDODvV0wgcKWTgDIp/LqYTBgSELI3986HRRiuH8H66c4gTFgQ2ADaCapWAiBDXnPTQwRsgaxKXmle/25MoUSFwqoUOpsmmXWucxa65xzcM5Z0cxlqbWZy2pplqQ2SbM0tWLVp28f/HMFj4pzfmFaUCKD/MN2DUpLNVx4hLy720h6z3atU96QqVNHf8Z90mgd14fdVEgNNFZjKu3VHdW1dzzZt32wzE0f++iHXvPqV7a0lIzXHRS1/3ONIgH8UQjgID/7yS/e+/4P9WZ9c4+aMXv5FG6RjJz4qTAxBCWJCFB2ef019AUdetoO4hT/NyJvb0+ZdcQCIlVHcFOnjj94/rwJI0aDHQiRAQmZyMSRKTGXCBExfGuZRMVlIpm1TqEEFhJI4pJE1FqXOZtlzma2llZrSTqQZhVns0xTmzi1mdiIm2Jq6dk8cOXXrjxy5rF/uPKGmPmO22591/vftXXHxjdfeuFLXnnSiAlNikTZESmzUVFyhkBKGSBKBgIfET2xI/k0loYFjGElShhW8Dy4kgKOXHhd4Hu8UCaBqCIMuBE5DX73mXPW+b1XIuKnIyi3evZDeTlJ1ojoYZNanYir0yCaH8zyPjAQ4raCvDpGSBGDnXXrt+8eMWLUxNGtmbqEMhVV7x3hRanKAmn8eB98BUzEzD7Jq7jM2cFammZZar0jBUQhFjazaZZZa5MsTTNXS7MsyzLnHf5zDwnNKRywiojzmwKgwYE5nDi9zKcuscoZH3+8yVmWIS2AodcoGlet4v8Uk0PFD09WKpEa1bKNdXfLo9c/3LVjQPr5rZe+5aMf/1BLkzFsQmOiiP7PNYom8B8FAQZ8wQUX7Ni1+zNf/MLGhza2j22fuHCkiaAMhQMscq15cB6rl05Bjhc6Bf4W9ZrzMDmsDIWSjYwIwTlrODIcb964J6vowtkzpk4c01SmLLUqkWbOGFMyJjZkiExkyPfW4M0L1FkrTlmZjVGhJK3V0sxaZzOXprY/rQ6kSSUT5xylTtUJkXVkKbWC5o7S7IVTH7n/kcv/8zubNzz1g5/+KG2uveNjLz/33GM6Ohim5r0ZGOSsY7CSBu06DHm2nsIT01xWmE++ksBPRWijw5u/QIzgqOTJaobnr1jZj75512qj8LlCrYgoWRHnJHXWiWRO8tczt8tuLFFj7z+WUzN1hqJua+k/2HhEQ2bYiARMTEqGS7ffv+LG6+/8xDvfvmjhVOU0NarMAiEHE5bEBbt/KPlpPmYG1ErY2ZIkaS1LK1WbWit+gsGp8x4VLktqaS2xfoBLBM6JEsAmN6cLhlAUTlfOv775AKLAj17k3RUMIbzq088Ir3L9+eajEPn1mnM69VfmTygKh9JHRPmP9o9ODGJjI9eLJ+5d27u91yR481sufs973tncFBlmbTSMCzzHKE4A/wsU6K9W//nj//LN732raULLQactGDG9ycVqxRIrE5HGAs+dB6KhHvEIRBDNlUJDUoT63bZCNlel+6UerCBV19pEC+fOmDF9QjmOoZEX1LNBySgrxVGpFEcRERMbJiayWZYkSZZkmUgWac0m1WparaU2tUmW9blsMMuyTCTLWCwTV1IRBoGMmpLEXY/23P7T+6IqJYP9U2aPe+W7LjjzpcePbFU2VnxHFiZUtw0HmMYAb+AdxNe7nqf3tL5XvyoFD7sGaSzB4x4h7tTzBteDyJDQLKSAgzqoCjvVzLnE2kSsE2d9A6Hee8/DHA+JYtQgOeqxp94crv9VFEowyEc5mIgM79xb/cilX2gHffrTl42aZAZKWWZYnEZap7eC1USwlBAlMqouTV0tS5MkTdK0lqZOjCpZkcw657I0TdNazVqbWucUzuU2T2BAvCMQoCREfleAekcjq6GgCMRSvXZvbBPwr2qebrVOcGn9xanH7MZF/ozmwL5/e7ZgTXlC9SMXngoVQgTEruS64+0Pb3787g1UwUvPP+dTn/23iVMnmIb6tMDzAkUC+N+hwPZde//pwx/48W9/NXpmx2FnHlIeSxkyx+oVmqzsx2CDvsVTyMgr4ry0yu/YeoQJN3LOnJIIiJiECY4iO2HiyLkzp49obyvFDFDmwpYCZBoZE5dMXIpjE5fiKGYiaFqzvZWBvsFakqWpS2q1pFqtVlI7CGctaQomgWYCscRpljGM0bictnSv2nvnz+/GgJ05Y/TFb3vpsRce2dJhYqNCGZC7WTaaG/X2X6Ah/JP1k8sKr5nJp36R214PITN8R0Rh6sHLvwIRON+OkkeJvMRWb7tN4d8ykSRzmbhqmlrfE/VZQBuxvh7hqKF49GGyHn9Cks7JEgGCxZwyQ8FMAEWIV9y++l8//p/nnXvGRW86NWkZpNYIjiJ1qc1ETMQRhEVEIE6t4chzVpVKbcDrN62fVDBOXJIl1Wo1S7PMOWfVOXXBOiRIJyn0+b3hHAUaR+DEWXGE4E1EWrdcGvLM6kkun+ENxX9OQO4beZ8eh+mPhvpnR547/SHYCQlDWE2UNce1+Ml7Nm1ZuSXpSV/y4rM/8clPzpgxJYpyYW+RAZ43KCig/x0ETJow5mMf/9CmrVvvuOfux0evn3fUtNIYVoLzdDCJupxUptxwII/1dWlEOBk0AiqQhybxYwSsTCTqlFkcb9u1t78yOGX8+MmTJrY3NYPZWUklUyeaCKowUWSYWCmOoqgUGSLLREypdZVaVkvSWuqS1GZwzoIsKVSATJ1vTTOZyDLVovVr1qVZdfaMSZde9ppjzzjcjFJFknuyN9qbnnSmIbV2oCfCE9dG/GmwChQif6NVmH/lM9gYCi8HBcUr8vgMEqgBJBcPRUQwhhTKUQLJxDamfD1tkvcB6hR3QwIafmT9AFA/HAR3NP8zyCgpMxmGLD1yybEnHnPTzQ+ecvIRLZMq4uKOkWMkdaVS2TpVayElZoIymyYRl2bJQKU2WE2SJM2sOqvOudTWsjSrZUmWZpkTJ0IwuT1P/lgMQeuy4sAh+naAX1VDQl5M6jcC6T6jDvs0nxoxVutv4ZAPhfcll0LlZ9Shx4L/CeRTDIXRFO94SEqCskau12x6dNPGRzZXutJjjjj0Qx/76IyZU6KwT6/g/Z9fKE4AfxIUsLB33XX/6990ycbO9XOOnrvgiFlRm6ZIhYQZEOODiGdjNac1GncnfAmWswVD/4l881H8mD4zrFMiIqMEjY2OGNE+ddK4sSPHlEola62IS7IkSVNxzrABqctEGXEpNqWIRQcGKoO1aqWSVm3qrKpAXajWHYtTRw4GJtKmFm1adcualXesOHDOjLde8trDTlzc3BYhSoWERIlln8tjCGdQZ29ypgfkN1lBnXjXT/8JJCr5sqwwOYFgqxnsN8N/5Ae/8ni1T+70aVIE4QQggFOkVtLM1pxLrc1E6jZvQ84VlPdFqfHAaZ8XP7wX5AfP8hOcP+aQITCTll15x9bs/33kc0cdtuh1b3jJDbf8oWVa+1HHHp1VUmHnXKbi3YHYqSZp0j9Y6euv1FKbOWed2kxSmwzWqja1WRbWb0mu1Hn6DZg/9kbyBFTgnAuK2vyqyl/h+hMd+v4MfSEx5Mt0yM/Ijz5K9RSRS6X+BPjOCxFgHIQ5IzVRFpuK2bmia80dj0nNHLn0sE9++l8OO2yZMSA8Y9a8wPMAxQngTwIBBtFRRy3/4mc+ffGllz51x1MdLW3TDhoTNxtL5CRjXz/6UFafTsXQsozye5bDP+QcLfIgBL9TGDCGxQlpRNDM2r3dvQODAyPbOseNHtvR3maiSJiFuJrWaukAOHxd1mdFVFKnThObeXmoOkSOSeHgQCxQVhOriVzMSXntg6tW3rl6yfxZH//4u+YdMEOjTKPU18DMftBUhx5onhZEGuV/Xlc3Bh5Ql5/k7AzqhFjjO4QoRvUGYv21QuMbqz9W1ed4Pe+gEbEYEyms10D6bzc0+ANoBP96TV3/98YfhzQHcg0NiAwIDsJWdPLUsRe+9JRf/PK/jzvqmANnHvX+f/14y4fHHHTIHCdOI4YLgp80Tfr7B/sHa7U0yzJJJUtqLknSmk2TLJPM5W9zMCca4oFUvxZCEA4dIw9psFiNyd/AyzUSwT4BH0/7S/3Eo0Ne5fytzT326rT+kLPTM74oaF3D8clLsgCNlKUa7X1892N3PUFZae60qZe9/x8PPWyZ8WLpQvPzvESRlv9UMFAiPvOMMy97xztjF62+Y/X21Z08EEcuIjEKIvLlf1h56gmNPA9QHl1yZgL1FgGg9dubiFgd4CRihnUiAEeKeDDRHXv7nti0Zc2GjZu27+zp609SlzodTLLunsHOvf09vZVKLR2o1AaqtcEktU6tI7WG1HihijKsOnVkJIrSuDlr2vzg5odvfeyYww/+f59816Jl01BK4pLzc7gM8i5z1CDV/SMcGv39Q84fuv8w5V/BUNZ8NPVpt35O0AyRsiPnYiiM9+Zf4wmP3L7Nc+IMYiJDFDH7YMp+I1seDPed+qJ9E8vQB0ND/+rnf8mf5DgfTgMpq+rgsS9aNmnquKv/cMOI9klzxy344me+2bm3s8nvdiMiImeTWq1WqSXVNKumaSVJBgZrfZXBvsGBajURJ0SGlBV+0ME5Pw0y9CE1Xte8tPdzg/n6+nARDWnA52xXmHv+E7BPmqAhvzY+Vn+T8z803gtVgm9TESk5EVASqS1lLaXBpu41PU8+sNlVdOaEqR/88Ide9KITDeuQG6DA8w5FAvjzUCrH77ns0rdf8kYd0JU3P75tTadWECGisHLJM9m5ApQa8RHwbbo8PgJD6lXypV4eB5mJVYUNE1gVIAZFzCa10tPbv33n7qee2rJx0/bOPb02U+eQpW5wMBkcqKU1myaSJjZNncvEWefbjFbFOlWAQZyhWdq3Pbb1vmvvmTN+yic/+r4FS6YJW+JURCgcYYLRQr2zGuJCnfwf8poEpkLz8B+Gsxrx+pn3vq9th/xLyAchztUZ6SGfH2pnMJM/8xATmOAdF5jyzJP/vPDThwj+kX/TIf9e/7f8BOM/JwRhyvWrVtg2dzSd88oXP7n9qcceW/3yF5/Xuxk//8GN6qJIWFUTl/XXKv0D1cFqWqmmlWo6MFjr66tUq4nzLp3C8CMDQJgE8KudKThP+Esk/JpLpLz16z5pKuip8isnl1c1wvQfR734GOLe9oxg/2x5RILPkHfBCCyb+nY0NFbmgbh/Q/+aO9b270xGtYy45K2XvOSC80ysTKbgmJ/PKBLAnwcGWqPmD3/kw//w0le4/mzVnY/tfLKbanEkhsXbGlN+aoc2LG+0Ee69gVeDykAkMAoD8lI6Yh81GWDDpE6cs7AKNQBDjRPOrFarSf9Atburr9KfpjVxGWyqWSqiUDKiDAULq6oja9WqCAsbG48wHbvW7b796rvHtJc++9n3TZzZRmwNW0BVvIG085nA8w20T2zcJz7UCRffmoX3xt6nnkUgkYZElnqi89+4vtQyhDyEfWFDdgWHmt7HSYbvmFNwHVYlnxK0ngFQ3/f1LG/h09JCo9Xg9UcN0ZBPhKqqpI5Tp+nByxYtPWzmzXdcO3rM5EMXLbv+t/fff9dqdrGmbrBS6R2odVcGu/sH+voH+3orAwM1P7zmtzN4D1fvBELeDhS8DyeFehAn+BhdT0dEjZjfeEPoaVF7n3dmnz8PPb4RwN6UCmoCfbPvq/MsL12uR4ISEXkfUxgL1cg1ldOWwS3V+657UAdkZKn5be9468VveF1klCiqP9wCz08UCeDPg78Rx4wY86lP/uuZJ5+edqerblq9d30vW8NqWFmcqxehjfGnenU6tILOb0eiuhgRgXFXIvGOkS4ijlCKyJBTWKgF/n/23jvekqu4E/9Wne4bXpycR6M4yhrlBEISGUnkHAw2xgFwAmy82NjGgAP22us1Tmt71/7B2gavyZiMEMpxpNFIM5Im5/RmXr6h+5yq3x/nnO6+b0ZCxlgSmlcf6c19993b4XT3t6q+lYRIjDhSBasJWKC+zxxEKRcrQtbBilNRB1+jwGy5YRvdA+1bv3SLtvIP//rPn3PhKZQqG4GCYzlWwaVU4KUKEeW7EZkKDAUiunqYLYCYys0Vln6kv8q/UQH2lQh64Usg2ul+JcOMGRBijUK5j5mUU/EvVTYSkJWqfwjRUyVSIg0VaSAViFPX4OSGV71srD2+fefm177spX2u8U9/98WxsU7WpfGJ7uGpyZHJ6SNTU2OtqXae5+qrthgmxMAr1JZXAsUlnyGiKqJOQ9zDE/SCSJqFDVTO91gnfawFeJw3e5y7sNUZAYTY0Zw0Nv9ThdSINTMTe6bWfW+9mzSJ6Ote86qf+7mfa/alzAn1bGNWnokyqwD+w+I56GWLF/3Z//gfL7ji6s5I64Eb1x3ZMZXaeqopg5zCUZgiq6HQK4YZCSFZ3hvGSj7DXUAW6kBQw0oGMIXGUGZBbPVLTMQqEDEg31hSRUQdxPnW874bpfpJMgbW5RAFMSNtah0Tyfe/cEdnbOpn3vmK51//HKROqOvbicI3u/bHBUR81hKkZ8JIZIYoRBliezTvx4S/MhOxx1v2hAeDfMmT30bIGELElcgmFXx4RDoJQ3N9zZ2HbtaKi+E5uABVxzhgqhrKPdyHllY3CvfBM/DQMGeFUljbnr9k/uUvvPimu28antf34kueu2X9oX/6py/sn+jsOXRk58FDB8eOTHdafjaZdwKJfMNuxCB09C7iQSlmwiT5Fv/iF0qhGob9lOmt4QR6CLlYd1fxt7TcSfW1Vt+c8bEZEugmUiJJSGuqZMWpyRiuZptpu9HZY9ff+tjUyGRfrf7KV73iv/3Wb8ybO8iI09hmdcAzW2YVwA8j/oFbtWrlX/71J6+6/Ir2/qkHvv3Q4R3jnIPUxJQWDUl+BCVwaQ370mDy8wg12LEg+EkhIRzgpwVTSC4SEBXzC0MtqM9XJAaIlaOJJgRWR8xkRcQhMUzKcGpyk+a1B2+8f9cje69/yYXv+vk31+ckllq+i4zClVZ5hbaP6fTRWJ1pJ5ZuTKS3gGhec/kfc4iQh0wef9i93HaFn4/brx5S4WkoYuUXVL1OQKF+IsOGo9GH4oaqTknxezzwEl8JvrWE9wMUvoUGFBc/7yLXnz/wyL3XPO/KlQNLvvvF++9d+9C+scOtTsc6ESUBhKGsobA57M3PmdEK846YHVtZbZ/eo9GtEY/+VYas+E5cGEUMOvWcYGUJopMVSaSC5KGeBZ9xiWOKUPABVVhVlVhVJAVzO7EH3Kbbth/Z2mrWamevXv0bv/XbixfPZ2KNWv2J3JJZeQbIrAL4IYVADD3lxJP//JOfPP+c86e2HXrw+w+P7W6lWS11qRFWja0aPU4Vz2IkmgsaWsIgLW8eCuAbbvoYZBHcREjYLrI+YoolFUyMT2JXQ8SANSSsFo4YaZOadTu07ubHNt738OUXnfjhj/5S//xajpaQ07AfxCycHtTUmK7aa8mVZE602LXoTMAgPzDBj3L0QwmJQvdOH7alqFaiwimRQov9oogB+F34McCll+CxUaUgg0pFdBSu/UDRarCatHRAvGcg6luwqYpbMH/O819x5Z3rbp3oTF96wXPcAb7v2+t0gjljsio+d8q3to7uFAV3oiDrywKL6HUEV6rgg0Aqvr9GL8dWKIfo65RKy6s+mom61PtvxfcIP6tfqn47qkxiMCvDqVWTAWpcmnTrmDAP3b5xdNdog2trzj7vf/71X65auTy0+ZzF/R8TmVUAP7x4eD/7zLP+4n/++dJlqya2HF737Ycm904mlowwg0RcsNDiHEIGKJAjfgtVeyu8WaBi4Iy9KxBNyGCfxyCDz9eJOAwAqo4UIiCQIUOgRLiR1fc/tO/hO9adsnL4w7/zC8PL+yXN4CcBUzAxPU199BnGw5qhGOJBRmJCe7/HFCz/HleAS+Y9muOleqlapJEIKv5Y7J8AiIgVyX2iUzkS8hhUxgzw69lXz2eo8kdooUmC7exdMSjUkbLigvPXnHrOqf/2tX81jWR+38DW+/a3R6whX7ILVacaeZkKYkdIjdkAcfGCIxhoLA3j4WMKUOUwC5WJHj+m6DY186SK468cwMwVOGolgOgJVUPSAIQIAklAidTchGy5d8ehbRPG4eRlyz/y8d8768wzTcK+nVPpYc3KM1tmFcB/ShhkwJdfevlf/PmfDw8vGN8+8sgdW1r720mWkONI1DApcUgp17LoaCZFG7kB9U0eA5iEagKvDTRAfhyjFz1tVQCsREpMpI58WwNSk6ppuMGDm8fv+vZtS+cN/MaH33vqBSdYyp1adapOPPxTsJ6PMvRjRPdosqYEsohe8QX8iBfW6AdQNTIcabCg/7TcVzyhMtIZE4E0KFE/cwYWyFV8pqvrdQB6aJyjjrvXmK7uuSQtNP6IfkAMjACqKlARmxC97V1vG1zUf+u9X5u3eBBjnZ33705dwkIAM5lA9pB6ZygUKAe9EpVAOQEzQL+Ii7PRfL5QgfdaaITSVj+KXq8mgh6Le6fwraBji51XTr28GkrRBoGSiIAygq1JPe2mPFbbvX7/9od21Kl+0ooVf/Anf3DFpZdAZbbc98dOZi/Yf1YYRMDLX37DH/3hH/b1De57bN9DtzwyfbBtHBs1oT4g0jZaPP8hBcSbw8zRqi8c/ZjtXbVMtYDPXhgown+igBPys1YIgEMf9U3snLz9azcPNeTXPviui55/nksyQQ4mVQnp9BrxqKqYtPRQAhz2CsWwrf90cExicicVcBKT3AtHBihSQItdROa7yIwJhxDQX8owrzqoFWdd+C9OiETcoWImwleO+Vi4eDQ5gsiFxWQkJpD6OfbkrKpLMuqX93z4p1/9M9ddePUZA3OTnRt2tEdsDanx4A1lgJRjSn9InPen4SPhgQbzkXxPo5WEWTDYCSGiVCjX6FAcdbw9gWWU2cZa+UxxZbX3VwrtSyopXgi0HgGqfs5Rokhy0+w0d99/aNNdO43jPk5/8yMffv611yYJGzZFIces/LjIbCuIH4EwoKCfeOtPTE9PfOT3Prbvsf31vvrpzzm1NidxRpRFyBkKVIsGpSuAeEc74B+0bNqL0B2B4NMFC0OcSps5Ouj+a/AD5wXGM88ENlzTwcO7p2//8vdSyd/3wZ+84sXnaiN37AwlHB9VlZBJGZFm5gN8DEOz/BmQQ70aI1U4769U6BuKxAfiV4CQIuuPgeJ+ffv+OF/Ln52GzwkcMavCilonuXW5c1ZDNW085LCkMw53Bu19lARyJLRp0oCIXKgxjaehSkJgsSRMndoCft7rLnXTVBvif/6Hbzx2584LX3yu9k13kKsSF+xcqc7C5mK4JtQxqPhKbQPRcCViEZr4aEBwBQo2jqpqsjiLSFpRtSNScD8qSVUz6TBU6alQR1FcK1UlElUklNRcmrYHtj+447F7HmFrFsyZ84k//eMbXnY9sVLopDSL/T9mMusB/GjEAHVjfv7nfv7d73x3nfv3PLpv812bZcolMP5ZVhFQ6AMU/QGq/ESRih6BT6AivpG+VnjkmAIECsMUAYBC93hfR+Ztz4ZN7CF75zdupWzqve998/Nf/pykX4Vy9kHiYFFr3GmwKwsCpTTWQdFB8QY+ozxmUKiXKs6IQySymPoe8l5KdPDMT2n9R/yNnwnLoqqiEqx+FQWsk9xZKzZ3zoo4VRfSgYByRXUGLh5DjvkJKvGw+Fx4wxdtAVBWIcTO1KpiGpoOyMve8vzTzljx2P1bjmybbmjdUMqAig1uGYmUVxYF1RbDNgF3VRygRFy4huWVQcEBFi/C3VJm7MZrFI97hhovIsSFB1D5TC91FDQVk6eclNQQG2eSTrr7wUPrvr+RcgzV+j78kd9++Q2vSGrGsDlq6Wblx0NmFcCPTBioc+2DH/z1n/7Jd6SS7tm4b9/Gg9yhmqYMVkIOEQYTGfXjYlkRsoO8u80h4Roexv1jLM6FGlIVwMHXGBCFrHefTkqJumiLq7KiKf1uNL31yzfaifGfffdrb3jjNaapAmtMAivwXDMFaxOosiBRCxR4H3/GhH5wUApMZILmYoBE4XxqqidqHKnA1/UWdEPMhWWWaBCX9LMW/wUNIkxCcIAvc3DQXNFx0nGuI66rzkIdVEjFFyIUiqWg0o4GpcehiCqgqUoihLDZ8GcJDFysDPANVp1Yl+T1efr2d7+8Uac7vnu3narVXIOQ+KGdYCZNWH2EpjCSqUz38VV6EoLlBVOkCh/d9gMWAs6zR3KfaxqTjONCFusXoxYhkQiikdISP77aa4kyplwEG8jrcgZIRJUtIImamqbUru3fNLLu9rWp8PBA/8f+6Pfe+IY3JqyGZjHkx1hmL96PUgiYM9D/8d/96Ote8Zps0m65f9uuDXupqwmxHzDlk/YVIn6sOyMWSZVOgS+YKoMAVFQDRGufVDx8Q4hUVcUpGfa5NWSpDw13yN3yhe/l46O/8PNvfe1br0/7Ich9BndpeZd1SYWU2Zw+pZQiRkc/pOR/KND7ARUrNIL/Y8h7j4hXBjM8uBVMhFbQJ3gi5R7VqToNxn4uNhObqctVrB87U1rzlXgEyndn8kHFMRzbTaim2Wg4hfCaoD45lwH2A+dVRB1U1ZE78/LVr3vLiyYPj2xdu6lm2TCB2BCROFUHYpLCKwocWOTOIteHXl0sPqajkZ2aEQWofPQoHe4POdYz++tSKdSITkWhiop4RLmaGphNAxjLaSc5uGHs3m+vt62shuS3PvKRN77hjY1GatiIHMPfmJUfF5lVAD9KIYChQ32Dv/exP3zDa147daS18Y5NB3eMUkapmEQZDg5WjYbKrch2BMAEQu2X1xdMlfzwgLcFFSQAmBwJSJl9HS8Mc537pg/L97/wne6R0ff94lte/YZrkz6ybMn4giahas5ggec9OZfV172nV/BDhSHrKxiouslqAmdglgoaPfyMiiHia4VS8tglKqJO1PqMT9XcqXWaW3E2tNUpVFWxE4931Ujvsc39+O6xtIASqLrNqApR0CMCUQjIeRsdCrVgQ694ywvOP/+Eh+980I5I3SXkr7ELtna5sTDtsTKogDzbXkl/kmigRwouEjgR+Gd4OhRuICqcIKVQW1JcPSXf/Kc865hWXOyIiJVIoGqcqjNqalJLOo29Gw49fNvD2qKhxtAn/uQTb3vzW2sp+8gwVa/9rPy4yawC+JELMfGyRQv++BN/fPn5V7T2tx++bdPejfs401QMgYVUyHEYouoJFo4/TexsWebOF5nzQCS6hQzYZ6X4VMLoQmjNcXskv/mrN2t76n2/8o7r3/BiHtCccnCglSgWD5e2o3oKP0BxNEXLlMWodmZazooq0kvxMtYj9C5KBXG1YEKKzxS5nhGbBSoQF5P9c5Wuc13nMiee/Q9J9MUGKukrvfvFMVVA5a8zZUYubJh4XCniihqYfTs6Ayioa/OBBbWfeu8bnM0fvmujsWkSSDwwKyQXEfHpPwjuisYXcdWiggnd8MK6EVDmBQU1Uro9Pdk+/ijFx0YKb63C+xfnUJ5ixUcqfQcAasDGMbVoZNPYg7dsyCbzwaTvt3/nd9721rfXGjVjkqofMSs/pjKrAH6UUvjlDFqycOmf/NGfnnTCSeO7xzbfu2t81zR11eSAkIiq+DRBZd/wK9TiYkahMEXDrDC0CEiUjXCClFzCIFUhaEJa12ZnRG/58u3Syt//qz91/euu1YbrwpIxEKg4+O40CCko0QIPqqiKlSET5Jj/zTjbYLFqIREXZgQYfV581FbEoBI7FIh5jj6ZVUUlzFIUtaK5Rddp5pCrOG/+Fzx29bhm2P8/8FJp9ddqGCQ2gtaYIYsIjMGU5gjHGvqmssu1e+qa017x2hc9un7D5PYp02oaTZSQuxwAwaAk/mMMt0wR9YehRVAoKkt/hlVd1/Mi6NFAnIUVDDeR+kgBEHpJaQ99hLjHUAlOpBAIyAHOKKeamHZtdOvEQ7dvoHYyt2/wY3/w0Xf+5E8lRhM2T2qRZ+UZL7MK4EcpFfAhBl1ywQV/99d/Pbd/7uiOsU1rd4zuPsI5JZowDMQxA0RCCvJcrwoVTeKCOVyYah7jYkMhVYLzOaMsbKBOUk3aB/Ob//02abc+8Cs/++JXX631LEfOJlBMKuLJF/FjXmLEoQBiLRj4o6zgGWeJCmAWwIWSwwggXskoQgx0gIiYTElvxYSiCMhFSSz5k3dQKyLinIhV0VAn64mZqgoCKv88Gelhriv+SLSbC3yN74YjIo20uaoSjDqCCEMz2Ny0X/2Oly1btfDmb9zRPWxT+H7IRU2XP3aKtQGowDHFCjCUJH3cdcwULjRdcfm4gv5Q9SWExaZLrw4zFUj8SZWDCN38QIAR1GwyuavzwE0Pjx+Y5sz89kd++x1vf3ujmRpO4jV6EiMIZuWZLbMK4EcqFZOWQQR93lVX/9X/+OS8/jn7Hjuwbe3e7kiHLZEAgIgKq7IHy0C5xCewIOpLugYI/LEY3wDIKZzH1Rqa04fszV+9LZvufOg3fuYFLz9fGllet5oQgWCFnItxRj9kRo865IDS/ggqaKGRj6gk7FS/AniGokryA2WCqjf5maovEFlq3/BMZ8BakTKkChXx3oALtbDhCIqy2qJ4ye96hoH7RNIbNp7xF4oJNZW115i0H8CcYo8PIvbWdoasfyG99Z3XTU9ObX94s7aIc2MoJZDAKnvvQTUGsMMA+ML9Crk7oQBEK3mf5aWqsHfxhgjVZPDxHYlVaGWVYLhKlchBmVJMIXSiyqIQBmpqaq5vbPvU3d+8MxvL5/UP//f/+UfveOvbU0OhzSeCLn+SKz0rz1iZVQD/JVLYbobMa1/z2v/+8T9saPPQptFH79nePjiVSmJgPLCKOigITKpMZYw3bkU1NgdAbCstJGIERgwRuaTWTfNRvefGe/PpqV/9tZ+9+rqLqZZZOGJv+zNisj9iugeVxnmJ9lQa/jGbvPRAHvdEKRRLaYSbeOYzKaCAXN5XIMCPxS1DiMFa7TF1fd5plV/ytnfV5kcBoJWD1fh37Xn7cS5T+Z3qb947Oeq7lXQtViY1PtYqTuCEmQjusmsvvPKq8zdv2DK+YyLpMsN4mFVnEed7URH/jwfpT6HM+ew5lkobqGo8QFChjGJuQKSDgm4t1yGeUBmm13hdfMKrssIo1Ww6ub1z57/fnY3bJhq/8eEPveXNb6k1ksD7V67trPy4y6wC+C8UH29Nid/81rf9t1/+ADru4OZDW9fuyickQcLKChAbUoKI54gR+Fjfm98bar4ylgkJaWI09c3QmNQQ1fKmG6vf8fV7p0am3/9r73zR9Rcpd5UlSRK1voFmBuMcIdZMBTTRgiKoWITe6AYVGSjQHvwM7FBZ3uqLTFEBckLZqCiy2JF5LnkNIk2IEja+UzTHiY6hb45nISh0yGAwKYkfokARH0N6VKktKoqsVMBPLFr9zLFBjXpea9U7QuHokIKZAIizCm00kjf/5A39DbPhrvWNrM9Y45xqtN0Z8GfEIb0nrKkTJxrUW+wNASBmcoIJxEoch9WEGRC+VsAh+JAUisqr517G6lU09iWk0OGOnIqQBYSVDBJktYk9nXu+dbcb1znNgY987Ld/5qd+upaooVne/1koswrgv1wUSJPkfR/4lXe9/Z1uWg5uGtm3aZ+d7hphBosoSIkrJHSFjI5FOwBJTAMHKSXMIpoI2XHc+a07pw6N/PJ73vmS664Et4RyGLVivcUngEeJOE2gsCHLHcUqX//nCttQtoKuYHfg88tDJRAXUWTETVb+HrUJxa/HQDCRARkfD1AgtkqiUqmQQWgwGZOECtqpJGiK0EvMaOw5AHocaC8P8PH1xFHeQ0E4eV5eg97yXIqGwLGV7spTF7/qddceOHRo39YD1IWhhBAhn+AJm3gC3nKPfZ8UoQ2sV4aR6w978TUgDr4GOvQTESqZfF8CViSz9pyJ+mTTcrgY+X5QIAUrEjF1m0ztzu74yt1juyfr0nzf+9730z/1082+mqEkhqMfT1POyo+lzCqAp0IY2t9o/tZHfuflL7nejnV3rt05tXfa2ITFQOCfaEuiHHnZQJEXProSlEnAVtQxMVPCarotufO7t4/s2fdLP//G62+4xNRypNGlcEKq6respfaIbEOVVQ7IUcmrL2iZGTROJRgRvhg1SYUw95jvTdGy+V0lZhjL3bx5H3uFVlVKJIkohohLGqkC2BTDCVWGfCY80RNRQE9SKopZCycgqshy/Xyxlqhaspq0X/Sq55166tL1ax+UMTS6tUQTOBWxcUiAqC/r1XLSQXSriguiZWWDnxrhpOjhWgQjfJyg+Epg2XoOnjQ0X4oqBxQKlcmpOgZSmLTbGN/auvfr93UO6dyBef/tQ7/23p//hVpqOPb5mUX+Z5/MKoD/cgn2rOrcuXP/9E/+9IKzzps+ML35ru2T+yeRewRVgRV21k8fRMDVYHBpeHj9htgQFMYqOrV1t60f2b3vp3/yda98/Quo2bXGwhARQTiwPB4ofO68WN9aRwu+XAMXXSQFRSj1zgNVQbhq0lfOLLD/RMTFEEhQbGWG8p2Cko8h5UCj+/kw0SWQuPtAGxVx0Miaa3XvYZMROGcQH0cf7xPIEzkBR/25zHcqrxIoJPt4LM9NVp+X/sTPviGz3cfWbuY2pZyADTOrcwDYsAjEl26FthgUWgxJeE1KKmUltFhV53tfQCP7X9HJxSEV/SvKoDnFjnJlh6YYR2eQETJdOrx1/K5v3NvalzXQ/5Nve/t73vveZn8jSVKUemlWnm0yqwD+yyW4zsSGzPJlK/7gD/5o+fxlI5sPbFm7szPWIUuszGr80xhiqRKQQOEQI3wOSlADSl0dU7WHb9q8a8OhN77+pW95x3Wu3s2NJZOI9fyxU4IQK0j8AEUnuRXrG8+UdniJ/hHEIuTPAFst/IMg1a0AMbexh1gqPgjVnjKx6DNEQiuqF6Xwv4QzFytwqqIikCItk0rl1Gv7V7Lle462MNIf7wJVPzZTZkYTojtTiXYXNnv8iKg4FTH5GRedePU15+7YtG18T8tOk1iCMIFJBE6gJNaJEyq0XLxgvtYYDijQ34kvgIvaothb2dihYMeodJSCRldVUibloFegCgdyBpQoUzvdu+HQ3d9em43QnIHhn/uZd/76h/5bf18j5vvPYv+zVmYVwH+9VPJcFHrlVc/91V/5AGt6cPPB3et3yXROghibI8R0cEHs6U8KhkBDdNSBO7WNdz66Zf3WF1979Tvf8wapT1mTU0qi6hu4EEEQWrMp1DrJxebinK8xLazaOGkeBdVS5nGWJHr1l5lnVpBIkXCOXY38Nv1Uc40hg+BYoAyk+iRHAsAU8iMFcOqcOitixeXihHxVQTS0iRi+eJqrvIRWfs6QMhvmceQHaofiVYUcK92OwIX5pRIwQWEcJGnIK9/44r7Bxn233z890qmpgQgzASQi1ubiSXwHqO96SnAg34jOqTgVC7UqceiBSmR5IrWmRRe86kWKvSuU4jxhEp9N5i8VgYjUKBmHtIsd6w7c8411nf0uxcArr7/+1z70/gUL5hg2RS7ArDxbZVYB/JdLb2yUGOZtP/lTb3/bO6jjDqzfN759SjNiB7Kh+Zf6mcAGIDVKJEasMpQILCln9Q13bNzy0LbrX3zZL77/9TyQ2YZoyqSM3CUaUvJJDITFeeMZDpQ517XOOhfol5hfFBPqfWy4zDxCAOuiT12pEgqmOb6g+FkAvnRBY6JRZdhN6TaEoK+GlqLECJjlRKxzubXdPM9snovNxVr11bbkN+a/U7TRi90yenLlZ4BWZNUf9wKVn67EGB5fp3h9pATh0Ie0cIIIDOecU+tIhPJFJyx449te1m6N73loO0+mqdTEqhMREVaGQpRVQ+8jdSJO1HnQV/WknRAckWMWQ0JQVjX+kmmldx+VAQNijb1klUiUQKTsu7h6I0Ihhkzqamayb+Mt29d/90HTag72D7/1ba//+O9/bN7wEM+MAs3Ks1NmFcBTJGX0DWg2mh/+rd+65LyLOyPdnffsHt85aXI2ltQG0lfIScjMVlY1SqRkFImt73l0x5aHHr3ysgt/5Vd/sn8uHDuTGIJRIQMGICJaIcglps9n4jo2y52LoUx/QEVg9qhIqgaDG1qU8RY4HqKfWhA50WmIiMwMjtAcin4JMf8nkkVFHqePkYiqEnJnM+uyPM+czZ11ImHATRy+VpjyCgl0UJnX2BOKnrH48byfwBXQx3k9Q6hove3D1RrXEkKkjNA6W3PnYOTaG6689Ioztj26fWJPizJASK26zIXlc+J5fxWNTcFDDmjVX4tBAo4dWn3fN6/DfZVBqFQOjKES+bwBEGslws5C5FIQO9Wp7vqbH3n0jq2mXW+mgy96wTW/9lsfWLBofpqYMOBlFv6f7TKrAJ4iKWECxITFi5Z+7Pf+6MQVq8Z2H9l08+apXZ0GagaJOg5zpEAKdk6VckNKjuu2cXjzoXW3PXjJBWf+6q+/MxlWYUmTmuakThMiIperKAyibR6IIyIHVUJXbMtmmQiYNcZqA/FMDuxZZ0WgIWIlWOChCtK9yOuHBMDx1UeWyLE6VjFAAjWqBhpsdo/8IcQpZTRTRMUpBAxloyaxRF1oR6UrkgOOVVlAagpfIkab4+wUxLBnmekU1cRMAqei+7S4Lr1XqWTVj5ZobfsQLDninDhnOBahUJUbO12LiCMml+RmKH/5G5/fbKYP3bNep2C6CZNhk2Q2EyeJGHIAmHyNQ1AGRdVD2cUHgf1XFRGIkFSLqkFQ1kIRR6onAYzzKw0BO4Imkqa21jrUXfvd+7fetzW1/YNDg2998+t+/xMfO2HJ4nRmRu+sPJtlVgE8dVLRASDg0ksv+cM/+qOFwwumdkzvXLvj4JYD6AoLkRAJFOLUgVlFrOs2qNE6NHXnd+48beXJv/GhXxpclDjucgJnLcBMLCrKyh6ffSp4aF0QYFGgVlzbdru269QG21+LRjGhpism3gDR4i4O3v8T7MtAIFHBBcVCsB6bv1ocHN/jwlVAzKcJDAqBiBOTGjYgCrVREoaXx3Z5paNC0JCWT4WJW5zGE2PYk/iM/iAQpMqHNG6yiHWElRJVEbYnnn/Cda95/uHR0b0b9icdJiERNeAEBHEi8A1Oo3MVXIvoJcWeR77yNyRPFQ6YCHx8AKqicAInECUoq7BzsEhIWcmAFakwtXls2+gD31y3d/0R6tb7k4E3vfq1H/6tD5588qoacygW1sfRgbPy7JJZBfCUiscLX0vPoBe9+GUf+Z2Pzmk2R7fv3b9+d7avRblRy2rVU8JKTsk0zEA+obd8444l8wc+9Os/O7y4IcZxAkDJdxX1lcMa5gYAkJAWEiz3or1Q5mw773addT7hqDCci3CwDzr76qCSOK9k1ACxM0WAvZhoUmF0ik0VWYe+w088f18moD7eCYgQAFbUiOpJ0lerN9N6aowhYvUzrDiMYynM0wLxi5qxGGuOKfFPLI8fEyg/cmxXwPsfrODQdDPCfqzkqkRUREScOmJ96WuvOuP0hY+u25gdcSYzkgNi4MjBOnIhpdN38hEfZ2EKXL8q1JE4EmGncKRCEoj+0A1OldT5rC9GzJISUXEgK5yLccpKrqbj9T0Pj6y7aeORx8aR1xfMX/S+9/3ix3/vo4sXLjJQgFVj5GdWjgOZVQBPgwSYVK2Z5E1vftPHfud3a9300IaDe9funtgxqpkyDJw3MW3KnOS89nvr8mn55V98xynnLrMmp0SZDIG96e5ZAZ+EHguEpFIqqwCYmJisSDvLOnlm/Yj6mAjks5AqR1hY29HGjaQ/xVY04ZMFQRSTi4pOERoIn2hRKmlki8QnIQIxqV1UlUGsqJMZTBpDjf7Bel9/Wq8xs1Q8hXhCFI67fAfVVNRjrfmMK/DEFcKlVHQAVd+lGIgFevwFv2hCvusPSEUgkKF5zbe96zUW7pH7H9FxbmgdQk6gBA1zZQAF+xI4H533QYWiSJgE3i0LTlHBx2lZnxbEKfyEYYSoscK0kO93W27b+sjNG8e2T6BTO2npif/jz/74V3/jA0Pz55okFnzNIv/xJMnTfQDHsRAztJ7WX/+Wt+/edfBv//bPDm8aabU7J6er+1bWQZwI102SZn1bHty5b/uet7/tVRddvaarXSQ5AFHh0OjdQw5JKB72iZWBD9cCKIgIrCxdsVNZN00SkzIIXHZXAGZgXHwRYp2R/i8JnxBs8FFRoeBqRE4nNoEIdInvQqDwBQ5Ww9grlHEGJcAAYLAxNdPoGNNWalvr1CkEYZh9KJPrIXECvxb3+CNrWBZ9lvJ3QrXRQsgH9ZpR49HFWKz6dj8QciLdMy889ZWvvfbz/3zjkkVLF61eKn2aIc9VDce2bqIQKIcVhCj8OSspg5QB3wDa009clg0SWIGocmI/URVisjWXiUxmk7tGDzxycGTX4ayVk+XhoeHf/b3fee3rXl5jFueY2R/trBxXMqsAnjaJzIk0Bvp+/aMfWrZizkc//rHRHWOb0y3LuguXnryoyQ1jaWoke/jOjReee+Zr3nAd1eGoQ0wiGq01qTDaAa1i+WeYPRh2p2AiAlvotOumuUmYGyYJyBXhS4sUdwqxaFAs3iqoldBNIqI/EP+lkhmKWCwa/AyNoWlVOPWIHoucA9+tvkaJCMZ3TDMppQBsK+84iCLOntQyIF0Q5+HY/LISxRBFr53fU4XwpKVnK1pcvMp1rLxT9MwBkUJECY4NxFhu8Itf99y1dz24fu1Dz5m7IF1OecJ+mX2pFoViD1ZyJKWXVeg1IvXlIUTwU9iK+ggVB9+STsgQkSMSshnyiWx079jBrYcObj/g2tblwjAM7RtKzr3ojIQI4owx5e0zK8eTzGr8p1uIDcFw8vqf+qn3vO+DjObuh/Ztv23nxKapWjvNDts7b7xjIDU/9bNvGF7U55DB+F6dLqCqsipHBqaoJwXKB9oHDX03MTAzDGcik1k2nWeZc5G5iV8JtUJl6kmoSwuGetn/Mn6k6ANUBogRtxjYHlEralWtaq7IVS3USWX6FQEMJRL2A9NCaiSD6kla44RjP7QC5GNZGGL8GkXTtBC7DsGBIh4RM1n/4zhXFHkVUokwFCHugm+jyEeFJnkKFXGi4tgOrUhf+aarDbmtDz7KrSTN/XSgMNYtpmSJhuNVIgX7UZJK7OIF9y1Aw9VRFWF1CVyNLCfQBjp1O5aMPdredtueu7+09oFvrdu9YVc21XW5MINIAJo4MvGXn/izkSP7Z1QWz8pxJbMewNMswWCF9Nf73/ue947s3f+pT/3f0cfG7ht5eNfi/bnrHtl/+F2/9OazLzxN0IYJ/INPtCnaKlRMcYrIGyfKFDPCi6/COaCT51PUNkRpLaVg9ftQbFABFPl9ibxP+D5Y1PmDL6q9IsdDRKHZg9cFQhQGUar6bhYa3Qq/ARBUYxjZG/hxchmgICPhaCjwLCjP2JMxkQ2aiWBUWZAC9H8IG5dipLvad4fKP2LGrn3UPVQKCDi2ZVCoE5cQXXLtRQ/csvWOGzcs2bpw3mlzJREr0Vnwg+LV640QBRKoL5cjqKiS54RUyJfFESDEjtnVBMg63enRqZGtR/btPDh1aNy2umV0gAkQNawOgEy1uv/62S/Onzf8B3/034v1mekwzcqzXWYVwNMvokpkANfXaH7kd393/uDCT//zpx/btnn6cIdTc/aFq1983bVIrDXWGKhVIhYw1E929EwBldAY6oE8X8wKcmqBgnICwMwgoVaeMZmE0maS1rhI6QSR8fFkAkJ/eVH4VHJPPQcSSD0uxsmzTBTQXUIFqwpFrh/wEeDIFmmY5gUKLYpVAFKICWSUKEFUHMSRC8cQh0AWzTWKQrSQxhQ6HRTuTBGE+OElRmd7Ygra60doQf8XnFRRYOe1Iqmq+ChrfSB51dte+tDaRx6778HLF15r0jQzmVEQhYYfRgxEmVj82hIxWJwQEZwSRBl+qqYRk+SmaVPbktaRfNeOvft27R4bOSydqJUNuGYazXqjr9E33Ogb6nPWjew+3Bpra9dMTrX/6Z8+85Lrr7vquS9IjMEs+h9/Uozum5WnTSJ2QNURkYXs2Lb7c5/93O9/4uPNObVf+/13XfmS87o0jcSRCgucTxhkQgBNFJwE+cnewTwmQETVqnMQpyoQzxwroOIgWjNmsFYfaDT7kyTxQUUK+OX5Fh9WFonJPNFoj60nix7GnnD3m+BAeUAtxIXAbwmZCuWCmwkxzkjNE7F4qkadIhPtOtu2rmNzqy62iwskTxlpDX5OYMK0J8PnP2vUHst30Oq5ACiqEZTgU3AZhQ/iXSelwNCbVFPq1L7w11/5p7/51lnnnrv8slW2X5SsiiqDAN8kHARhVVIwwYLARGLYqIqIGEoNJWQlm8wPbTqwa9Pu0QOjIq5vTv/w3DkLli9ImmZg/tDQ/KGkmdQaCRE4gaowG8lwZOehtd+6//CuCYPk2muu/NSn/mnBvIXGzI58Oe5k1gN4JkikKYgIksKsXH7ipVc+r3+gcfHzzrr0eWeq6bJxTpw6ZZ+oQRwzLJViKiJFnkJjDRYpkSrHLpMMVoWBioqCHGumMpl1lJlIG0masom8TkhijF3HqCwvi1mHVLRF9sFeiHdERJ36Sl8VBVxBCQXVEltBBIcFgV9BrBojBXxGuzoR55y4MDVTo0PiTzxEhDUkrRdB3xKcf1jop9BA//EciKPej2GPno5zUa2H0DipCMBiyTXq2TXXX37Ht9c9+vDDi05c0rdsoFNTIadwCiUlJnJUxHMIDDLkhIjY2NqAabq2O7j30KaNmw7u2A2HxcvnXvjyc+eduKg2ZwCJMQmxCQEasHbFkq/uAKmTWl+ycHjp1UsG7vjcnXsfGfv+TXf9679+9t0/995Z/uc4lFkP4BkiZTq5ABPjeMvb3vzghtt/73/96uqLF1maViPiBEpGId7QL+On0Q6N2FhkoUBVVKyKI3W+144nI1REVUgBZdW6SfrraX+t2TBpwmwgEKfE3rSOLoWq+t71MXUzKIDYfNi/TwBCMUKoR/BHRsE9QZgBA4lFveTTZcjXP/ugAzvVTKSb2651ue+DHBNbKegeUfLzEYGKJxDX8z8EZVVy/6gvHquNqFZaAEWHoOCnirBwdfOhYSoRpSZhq6Zbv+NLD/7Zx/9h3txFl7zoimye61BXWUglEcPETonUkIphw2ASAxjNJZvMdj62Z/tj21tHxsxwetrZp5x21sl98wfzpnV1K6QwrKox6B1XFwBDICqamoSVajDtHdM3/cudrcOd005c/vl/+8IpJ59qeNYiPL5k9no/Q8RDIhQkIl//xjduv/OuV7zx2hNPW+7clNaEVH2DxgJlKz3fe7dTRn9RaoJgO3PUNARSP4xLVHLRVp4rjNZQU05DdilFYinWCxMktqMXT9Mr/ExBrxs0BgUKtPZfL86QQl4MAPVFvhxDtRQjuapkIbmTTFzmxDeDIxQGf5mAFAzrmBjau5g/5FWoKlIUfouWybHVHZRvaeW3YnpPjyfhTxmikrs8TZIE7vxrzrj0xtO+9/VNJ2zbt3hoWW5EYAmsSkJExIYStmQ6nOYmm3ajB49se3TrgX378k42vHDgzKvOXXT+qmR+A6m2UgtWJ55bEyZWEd+TzoeQVVVFATCRiFVQpnZw5dCFL1hz59fu2bxzx9/977/73d/9aLNmZrNBjyuZVQDPFPEUh8B1u61PfvLPFiwfeu3bXpHUrLD4lEnmQH17vz5+z5MeFBsRAJEF9zFXb3WSctEhngElsHo70YeU0XWiyABxSSpMTMTsC7YQR8iG7zsNrWeKojAVQEUq8NcDo1rB6BBeCO9zyHdBZIAUQK4uF3Rzm4mzsay5zPXRGG4GV7dXwf8nwq9jZAtVoP3ob/ZEegOJpUd9uLqBHq9Ai0B0pLB8+zqXu0x1YEHj+je/6NZbN21Y/+i8kxbU55gcztcCs+OabRAYue1MdLc+tnvHlh3jB8dhMXfF0KlXnLvyjBPrixtTtZatZ0LqQ/gkpCxwUHIMVn9ZvJcvgO/uEYu0CSaDLD5z8fLty7av3/mvn/vcDddf/5wrn5fMOgHHk8xe7GeEaJiaQozkuzd9f9Puh6979dVD8wWpECt7Jh+kIgQiDjaphBhAbOJfGpvwSfQCBgTkKwWKuKnvxQ9RCq0riUQ1V7QyK0KasGEY1eCUKIK1TwF/JUSSoxcASMgw1TJPh1QDtQOEWoRoTEckJcSUJUABgWaqmdOuc13rBOo8laHku4UGur8sf/KnwxUnI4YnnlBmqIEn/kIMT5efib9VdHD4mxRKjuJCxLpp9Qn7YcSWMUps1Z15xTnPv+Gyb372riPbjyw/ezkbtYBTqgtjLN+7+8C2x7Ye2j+iFkNzePVlJ6w6f/XAsnloUsZ5x0wRROE4nLdnfNi3WRIV311PRWNkRGMYRRks6rrops3kzKtWT41OH9g28ld/+clLLr6cG8lscdDxI7MK4BkhMZwJAb7yla8m9fyKa85rzDGOOkDs6xMjp94ajtn6HIMBQQmEDQbdAMCnWfo2nHGMbIGc4Ng0mlTEEnXFwYph1Dh0IQBFLkhCmzIfQ44UkJ8qo8xMiDAUs1I51F6VzcUoEiO+CY6qglihTjRXl0M71uY+juyz/kPg2PeJkBjujnUPMbuoKCY+dtQ2ilZ+Vt6j6rrN/IrXnTP+UO4dlWhADycUrijFELcfhKDk1LJhkHZtNyF66Wuu+f637nlo7Ybly1bOWTZnQqYOHT687oHNI4/sz7raPz897bwTTzvrlKH5c2QAeYNy7kiiTsV3DTKe9SHxKQQQJUIYoOPgbYJwNUo9Rj6EDnFM1JzTf/olp903OnnTbbd+5zvfuO5lr2AzCwvHi8xe6WeQKHT/wcM333TbOReceso5yx131SiJhuhpSJ0vp0VWiBWOpWC+aaYvG/J5md7eF6Fi5HxI4Ve4IlgcjD5V57QLMWqcCAe73cdxw+487+9RMdjuRR/LcjBXQJ0yjOBD11WyJSARnCKHZiqZtVYldyGbCMRF1Dc0Air7TwcqLIZFQsZS6Rc8jjz+X2fwU0f/tXihsZ9EpX1euBKV3yvN6rSyEa8pRQSqbAyxO/2CZS96yUVf+ed77rnxvtpQbffInunu9Ny5zUuuXTPv5EVm3hBqDHKtRJXEz0Zj7y9SDLSETI7emHQvfaVl9hRiZw4HsEVuarzw1Lkn7l6+/b5tf/k3f3X1NS8c7C8mgs3Ks1xmFcAzS772tW+Nt8df+KKX9Q2xGqvkKFZnQTWkQgIFxFSZicL6ZQJ8l+gQlA0EDYFDyzgCAAaXDL0qQiswdULKISJNnkgIPoLffA8ZTrE7KMOrgWjpF+AdVZXGfKAQytUwsN466ajNnBMRW0JZkc1ZmPsabfXIAkWG3bs7CCrgieQH4fsPkKOoIJ2hUUIjOF+jTVW1EQarAepEDTMRnLMEbdZrL37V8777tfu3bdo6uHjggsvOWn7WCbXBPjF5t5Fbto7LaEtQP8VyIFRuU1ymyumXYYmq39Pr74iAc9han1l13gkju8fueeCBr33931//mjfwbE3A8SGzdN8zRRSwar/89S8vOXHBeRedRqyAg7rAeSj8xQqsjjeSe5sQM8KwFQWp70JJAhKEpjgUtEB47ee9cAVnfRNiKFSUfDVvGDqi0Bj41WJzTGyIGIYpYRjjh0AWh+EpaeIKL+MLuVSRi81VOiJt51qSd3NrnbNOoCTiT8dHCHy0m2Lj68jFlDmOBeRRoTPiJ3+IKzBTExQ7paLtUM/uysUspwKUtn/00KKaqLZREj+6LeGOdFadt+LFr7186NSha97wosWXndJdXh+bm030593ECgnUAf46hjpnLZOAFWEucFADJdiH//zOynY/VJ6lv7IiKhZZY1H9hLNXosb/88//fGJySmazw48PmVUAzxSxzu3bs+/B9fdffPUl/fMS4ryc5Vqa3ihgrsyAjFBbwKV/M76P8Bk/rTe+KNIxS0OxaOYW4B4ClYLARpi4Urzy6fzMTGBiUwCiT82XAFpwCqckIX5ADpKJZKJtZ9t51sqzjs2tH2QcghKBK/cN0FAg0Yy4bTXrqQS8H6UcpULiLo7eTw8ZVOjo4rMaDzKqLk+iKURdDqU+felbrj75kuXdQZnut1Pc6nI3Z+uHKJAwij5/xZ4UVQoqvqdU7K2iAbzaKIYqRGcCPqDjJM+dQ0LLTl82d8XcR7Zv/spXviTi/lNrNys/JjKrAJ4pQoyvf+M7lMill52XNCAsEBdy9o6GHG92xrJaZQE5b3sK/Kwqb3dytFGVS3vWI0RUBEAsQo4SggZwos6FQTNecSDMfC/2HouP4Xcdkno89FtVK7ACq+oAK8hEMie5c7mTdmYzJ9ZXp3lXwXe4JvL+S/QheEZkt6gi7nkHM+YD/NASVu6oaIEea+O9PkHsAxpfU9EtL4YnPIgHh0YVTkXY2aQ7uLx24sVL2v2tjNsMYfVz0iT0/uSwSF6dlE6PouoMzKgMmXH8WowU9j5dqCcM9YEucWY+Vp6x0tT5L/7yL1udlshRd92sPOtkVgE8I0Sh1rrPff5zF19y7vIVcx2rt5v9XyP1XJq8sdg3gHlI8yDfyscPYFcghAu50nqBQgRT1ecGhbrcCF5xf8VkMVFxTqUYOKbF/JiA1D5+7C19K2r9T9HcqXWSO5c5yZ1kzmbWdvO8+M+pOBG/m8pJFfR/CcRUxgOCl4GZXL/ODLjO/Ot/+HJU6yoA9KxPz5Yr71a5J4rbiV1JY1O5kEBbKAwVsXmW9qVLT17coinLztP6Sgiz0kqrXmfsqvJm1ezX4rCKldVYzKdliFhVlUFkSFQza8XIwlPmz1k2Z9OOLV//8ldF7H903Wblx05mFcAzQpzqpm1bDhzed8nl5/QPsRgVFngaJj7FFXgJpDP8OxqYF8AgUPBaoKYf78tKnppnAlH4lyMjzApW9s30KSbuK5GAnEAUzokV3+Ct6PDgoZ99bwnnxDnKnXSt/0+7uWS5ZrlkNr7pJHOSi+YgRwxlImZmTycBvs4tHBXFozEEo2SIQOQ/S+w/Gin3QiEdZaFHDv6HEKrGcHs3GfRuNORnRBwoRoK9Ji68Fa9ySYXIZ+34km5WEEQU0l59xuKBYRbpquYqElYknEKvmqn83huZqEoI4BTqQEt1AF/hQeoL+BQEFeTOYlBOPGcF1fP/7x//T9bOHqcV0qw8e2RWATxDRL7yxa/1DdROPmNFUrcw3AP4QPVFxZRHGRb1bAlMwCcGBcAJnRyo5Ho8zrL/j1BCf2nnejtRQszWKcSpFbHOOhEralWsqHPijf3cSdfazLrMucy5rrW5SC5iRaw6K86KuDDn3tcSAwixTFLP+YBCAKA4J8Q2dwr1rTRjkCGgYJyN+zhYrz0vtbSSe8IJvbR+JeJQMi3V9df4ZnxRBtg1Bl2KF5H4Cc4S4MuzKcbRfT8mMNikCZvB/r4kUmDV1J/S3EfZADsa+zOCH/H6afVIj1oijQ0E4dOFVSC+YeycFUPzV8xb/9j6h9evc9bOaoBnt8wqgKdfFLBiv3/brWeuOWvesrmSCKmFOpThzQJvPDVNVEEXFKmY5GkaEpBz8AFaw8RRGzAxMzGRIWKECAAFBA5bR2V+ozdbPWrnikzQFXSdeou+7VxL3LSzbee6ql0nVuCcBqfBNwSNXoyEwTJVm10KP6IkNCjyQRQSmPxEdJ+iVEagQ7iAI77iKFboWGHhsolnAZ5VLA0fp4JbL5Yk+Fxlak/YytGhGSVSMiDjjXsikAFYAFc079ZAAqlCIVBlIhXX36gtmTsXNhMnxMYHCWJM/XFunKNOuaSACscjgH81khQiw/FrEn0RdWx5kFaes3zaTv+/z/1bbmdDwc9ymVUAT784tRs2rD9wZM8Za05HapWskkPso9lrgc4IRcbSJwrPvLByAmZKDbNArVAY7cIUEjKjsQ8PehTKACgkk8dc0YDJkcMgAF4TBIpfxDrJrbNOrIgVFYRCM0XVkSiguMLnayWBMtZyxUBweWwF+19JKi1PPFJVFStZY65MhQTvET3qZ88Kh9daZNeUvH9UitVlOSoxtFypEmkJ8MV0vqgOiPyR+jWPUQInYpjnzRlMSFRUVAhV6NdeRkpn/Hus34q3NN42veoiVpGFfGJVQEVFWBasWjCwcO5XvvG1wyMjblYHPKtlVgE8/cJEt952G6V21ckLTV2EtWJbarTsCg+gKhGDlBTMlCQmtc7meZ5Z58hw2lRiTaAGRGA/Fh6RCAqZnCWzTdGGjeZjUexbwqbPGxGJY78KSzrCiUdhH6ao3F7VmgUtUxgrzEp0ZwK4xjOs0OwFiFGMfhSZNyqlijw2J/7EUj2cGZwPRT1bsm/lMZTKICxA5LT8gE0APpRfUQ/RnfMKs0jMTE0y2NefMMMqueLUKuV3xSH2rEa8Q452gGbqjKNUX+STKByniIqoM33pqjNX7Tiw4wtf/MKzOxdIIVoYWselzCqAp19E9bbbb1916uKhxakmFqyqJK5q5RV4WL4ZUCcy+wZkBKyaps3+wTnNpA6oak6qRMJcoHFAzWhfMxMzsQGxjwSUgmIfJRqHf6GFBRzejzEJ9dmg/lOkcTdl2VaIMT8ORvcyXlV8q5I6IXmpeqz+ozpjW9WNPh6Q9Zr/1TcLxO5ROxVvJmrHeEFmbJcIzLGywZBhMlSek6+xUD+sDUQieX9fOtDfdM4KyvDGUQdePc3Kbive1owF6Dmp3vP1PwvlpBBHaknmr1zYN6//3774b9blz+pQMDmg67LD00e2794m7rhzd2YVwNMsArd7z57HHtl05XMvTPodEqccul0e88GLj2xpAUYyBYCQYuJgZ9ODu5yr1bjGmiOxyiBVYiawz92PWfYaGgZppGZCKXGJCJEMqXLiPbhYGOhc0M5+m1Iaw9ErCFZ7VBQzTPqqeCVQsivVD1Te9TlNlTTRAMvHZkuOipf2fgAooL3YQ4/2Q3QFqDgRihRK+ZFCMYTDUT+1xkATwMQkXYq1dRycO2VVSuo8OKcPHCi90LePquRT1AmVii6dCfEzpWI9RFey2h4ilnl4b0RUnXG1heny1Ss3bnnkwYcecs8uWFTAqVpxVjE2Zddv3PI7H/voVddc+/O/+MtT3dbjGwrPTplVAE+zKHDjd282DT5l9fJaXQW5OBtwrfT6ew3TKm0TLWViZQOTpOvu3v7+d/3hX378X/btGDOSJkxkrLCATYj5ImSs+BqrgvSPf0FBbZfmZS+wzHRHIssc4JeqbEnxsUDNRw+kiv0z+P1iJ9XtoyCRylWpfK0MHfRwQCV9ohX07KVCqmqmV0UEaC6WolJrVnXJoj8SPaO4vuV2w2LEMV2EULOtIsJgH99WRS2l4eEms6/CrhxZOHNV9EydPCoE0PsG9fzhWBxQz7cJoYm1I7UNt/i0xVnivvyVLzp18iyCRVGbU7b30L6/+bt/fMc73/ra173o3ofunLKd+kC/WufHXx8/MqsAnk7xz+RNt918ynmrhhc3KHWGC979GB4AHeOJJ1+c5bv1s5gje8f2PzT2hb//9p9//FMPPrCPu6YpJhEBWY3xVw+Xyv5eZxQ9+rVMUSmzG6OOiE0aCsiOqFsa8bHSgMrdVEl8iud0NEU940QrQQH0KgNUkLogsBUiUCH/X+hajeh1hHKH0nAvthKOpWfThZpRKvC/SgHFng4VFSzaq5YRmjb4RFdSZWVSsDI0gbL/I4MAoTDek8gYAobn9CU1gWZRZwChh8NRbX5m3gsz5eiPFS7asb6kiH6AqOSUDS4bHJw38NnPfebQyMjRG/8xkWB2OFWrbrrb3j+655vf//rb3/HG57/owk/+7QdXnNr51Fc/9sHfetPQILpTh5OaoeOsD+psN9CnUxQ63Z7atHXDVS+/JKmLSq4sxsdPY+/LY1nH/rsFSBIUAgHbNIVIPtBf16xx9zceOTwy9aZ3PvfFL76yXue2ZgIBGxXfsMFbk7Ek2ONciGXGptFF5+W4U4omtNcgHvdCcWlPFJJim048HpGl5b9Hn5/nJIpYeO9RUJUiK4LJxbSrsLDFEHn0kuEUFU+xuaom6IHLooo3kjtatjvl0F+HY9KsarnhXiIIpLFTm0hBJRFYYZBomNZIpKRO62mtVjdZywVAPibaH2O5HvftY3hSOOYtRQVB5NtMaQ2LT1qy6Y5H773nnhuuu55N+oTH8MwShRAgYFFxag8fPvzQhoc+/7XPfefmb46O7j5l9ao3veeG615z9ZzGUCOlnXbCoTM8fx6bhIorfHzIrAJ4OkWhGzZsaOeTZ645xaQQVohTKUhtjcVS0fikKmRqYf2GTxPnznWnWrCuv9FgJI/evffP9n5h1+7szW+6oTbIgpZJyQLilIhUmEjjxEI/+f3YN38ZY6WKTihjwOFg4w/ETvWVfPPYdI6i8V4uwbGfuMpQ4bDhALWVtBilOJuy52CBCjcWjqe6oZkXoTTwi2k5AcOp5H8I8OO1WMkxwEJNkzpCTtJVp5AA7aGTQ3mRhAB2vvGCgBAiMb7XKTtYIlLVXCwMc8KDwwOHpse1cjGoOLwnJ/o4rytL5C9NXJ8Z7pZ3puoy/4T5j93jbrzpO9e95Do1z2hgLE5ToAoIMN3KRsYO37du7be+/rm1a29tTe4/4eQVb37rtWsuPWvFoqV9c1LwFCWjHZiR8fGJltabc4wxpT1wfMisAnjaRIE8tzfe9N0VJy0antenpsJpeIvatxGO3MKMHvMlAGtAcVYiRWu6Y5KUOBkaHLIT+aEd45/65JfbE+6n3nNDbbi/oy3hnCjxPWkAUFAfFUZdCZFFOVqKZB/0IlSEfqrCe+WjKKib4ui1chI/QCo6BohOQEGLhHho5LC9R4TYbKd0T4JlrtFKjweoxSkEVcG+TY5CiWCISZ0yG1JlcqySaA3itm/ZecTK0hXLh4bIGRWCE5igqore3XEhix2EMu0QeFFhQZioTEBC3N9sHtKxyJMVZ93rt8xc+N71Puai9X6kNDFmYH/wq4xAanPSgYVDN95y4/jk+Px584meoYxxsTRW3cTk6MMbH/3+XXffetft27eut2Z6+UkLr3/rNZddedac/qFas7+/P3V515jMoZtxV7W2e/eedjevNwfifL3jSAfMKoCnU0xq7rjjzjMuOWV4uOa048tii55oweYvVUDvfVnlppU8C24cjoxMESdEmudZPW3M0WRs/+T/91dfHutM/ML739w3nIrrOhZllZCsbzxN7etyNTTtjyyGUqwmLVgXIKgBKnyGwp6PBDlVnkrqNcHLc3lyz1mE5QpT4zWe+hEzRW+14hDLCYilkRut+Kgyeoit6GuVGik21iQGWA0JO0lA2jCsadfs3DT+jS/f+I2v32zb3ee+9ppf+sANSrkCxjBZ3yJVK6qSCo+CVbzvRJFgIiIiAygxIdOEuVmrU4WXOuqqz0Soikv2BNrgWIuqYdhcuUaxpZ6IU2Ye4CUnLd119/bbb7/jhuuvf8YAo7/koeu4QjOXHzpy6L51a7/5vW88sPb2kdEDSR+ffvbyd7/2qtNOP3HhvMW1WiOhlAyJAmRNPbViYQyjMTbd3bpzt81tYkjFgo8vSDy+zvYZIuqfNNU9+/aPjB886aTLag2yTE5R7YsA7XnUlWYYayWAUWy3YDM7fmhMhPKsS8K5s0zpykUrDo4f/sJffGVydOzdH3jt0lP62ppn1qphIxBmjdPlI9BWqSZVLWAhuCCFkd+bq+pJ6yI7nqITE/725Gz9o+2v8pSJwlQxP8cYkU6ageYkYbRZyUeVR1RcAgWOokAICiFi+GxZZWICZcSGU1LI5OjUg2s33fiFux+5fUd3+ygaKVxt132j2eFuc4G6xG/LROVWnEukv4JaiksR1Wro+aA+kCHNZi1NEkiMYR/bE4vn0bt2M9fy8aRwiCq+kd8ccQBXESWjc1fO3Xz35tvuuv366657JtjGBX8ooHanvWPX9tvvuu27t37v4S0Pjk4eGZhXP/X0Va+88oVnn3vS4vmDfQ2ppwxrFCQORMQQUSJmZw10YGKqs2nXgUceG6mnzWatrk+01M9OmVUAT4NQzOz+3vdvBGjZqmVKQiRMBRWvqOQRVk1T9CCf35z4jgkibNtu7579UHEsucpA//BFF17wM+/+uf0HDv3dp//h+1+9fe++iff+t9ede9EiIpurKpGSIzaxkrTI06kmw/U8FVR5z9MuJcaip9vwEyLX4yHJ4yNMqXMUEiZWVtimCtNEUigwX28VAtrBcAwpTMFG94a4hnw4BZhJhXw3HhJNmaXVnjg8cc9tD3zz/925ae2YsUPLTzj71R9489jYvn/9zP+Z3rK7fXhyYOGg83lVCgUTOyXxXL9q3F+8gkGdxjVXbxEgUEL1NKnVanmHVCmM8CxpoJK9Kdbrhwet6MZVzAot1aZREducX6/Pad58x60T09Nzh4Z/6F39RyQuVMU3CX6JAkSdzO3ae+A7N9/43Zu++dgja3M71bewce75J11y+YtPWrVy+dJlzf7UkAgrs1gnhMTbNIZIWeAAdWli2ra24+Dhb3173dgIpWlteO6QMccdHh53J/y0S3FXC9yXv/TVxSsWD8xtqnYDoUHsWwQA0UCt+AEBwEqiNvxFYhJ/d9odOTDhBLV6krj09a997a/+xgfmLpinTi6+/MKvfvmbf/WPf/XnH/3UO375uiuuOadGrqUZjBgnyim00genxHeUHBBAQJgp3MPnV7URgjVbJS8qIB3JoQgzPUsyY51mQluRGUtV2ItMi4YPROLMMxleGyj7EyKCZ9s5rqLfDBNT/CIryBmWtJakU+OtLZs2ff/rd95z4+ZDOybrZvi5a65/3uWvOuXUNbWB5ujYlv/3mb+bao3v2r53wWlnOWQgTsiIb/mpXFwylEddNo0AxFcGsLJPrCKFKpKE6oYzEeVwVFWvK5xZhRD7z0gV/RGOKlwQESVFvZEuWLF46yPbN2/ZfOGa8xN+ymYFK0Kjat9UD1b10KGRW2694/P//u8PPHTvVPfwohPnrHnhaRdfdM7Z5502f3i4lnJqVMURW3XqhyKRM8yGASZRcSBjDAmpk2Rk8sj377jjwL7WSaeevm/T7hrPTDw4HmRWATzVUgTfOll72+6tL7jhcuIMRoh8C7AeBoioQAyqbiLmpwQLEhAig5z27hzptDIVw8Rvessb3/9r71u0aJ5CRXTlkmWvecWrmrXGp//9U3/xiS9NdWrPf8E5iSGLrhJ8LJhAEEikeQgx9ExhzHnYeUS0guFGmbOp5TH6f4PWKkzXo7GejvWyUH5xMYpNqFYxNH48KIzIsvh/SMIRCYVy16hUVQisID+aVw2pCBtAataZuvCebQceuHfzbd+8f9MD2+2EGRo+8blXXPGcy69dvnw5ad62o9KaqtfN3EVzDo2O792+/2J7HpNlZn9lRBDGpXkT21dje5+Fgqr1vbsh4qeyda06UXVqVPv7m5OT7RjGKLA/rkVJWpXvPK48oY9Q3lta6nuEAIWCyCWYt2Lujge3fO/m751/3pon2M+PUFTFdydUKBGPjY2v37ju//7bv9x567cPTx6ev3zospeddtVVV5148qpFixbV0hRiFAKyAgGzQo0BlEWVvWvllI2FilMVTSyS0Sy/bf36h7fvSRv94xNjzMnCeQv5OAsAYFYBPF0iKvfdd+9E5/DSExbW+g0olwJhS3grqGEgsEKRL55ByxAUzCbZsXmvOlKnffXB173+dQuXLfDbYcMDQ00IXfO85050Jv7qn//uf338M5MHX/qKt1zdYHS1AyOwCphoeOsMrRNQiChyU0XMtAeQSY+B7zEuUGwpnE7vxsO7Fan+QhT0UdFjDSSIg9B7tE6oaItmuEJ882ghjjlBqqJg8n0qRKxRMppylkxPZ48+sOO+79x/z53rR7ZPMw0tWnj2xVdee8HZly5auIzJtacmVTOT8EBzaO78eatWrtp3YPdD9z9y3ZteVKvVHDzHDCIGyQwqo1gHv0oCYTCTceoANcwiIiJJwvVGTbUF9uFkjfdCReM+Gdv/ydFDZTavbz8UNQF8E2+m5twmpXTPvXdpyBH4Lw0EiCqUWFUt3M5du2763s2f+dK/rt90X3OeufA5J73nha87a83qucN9zdQZoxASEu+V+hlGSkpgUlUV5jheM/i2gfmbEHvbYw/c8eBGa2qNYcYRNzk1lXB63Nn/swrg6RImuuPOuxyyE1Yv10RgFFJh/Y/BigQUiMVVRRYlPLNBoqTpxg07cytM9RWLV5y75qxgaGoo52Xp5O3ppYtXPe+Ca778jS//3Z/+2+jo5E++6yV9TXS0A0PwGSzKoYXlMY4iFkepluNjovVdxieCme1fR42lgPclCmA6yiPQ0NY6cPhF9DkawoE0F5/s43WRllUKKN5RIjJCAsCo8SumXptCCcwQVUewSUJsKG+3pw601967555vP7Bl7ebxkUxyXnnSmisvfOE555y/YM586WYuP5iLE+eY2CBRa+tJ84KzLrvrrnt27jx46MCRxYPzwI44BVwwpKGF/q74cSSRyfJVy2CGiDjxjQgE0uxLOBFRq9EDqHJux6TMjpInBf+FbkG4rdQ3CJGYJqxG+ubU5y6ct27Dg/sPHVqxaIn5L6mVLVrKkhK12t2771/7/770mZtv/ubY4d1nX3T6h37zDWsuOnPp0vlJjVWMQo1CnERfyi+yMrOCoBLoQO9IiGUiInWAqGmpW7dry0333TeV55bRYTtvsNFs1hYtWfi4PQqfvTKrAJ4uodvuuG3psgX9Q3U1VtRRQasg8trhpX8+A4oU9q9/dGNqtoqq5LJ5wxYoWHHepRfU+xqBQPVfEcfkaqnW2Qw0+gfTgUP7p//5z75k2lPv+IVXpn2NDuXEMAJRn3DkU+m93VkGIgoHpEwBKn6hmH8ZSIqyCozQazhW0Kli+0f7uHhZ6ImiQUbI6KG4BsXHIvejsT4ZrGrIOHHCZBTeuPQVvJJADSUiJhvNd+w48ODdj264bcfe7eM6rtKuwQo7PmXpuZeef1Ut7XamR7Nu7rUGKalJfB4qCa85c02jVj9y4PC6+ze+5NQrSR04BZhUwP68g74rKKoiTh7XM7BYhpjBTEziEoZopmTiUs20u38QtB9FGz3+58pN+qzgkrEDg0iQpMmcxfN2rdt+z333rHjZDT94o/9RUQGIiLt5Z+uuXV+78ZYvffHftu1YV5trrnnpZa981XtWLlo60J+aVARWod7eF4Gq8asGdeIEBGYKHQ/VwTCIxSFJDUBORIgm83ztzk3fvfv+qcmMDNQ5p0kn6zRqjRVLlh9fRcAAZhXA0yICTLUnNjy88ZXvuKrRr0q5kpIUES+g8mSW5tlMgjy4BEweZujwofED+w6lCbNg3txBJjDIqRITBJrnedZptybz9kRncpxsXhfutviz//trArzzl95cS1sZug4+MAlRpTI3iCqjpQjoBfPghvQcc9HPODoFJeZT71lVT6mH24rKo2hKgSKZP2b5ICAsqYYAcIiPkIpXjipggJwKkSGCI5cAJre1I2N202P777tl8857t9rx3EjakDmStsnY6c6UgCbGDh46sL3e1yA1hhOoIyZmShIVUlHbyVunn3rSyiWLNh8YXXvHw9ded5npJ1VSSAxYkA9lFoXJ1TavhUKVMEcHzMzEIkiTOptEtKR+UF2WJ0VVxM0/OV6jPMbgtgT0J4Kkbnjp8Kb7ujff8v1Xvez6J7W5J9pT7G9aRI2IDo8evvueOz/1xc/cee8t1kxfdvHSt/3CDeevuXDO0NyUU8dCal0I6ITe5kxKpJbZt9HgoCtBcEQUeDiIMezEilJHaGR6+t6tm25e98Do6BSblFXShFPmsUPjanV4zvCTda6eRTKrAJ4GUdVHHtk82RpbfeYJtaZzHM2vmYG9ym8zHvsY7iQiqPi2NzseO9iezpNaqqDJrKXsVMHEEFLrsqwzMTY+Njl6eOJga3rU5Zkh1DXJR2uf/YtvNpPht/zSq1SOdLlDJrTsLJJqime1sOmDQR9s7vBKy79V7HQq+0FUW0oUp4WSPqq4ONU3i/MnxA5tKCKX5DtnFryTZ5x8izYOQXRRqwmxUOqSvINDeyfXrdv6wK0bDj460rADC9NhbjYAM51NtDJxZNkkksrW3Q9v27N62ZITa9SopTCUmARIQ5jBqetm7b6BRZdedNkjX9i8c8vB6cnWQLPunQ2vM4HC3q9cuR4KrHgRBhuIKkBprd5s9k23coWEkWiVc/6BUlnVJ6kutKLEiUL5X2DulGVgfh83a2vXP2Bdbnxf8R9KtOQQyYqb7kyvX//Qv3/nm9+/83s7d2xdccaSV7/r+VdfdcHqE4f6G3XkpKTOWSEJY8tIfbICkZIPnWtJiwIQVUMkPg2NSQlO4chNdbub9x2689ENj+7c1WrnRAbqg0B+KzJYHxzsG3iGFjr/V8qsAngaxKm76aab++f0n3DqMiVHcEWLHaBA/hmWyIw/+twcDzAqSMjxYw9skswlxlCzvn3Ltm07di2ft6jZ7FNwa6rdak/uHTl0YHR0++4de/fvJNiEjTqtc9Nl7n9/8p8G5vZf/5orkprVmhUR8jkqKFNR/FEEZj7s2h+Lgooa+mrPZH/AP9ikKlGqqvXKeOdRHy71SOCCIq2iIflHyDAJSI0jKDvSDDLtdm06+MAte9bfufXQ3qn+tP/UuacM1fpT59pZ3upM1k0nq0OnXcLsEjedj2/dsXPuwLKkv5nnooklsCHEhgHiJO/m2XOee9VnvvKFQ/s72zftPm/BasD3DAqeC1XOLCxVeZo+4BGWLtYqOEDSNO3r65ucGmUu9eqM5Xn8t6rUWFBXP8gRoKM3FG8zp0T1Oemc+UMbNjy4ZfOWM04/0/yHkVJFxa8L2IxNTGzeuumr3/ja92+5cfvOzbU+ufDys9/7gV866bSTl8wZ5LpV7XS7loJ1Dyaj5HyiAwMqDO+hqhjAEIV+tiriAz3EAlhxmctbmT04enjjjh3rt+/ePz6eWzWUClliUcskSg5q3dlnnZ2kP07d7n5UMqsAnmrxj+Jtt91xzgVn9w8ZIFe1qDLaj/e9o9WCKpgEUCbq8qNrN7MYkyYK3vDoo3/6J3/8smtecN45a7jWaLc6Y2MHt23fuX3nnr37DrbamUkT7VgVIkIKI+3mn/7W386tmRe+/qoJN56ZjNVbW0VijyIyLQVgx56fvrPZ4wF9mchYsDdlEKOsMCgYo3hqpZTehUaMr4SS44FEjIUnrxnCOcgklu3Y5N4tex68bdv6O7Yf3p43zaIThk9bNH/x8FDdSXti4rCmEEeuC0cwSUKgBEkntwdHdrZba2pJX61GxhhVP+CLmRLAKDTPsrPPOnvBwkV7j+x+8O7H1lx2lsLG1N24Nl6VzTghb8f68Ekw/4OrpVDA1moJVLRY42NC/+PdHcWihhWpMnCPJ+V3i2QzHxRwqrVaOn/pgo2P7Ln33nvOOP2MJ9xOzyGqT8EBMZnM2p27d333xpu+/p2v3b/hvpy7y06e9xPvf/Xzrj5/2YKFSQ3MInba5U5Z/BghqBrf3IOCS6XiN+tnWbDvlC2hPEWdIgc61na73cnp9sjYkT0HDm7bs3/v4dEpp6JMGgct+JEMTCSYODy2+vLTkuOvCgyzCuApFgUEmju77+C+l7zoCpgc7EJxUpkkWCX9K3ZZkVLv4Y980wJ43mDq8NTuzbtZAWvIUHt6fN29d/azO3jgwNCcearamZ4enxg/uO/A1PhEt21tBmaTKKkK4FLUSPgv/vjTy09YcdLlK3PrKBEixGJURMoJFXQJzWw4HCbFwh3vqx9tUfZEAcJ69OCSB8qC29GK41FqBh+goF4dEQ4mRCqUiAwrpD420j/y6MGHbr1v3dr1+7dPJOhbNLx01cITlyw5adGyRZpADHZs3zKy/xBRTsjUiQLMxigSsVMT+8fHR4YH5hMSQsKUMiXglCgxnECNy+y8ufOuuPSKz3/j8xvXbxkdnRxu9AcQZYIgxFHKhKjyRD3I+kYcni/yrxwUcHnWhYLKiHrRY+OoqIAWP4vYwgyL/snw2lrVAfDQCu+ikGFetnz+I+xuu/PWt771J3x+6xNvVCGq4vX9dKt9yx23/9Pn/uXOe++Ynti77LT5N/zEmsuuPG/N2af39TeNqqVpp3DQpA6IL+YykoAcFAIVglEYQMiQivWOEnxDVRWr2rW242Q6s9OZHWu1xibGDxw5fODw4YMjh7vW5eKIEkANyICskqpywqQgq65jzzz99Hqt9iRW6dkmswrgqRZV3b1nT9ZtL1uxglIGclVlPx0kfKJqzXmmPdQAxCff44U3eiVhR4Z3bT1w5NAkUc2kSZa5mqFGszY2NXFwbH+XctbEZd2RsSOjk+PdLM9yWEcEYiInKiBxSJN0ZH/rF9/7kY//9QfPvugkka5vcQxFaH8fwMxrKi7ik56dlfiHor+YJ+Z77M8e0hsleROBkWayFR47S9wMnXO8aio1ZlEzCmKoqiGtI5F84NufvePWf7kxH2/lmVo3ODRn2YKhJatPP+3k089Jmn2m2bA2W7R42X133HWok9s8SziFZJxybnMm7rTa4+NH3LIum74kSZjZmCRlkxg2YCa2DtbSxRdc8qVvfvHAgckjRyaGFveDvW4mosJT6fFsCgincFKeLGIgzOgRp41GzTBKB6xnTZ5AZlr6FLRpjwdyrC32aIuiA5RAjcBAB4eapsbrNz7UybrNGjPPpIF8azkAChUnYCXC4ZEj373xW5/6l8/e98g9GMjPv/L05z//+WvOPn35ikX9jTpsR2naOYgfqyCh3RWzUbD3fwQgsIg3R6DqAFWGiGYOWa5dm3W63el2Zzzrjk61Rqemj0xOjo5PTE5NdzIrClETqtlFmBjimMgBECTGkOVsvG0zx/wDldqzUGYVwFMqBIjKxoc3c4plJywUGmcGuZgXER/eaO+Vt2NhescGNj7XJfj3JObRR3ZlmdZrxESJMf2NZn/S1580h/r7SUDQls2mOl3XdZyLdJ1hcgSBMDGJJGSsczVKOvu7X/70Vy64+AMd1yVD5bHA651Kr4g4iz0aqPEEAq+hkbcpIF3LOYfVeWclMlFxqqHeOKZ7KsCAUtlo35vVvsoWhfvkd8pKLKLa1+CpHVsPbhuppX0wdeL+GtcWn7j01AtPX7J4xYJFi/sbfXm3a7tZPjZ+d+vwZGuckSaUpqaWSW7IiOq+AztOP/0sgk1SrdWolkjKtpZQo5bUTZImtSxzF1140YK58yaO7N384K5VZ65gViJyYn2wtAwG+wuoAIWRj+o7T0NV1HcAIRCUxGozrYu1SE3sDlL1C3tf9YiWrkDPq2OEnvF4W6j4oEwKlizvDAwPzF0wd8u2rRsfeeyC89YcvXcin3kvCgLTyFjr7/6/f/70P//Djh0bhxfxS1592Stf+bxTTlsxb7gp7GyOPOsSQwS+ZRIRM3n1qOJZHnGiSgyABSJqncKpdrO8k+ftbt7K3FQ3a3c7rW4+3WmPdbvj05MTE1PTnU63m7lcolr1BgsTE4QIyqwq3rcgyTSltL+//3FX9FktswrgqRbr3I03fX/h0vkw1o/fg7drwjA6QpH6H6WankwFIvscUFDOpJY2PbyJyOc1cKNp6o1as96/bMnK4cE5QwvnjI+Nj3ezgQGeNCId6zvjOFEBjDouSAOiGnjegkUu9KKUeEwzIhRFTwUqTflweOHwQ6xYoRXoA4p2cceyR2OoALH4tTxtjTkpMfBd9DArQ9KeoOKAdtZYSfMlywfJiMI4ZaNZ3yAtXb589Wlnnr569UB/XyNNRbU93RkeMIcndh88fMBMpcakKgoFGbK5HBkfydy0qc2HsZwkSY0aDdOop2nKaWKcaG6z4eE5F1140Te+t2ffviNZ7mpJyKD1RWcKKbHFF+VFQs03TWKi0AvIr5QSM6UpkpRtoNQqPCCOevW4UrSNOjYl9DjXoCD0QASBQKCGUNMFyxdt2LXp4Y0PXXDueUfBpaiqkgHx3r37P//FL/z9P/zjI9sfHl5Rf/XPPO9Nb37R6lUrGokSdW3WkQQOLErsHIMpSYQ0VJ6Rgtg/CwxDTAKXi+RiW91OK8tb3azT7XY6WatjpzPbyrPpbneq1RmbmJ5stbvdzFrnb1BDBhAVNUTs489BIbOKUwUZAOhO5XMG5p58ysnHJf7PKoCnVgQA45tf/+Yr33FVo985Z9k3KiPubXCGY92Mscyq2JpCWcik3VHZvmlnkiBNkr56XUn7++prLrrgqmuuXnXyqna7U2/w6NShe753y5FHt1Du6mltstMCNGFQ9K3ZkFXq5u60M08hw3CxriuiO0UbO5r9MXDpfw2wHunqIjAAhDACgePApkIKGI/Wf6h29kxAoRAi512mqhchh6i64ktf6AUQU1fzU9ecyvVv2W6biWuJzlu46LTTzjtr9VkLFw4yA8QA6vVkaOj0617ysr0H9rWmpjpd7ms0O+22QDnhTndyanqUsLTeMI16rVZParUkSRNTS6hmKFElm5rm86587ndu/Maux/ZNjbXn1htkwMSxuJUAPwWehCgyJeXJ+VOiCrOvgPpcdxFV5ri28Y8zVejjAVdYrvDRWNUxc1u95E9xKX1HICIliKolt3DpQrWPrX1g7Ztf/ybjLWv15bb+usmhI0c+9dnPfOpTn9667aEFi/Vnf+WyG1519erVJ9W5nrsWgTIRNSwQYjIKJiFITlAYw45IBaQggROBc+LEdaxt59lUuzPdbk9nrmNtJ8u63azVtu08n2q1pzvtVqubZZkVpyKhEkyIlJSSsJR+5E40Uoh8JQElzK6T9yXNhQsWHo/wP6sAnmJR1UMHRzsytXjZYNIQMkbUGV+zEgw1/7n4hWCQlU9lESAgMEGFwIKx3a2JfZOGKUl4oJGqk9NWnvSiF7/wwksvqyV1wAFYma8czuv7Nm55LNlDBmqMgMgh9CmDQMnmMrSw//xLznM2I+ZQolniBhDzWgILhcjxzOAleqj8aJdT6H5ZKQWqnKT/ks/3KNI5iYFYgUaRXELRmK7YI/kPiMYOzKRQdbaz8ozlQ4v6RndZl2WaNE4+6cznXvu8RUuGq1BKTKZmLr/yioc2PLxj2/ZWd9q2uNHXbLc7gDrknWyq0ag1mmmjWasZrtXTei1NayZJEpOoSShN0gvOvWDO0NyDu8eOHBibv3gFyHrauTh6it5PcR2VfLKoVwhUhFN88z5j2BimohNoaR8QzVi48FIr7ymOUhRUXiNFr/6oSuVjcYtKIDh1w/MHkPAjmzdZscb33vGtN2y+99ChL371C5/+v59Z/9jaxny+4e2XveUtV59z9kmNFE7zLBc1LM4wG0dgSQgEEmUSgTooqZ9GJFALZ8VmNu92s3aWT3U701k23e62Ot1W7rrWdfJup521Wlm7nXe6WZ7n1kJFmMK9EyLngpjFULZUhK8N9IvIMMSuk/dzrdlszoisHycyqwCeOvG2yNZt263NVq5eosYSC4HEKXrvvuCCH1VDVJDrYWY8w0jCjndu3pN1xSSmZkw9x0Dad9Hp55615vw0aQLw4wk5TVasPv+y51yzddeRbYe2sTJLmE8lquJUIFazsy9bs2D5nBwTTBJKbFFCUPUAS97naF6iKMvV8vMB3JXh5zLNJCW8IRzgkRCznQjEHHUGe/SSGIhQXwdWKKdAGzGrsKjCDi2Zc9LZJ4/ufJi1MTxv7gtfevWqZXMqYFkAJSW19Oprr/n2d785MTXemW7Xa2m30zWkBmZsYrRvoF6vpfWU+5qNet3UEmOMIUNpatJamhpzwooTz1p99v1b7z20e/SUM1cyQyl0ONDY/z+oc6qO9olcGgCOhrSIKpjZJIacAKzqqMTxHvs9rnmxoscgMrSE9HILxxQtkdJXdpBSyLsUk9eHqG9OY8vWxzp5xoYMmURpcnry81/88l/97d9s2Lh2qC970+suft1PvPyUM05oDKRW865QJkbZkBgQsSEVF1R0bNrBIlCxJJlIZvN2ZttZ1s7sdCdrdbtTedbOs3Y373bzdpZ3bd7utLNOnmV51s1FoQ6EhNS7lyFKxcq+lk1ZcwigLN6wEAnhMyIHVupMd5f1HafmP2YVwFMpBDhnH31sS/+8gaSPVZ2LZsvRD+QMogQxwSbSLAEsjLgGma2P7W13aCDRWkr9XF81b/5zrzh/sG+gdwOU9jUvueqaRx7dfM/9j027LLFKVFNSH4IUUtRp1fmnuTqLE8MxBluU8oY2C4jdfkoCvuB+NOytcA5QoIqveGKfw0lFlOAYkMUInEeZORpsZJ+a6LEpppsixBoAIvatfjyHRA4uGUjPuvzM+76xTqR72rmnX3ntpc1GDB8UbS/9x5lOOOGEhQsWb9my1TCnaa2W1jqSqbrRw4cHBvqH5/Q1G321WmIMpUmaGJPWEjYmSUxiuK9/4AXXvPjO9Xfv2ba/O31ms+aNTVWtdNAMKO0Xxfs6CO3uKWR8+qZoKvDBfII4FY63RDzYclV69ULPLRPVA5NKLCQo7qRSdxythivankJ8hdWppINJ30Dt4L4DE2PTjcXNVnvsq1/44t//70/ftf7e1HSe+6Jz3/ML159/4WpTMybRrmQgtmKYmIhUjADWCrOBgToQs4OKiojrWDtts6lut9PptLvdVjdrZbad5a1u1s5sx+bd3GZd221neW67WabilKBOmBgMqFPfg0MhTATyDYO8CqDQca+83iAQkYioRdZyq847ua+v7/jUALMK4CkVIX1ow4YTTlnRGEhBHQ5mEAPAUSb/MVz94rGlojmO62h747bttaR28iDqjLmUrV46fNopJ5JSQc4UBuPgshWrzz1/yYKvT4+NwYoSKXPXKkity9JBnHXOiawZB+zXMgmz1x0gzw0HUJnhqlDPMSOWFUXuhskoNExN8cQPCqococ6/ynUFlaJBH3mtEb0hBUAhgF4eiSEncJQK8tUXntrXb9pZtmzp0sGBfhWnxDF4XaguJeJ6vTZv3kJjak5barVWq3eznIgnJicUbniwv95oJoaTNDXGpIbZEJskSbnerNXr9csvuWLuwJxH1m9+3pFLm3MHCNB4XAonMcMnZC759SUl5p7UV1XPhCRArZ6ilUdnCOVIySJ0UMpM9EdE+tDYtbqDcPK9HE+4BkHtlvgPISiEQKopzZ0/tO/Agd3bDnzvazf9z3/8wwcfWA/C5dcu/vmffOtzrrlwzvw+C2fhOiChlJCYBCJW2QFCRIZCEW8O5Lmbdlmr05ludaayfDqznTzvZt1OlnWzvOtsN887eW670s263Sxz1kLEWT//jkSEfBABChUiaAi6kFcDUHjvkHxs2jdm9TebqEINcz2t5227aP4S1ZkLepzIrAJ4KkVVZPPmR1ZffCKnjjk8yFpQtr3mWG8goALkVIQVnTM8MTY5vmffipqeAszpS+YN8JLBtDZ3Ptj4B6DaYM6YZOXypacsHmod2DM5gXY3UzI11q7BVNedsHLhuWedBBHyiXLFTgubsQz7HkX7zHhZHH8wcRU+0q0QhGG73titfLUHxXrXQcqKAh8gJlIXhpoHRUDl90SVwMQMcStPWrJw+fwdo4czznLVXEw9icY5+Q0ogaGwzkhSm5bcgkFKjLSWZpmdmBjLsu7c+fPSmievKTGmliRgpIaTJDEgJpxx+ukL5y44sm9PdzpXqwqnpGEgge+f4wumY3oTASp+YglCppC/H7xT5lcp1AhDK8tZLk7Zd6NCa8V/oosV1q2SdyuRais8uKBHoUrMWknkjR8jwFBO5595zmA2593v/pmNGzeItpefN/T2n7jqTW98ycrFczlxbWnnTIKaU1aqiSUyIsSefHe5dQ6Zc1Pd9nTbTnWzqazTyfJ2lrUy185tZm1m8zy3xX/WWpuLc05EVJUBUhb15BQXT4N3AwMtKtG7i+5o0HUB+0EgPxhCFZpD2nZ4cJiPywAAZhXAUyY+gimkh0b3X716jWki+PwUDeDHsUCKO1OrkKgARARIahNjk43Dhy9eNHzWQNpUrklnmIFmUwPrUkELEANzBvpX9kPng4b6YGpTmYyqbh6bOtKVKy86d8Hc/kmZNjWQKyhlKjcR7NB4wAE7iI5x/BGuA/Ht+RtSwPh2Lb4/DCoJ6kXSPMEpuEyMUfgOXx5D/XFATVAnDggjFWM8FT4hlpSc2oEFzVPPXbVn8/jWTdu27T2IxcNDfQM1JkMEQESVWBRZBxsfO7hz76F2JkpgSihh0wCcs8h37d35ypWv6nbbYi1zWktYIWw4YUNEaZomCc2dO/zcy6/656//w8iuIyecMV+JvR4RCQhNyrFswS8umEOCq1aiKn56mYgQk4epYCj05IOipHSqv5YqUilUi2jVX/KZPQKNix0uD2Ixmki457yp7KurlblOzQWNhet3rF+/7sG87eYODz7nJZe95k0vuOY5q02zO9ltdfMuakk7k9x1lQlpnlvJutY5dPI8y/K802l18uncdqzNcpdZlznJnMud7bo8s846l3Wts9bmubPO5SIqSlABHDy/D8STiismJZeHSD+GJ0rV+JWjyJuGmxhsDKnVqfF23pWFC5ccn+Y/ZhXAUyYEEsjGh9dPtcfmzhuOfG+AIVTjb0f59kcn20TaWJyzozv2LzF8Qi1ZPJj0Q6XDJjVI0uLTcQchKNtf57l1xRxTp8TUkw4aB0CPPDCazO+//KWXuSQMKSm/WrHLKbJK5XsaUz1LIzTsLlhiMXtUC2UWc2GYWPwA9xDaFcQnNDoa4SQqCYyFnYyA9DARCqhIl2RlkIpAidKaOffyM2/97vqt27d99CMfO2Fx39lnnXvK8pPPOfvMgaFmalIIso5s23Xwc5//4mPrH6JuzkScsChYDBujTjc+tqHZV+/rqzm1YokgDCIDBjE4bSSkmqb03Oc89zNf/b/7dh8RB2J/iYVK+Om5IPE8JDBbIeKhvoEOA0X6Z5jLWb0oPUsdt1Qp0uu5hSKeR12g7JuIVI+iV8n7+jQCjJIx9SY1aALf/d73Nt74SJrS2c855eWvuu7ca85evHRg59iITLccHCXoijjl3Elmu1bzTFwmaq1mos6pOues5opcxDrJxeVWurnNXN7Jsqxrbe7EiopTb+QrAP+S2LcmDMOmtfBKdMZt592/kjz0d6sQeZIIVDxnSsbw1PhUgnTxwiU/dH/TH3eZVQBPnRB0w4aHNenOX9wPzdWbdyWzA4+mx/QFingn4nPAMAROBQc37lrZ4FUDZrCW9zmrtUSycZme0Maw35AE2xoASK2bGmugO7+fTeLypMssQ5QKt0887YQzLzoxRwesTlxSFByU7HH4QTMOrOdD8YGMz2URpqUYUQw9xgIT7nnb8GSWVUuhhxjFpkB+Z7FVGoqsyRAR5lA74H0KhKiDEkEz1znpvJOGVgxP7hs/a/7Aiv7OxMYb71z7vRv/La8NDvXPWYCkeWRsYvPWjUd27+nTzpIhl7ddK1cLAiEXUqbHHnv0yNjoqhOWGmabi4ozxKKOiAlkEs5zC8aF569ZOHfJlo17Oq1Oo1/BQQuFow+KrLe9Z1y04PAEV9FTMj4xqKDHZkB/9UWPUqleCSUIERM7KymMy3IhTRvm6AHong5jw86332E2qDfRMK366O4Dd3zz+52JzkUvPOesC85ZcNrC/pXDm6a37tqVNgVKAmJi01XrZ7qJiJJ1JFYB+HpHcioi4nJk1mVZ1rV5t+OyLMts5kRUIE5jRMhrQhcy9n3jNhSdSSiuofb0nCrrKoqwk7c6SgezzDNTNZx0Wt1G0jj55JOfwAV/dsusAnjqRBTr1t2/5KShdFAKG0XDo14wuV6ir19RDQVFQH6cuSosWctH9o3Nr1F/U+oJsSEyppOPZft2pnOXwKToua8Fkk+N7KolebPJYtCtiSZuero9pfaa51/eN8wt2yYmKmO/5VFoz7ao55+jHx6tvh8iyN7UrRhuxCEmqQU/G7DPH66UcYiI/RVNE0zq0gTU6CP4tWMicZxrPn/1nIGV9YlD7auvPPU5F54wPXFwZN+BHbt2b9m8c2rPVsm12e2e3B0/YdCCeKxjxigfy6SVq6XEkebg0fHxdQ89fOKJyzmlRmoUBgKiBKHqTFnZMBYuWnLW6ec8vPv+rJ3VJCGjIApJTV45xVUMSYoxFyh6TKShiZqCwBxiACgGdFbdqJkLXlHYFaePfP9SAZMhR63x6aSe1JsNJzaQRQEpFeKrcNUIamgkLq1xbezA2EN33HVw554Vy5aeds3qhScv5aH6ATc6emicEmrW6k3Ua0lKKYvC1ANjRaSGIYBTiKiqdU4ya7Mst5nzGZxZbp1TiO8U7QlRUn/5Q/erkLhTlFNUrnD4rbiRgl9Z/iXcCF4bVPlTr2dZCaLTE9NzBhb09/XNegCz8l8rCiiZ22+958LXnJbUcgWIA9gBqGDsDCSNOBvslh4iwMC4rH5w8/6zB5uNxNWhKRylQonuuPNb55x6ChpzgVpPhWnWPbx/a427zQagtgtTqzd3bztCcxvPfdH5uWRqlFFy7yW2HmUvRtuqqrSK7gyItrxq9e89//j5wIVT7n123+cz7Fnjcxx3qYrq9hxFUkpUisefQKriaWIGK3E6qKecvXzv7XfpoV1D+fx5A3bZclrVrK/qYmrE5S3bHWuPdvPMyGQ3m6jlh2r5EcJ0Ql1HfXnt0Fh3Isu+e/Mt1730hUkaGKeC1PHHzkzEqDfql118xT2P3D12ZGJo2SL4ob6+KaiYaJNWObVyKcPpIZJfPqBBocspeuK9R4uWm4tbJF9NERaRCIZB40emBuf2D/g8KO9CxaQgZVUSFtTRTPIGd+iBu+7fsW3T8PDwtS+7qm/xQhpKRnXK6ZRJeHJa6sZ0udtN67VawjZlkxiHlEkFIg7qnIhVss52bZbneZ45m2c2z52zvuI3Fm4ogSHSkw5FPtirrMTw/Z41TnwujP3qCR+1GjHNoLCvNNbZsRrmhLp0eN/oWSetaQ40jxXEOi5kVgE8RSLQkUNTubjlKxYkNQE40CxU4GyZmx347ao5AwBEJKBIAhkLy90RW580Q0OcGscqzJzWSNGe3rJ293c/u+LqV1L/MqUa/NBBRefwzrEdG5pGGzVyQs4Q6rx7auqkS85ZcsqCDNPwliCUYFA+aMegWsNBFz8QYaz8eOUcKt/yh484MSuEdSPtXLR5iIRTWJpYORTg0W8/BiqjwigtXw4ZRgR2lig5+4IzbzN37d2xU90ZyCdqdmwOj7WSCeoeanctO9fOWgwkCdcZdcODlrqubqU5OUjrWvtbubv7rrtb7Xaz0Uc+raWo4CIA4JQJqNXo8ksu+4t/NPv3ja44b0lkdByUi6SnaMHHAC1QcFsKFX+WvgLOK4uC8tIqTPW+rt4miLeS747tHUdSp5Y1aU+0B4b7AN/OtZwpDfhQOlgJTnc9tmPL+kfI2iuuumTOqmWuwS10BS1KSRWu49ghg3OUdUyHCAmZmkko6HRyIiJqnXNirYgTp6LOxfo+YgY79vaBkk/QIVXtte2VfbsJEMQnART3U9Uzrt6bke8paMMwVqDkIBWkIpIqUZuy8c6ShYvouOwD6mVWATx1sm3Xtqnu1NIVCyk+mwBQVLzGNPjSLCwkUichsMoAqbBNkvr+3ROYlub8OpM1CYFTlyTMef/k/vG7v07Illz+Gh5cgiRVZ7LpyYlH78TYrv5EayDrkNeSw1PTh/PsNdc9r5YqkZIx5Aq+payrrx5PQcP716TFzEINoBWedFNEHv33Cvu3SMsID2fcPsMzFmGco8TgRVUPUeEISGTJAjfkK2xLA5EIgGNDVvMTTljSHOzfuG13K5vuT1vkcnSpRk2yabvVyTMlTr37kRoaTLQubC2JQ95gXjUv3zl2YOem+9eufcE1z0WiPru+cp2CPW0YZ5x2+py+udsfG1nzfE5rBBJRYlZyQmCKDS2kOMIC5aM6UN9ZjUiKxMXyv16joMeBrOpXqnyWVZQShhXNJW9lqZhq0BcU9A/BAFyrmUcffKQ91jr1/JNXnnJCN3WTacexGCUVZQ1kXAImgrPIxRKBJe/4BFZAQ8Mnr7ElkPEKGC7C+36Gi68L92dSIDtF/y8kR7EKIF5vxNsnHvexTj4sDHG8QI4UYGiY4wYiYoF1nVHRtq5Zc15i+DiF/1kF8JQJgQ7s36cm6xtqMFSVON7sMbVSq89k+U9PhjIJe59ZyEhS55HdB2u5qbPWWDgJrRZSdim1JEv2r7tldOzgsrOfP7zyTDGNkUfuOfTALTXb4VQNJ2TTpJbu3HU4a9bPOP8MNoAaOIsKOswkfuJhzWQxZrztn7wwtkaLfFeU5n5ZMwzEMDA89e3Vnfpp9t4oLoneIgUkpjYilsSVfkiwnn1LT1JF4mjZkvnzli18YOvW1nS7mSJribY44X517DpCwsZQYlicGCcNp3VAEkHiVNG/au60w92HRv/ti1+45pqrDIAQUizrhwr13exrDg3M273zCJuGSg7NFWCkBJCGmiwJx8vqk1MicEKhSqKiqgIXFGlMH1LtdaiqEhRt1IhBCVOINEBJJQWbnKSdJZxQBWljHCCcTKeVnXjSickpRtO0lXQdORUYNYCwC0cDJacKghKRhH5NLmryqLR8j47Y9iIUO4dMTmJf2Ec9XyDfLrE4AxQPhc8fKDXkjJXo8Z17bkqNlQ5M4YYECUETMSMHpijjk044MTZiPB61wKwCeIrEuXzjow/OXTHQHKip+sZo/gYv7r4Z6D9TvFHFoTiXhFUlO7B3u+t2TZ6mnDALpyQKgmEmdGUgP6jTRw5tXz8xvEhMfWLsSD5+qM4Q32kGmjYaDx85cvLla+YuHBTqGoKIqgSNFBXUzGdjxu+BegnmfagQBgmRg5IR78dT6O4Z3IQQ3Q0hvxAippC+IUjAAnVFbqz0OCER3Nlvp0C+oEYRiQ8oIAwWlWZ/87xLzrz5s9/dtWv/wlWLKJ/MwS3X6mgHrJwQKYlTGD/1MWFiZeJEEkYf2YtOnN9m9+2vf33fb3541eJ5GuorCsgJNBVA/YNm/rx5Dx/Y0Z3OmzUGiNgHQn1eKJQ8IcQ+B1Z8Wz0CSWCoEbpziEIgMY1WVQvkO/YdUvDhEWJD2qcmhtVJIiZrZ5JRYhqkFMpmg54Gk7cslGoApblR0cyTRIkynGhRj6YFOGvRiCOop+DYaInplSLEEO9HYe4XTnCvWzPjtgegcb2rZXBVM2SGEeI/SMUoOyKAyYX5qaIMk6oZPTDR1xg66eQTUX36jjM5TmPfT72wMTt2b5uzeJATg9Ljhb9hK3cfRXM4/C28W8kDBymzJiKJ5lMTByfaUw6JMwlISDRJmdmItcZ2a7abtFuYOJjt3tDZs4EndqaSM9QwO0OcajvvHARd+LznpmkOyVWViJlizk3kUX/AuZEWR01a5Su8ViiGuMAbjcU2C3DQGAf20FI1G4MGirUGFQqA4r793sqD1LidQhIxrO7855ztUrrv3keN65dpm0+08+mWZJlhY7hGwuzU5KgJp8QpoZYgZVejbNB0lw3wpScvwsS+b33920XQoYpYChGFc9i+Y99Dj9y3cN5cVuOciKqhRDVSJwr4JB+/tuILWAP9LaoiPRhbQbbiFojzFmJmS3mvaO8RFf/6hs1Os+kOhE2SxFADEFrmEADyuWlshER89ZoSOR9RBxCa7UCLtSaScieBwSqOKVzh8ioUp18e7VESUn4Kd8jfAlT8qXLCR39dqy9i3DyYFf7QhRjMbIS5g8lDk8sWL583b16RKHocyqwCeIoks90DY3vnLRlmE0zraLRyBTE9++G/4dM/4tNd4oHPomeSxDEGFqf7UtmbJdO2z1GNE8fe8ktJDByp/92CYLuJE4gzhATMbCip7Z7sdAf6zzjnpES7oiJOy+fmiYH/qL9RJaQZ4nVEQiqswiIsakTZKUvxpBYhAV+TG1Ev/BcYfo0fDN8g+PQR9ZRzSZ6UIYZSCXnAUQVZsavPX7hoxeAt9z2SteskLNl4d2KEXSdh8VQ1EyfGGMPGwKTECThRk4Co228mTxziy06a/2+f/uR0blFgMQgg5wMhRA88+NhLXvF8Hjzyxnc9P20KEQjG5RLij+G4vLEdWRotzjMo3VjtQIGZJ9942QdIZtKCx7pGcUli9ySVkILa6WTGmLRelxkeXIHOvjxb1QAkcfkqn6zejD1K0CO7X0gtYLy6jzgGIdjxGq9Pz8+qSivq5uJUTCrYsN7T7nWd48WvGPUEsJI6CICUUmPZtVx7sn3Ouef0NRvlvXL8yawCeIrEip2YGBuaU3oAJab3GCAanwwtfgZbmUJjLgAQIk0kwYUvOXv4wuXfOXJkG/dlSR9MYtSaMIyQACERKJMjOBg1RmAoMTA1TqRee/jw2IkXnrZw8QCp88n/IuqtvxkZPJUD9D+LZ7K0QIPhFZnawkhVimgWf6lsjnqf+rCHYA9XEkOOEYwuuYTqkVVf+r2LqDrVofnpc15wwf079+4Zmcggeaa2bTVTFZO1RZ2QgMGJ4TQlTpRZDSmRJMY2tTU3za49ebnd9cjae+4QKffh4JMn8ed/88+v/ulXLT93zgc/8e6V58wHtRGYLiZwKPUt+HwVpdCXLeb7hCwnjYlAsfdDacv3wn2h8GZeohnry+w5N2pNtpt9fZyGbqoxhZXCdaGK7e3KLdOMvRS6tvi98BRpBjQXny+ulvZup9QmM26DWHJe6v5jmBzR9SOt3kYV7Vg9Hn+qBAJSTdrjXZe5008/yyQpiiaHx5/MKoCnQhTIul0VOzDY54t7fKQS6DWV/P0aPOmep5hiYg0I6sFd1HXtaWtO+M2//OUL3/WqL+/f82iXp7ihUAPlkGZNhtloypQACZQMQE6ZmYhHsvoDo+7aF7yk7nKFEBNiQFJ7MvF7pdc8K0Cp0FshkAlIAOjYwFIqCB6sTl8iLDFBsrQMw0e0ZxlmHEvQN/F1JA0iGPlgQsArIYY6e+0Nl0sz+febvp113NSBKTclNdM0jutCKVMjoUYiNXZMeULSANWUE5gESc2gZtsnDZmXnrHsn//XXxSl1Q4wIHH46J/85fs/9v6XvPE5H/z9dy44pb/rjiDNocrgEgALvYkKpxUTF1XJtzzr9V1QBL+LeGZlKSL0VeLDJQ4GRUokzDBi3eTExMDwIBkqcnDLGy5QOEErUPXQKutb6tpCTfVeEK0UfsT74piGREWlVFR/mSfQe57VTZT+Xakuom9V2WKkF0MZHimHZB+nRmujIy1Wc/rpqxNDsZns8SizCuApEdXc5kilb6gp4gggcKzRQaQ4/Scp4mDZ0cSjQCREIUoqrDC5SzuWuamXve6id/zNbz48D7ccPDRh+vLE+LxSgkKExLGCBCJKYEOsDGuTdfs644OrTj31pFS7KqriSNUfGAD1w2KOSTGgcsDwKCxV4iXCeFG46pRE/cA/37axYpcV1fzVAF9pFGo0m2fiCiJZXHhTRxmloeMSqSoYzumKUxetecGaz9xxy7797bwLq9LOunnecbZbS5Mk5SRRTiStJ7V6UmNTZ8NsGCRWUuf6pLNm8cKxh9ft2bQdAht399E//cRH/+5Df/APv3rDmy6f5nGrLTIG4rtUgJkpZjhpHPEVYa6ntE8iAVRQQZWSvIqZUDnR4t4pDeuKW+STi0UcCHlmO51Ovb+pfgBc+FZMuymCuaWhfgztP+NYgqcS36q4rYVGPtpB6X2LClUX0rpKq6PnU1WJLFBcR6Lqb/6PPbxQ0AFEIHWilJvDe47M7R9auWRZzyEffzKrAJ4SIWR516prNBsgqEqo7SwqYIuxMKVhWPG8PVaEWKkahU8kJTVOkRPBdOYuT2740C92zjn1lgP7xmt9kjZUAWaFr/AkASyRISXnEsEYGt88OH3uSy+cPzQFOJ9jLRqMVPVdemacQ0nwlAn9gBBcbNfseScGmJRJmbWuIZEbhDIzXABVFrADSfQQiEIbZNGiG11s7+mPIOoBVSlcp5IlK6HFBwYo/EcKqAgsqNNxb3r/DSP1yW/cc9+RNlvbVK2ZpJ40G9aIsjKhkaQpJSm4Blcjl7AxxhiTkLjUtppiL1019//+7R8qh1Fkv/eHv/8Pn//LP/vH961e02w3xtHUnEDEhslDjogowGAEXt2vhE9UDVFg8Wk/KlrCL4X0fN8VAhwWP9T3UnFnIJLjFPOJAF8z4nslKQgkaE+3c+uG5w8LaaleqgBcuBylg1a1xqs3Qu+LeAHoqDePRQr1ft/vpcw6K/f7hFZ5YYTEJODqDgtui+KCeOh3loBamkiOw/sOn3ryyavPPC20TD/6yI8PmVUAT4V4my7LbLfVNsxEJL4FSsUALMxfLUyaUkJ6DUDwLYX9f6zqVDx0qVDTXPPzb5k8a8V3tu4Z0SFJ+gUsRn0/FhCDObMWoLx/8KaRkZ19jWuuv1bZgQ1EQxKIlo9UZCJ6rP3yWIvWy/4lxcafUIQcDIaqCJEaUiZNWJgcGzCDwsg+CURN5EFi54IKdxB8orD3imVXTd/ubV6Ego5WghBASuQk6QgPzxv4qQ+9+cuPrtt4eFrcYIJ+pjoZTZrgmqapL6YAAwwxoSsnJymnUeEluwABAABJREFUCVKVppGThwYf+M7Xdm3fScBXv/3Nv//CX7//o+9ZdeqSnNua5DmFAIH45jbBMQodVj1GB8euEsImZu9GCZUhVCeqHGC6+GIPMs6gZRBDJnERRURJmZkcWlNtkyRcN8Tq28yhXLIYoJ0BgiEuoTPfDC9m3BnRaz1aM8wUmvFv/GKpDJ6cPU5ex0ZfuZIqVWwt3B5KRExMgFHTnbCdqez01ac36zUmc4wTP25kVgE8JaKa59ZaQWpEHEiIo33iH6jYCLkw7opxLBrT63w/lEC1kIi68ASKaOBrVGrNF3/gl6bOXv3VDbvGzLycyZLAdBTtGgs5Q1Sfv/TEmx4e+fzDm9/y7retGOjA95cIWIzCqKxk41XwFPGPhdpSghpVBoRJiSEEp05JDVnDkmkNZoi0j/I6SdPogJFh2EbWyZ3kgKiEcR/qYwEQIRWIRkM+8hWxSU44gmikBqNRCw7bL15oteorGQSkKiQ20TzvPP+VL3rTr/3clx5cu1+sSQZNLpzniUo9oaSmaaKGchJLqkaRqk0oNyRJAuK8Lq2lTX3peee85y1v+avP/suvfOJD7/3dnzz54kUtN+0goqJO2A/58o4UxZHvRcKqIlryPr+FiYyPjTsV+OB3dGAQgVVLy74Mm1YBFJWLV6huMuQHMdbRmDg0NTA0mPYllvKiTs7zRAVvFC/50eir5QtfrVHde3ljHOvmf2JsnZHQVNXiP0hoxsve7CMKSxDuCigZZhZmaw7uP8Jizj3vPKou5nEps4VgT4kQsiyTwEkI4CIMMCBEXLmXoyddxIjD5FhV+KHqUDgoMRJVqDoGE5ELSgDK/PL3vvnr2Re+eP+Dzzlj2Yn9C7U9leWSoS4mSRYO/Z9bHv6HRzdd+otvO++yExI3Zl1NySoQ0y5KlI3H7nG+aC9Qmo6BQiD28CyQ4HCrEis7bXXb37t9w4DOXT53sXa7R6ZbW3fv3bTlkb5hXHPt+WeefVpST1nL1sQ+WxDlA11SuiXRr5WGcNVlK2gBCq2EikiCQkVBJE4sYEYPT5z+wkuWnLzw//32751Tn3fpipPTTo1UNW/BaMKcNlJRB0sAG8PKhCQRAtVFBGlNhvrTfQcf/e2PffCjf/FbJ57cdPkYJU58y3n4HjfhKqoPqxTr6+tf/YH5xnca2qHF/MnAiaiS+vbIRMUZRRUYlXJp+x/T6FYGCUhVjfLU6OSiZctg4sCxgjjXGQZwxYc4+k4uVY/OVBNaPQ76QSgeLf7CvtDyAj4ZmYn+PfRT9EH9L6EEnQjKRDU1+7cdanLziisuN8F1PU75H8wqgKdMnErX5lPTLWNIRRQgEwYTFlPhe+zsALEUrUXPymtkhsXPWAcDECdKYIEIQV1OfbWX/Orbv/m//vEz377jmtWXnjp/lUlp38jh7QdHv3PL/Q+3D//ER3/m4mvO65dxB3EwgVyO7ZULBCCfpNP7XHltUBQroLAXSclPe/Q5j1BiY1A7vPfQ33/60/V2vZ70jRwez5Gfd/np173qurPPO5nY+Zoor3rCEEotHt2i9LSqeCr54+VzWz3CGEz2Bq6W0OBUCJ6Uspm26quH3vGFv77p7//PZ7933xnNhafOWTyvb2nNtbXTISJjRFKffG+QmrYxyixwIjKZH167Z9tFLzjl3W9/zeBKbpsJQ13xNaY+/yocXZm07i8z4ohOqqJVGNAoAidwUpaIS1DpiC5XCb4z4KpyzxT1dCCARFQZSWqmD0+1JqeH581xJEqiscoiLFbU59ASD48F79XvYObfw63xZBG8d7+Vo/7hLPJiU/Gf+FT5lwQIBAwjuY4dGjt79RknnnQCF6UJx6vMKoCnQnwqSifrdrOuqDVUNDAgYlYRPy4X6qupOBi/RVAMgM9kiLwCiEVDraYr4FKVyQlxy3abdXP9e95858pln/0/X7OZG0LiJltukOpXnPrhd//msiX1hnSdgzIrrBY4GczTkJxSYG3PP2FkTcUPoFhmCRAM1Imf2cvaHKq//SdeP0BDn/rkZ0ZHJlecsPTNb3vNtS+7lOZwhmkhS4WTowBYEbqD9aZl/P/svXeYZdlVH/pba59z761bubqqq3NP98x0z/TkPNOTk2ak0UigDCIYCWz8jA3YgMHvAfZz+gzm2WAERjbI8EBkhISEAsoaTdQkTZ7pnFN1daUbztl7rffH3vucc6uqJwjB9/mp9teh6t4T9tnnnBV+67fW8pdd/BpCu0vsxdJjQKSUgBAtbF9dHoDmDAPTtguG2zf+8Jvk3ste+txDX3z8xeQQT6TN4UbDqGZ5VwggyhRtdFsJLOc83OxfNbZm24Y7v++tndHxfCDtmIWcM6tCKkogMLyWKRatBFQiyhIFE0XKl/p8DRUrzqkTCAgOzsKJV/ykBewSvIslBrpSNKQLH1IJBIGwJvMzbXXcHO5T4xCgdhRSO0ynUFzloQtFS70fLf+UUw96VB7wrIMieNWjwV97LFGBld0rhyKKbxlIWQlEzsyfbmcL3SuvuKJeS2npXt9hY0UB/P0MEiVntTXXgYAShe+AqMSGFTD+1WGoKlOoc+nfdyHfDpvUBR3gESP/LcF3E2dAVISInQiY2m5G077r33XvBbfd8cTDj6m1W7dsHVwzjFX9pKep08ryzBiTEllxQeiHw4cDBxtyUTw6ZKZWPP0CbqaAXMB7AaoCtlmeIH3v+9+6qm/gxW++8v7ve9/45pF5mmrnrSStMUEVhlggCM2diqijlvKklCQlb3apmFr2JS6JQv5iVBXKyo6kZlJnu21Xo4lztv3gBTve0Z09enLqyNFjJ063Wws1w5ykDdMYHhmcHBlpDo80BhuuL6X+OgSzkoPQlgUylGgg2CL2N0CpMr1BTKQMKCspCZUAlu/Ly1BAjbhcnfhSEEysCgiRUNSsCKUYyFfoqUjawnAOjkXEvQVsSJRY+PTx6YGBQdOXWM60Wka1fEKrgEn1hvfo/0WWdnX1l679WT7v3aTQAa97EMXKUa9yzGJjAFAJNU4Sk9UPHzzMTq++9mpjuHDRvmPHigL4+xgEHRgeVJO2Wl2lkB0Khlg1ojUFCwyngiSps+GUlOHgRETFsrOaq4CYBBAfExVRQ0FMwJdZExBcKCSpSjSfZR2Z5770qnuuJko6nWwBOXVOsczXkSRpH0Gds8RJbMEqMWnGzxmMYHwjEFHC5wEWCgZ3ASJwKG3j91cQITFJbtuOs7vffdPd77k177ZnMZVRK22mPqIdPPMg56OPs8gWjZa7apEsFr59FdkTlVk0AcN+AVMzSk7EoK9LmiXMmteMNjas2bBh7WZKmAROLamqOnEKUuU2wYFs5ogoMeRsliTsBAplqKohBokviR8bqxdAWgGtlSEWL61JfakGQFSsc05ECQLxmYKhIGr0vQq8Wwv/pwJzxMNH494nAwqleXrq0Knx9euRKkgK/6T6gMaDLZH+PXpm0c+0+IMerbLEQzvbrXpdowR3gnf8WqOKXhKTwKqYmk2O7joy0By46eabjTFLFuI7bqwogL+fQSnXTFo/dXoOqjAgESaipI8cC9dnZrITu7qvPPGynj6aLbQM583B/sFVg6vWTgyvXTWwerQ+PJAn1jWEuGuMqFUwORFSQ+xEHIjUkKgiFwKJgRMn2iXt2FYOFicuMSBhQt0FSUiWYNRVWhGEl55itLLMzlLfSdy/Mq6Q+lrUdhZ4sJXIRGKJOs01UWXpykwuQokRFaMJKzvnvBnLPphBpYBXrxfgw77RW6IC1Y0pcnG65eTDRz3vPlAoKSZWp0ogC/GGZKJwCuTCDBWfrZZDfFqDE6NqSEKRUBJVw5G8adTCMQxp4OiTKhUZDN4Q90rKezog5VAUiKhYQAarwtdbUyOSKpgoy6wIuZwN4IQgCNV8IBUVWfECCq4AKGxKSkzkqGFr2Zmss9AdXzcCY6PriKp+jKvWM5aATEvH8qjQYteMzhI1QOUeViGgs4QBSsjxbIdaNI3eyLYBGaJ81s4en92544bVE2MUAmnf0WNFAfx9DAL19w+MDq86dfqMY9RYNJM61eDkxef3f+7jX5l6dn96oHv1udvX9Q8ODQwmtT6XSef4ialH97y0MDedt3Wg3tw2sfGqc865aPPI2vFGbTBH5jS35IxvfkWSixJ7CiLZzAGwQuJckrCX9wk4ArWaWwlIDZniPS2seoEHJgqmv1aUhFRlRWFaw5OZPIKgsXiwACHFlNQ5JMTCqhCJHV/hSw9VX18lIqm8zBrBm0AEoqqYWE4aVISIFtg4EPWI/5CZWNUBSC2gQmBHjhjio7K+SZW/eS4clgp0XYSZVMWvksZcovgvQZ3Cc+1JISBmgJVJKVE2vjqQ38GJsloFpbRmYE2fzM21W23Tme92c2eFjEfzCzKsz2pWEPkKzRoA+mLZQ/0gZRCYTJonx4+dguG+4T7fSqiAxMrlKv8ptUgVVzrreI3vX58TQL1y/yybF18uf8LiAmjxxwT4CoxJZs4cne/OZtdce02tZlakP1YUwN/PIHA9qZ236ZyHd325NTPfHGv2m7EDT+z/09/+9MGn9127ZeP92y7beGn/gKqhJK0xsZIycZqYMavkOD05Ozs913nkw198rD2fTfRf9447Lr3zimZ/ynDOWpiaiFVRUutUSQ0xGGwFbFjA6jLDCVQFjgE13iQV51mIoTcrsefvkJIQMcdKDuyhZwnQhI/ThixbKBFYqxTSgECreitM1UcxDRknAlKwFlkOQISYvQADopT35CApK6JpiQi9ht8eZHxEjigIPAUQW68pPB0IROS9Md+LXmMyhAIeOBZfNI0osHAVoACiM1jV191jQx5bSbwOZGJVYlanIKWaMSmSxBmSFFmKLG11cObM/Fx79siBY4cOHZo6M31qZmqg2RxZNbJm0/rBiYnhWidruDPzc4IUbASwVlNiYhJnQWBiEY16NkzSryPDWGtRIwNTR3pwz6HBsYGkny1Zr8CgS41yfwer/M+l8vgs6vY1H39a4l8sOlhhXSzSO+FXisco+WcFL7YSz6BqtCLQABgQX+kqodwc2n0o4eTW224lold/iL5DxooC+HsaCSXbt2z73CMfz1uSDDae/pvHP/6bnxuYqX3g6jvXJjIqmszNNurEXGOOOfzEmndYkZj6xjpvqI9dcuObz7Tx0vEjj/3eg5/5H7//vT/x/Rffdm2Wmrm8I0YoYaiqGIEQsZUQsXWixImI76xHTgFfSkFAEEfMUA/4O4i32dm/tUQcI5sOvqS9pn52GjrjAgTlkOhaCN2iG20kr4owyHCoeqDsCxSEegRAmXdGEcJZxCfU6GzErcO5zyJ/CoGgMWtAi7Qq79AAvv+thqzlwBYCUYjNhpkzIlU8TCvwtKBMPu+hCEYQVMT4Pl+JAUjVJmT6KDWWuy07dfz0K88dfOmp/a88f/DksZmFM52806HcJZoSm9yplTzLHFJggi+6css1t12zZmTjgm3Pd1u55oZBAg46C8pKzJ7QGfMk/NoDCjYMUrGuvZBNnTpz3pVbKIEWCV+I3sOiJSsC1Msv6/IYzCKN/Goa4fUK3XhjvfQvPo1zXhSrWDQFLY4BeJo1CGTZnZGp/afOWbvp0ksv9Q/9ig5YUQB/T8OQuey8Cxqozc/0vfjykd/7+d+5Yf1F111xQZNcw1rAijNiaqaRqlOFEDOYwawkDg5Ok4S0szBByZrJ/hvWXvvC9OY/+uU/fPjrz3/3T3ywPtjM8zmXUO4ECgMIqYP1bd29Te8LE4iSsoZSoQr4zlAeiukRIOI1kIF4sIIISgxw4JAQKVgDOiQSSzFwhaAoXjIGZRB76MK7FP6kBR1WYymJCN4rF4mqMWhZynw9u+TvGctDz1p+E5wC/3sV+kAQPCWAFJN3g09ErCyEPFEIWNjXvGAkzA6ASBfGKGrs6p2W/frfPPknH/nsqZeP6wKGmxgbbm7qH5zctKlhao1arV6rG2LD1HG21c2OTp0+cPLEc3+1+4UHjl1688Wbrzy/b2w4wYLVzKpYqDGkrE6FocQQ8WsbPB2PpzEBoITMiaOnWGR0cpXjQnguk0gX1/Z1OFi6zG+v42ZUdlgK/fdOofI5FRv3AHsoKJ7+WzrbEUTVMBJKkrw+fWy2M9e+4213DA40mcyK9MeKAvjWxrdgOySgi87fMoLhRz7z8vOf/fr16y68YcN5w9o1kvt2d6ZRz60YS8wgY8gpKUGEiJTJwKg4mK5zll23j9IrR8c33/Y9H3320d/4yV/4xz//j4Y3TpzJ264OdiQi3cxyQkwqAnIKkJIRhSMHb0UGl9/EtCMtnGqPfAuBSR2UIUxM6o32aN1HLEa9tlBlMgjQQqzgoAUWEymmFFsbagE1VKo/hrV9ldWtGvYROTrbKPHnReZhVZ0U2/gLA4iLbQhV+UOAb0wY1BUH5URCwiwq3pAnITVEiVIKk1Gyb9fBj/7nP9r90IF+Hblm7bZNk5P9tVrNCMEZ0YTgcsuwaoUM2MlgPZ3YsPaiDRtOzWUvHjj60mcef+EbT199+6XnXLK12azNYb5DzsZuu9GkJzB5UzdqX1WGYU7EHNl3on9wqDnWFLaecOV1WqxMob0L8Tr4NVTQkpZBcapDl/lSF33R8+syQYf4u3+UejV6BfxB9Ucq/gGIWMmpaK2TnNx7sm5qV195XZLUvqWX+P+Hg5YkdK+MVxvB3qnyJV/3jvPt6fve8/bnX3zugnT4H157+ahbqLFNwH1pPROb9HHNoG5SA0Hiy6UREakoJaQkAuI0EUFKSF0iXdGk0R1IH5re/buPffkn/su/Gt+x6VRrhut1cRCnpOqLTyZgVaswjhRwnlYe3mCEtrpAlMlBNHAgLyoRwERMbMgQwbAEqN8ow3g4SCAGiWjReVsLymKkrQR4WrRo7xvqvsVScEpCvoRaWOOiPk1psheGe2F30hKBHqXIWR5srZaKqKSOLRGE0eOgkndYFvBWeM2ZJXCEVBlOTaJWhBKok37TZMEffeRTj33i4XW6utZOxoeHm6Y2UE+hTjRXCBRkyM9H1KmqYQOoy8VQwsQiyVSePXtgz0unTg6dM3b5nddOnjOWmfaCtETjA1g6SN55Ig/vMEyiTGfsl/78sXMu2Lr5xnNsX0cko1iwAqhCQIWeLPOYX+1RLrIGlhuExbekPOCr7bf0GMXP6PUCK1to+XPI+/M7BGSOwGoYNWmYo7Uv/eEDq2rDn/7C5zetGze0YvsCK8Xg3tBQwKlMz55pa+YEvszv69+70ei/6tJrTp9YmFi1PnWSkLJRZurmuRpSJefLgIFVDHzlIAETQ0HCiSbaBazaTNvOcZ3h8npn4caRje8479pf+8n/eOLxPaPJoMvzLuUgYmYIKVTI93dSKEgilVIhvolwcW0I1RQDVz7gASq+5KhILi63LrM2E2uhVtSKy0UEooADRJ0Tn8jq/QBVCNSBVNQSRMSKOA3SnYAi45lAKiRCqqFWQUXix7JolURfP6o2+mLsoPe3CG54VKpyzeHroiQALd03EkuooD+Rr/KppOBQ1ZWYiITJkXMNbhzdM/2hn/utp3/ngfs3X/v2i2/c2r9qvMYD9RzSATKosDKI1DEJwTG5hDXRnMQZYxIwxFkmO9YwO7dvv+eiy/uOu6/9v5/d/eVnajM6qM2UjCqU2KcKE+BJugwQkTIrwI5PHZ52zo1MjCvDiVUoc3zly0UoUite1aNCcT8QCaTVP5VtFi3ukvVcbix3+5bcgyItsQciio8HQmwqqjZVCrkTSrnMHpvN5/JLL71q7ZrVTCtyL4yVhXhjw5H81p/+rw/81D/8wtMPtJic+iSe1x4EMOj2m24xad+Bo8dzhTpRC+uE04RAIspgElLxYpoF6kgs1CmLGBFmIp/+hdR1yNokR55xK7v3/IvfsubSX/vpX5o9ams8BEsKI+J7lmimzpJCHatlDcWRFSQEV+Z/At5OD9FY/wMR+cLNnh8iArWiudPMua6VjpWOc12VTNF1YiGOoQxHcHBOnFNRVksigBpi5jpMTcg4sAOcr2sNVahwrP/sUwFUEdiXETSqjCh2SihfEcIQVfGjy0j0IC+DsqPygBRLjJbSiErN4Y+mHOArJUewRCKcoKbIyVgI+qneZ1Y99eCBD/+fH5576PAPX3X9PRu35IcPDKaZMapQB6uBSKu+4rRXsqxiFERKKqrqIJJSZqzTLIFsHhq8+4Idlw2vf+4Lz3z5jx7oHJShbKDPpXAea2MJDBkmJYhCJDFct8nhXUf7BwbGJoeJHQBfFhUaLy5ca1gIej0AUE/lNDqL4H6Vj2jJN1T5T3vlu1YmVplivHexnC4CTSvOX4sjskLFKNc43fPi3sTybbfcwiuQR2WsKIA3MBSYbc199jN/9bmvfOL7/un7/tP//OXjrdMKV6Aor7qvEvj6624cG10905q3eRaMF0AUDlADCxWBr/gvGgU2q4Q0I0B8PzBlISfOsSJBCsH8zP0XXXFFes5v/dx/wZQOykAipCJpUmMiJwokCC+ThugsVAlC7HNRQb5vjBSd3QMXSYMlBfVKQZRIiJyqqFpAAOuQO4dEhNVCMmRCDgwhceosLCn3JQODjeHR5sRo/+RoY81QbcygrkwudCtwohIqXoun7igACaWRF6HJ1T/L+AUVqaQVo7YUCwhhiorIC95IcYDKIbU8G1EhglRYLEkCTgBmA4JRSrL0gT/76p//h49snKYP3HTntWs2LRw9vrAwp8YAREKs7Bed1OdNh9rPSr40iPpb5AvrQb0bp5B8oMbXnLf1tvMulkPzD3z0M6efOzpoGw1NIFAEJ0xFWZHAiBMSdOc7J46enFi/2gySJgqAmEWKHL4CkKma8K+N/1SAmMW2f++GZ/tYl25IYdFL6b+4BsmrjeV1DcH7ZCRd5GfszInW+OjE29/73YmhlQyAYqwsxOsd/rHdtXv/oYNHLr10w7nbR/7b//hPP/qLP/bIC093VPOzJjuGQSAmXjW26t577pvrtDPN4nMrzlPj1RE5DyarMNRHGUXJxfR9361XABJfdEbUQXPNUwG3uz94y62DLx/8/Ec/XqeBJAeIujlE60w1VdbKi8UKBrNyKglrQpKypCycKqVKJgAbREocSxr4ZKSgpZiImIRJCJJAE0M1J3DqlH1VHOfUgg2bgRpN1O2qY3uyD/37v/rAW//tB97+f//Gf/rY809NOzfRSMeHasMNTgyEYH2Qj8hXxCOGQSA9LuJs90orbx2Wlugi3IACrWkZMVGohFBbhgo5FM+w6LQa/AcF+ch4QnAER4CKuHTok3/xtS/++p9dz83vvWz7ef3JwvzsK6dPzyXIufB1iqhbkWMdj05AaBTn1YAaBTmoWLAKHEl386qRuy+9aG1S/9pffGX3N15q2lq/1IxlA/axfSdKihonxvG+XfuE3OR5a7N6brlr2NeeI4CjTq/+WxHrVLnm3hVYYvDT8lue1TVY/MGrCvqzKxEU2FXvZt4PIAoKFZRSmmrj2IGphbn2zltuGWg2eSUBoDJWIiGvdxDggGdefmW203rH7XdefsNF33zp4Cc/9bkf+8V/8MPf80+++953r+obScB81qeZABjgPW/7rs/+yR8cm5lbu7q/BgKR2NzAwMApSQlkKJwApL5XIwmr8QJIPLlZQSCxQiAlSWH7MveP7r7vZ//0ry666frzLpqwDsQs6hhKrCKBEk8ASESFyEAhEookM3txGbwZz03yYCoF9g4UapgUIiKGA+3HkQMZkFqoIWElEU41YWrML5gXH3/xS3/x5acefc4u1OtmKOHkM7sf/MQff2FkTd+VN++49fZrN5+3eXB02KHdzTo5ulZcoCGp+lw0EQ29CtArmJV6JVVvcWNUN69WTdACUNZyz2Vt4OB8+JIOPi7ig5FOhCkB+SxhK5YIjc/98Ve/8ut/ee/4mnu2XTiesltoHZjvHrcdRwnDSLwMlCWTl8Qi4tn89KIDQwKvwZUkG2rUbjhv+8DBI89+7un507Pbb76kOZR2JAOYWHM4MTCW3YIeePnQqrVr+ieaypmoLxYOw1SE3RHLeFD1giszeU3kvrLHq6FAtOhulIuw7PEXo3ZnHwHwocA9QLwWBcCGoeJy13CNQ7uO1NPGjTff2Neoh8yAlQFgRQG8oaGQvYdfpGZ3/ZZVmzZPTKweveryrQ9++cu/8Yl//YXHv/RD7/7RnRddN5A0CUjPdgClnTuvveii7XuP7Nkxur1ZYxgwGVX2lQ4cKcUW6qRKSqIGxJUMHQJKeroChkkAFZtCJurNu9dd8me/9tFf/M2frEvSZo+gWwOGkoAU3vxXJo8IW2dAyqQ+McCzyw2RjwAoR2g9WOKC0PyFidiLXAeCsBJZNkQgI32pmvnTrSe+8fIXPv3Y7kde6rON9cPnNIf76qaWEPleuKcWZh7+iye/8rFH15yz7sY7r7py5yUbN61JTLdLLasZoKEkAwgM8ikDosGcL1/2uLD+3wIBLkVr/IiCzIm7lckEheVbgBE9ToMvvhn39NsYkypE4VhdApX68Gf/5KEv/erHb1+9+U0XbJlQpq7MUXZsbtY6SpUcnM+7KCpBR+9lKeQSrjEqPfWVJxQAQ1lJpE585dbNI0N9jzy+a6Ezf/GdN/YN9bWplbPVBKpSw9CBfYdnFlrbrruU+yRjV5j4PvZeSGSf2YZFS9mjE6nyRTni18WRFjlfPVsu2duvxNJ0hGUdtSWb9BzOL2ClGHZIJgkhA1Vpz3RmT81uXLvlrnvuMKy0EgGujBUF8AaGQqamT1IN6UAC2IQxNlS/+803r92x+aGHXvyJX/5HV6299p1veedN19+8qm+EQClMz/sBAmGgr/HBf/CBD//8j0/n1EyozoiES4iQkihDApnDI/bkkYFI1VcpmX8CwImAwEYNnHHdu7Zf9PVP/PlzDzx37k07Msk0VVUSsQaxUE9JfxQVQUj7jXm5xDErCl4+ecKOAixEQoZVFJwY67WVIcCSMYZQM33GmNljC1/6zJNf+euHD7w03W+GNzbPadJAvZYwQ7KMUxYFq1nbnFjTXN0VN3Ns/lMf/sLn//jL2y7efMs911584wVDw6vy7kJbFzJYJ05VjWeuB2FeEjn9IMCXzax+omGzyt3r0RpUlXBBlkWSZ4WtWNjHRUiEfP02ToxKztCGNL/0yW9+5r987M7R8fsu3DZmkHRtl/iM7U45KNU1dL0pJlQmy0Wdo4WY1Phf2XZBEYuCAgQYR2AWe+7kRGqSR194+YXWI5e96ZL+Nck0SQKqS8oLtPeFfcNjo+MbV7vECYl63E4C4qPx6ovEgWUe9mJhlhPLuuj/cEmFfl4K9fRsfRYx//qk/9J9on4ubrhn1xIjFT564HQ2n19729WrV4/yCujdO1YUwOsf6tTNz8w3qG66cE4sO7hun+FLtm7cuGb1zst3PPipx37ml//xurHt737b99xz272bVm80qnUyVVzIEO54090f+dWNu04vjG8YqjtLhg2xFYLxtrSvjq8KJmGfceAplRKN+FhqB0QUyymTJTKMVQb3bD7/r//X5378yotrdXScE0HC/ijqY77E8DXFSBnK4kRACQfKPwDAV3rwBBwQwAF0Cia1BJnlyDe0FU5RmzudffULj332T788s6c7lI6eM7w1obQOTpmdZAJO6knunDIlTN3M1RKTMo83B0YGB+YW5p9/ZP9jD78wsWnwlntuufHWy8fXDxvTzpNWLg6igVQamxZQhP4XSXhEWzeWcyu0QkQeNLJ74jclw2iRS1DsVORckRrmwI8SxwrDI08+tvtT/+aPd6aD37Vjy7gRwHVTZGqmZ22WASTCsVAGSvFffa7Qe85o5Mac6ADUUIjBs3EiZICubB4dHb/sqi8/8fSjf/PEle/e2aizUTdAjT27jkwfm778zmvMEOfUJQgHt8LHPyMGVNzOZSRrL4a/5HtaXhr7r84qx6nn/zeIwyzvghRfUbkJKZRYwZJoRgf3HKubxm233sK8gv0sHisK4A0MZ13edaxkxCSgDE5YrXW1hEf7+oa3nbN+3ZpL9xx+5GvP/Oc/+A//9cP/z5tvect73v6Oay69sk51VmOIARBjfO26m9/yXY9+7KNXbhjkBB54UZBoaCfrVIjIqFENlE0q2oKAAOKicxcBxFAByDeMabC7csvWv3zg87te2LX1qnWkEsrVCEVbCcH7BzMlTMZBSK2owie4FuYnBCE0KqpMPu6sPrvAgQkOBqau9dYZ+5UvPvT5j3/t+K7Z8cG1W1aNplpLiQ2ruFzZeRNWnGOQZ6LW6ybLcwDEDOtG+gYGBgayzJ45PvNXv/WZz/7Jp6++9aLb7r5+84Wbmw3T4VbmOj4pgUt0mqBK7C+/yE4FgJB+XAiGSoGKCAAtFl9V2VGUuA5GO2lEgMipMDOTEGUNW3vxhan/9Qu/v71L77zyovX1OqntJtxN0OrKqVYbSInIhmZh4Tw9oj64FktN42CaF64H+R8oVIkCRFTY5v2JuXrb+V87vPv5Lz12+R1XNU09X7CvPL17cNXIpvPXC7o++66S2CGhGl48Q0GGLWGzJSuzVGzqIhEcIba/k7HoNEu/iDGAynYKQDNkM2722Py5a7bcdPPNJmZKr6iBYqwogNc/1BgeHOl3hNOnZ1SMgSIh1UQEhjRJadVI/brLNl+8dd2h26588rHnP/OFP/jE5/7i+qtuf/fb3nHHjbcP1pspEkNg0Jve9PbPf/T3Zlp21WDKlMNBwRHtZYIJFqd4ES/qa+koAUi8G0DqO0h66qAIlImhItn4cN+59dGvfPqhbVe/K3HqTDxa7FXixAHGdm1n1o4MDqVpn5BTzS1y+ELNauHLYxKI4NSX7LeUkGhXjCbGkNb66s35U+4rn37mc3/5tfkjs6PNsfPHNxoYBjErkBPYsBERGAKRiDMgMkZFcidkCAo2UBBYybqmMX2DY+PNwTOd+Yc+/vwDn/nmhVfsuOvNN557xbmN/rrjOYGIqgtCWmPZagBCsfy+QomCr+BvnI9rh0zRGGONIqM3kNDjFWhF9nk5KapKDKPgPOm25c//w0f7D8+89dqL1g/WAVEDOAA812q3MquJrxJEKCZWniosb3FOf4MEpeqFomiBEJwAQNT55GSkpArJ88lVw9en5zz63IunVu+78rrrH3r8yZOzU1fecbkZpC6rkjJERZR8+W/fs9KfPFyfagURqz7xFTCnKvEJ5acUC/+9btbmGxxLpD9VBX+YZuFTkHrMkpUpTbR+dP9h17X3ve2tE6tWsfeDV6R/ZawogNc/mNmMjU3Oz3dbCwuSW7BnDzKxusyxcSlxLU1qA2Zo28aN68cvu/rC5144+PDXH/sX//cXt67b+o573/Hu+945MTZpKN164fZkeNULxw5PDpw7yEosxHAiHsYRUWM8/9/jGaqhKDOiIAieu8R6O0QgIRgF2SRNL51c84knduVzjhsswZIkB+9QkAKGkXCyd/euxx78xoYNG2646frJNaMp9+V5R2FB7NRyqDjGZFJ16jQnI6xZzdSb9Xo2q1/9m8f/5mOPnnxlbqw5vnl0nMSkMMSqYok94KCqxMwOPuvABFnM5JuhE0icGCYFjPF1Q6Vu0omB0fH+4bmsc+gbh/77I78zsXXs5jsvv+6ea0bHh3PtdLWTs3WwrL43DTP5zGmDGC4prD0NLM/IwfHgUCntF4m9ioiOixxTxISZWFmc1hIDV/vMb3/26Nf3vOOybReuXdVwuXAqhqywU3Nqtuso8eW3PXuXishE75nC3fUluReBUSXwH6YacG6CT+MmEhiCzTcOjOTrth5++tjR+qHnn949vmZ84wWbrLGhR5sSlClWcC2TsFExnHuWoYCHqJCtizaJAKQiYj7LOgrfhrHE0Vg6z7CNFnEi8pU2km7t+Csnh/uGr7/mxiRJVwigS8eKAngDI0V6zuTG8cFV33z86TvvvcUoibocYkWTWqJCVoSEOYEhHa03B8c2bt22+vLLN+966dgjDzz1S//93/yPj/z6O+9713ve+YHV4+uvu+uWz//ehy7etLneB5PnSJKk1rCqgCaqzlkG++RQaNkWVomEfEcQR94nUADwwVxVVQOIPX9yNH3qlRe+eXDz9VtF2w5i1YB8dXRHgHNiTHL55TsuOH/Hgw98/b996DfSvsbNO2+/6orLB/oHQN0k7cJaYYFaiFVQLUnYoV4fnWvrlz713IOfevToi6ebZmjT4DqjSQKYhMQ5AhEbQMVnK7CGQnIeePDV1nx/MfWlC1iDt0GAJmyIoVaIabDebJq044am957+ww995pN/+tD1t15x/d3XrNu+rmHa1rSgWS6WoOR8bQtfubTIJAtpr0ylmEUUar0FhcJPQUEWQlEpUEhJvfhwhFSY8mTf86e//McPXDLUuGbTqhoLQdlZuLRW65uabx3vZC0YB/KqOSiliEj5yRQ1mDw2p1piMdWWt8X8QlUiZlH1QWErMCBAyOLcybXJXONrX3qwJe1r33INDaZdaiuEJR6iTEFAZBr1XvuryvCq+R9npiVUVvgQ33YZq4t/C+yoimYsLiuWbAKpgeXWmc7U4dlrLrn6uuuvYQhgvt2T+99+rCiANzAYuGT7RZS5fQdPLOTdRiqAGtSVVZxAhcRbtAQiZiWiZrO2/YLNazeuufja7UeOnnj+iZf/6Au/9zt/+Ntb1lzYmWsfaddenuoOrakNGzYMkZzYiM/mN6LRnAk1dQoeSgEOBzs18NR9B1tiNcDEYN8k+OGvPbXxxgvh5hEiuCLRdlMmJ9akpj6W3n3/HVfedN03HvnG5z7z15/8q49fdsGlN153zdr1qwYH+7meUZoZVRV1WTY/p5/64vOPfunp9r75Jg2u69vEztTAYBFyog4cZK4K+eZkCoAksNpDFyxwtM8lRlkjSk8AVJCYRFRVxZi0DqwdWTMmEzNn5r7ypw8+/IXHtu7YdOVNl2y76tyhycHBpuScUTd3UCvOKqAact0841urMfhCzlbk3qKfohYo62NU3AeQaKqUDT74sWfyw/NXX7ZlXV/KImyMtaCE26onW+22wiYGkJDRvPhcXhnGu6jwpqvnslexlHjSGLUgOC2Be99VTQVIWDI3kDRm5turL984es6Ypa5jR6FStGeVBvYUFc2eF4/ivEWFkGUkuiIGJCqxlb8rF2CJ9MfiexZUUFA/PmhFysQ1Sfa8cpQ7etPOm4Yade/9rLgAi8aKAnhjY/vWLeeu3/T43q8e2T21/aLJXFqiyhzyprysVkhI9XfETGTc0GjSHB5cs67/8ku2vvX+W5584NlP/tkDu144BjQfOnho0/Bko5nUSY3COnCaQslZRQqQMlwiwmAKFZWdghypD/5S4HUH155BJOwyqdeSjQNDX33ilfaZbr3fFwpTXzUgMCFEhEgpt9IhMsPjtXvevnPnHVc8+/jTn/3Lrzz52BOrR9ffsPOWdRvHJyZW9dWxf8/Bp5564ZuPvNQ5pSP1gbX1CaaE1ZAhIZ90acQJcYixRt+kCHF6rMpEUaGxjxi02DLYbhBRJd9/i4hgiE1iEk3qQzximwut2We//MIzD74wuXXVph3nXnrdtsnzV4+PDyR1y2me5bmG9GFAySgzEQuTb//om6B5YI2qPkCpTquYSzUE4JEFNgzF/NTMS1//xtokPW/dZI0JxL4DpJB0xZ5udUPFn4g5RwcE0WIlxOZiFLsfhxOUQyvk0CIRwaMcvhUNEVRdcE3E1A4dPpH3J1uvPc/VbJdyduIngeIaKpkSFNVt9YwozheWYnmBGQG28MzHJfw7HxQXRXvuTiAFx4pASkpw7HI5svfQ5Niqd779bexL6/7dz/B/u7GiAN7YGBsZuv6q6z//2Ff37j5xwaXrnQiIPdG7qBNjKERmmTzVXhNlYzgj1RqNbxi8+/03X3z7xV/+ykuf+q3PP3/w0ONHublxw2RfkgBMZG3ORtMElhwRsW/LJRTfS1USy8SIqalKCHlSUAcYynNxoNHB4bmDu48fOr7lotEcuYiGEsRgkE9LJSEBQ8jmKnlX643k+luvvvr6aw89f/yrn37443/28fkznZSG8pbMnZqvcW3z5JqtzbEG9xlYIdd1VhMVHxsVLZoUMgW4VVUBDtnFxCEMG0xdlEIf0cMBAJiERRSqzIaYmFlEVDVJE064rzE+MjTabnVnXpl55NmnH/7zJ0fWD5978bpLb7hg68Vrx9aMga2jTIWdSKbqxHriERuupQakINEo/70O8lQrAIsMcMQYAHyira8Haujk7kOdw6cuGByaGBwwyJM0ESuGSUxyerYz37XB5VIgxN0D6BSRFoVn5PZEBHrboFFw/1AY5FTuGW88ETOYrGLe2hdOHNyw87yBjUNdbvtIeRmqrVj3iGb7ciM6cL1fL8Lhy3lWci20svG3ZyyZwzKTpsqGGoJKpATLpw/Pdea6N19z03nbzzN/t2GK/43HigJ4Y8OYZOfOGwd+Z81Tj+1983dd5UusSURiIrAc6BZsOLI5oA5JLcmsIzgRt3pt473vvWLn5es/8u/+8KuPnRysjV+3ZniIXDO1SS0RqIPRTGsJQAiJuj4/mGNVgCD0PUgtXl5YUaewgpa4NG2Yttv9/EubLrzW19lXAiuHejAAQUSE2DDgxDLIukxEROTcS9adf/77Tx2f2vXc/uce3f3YA882GmYgSWzmuibjhOpp6iyM6bOaw0lChERI1YmEdQCph1GIVJkisq4xrTayRwK1iUJwFlU4mlRJKAZQyVpnjKn3pXXFYHN49eiEip6enj91dOaBl5994BPPT2wYetPbb9t2yfozC6ePnzx1YurUmemZjus6YxsjtQ3nrb3y8h2TE8PGhAQ3DxaQFtWpewr0VLigfor+F2OUTuw7mnTbG1av6UtrDFGnhlmEusDJhVbLF8NjUnFAlNblQ+SlUeSGVjyDOCqURvWtFQqopeihRiJiGGBWTjhNn3z2BYzVL7jhki5yJUmINJfQJRjFRVXjEMuOakGF8t/i2/CYo+DdUuX7s+iUb9MoZqKLPg0AIgGwziUJJ1zTPDn40hG27s1vfUuSmBBdWZH+S8aKAijH67APlMFXXnHthedd9MhjL50+uTAwnjpPrrEeiDQxpkYEqIA1ZKiKiHZhODWqyip5Ls5t3jrywX/5zv/28x/72p7dDbflktWDCSkSC5MQm5QYBFHJBUzKJAKCLxjss5IIECZmIYhARB2oZW07d9Nz7a6VUdPc841XbnvrdVKDkCvs3SB4VBNOnEBVIMaRJMSOISRzMs/SHlrXuHbDRVffetnb3n/Xy08deO6xZ154etcrJw8n4GZtqNno62v015JamqTEIGIRYTIBMRdPxTEetgLYN9sqIBAABZAQTGSPm3mmPxOUJZCciJCkiamlLII8F8OSq7XWZt2uI5cCg1Rrt7qHnj/wP1/+yPDqZnM8aYz0rV4/MTYxvnlyS32oMTTWbIzUGrV6bn3lIsDnQPuUuoiRxychmtzVQaECB4vOnJxKgLHBJhs2UFVWJhhqZZ25VsfBKFSljNNUkowL/YZKrBnR9FbEWE8JPlXiAAT4djwqYGMAFWZnkmOzM/vmT13+zp00bJy02edoUJEqUurUytUtsuvRK2CXeRVCMlqE/gUsIUrub22oElI5yN/d6IHt4uzUpOxUEqsyk08fOLVm1eSb73trLUnwut7u78Txna4A/DsnDoZf1wNCoP7+wQ/+gx/88X/744899MTd9+90MhOQSQXBspcT/g1RUkq81GY2uWQEp2QkkyRNyJC47raLNv3jX3zXf/vJ//7VA/uazR1ESGxGqfZzvclImRNCwhBiozF+6fl/ML4Qj7Nw0Eykk+cda+dsPjPfzbrqYNYMju5/+WjeUdQTpVxFDCUgUOhiwKpC/qU2GgpHO8eGBYKGdNSSOqolQ5vNdZvOu/z2cxYW5o7vO3Z416FnH91z8tCpY9P7bYcalDZqQ8PDY/19zYQTVTDYgNgkqs6pI0++EIUq+YhsLDhR3AUUUtKYAFkwOGEnYsgYg27X5lmn3em2OvOt1ux8ezanLieukdb6m4Nr1/Zzvb9/cuPg2qG1G1ev3joxuXF12kjq9dSkzASGKIuSCodUNkZskiu+9ZpWZCPgIxMKJWWwQlXFK6lE0JqZJWBosGnYk2ZVxVgyp2bbC7kIJz6H2kSfJpbuQCHHo6wkofLRistQkPKJQrjXgRygSqwwKpKoJsTChk1yRvKHXnlpzeXbVl+4rkttQFigKmRMLBsSDtxzfcsI6V5fodRMxYJEHD0Y/+rrJKkKgblX3Sx5mapOxbc+KoenGBUJFCsCSJGAjcXxPcezM603/8C9Q80m0TKlKVaGH9/pCuD01PT0wulV61Y30UdWmJLE0NlVAQFg8D133nbOh9Z++W+euv3+25nSXLs+RBvYcAEFUoI6dQoisCoSk/gHldn45ldGa9LFhRdt+KGfeed//ZcfffLkScjQALmaceZMp7+W9vfXm31JatgYNkwEmICZsEJFIM5lVjvWtl2+kGXzWbcj0smVJbHEjSRtn5hzjnKnlIiqeKISIBxiaaH+j3cJPGJALrjUChAnVsWxsBEdpYGxgYF15597/bZb3ndv3ranp6aOHJw6uuvwk4++vGvP3vbheWMpTerDfcOD/QONepMYtSRREyIPHA7MEGUyGkRvNOaISNlJpkqZzfPcZjbrZN12t53n7Xa3LciQYGSsf/XFqy45/9zNO7au3zyxamy02T9AdadCnIASdpw738eMHcGCPDBl1OfxRgkL+AqnHjiO2He40TGzqUzIhZfYKgpHWaubMho1hioZn5DGjpLZVtc6KFOo3sQFmh5DCRHkKtRBaceGpKoyPhtSEPxxgoLQEGgicsJITUZ4avee+SFz4x1X5EnHqRivzZRFVFUYJrpdr/k2BIyu8ntBByq0UkgphJIKGYU6gJhMoeXikRTfLqFfTo4WXUXpskQ4SsixafGJvScbWr/zrjfV6gm8pl9RAcuN72gFIIpPfvrjP/vLvzC+cfK2m255/9u+95ILLlVNObRDWX4YpdWjI+++795f/8Pf2vP86XO311RyZfKixCpATAoWBPAbShCBsmMiVVhliCqDmJlATO6GN1375KP7vvi7X+PO1gsmJvqoS5yddinnmZlBPU0SwymzIWJf7FxFVcWps5qL7QpydRZwogIlQ7mIg+lv1NzRzsFdh9ZcvQmUxaQCDfRM/y6xQoXEfxaqjpnCIHY+ssbqiHz/KiNq2NS7tT6zYXR8w9Y15saLvvsH7806+fSpMy8+t2f/3oP7Xjg8dWTqwOmDtp07C/KNCGMZMgYDSmQ4lhdSVWJy8BUzhYnEaFpPav21gcnm6tVjk+tG1m5YO7F2bP2GyVXjQ/X+urB1iYoRJuR5l42KdQQShNyvCFV7kj2LqhIMs4p4QD+UuQgNWRRUxGo1RhyL5F0FQAwIVNRl6jI1hEZfQ6M4FuaZrj3RlTxKfWbSgnaLqHHQG7eMioGCiBbvSQYHiRRwoeIGG5/AAXEMw8xWoJw8e2D/ns709e+7iYeRkYIJvucLkS8Qstjsf61REHJR3ZNQSFBlCBRKiVDSRW5Ra9YsOSFH5cbV8a1K/4riCJpkKfwPCumGUKga1NglM6faU8enL7/88utv2OmJZCvS/2zjO1oBMGHtxGQ7z/bNH/r1v/jNP/jT3779+ts/+K5/fPM1tzf6E1/tmLWngGB8gfk973j///rox/74dz/1r3/l+9RKV62q7/ahKp5oTd6yDhALDEMUIGUVz6PzvQ8dq4Had3zgjgf/5pFH9h5tNkc2D5Kppc5Zb54uiHBMYvFSGiIAqy8fClWQkBdgLE7UiRLDgBk18PSxqbW6SbzsDaRD9dlRBB/BiJA8SpodVcxUn0GkDk5zJg4n0cyChASJJpyIwWj/wA0br7iRryFrslxac+3WTKs7lzkH282ybp51bbvTtpntdjpZlnc7mYhjNkktaTQatb5as782NjrU7Gs0h5uNgUa9WTM1wwkpOSEIWxVRsi3tCqs6FXGGGRCPdagngUtIlTNkBIICdtJo2fu7o/EjoLT2F41Yf0cBOJASk4EV1xVmpKlXYkRkHDDd6S64YB4LQs2jInaMuMT+tGWwNVJ8IrIRqV7EEW73cKIAysxkSZksFJTsO3HimeMHz731kvHzJjtZR4woREPZJih6qe+vZohrKWMXb1PsFvRCSCtnMZK4TufUkal1mzbSELyxEx6gYK6jR/T2lO57fSOWvCqeSv+hFuWqqnYMqbMuaddO7DnJubnlpjv76g1asf5fdXxHKwAAV1133bYLL9inu9/7Y++efunwy888/OP/5qubxrZff/1tt9x67yUXXTY2MGyApHg5lEBgMhs2n//B93/wV37/l19++obN29f5hlZgMpQQEcHX6xEO1D/24pUAZQ5MdBLP3HfMZGl87egP/l/v+0//5PcfPriref65q9NaLeHM5TkoMQkxuk6YCOT7RgKkvrBjAKpVALAEeaVMIpomSS2p739x78X3XikJhcz+2D41AtSlRVq+6D1EzeL103gNPg4NZoYSmBwAVisOxgKWEqWa6W9yc2LQWE44gVPyzYUDJdtjbSG5wZczFVZRJ+qszZXUiQWyXH1GlxKzQGAEJBAYioXhfBOsHuyeovyRYL4XgdQiy7eC9dCysj+mtvqDFwa8IZAjySVlGA65ZgRywOmFBSsoqegBptbFhy2miTI0WcDmFMVnsRsRxXbJCXJNWS3gOD29kD95+NDYJWu337S9K5mwEqk6V8D0HIV1z92tTqVHOaCkPfWoiugPUGRwAaTwNOS01pif7s7V54cHhoUcfJ0iVI4akx1KSf2GRixcHlGxCrYEqpB3Q31TQ4a7fGLvqZHm0P3337/S/fE1x3e6AhgdGfnB7/mef/6r/7yepvd/8J3dzj2nj0699OTur+763Ef/60f7O33bN2674dKbrr382gu2XTg8MFDjOmAIaJj0vrfd8+/+2y888fQ3N21b1+1kVAN72yuY/ghAgxedUARaCKgQwipCUCepMaJy452XvefHj/7Bf/zrbxwZuP6cDeNJwomSOFGX52Bm5xSGPX1RJMYZEDtcaKj8ApA68VZ+I61PHZliEIGYCQ6AEhOkUvIA5AGQiFEDlUyfOKiCV0cajYDgy58JgRSOlAQ5gUQtEalRNWwBShFEaQF9R4hJYzg4dMCh4HKpiXwYf3GFRPNmpgODQMZD+R6AC3weLSzHYO9HBaG66Kp0EZOkWI5CZJbZQ0X5BBISK4aZjRcvrKQd66Y7eU7Gl1wKhutyNdZimKiYHaIaqmwS8fegCtQzi4nIqObEfKrjnjh4INk4tuOuS2yaiThwMAKKRYvRh4oOWHyVhemvPTd/maFQLudGBBEhmzSawyMTxw6dGDtnnFm0UBflxVQ9AERlerbzlPqwUCTRWkE0+6Ph7xOAQ2krKMAMtpg/Od+d6Vxy6bU7LriQ2ayY/68+vtMVgCrecttb/vtHP/QXH/2rddvWrR0b33jexsnN49fnV+TZwvFXju1+Yd/nXv6zP/n6b9fs0NrR9ZNjq7ds2DIxPrFh/frDR3atmpSubc115zu2DSFDppYIEQwnRD5BOMQGoAAHaJqLd5sZJCC1IoaNyeUHfuxtAv2Tf//pWoqrz1k3YJhJ1Tr1gSwFiZASMamEJOAQJyx6NsZ4pqgS02Bz8NjRk5RbSgDjtxXAVGVxtPFQQBSVFaoiADEGFwATJY0GliKyGrVAnX3jSg1hT795TGyKejD69BUz0evMQogCqmDiGLGN0qG06zUmXEWrO045lisozOBCnRS0kGgiR+xFK3uVjJbKdvEuUELsk4wBOOGZdredOee7WDLFCfREVKMojpGAct0puBpx08pKhuslVeMzKziZ7din9x9d6OfL77wqGXKiuXJMBSxvlwa9EnQm0dnkezDv/Y2pTGrJoPKJ8EklAMv4xMiBl3a7mU46bPKaUbg4a6VSw1Us9/JmLDpP5TGLBTuWmwhpMEaK2m8AyCgbMeSSfbv3qtWdN96aJAkVpUZWxlnGd7oCYKK1o6t/4gf++Y/90k999s+/+O4fuAfNOhp5M2Vq9I1cuWXbVRfkQLednznRmpmePXPm9ONTD8w9PZU93DJIvuuH7738tkvn8zYRnFgD6eS5ITJsmJPEmISMAZNhL+0oxGEp5j+Z8FYQHGCFbJ69+4fvnD3d+tSHv5K5fOfWzYMGxKTirPggMvne30Ag8RCK6jaKUE84JHsxUV+tfubkkXyuaxomtHcBSmdeY8YqlbKoVwVU06GCAI7gUNQDCLhHkRpUZM9GQ5qKg4XrlRKWiV8VtqX/hasziogUBW+qkNdaTqQqQUKWWXngChBPkd6J8uhxy7B91EsaPQ4/dwF7S9gwUCMwhU67Djzb6uaOAAY7raxgj0DGIpGm5Q/FapUGPHzwhSjgbKoQxVzuHt+792TirnjLrfV1NcWsC4q1ZBlVVGoUqGeT/lROrpqGXC5CcWeCXKdwOwyD4GDrg/31et/+lw6cf8151tfjAwRVhyacKK6IVs57tgkVX8YHWxdtUXhH4teJiDRXO9udO9WaHF/znu95T8IUUxNWxlnHd7oCIFCN07fd8Y6Pf/JTn//8J7dfsvH6nZewOiEfWbWGMlbpH6TBwf4N2u+wltQkCrUKlzqbI8kzWJ8NLKoKOAWsJThmMsSGk0QMiDhJWLhPKYGyqBrYGJJVsSCDBDbLlOgHfuZ9eZJ+8Tc/30zdVRs39zETia/eA6iIq1CyQ1JrVbL7h96XLWg2mnbaTp04sXZ8fSYWvuGvC74CQgpPIWCjBRgdguKXqLEK1CiiKsEMq4o3WgKsKFB24g4wgDdJe/SNVjeIWidaw0G+hR38XApZoJG3CR8xiFopchP9XlzqFCooMhUztTLd8j9vRhd8SCKhlI1hlsBpIrQdTbWyILHjjrFCTQ8GEueL6jVT5QJjUAExBY4AGIYQhE3b0Td27zlCrUveunPg/Jp1cyAlZYUrnofiXGU4trjms9n21VEVzYUWCwrCB2SCjycChSSNfGLNyOFXDpx/4fkmMdZI7F2ki56A8uSLjYyetfdaLJo1flF7Ng/eJIyCAKfkmNTAGEqmp87MnGzfe+9bNqxfs1L7//WMlQgJiGh4oP8Xf/znBrrjH/9/v3J412lDhgDV3ACAUxXnXGY7TjLJOyy5cxmTKDJKYlw0dPAlUTgVgQo0F9d1tp115rqt+W5rtt1ayDrtvNPOux1n29bm4nI4hVqgbdtdyayydSyu+wP/7K23/YNrHzt8/LF9u1uUmqQBIhFnnfMcVVFfIEdFA+4tFSPOv0JEqNWMzXD66HRKSSGCCzubqAwJV9ajBGdAIXhdYD9FkABxX5SSp1fgBDHv/3DADXytIDBp8IdKr0j9h8w+8dl/4neP4rE4XDx8uJYyQUkr/kb1LhcfaPWT5SVE8RXFtfIgW1FrJkmJWUmVlUV5IcdCps4fmSkqZe+dabw/Xg9WJH+EsKJuKaSsAlAJOo2ZHaCUzDo8vm//QW1dcN+1YztWWV1QVmiRYV29j1FnV5Zgeemv0e/wd6H6SY9aD6Eaz4oKk1RSotzko+tHbO5O7j9pxDCZIuF78UnjjetRf4GIVaxGMR2/S29FQUB9IIdCWA2hiamqKLo4vn+6kTQvu/yqJDHL+hcrY9FYUQAgUMJ00fYdP/uBn84Ouo/9r090ZzklGCNgEjgruahTUhVLIBFREStOyQIWCl+KM9q5SiCBOBFRdSJW1TqxVnKbtbPWdD5/KmudzBamstZ01prptKcXWgt5t+PsQrebw+awoi41/EM/931v/cnbnzo59bmnn5jpZElaVzgmEYgDBBRKAgWZHF9TggfnfSnIWmIG09r+Vw6xlJQUhYcyQhkcUKDOUJSkBIRK9nGJKERAKwXGNJKiKtY/9fwprV8vSQvNQuU7TuUGiC98KSh6ty0PBipw4FKvxMlq3L5UVksAkEqhy8oJyj39n4icaAHfiTjTgOlPLFTBCs7UnG515q0DE5GDSCQiFTBFPGVY+Mqf8KWAJJTMA5RKuoBPViaTtoSe2ndk9+zsxXffOHnJOqddIk3ALAoJlLNoPUffSaPyrqzb0qe/vJ0ViVma2xV9H5kBUuh6MpST7Z/oGxpo7n95H1kmxz7wgAIyDKtass38k1qR91r8TiiouRXXqfBv420jgEmI1DdKJrBBahf45OEzo0PDb7n/zWbF/H99Y0UBAACD0zT94e//wPfe/e7nH977+b/6SoK+GmpE6siZNCUmiKoDKamIinqEhSrkiRiQIoRsJ40CShVwUBGxKl21Xc26kmcu79qsm2ddl7W6ndw6J2Ild5oriXU5p3rPD9x20wdu3Eetzz7z5JHZhbrpN2CIqIiHPSTEflW8EUUhMOBxEZAyYbAxMH3oNPvpcZwthVyFCs2ikLMF+loMDa9xj6cQwZOqvO8R11r8WqRTLRqF/4Qe21ArB9TyT6lqgnSOXyGUnlv8p3L20qoPXy1nH1LP/72eBIFIiEQl0WS45ggkCjWZo6lWJ9NEQ8qpt4C1WC1afIriLGWgwidzqAr5RC4GEYFJCCBecO7pvXv3njl5yT071126waGbq4UQQZlJFIH+E0VuYUQrqqqvxx0oF/JspvIivREeCiWNhWih5LOeE55Yv2bm9JnudNc4f/sK36n60FTO2rvS8YdKBANa6sdCK/vHLRaiCK1SfUWnHPPHF/K5bMcFOyYmRtlnUayM1xorCiAMJmrW6j/7Yz9760V3fOoPPv+VTz6MLE1YQbkonKi6opu2gEnhRbBEYmJ82IO5E+3z4PVrACq8QgiVm8WpWHFWRFRycQKx4tSTHCFwWX9/864P3n3NB24/OpR85uXn959ZIBo2rm6cCdmtXgrBW+e+/UhhWwY7c7x/eP/LB7sLlj08oRFkCKJfKxKqB/pdMqoWfLH9khetOEblJT+r8RmVaO95qbIRUWHml0cvz/pq5u3iEy6dbQV/qKoNVA3nqG9CuimQoDbaZxnOqgjPde2ZTmahylXUvWKM98wtqF8tXZ14mQTftoGJQkcJApmkK/zUvgOvzJ/cce81629Yl5s5UZeyIVVxEtsdczXfrHctlrtHVIp+XXaDyoblj+Ho4RH3aY+kEOjYmjGn7vShqSRPIpRXqvOqs1M56rJqOLCXlrvdxVwYygIViEKJiQVk5fihk2Rx7913psZEPGtlvMZYUQAoqCTMZt3qNb/2i/95q7noL/773zz24Au2S8akFk5UTcpgdSJAIPsH67tEMQNAGR7WKtgBBFwm0GhUKFQJCF4CQmKwr1CmTkQtWVOTxsBA36V3bd/5wdv0/OEv7nrmiYP7lRpNrkvXiViBE4LAOdVgJBeyHwDAREONvvbp9tzpBYKB8Vz4KuaBimnmp15Y84VNHz0FhFTaJSKmgpxQcbTCVEfVHo9/CmlZ3X05bVHF/eOXvUpLK1tRddUXHa3nhhSOQXHuxU9Fz2fezlZAIUMTI11CO0eX+HSrPZ9nAY8I+r2ELGLago8hlB5h8X2Fn0Mq8GR/5lTAoHTe0ZN79+1pnb7wvmvX37ChjTOOMoIoOX8vJN7yyNXR6hJR6Yf03q6K2qSe1Vp+ULllz13yFyCqjeFm31D/wT0H2TKFhgXxXL32QuX/ZfT9koku1p0xyB7y6JWUoIkyLeipg2cmhtfcededafL6KjuujBUFAJSeOoMN89YtF/z6f/zQoIz//v/4y0e+utu1+vq0hkyttVA1xCQhQAf4F67Xs44MF1RkjUbz0Rt5XuhQ5NDHjYPkCulOBCUHZwcaKZPw6toNH7h3431XfmPhwBdefH4WSd0kiWNS45wvLdFjTRVKicH9tYYsZIf2HyVKVJSEWA0JkYtlXyL6X5Rm65XXFMLbATnx2iru0SNUS5+/Io6XhJmX3IC4V2/MYJntqKI2yj3j0hdOTylgChGMxY5Oj3yPEZyoxRfDJr03Wd3Y5KhlzGeuJZjqdrPgiZWyP/JmQrwyrEevTgxbhugFxZ4OagAiNUmjbfnxXUdenp275M23rr1mU5cXHOXECjjfRVkWydfKL+XlLTXul2rvINzP6gdoXKTid4oXSUSUAE0dWzs+OzPrWo7EFDiYRoO+CNFUDlk9XRkLLk6oceVLbVU2SiBEP8o45rw2O2UXZrLrbrx5zZrVK82/Xv9YUQBA5dEmcFJLr7th56/90q/Ws8E//MgnvvHYK2mWNtIGG8o0D2iQOilN3dKiLP7TYBhF6UMAqbIUIEMMqKl/jWLXD/KFngECqaiApWYx2GjOdRemdWbypnMufvcNB8dan3rukRPtLDUNsT5WS059pbJ4Su9ogEhRU65ZPrjniAusfzDAPpEntspbbD4XQU8qXP6guoqobyHWKYiCGJMu5FyJ3WORcKkojUIQxB8qpvtSNVCKESoUZ080orD/NRbWLO9y2FErWwZDspytVxnFh1H2R62izESi42tWoZGcyd1sbmfbmaVYWqkymaosKw6DYg4azVhST6oJ2cwmESIlM2/dIy+9uHv++BX33zp5xcZM2pl0CarwydP+6ri4d4uvEoVEX06sVx7Zypav5gH0uBKlolYAIurqOrJupNPNzxybh41mRRnML7jES85ROSgtuV3as5VfouIt8zF4ERF0zMn9p/q475Ybb62lKZW+wsp4jbGiAIDKo60AA2lCt+5803/+579ippt/+ZG/fvqRfWj1JbbBlMCkhtPU1MIb46vz9FjOlSIvFAx9jv/6343CKBFYyUeKDZTUNz/xrXABFQiswrFoHzglm+eteVoYvXjtld9z2+wW88Cep493u6lpsPOBaBshqYAthYpgIo3EjDUH9718oJs5wDBCl0IlwDjAeQ4moAQh/6+XSwpSsCpBGcoKVjWB9VQ1t6vkolIVVMzsxe/iYgs8iIsiklKag1o5agF0hBdcI66CUEaitBajxb/oxBW0jVRZ1Iga1URhFCxgBQezvfBeQnGPeFkKgoyuGuX+5sGZ2enczue5hQqIo3enUGESX/M0yOhIyw1PSWREeV0s8CtsPI6U1BYcPfDS8wcwfe333Dx53UiXpoSyWpKABFCwUKBo9ay/X7tyAXp0wdKHvrJ1j0+1yJ9D7+6FVtPADVMIYMkOjDVraXLq8EnjjEeB/CoWqSW9ZZHCWxLNoOIJBEolHpRpnE1pkwSXSRwp1ZIUmTm+98SmybX33nOXoUBwWPECXs9YUQA9owAjaml63z33/5sf/7dzr9BH/vMffeMLz9CCadp+zhop99dNmlKijtTFalUVH6K0SqOlXLzupDELqrAyw7MeWv+qqm+DRRSKpymUFXVJUqQiZrbd4vG+a95xK++Y+PyLT+w7M00mdU6LM6vPR1OVkCIgJBisNU8cPNHt2FCKGUJwFNNy4eV5CfvE4UntFKVNgeYwKFQX0gpwFFmehWHe81spWhb9QWUSxZtffLko7Fz5r5hjz/2LYqdk4PS4NkCF+aRxBQrfpVfmqULFI/8UCPoqcCpucKh/eHL1/ump6U6365SYEaqQhp2DegumdeEH9ThDgbPrg0FE4NSSEU5PtrOvvfT8yTS7/r13TFwy0XKzYvKUiFT88wblwq06q5R7Tfv3NTZYpAN6dwvLFhO0WB0k6a81hwbPTM2yNexK/VmoberV+v7u9qx7sKYqofHl5oTwuIKZCGRs0jrdXZhq3XzTTRMjQ8zmta58ZZTjOz0TuDoqiAEZoK/e9/3v+YDmjX/zSz/12T/6/IF9e+qDAydn8vpIdu75w9dcdSVToqrMxdNakkIDzECFQxBFIEg98hKQ9WB3K4HK2r0BxmGQiiixkCh8RWRiMp3uQqO/ftl33/Ykf/2hbzzfPWfrprFViaq4XBVkjBeCoQ6RqDE8VGvuWTg1PX1mcGRYVX1DLAdiTSmGQsuEgFDSWIl8s0YUZquqr8CpUITmJRE+CFay94EKMzF882rLXhygMF570SJCkd9WmvfL3r2INvRAc9ojN1C5WeqvyedHKMAcrsoHVVVDklGRQEsggZCSkMH4uesefPSrl7XPN5JaEjVUNBjwWJ4HePzpGJASZClSp0mdMhwxExKlREzt0PTs4/v32mFz4zvf0txU6+YtGGeUVSyIjGEJxJ9oeSyvAsoaQJVlPtvSvbayKLfUMmQVV0l9fRLTTIbHR469fNzNZ6bBkpCqqE+WKIpreK8lviWhTXRw+CIrtpx2ufDlJ8UXCqaExWjO+3ft66s177rjzsRQebKV8TrGigcQhn+aRX2zxECrbiR05123XHXZVcf2TX/tk9/4+me/fmT3XtvOIanNHIGMMUUANR5oEeeNNMbCECweInCwl0uIwX8bPvQdpfy3qprn1pIImMX70pw5O2/sJW+9Zmzn1of27XnpyAlxtTpqCacgOIFC1ZNMnbBqf6PebnVPTZ2hBCCnJELO9xkGOHISEXMZiLw/UYnxFgYdR+ckCvYiTlBh4gfXpgxuFGtcsbL1LH+W3pnq+99zhDJDqOdX9B5t0c9hEYu4BSCRlqmqtrD3Y3d7je6aMkCk7EidW3/5xukaDh+bIk2Nb7SQsIaFKrKU4DVC6POCkLARWraIMpiIRY2jREyy+9jxB198RsfqN7/nnoH1g7lre3VDodiqqmh0JYtHrserKJ/l12Pi09IfX2OUYKnndhEjRp6IMTw+7MTNn1owjrUSsY0TXF4JxbMvB9pTRcsVRYQK/8/nindwfN+p7efv2HnTTlqR/m9wrHgAYXgMA4DEIsezrdaXv/7FX/vQL7+w+4Wrr73krnfcvvH8tY3+OtVVTcchd+K8rUiBDgFv4Bc1bkobrTSaStc9NqAiICCmKI1gAqmKMqBW2622U0fkCMxCxCSkXelqwjveehXX6ekv7pbc7dhwTpM1t21f50aUDAfqasrGZHTm1Bmbr2F1vnl94sAkAoBIyHq/pDS3i3WpGOhlykOVZx1QgYBHF4a2f09jNR4UucQVeOH1B+oUKKCBQh1Qr+DwbkvP1BbdYvhbFIqVUbgTFTNIUVFtCPwoUHErlUjBEJE1567lUTo0NTu5bkK0a+HUGVKmcPQyiZjIZ40ENeoRf6gaKIiEmJO0w/WnX9nz8on9G6/beundN+oQd92cGN+nTaEgBknhm4Ul7o0z967W8oOW/a7qJb3GqLrJ8LY5FfBW/1CTGccPnRjdPEwNIlPg99U+M1XosThq5dclz19wwkrZHp5SFkq1Nn3iTL6Q33ffW/oaCcc67CvjdY4VBQCFisAwOejc3NzRY4cffvThL37xSy/ufXbBzW84d82/+KEfO+/8c2oDqmQtupZyKzlImIxCqGgn25PeqIoKQO3FV8D2URQ2LgQhhbqP4dUIyLsSgbs2Pz09R8LETGpI2YAAUZgcuoB8x71X9Jm+Rz/37JE8u2XL+ROadl1mDZFhgpAgBTXSWpLx3MlWu+sSypMECRIRFViBUYKBECCh+SzCW1qVrMH2il47wn8AyEtdBNkUPfboBESvPZiMhWMfwCMqqN1x917Mpnqjip8Ui4REcRoKyrf4rtQRPZvFA1bo89DY1WqJL1L0aASgIGNVRteODq8fOfD0zKXsb4qX+yRVFql3f+L6xNMRkfGN5BVGkuR0p/Ponr2HO2eufctVG687zzVdrguaeEJYBHwkPGFl4KhnEeI1R9laoCZL5P0yOoAq/y7Z5awS1Yt+QlHnXOsDadIwc9PzjJQ0BwgqUYn20JSXBWoqiFZ5eVT5LvyoxExKFgpkfGTPyYHG0O2331GrpWeb6so42/hOVwAKOMVXv/7gr/zaLx06eKDVPt3hvDleO2fLuju+94prbr22f7gfqrm0uq7j1DlxYBAHIJOjbVI83VRxksuzVBAiwJeLVwCxQD5rkWVMHmDxAVgjglMzMyfPnGE17JNeGHCWwQIBUabZbG7Pf9OlgxsmvvQXD5x87vHv2rxt48BoK5t3ImCFU2apMw1RMj81O7eQE2eNmjYMPPyvCpBzKkQG5Dt8e3YTAiGJwJGrHuYYXmB/wUHf+e7HlZeYPDjguTFKUZRR4Ql5YaiFxxETbYOnsMRIrPxYkeVFlDm6HsUMKhIwHhFUSPGKpAlmdKgwQNF9UdWic4N6/EVhVbvkSCip166+ZtuXn3js2ML8qqGaFWvIIVJZVEshFs/nU/WEmInIkXGoW01fOnrmiUOH8qHk1u+/Z/W5gx3TEu0KhNWEPPMYE6ouRNXUOMuj/SoOVsVvWORTnVXex7tNxW7FsqnHMsUoN7l/qJnN5W5ezKDxze+cWp/dHFd/GSVPPf+VdkN1MuF0RFAVARuT1hr5tDu69/i1F9+w/YJtdNbJr4yzju90BQCACXNz81977IFrbr/80uuuGN/Yv/G8jaNjI2k9yfKsg2m16uBgRK1yYlRV1JWufZBB/jWvGJVRE5SGY9wsaAQPkfs3IgaAvfT3kQJV6dh875HDC508gWGweHuZFFAmCIQktYIz3YXh80bv/JE3Pfixr//5c9+8ffP2C0ZHa+JPbclJLjTUqHMnnz7dtrxQS6m/0ddf66uZWsqq7NSpN149SoFo78OXj0CF7kqG4Jsdo4hkcHh3qSKY/NJKJBOpMsoYohccMUiuwStCCXEUwFpYw4qFGyU3VRUGYqyBFpuWpYgLWWzFbQnfc2mbxvBs7G6rnlIlEJXc2UxsZvNWN887to9q26684MtjT+2bOTHS3JCyappJqFFcKjUi8rQuPxEmCLFTdmwWcrywb883jx4avWjbbffdMDipXcw7tWBDCiAyzAIQFb0loopCKK5ikag/Kwa0zKhs+yoCNKjYqo6N90KhpBAV6qP+sb5TJ6bnp9uNyUQDh6riO2pxJZWzFatVzWNGPHKcYHAavB43pAp0eOHYjJvr3H7zjY1aQkSV52tlvK7xna4A/IN40807JzdvuPe9916xcx032q1sNrPtLtSxWCsEMgQIyACk6pQM+ZAxBZYIEMrLF6ZrxIzjm0Ph5S1afGgBoERpqsQMhYgaMgxa6HR3Hzly6NgpowmDScGhnERCPkKgIBaQOpJM0RxJ7nzfLbsfeP4rj750am782i07+iBsUU8SMa7GVMttu53NaJtI6qbbn3SajXqzhnotSTklgNlPMDJ8ijWKmA2BSA0iqzJ0vQd7PijI5zoQB/3GTGRYQMpkAg5A6h2MMncginARZYaoMJOIEpdkqmBLF0EVj7GjpCFRIZV63YaqQlJv5ROJioZuylHvAOGAqk7UORHR3Dqr4px2u3nmbCfLW3nezvNWlnW6jtu6dnjNuis2H3lw/458dX8tyR0pOSWmQJKCFuXefJd6YkMMNl2p7T126tlDu88gu+F9N268egunC23tSMKqTCqG4USIIT48pFrIwiVs+qBoCu6T/7zAg17r8Y9x6mUOutwORYG5KMKJihQMohqaw33AmenjpzduX2t9owJvQvQcP6ZqxRvrb1bwpEsNXqBAod+Dxq7crDBqqOOO7ToyMTT23ne9mwvHbWW8kfGdrgD8GBoc/K577/+ND/3mf7z0p8b7QDVS53KrUOLIxxEIE8Ona4lnt0jFEKsYrKTxDUThlkZ4Ab7iV/iKWH0VXwITBwiaWEVbWXbwxMkX9+7POhYgCu3tYiEfAcNz+lXggMQSWadJg86/6+KhDWN7v/CNo898dcfqLdtXb2RKc3RbyOsGmXMdVYXt5nlLO7WWaTD66mlfvZEmaa2WsiHj5yK+B32V0ymAR3uDKe2znKJv4zODODgLCt882XDMegrgFiLOUgpHoEcUFyCNly9V+FiLIhQSbx5T5d0PdmrVHwuEGM9JVBXAiZfw4nxDBV+424V/rXN5bvPc5lasOuc0y5wVzZzLVZ0gV3ECuyCUn1m7Y8PxZ/cdmZ9qjq41lCp1VbuKJJJrVVWJjDdOFYkYzHQ6T+3et//MqdqG5u333T5+zoSttTKxlBC8BgGs1ShcyxrRy0E6FVeg97uIzxSe6NJ9zyosl5X+lSc9BlYQTXSv6whKrtZfT+u19lyb1TdG8BSmyB0uD+YfrGUuqDi0514FXe8tJI3JwEScMS3wzIn5nVfcOL56DXNE91bGGxkrCgAEGOgH3/89H/307/32b/3Zj/7MO5v1PmOcERtQa0OkjpVYVcl3rOXwyKN8s6LEp4hLFGBpeaIQD4zkzwJBEqgEu9Yo6UK3s+vQwb0Hj8+3MgUZY0Qcw3e+C/RqYqMQZRiw+A6RKZyqVbf64g1rN6164evPPvjQgX2nZ88dG2sMNo+1un1q8hyGYaEimjvnrOsqz3WFqJOapJ4maZr01dIkMY1aYpgN+StV8tUjoFAHeLKsKhiqEpchegxRw/lgnToiYvIke4KvPaG9HkZQBqoKZg4XWGBMQRYU4hyoIDMU4ZpCxAeBFJp4BUdNRBWkTh3UiVjrnIgTiKqIWhHnnKqvtC3OOSvinCggAufK2k/B7RA0mylptvrcNVuu3777b3YP9fWv6xs2XWVSYlWIZ3opITaHTrtqXjp8+oXDhzq1bMudF553w8X14SRHJirEiUgATOBFJiJztCJqC2O9FJyLpWivLV+mUhdqnCpbUsXboyJd6zXeluppgmMb9JNA+obqaT2Zn2mRYwYrQSVyv8rqJwGn8+crTYGeeQIotiKgCLSJAglSY+vHDpzKO/KWt31XaphphdH+rYwVBQAABNq64fx33/G+3/jEh5OafP+PvGN8aJQ0yyV31gpUHRtAKfcUfkRDRBEQiiLqp6QRoKBg9BSotKovHeG/VCWKVjyTEcApsiybmZs9eOTYoWMn51oOSA2UQa70LSKXSBQc3j8l8f0dRSGkTnMzkmy69+LxCzft+/yLX3/xlUST+Zqew43MWYEoV2iKQrkISLtZZ6ENIiRMhqmWmDRJammtliRpjQ2blA0zYisaL3YDT0UR045UVa2i2kKxgLE9PhQzeKPvX0ZQxNvLAekNEp8K61dFFQrfCE1EnYiIr4ap6sUPpOCiIlRrClBRyIsWFUCcOCfOOeeTPhSq6pwER0LDuYIQUl8OQxkMARkyAJjFSid31NCNl2xv759+7Jlnt9tt64ZGBmpNlTzRXCAOAiKr7BzlwBN7dj9zYrq2emjnXdesvWCtHUCOFnGADyXAWaQiBojUSS2kIlVM6B7sJIzlYZte4J0WfVPRBq+JF1XOU6iBQEyKGA1Qa6RJPe1Md6Qr1EdkSCJAF2JS0VNDBQtCz8yicgngV/xEI9uaRKyYbjK1f2q0f9WtO29LkqRwuFfGGxorCgAACKhx+i/+4T/7zOc/89k/eGTuJL73+9++dsvqtJYkyHMn4FzFqbJKjUgAG7NSouEKRHHmrZXwTPcSnLkQ/F7KEPvMWlZoN3cL7fbxk6cOHT0yN9vOMmVNgyuhykVlm4LsHr4hhXohouRAUGILJyCYpH/zqsvee+vcEyee+/JTThzqiRPnjIqIUaiyqJBKEOUxbJHnDgRo18P5hmDYJGmSGGPY1GtJkhhjDDMbYsMwzFBv2SsHi1U0WPUhVVQ0AO4UmC0RJephbZIi9DSICQQRcY5sGAVExMM1iBi+hp/Dono3QOJXEurmqwhERL1FLypQkdi9XH3HZaKQ2F0xdDVmaagoSJ1PEhMAMLyQZYOD/Re99YYXBp945tlX9pxubB7dODwwPNRsGiLDyMRmSqcW5p7Y/dyx7sK6q8675u6rhlb1daQdwjjKTBAVitGOsgExAkxOBZOo4gP1mv5L4qfFEXoec+3dz9PQSohNK9stfUdeTUXEnHeTJo1mX+tUe2G6VR/kwBBCgdhRIcxp0dQWzb4sC1RRGQSFMjETd+Y6M8fO3H/P21ePjxZJcSvjjY4VBRCGQjesW/8vf+Qnf/hf/fSDf/3CwZcO3fzWq2687aoNG9emjT7bmRPbdqpWIJ4ws5htUj7NkfGu6Eko8k6DFmamB0WsuMy6+XZnanrmxKlTp6fPtNuZCAPGu8wCCc93EVcuyOBRJEA8tZSVIKJERI4JbDXh4drG67Y+//Tzbq6Vu1xF1MM54uMYhrxlFpn93oWBQn3DWetyBUGok3s5y8QEYWJiGDZskDAbNmmaGGMSwwzPYyUCkYEWGlEjHBGTHorlqqiAKlYdJqQVcpWG4ZsxBF0RDHzx0p7EqRNRiCiciDhfvtWpg3cdvB+lqLSFD1UfIIG3HmMIGo5fTE7U1+4AEcCsKm3p1AbN9rdeM7Z97Z6HXnxi/4v5fuGcB+v9owODBjy10J5amG3V8mvuvn77DRe5RtbSFhKrSiSIRoCC1DswCDBjuC2lqR8nVP5effx6oZ349VK5raW0XiLSX0WMlideslF8yBWASUyj2Zdn+fyZhebmkbwamwnRGADUezil3gsrv1rC6iEiBifOHD96xi7kd915Zz1NVuT/tzxWFEAYBFbIu9797t/83d9//MWXW1P66Jee3Htg3+YdW7bvuGzblg0jfQPiWjnleQ6hRGMP8MgsDCBPJbAZwGvyeoCYNOacMvnWX90sn2u1zszOnZyaPjMzNzfXtqrMiRITjLpKW94KFKokFE1nAAySAmhSJVYGB5oLKUh3vbTr5LHTzYGmncvFwqVkQCRgX5YgwErsQfUImUTEgUKFOYkvqPWFhERBouoCfqGqBGbmqEkKpIfJBNaoNwaZiUOeARsTCEMU/0bHv+JJFbKuYuwHRQDvCDjnQSFxIioqqiKFyxH7pEU0IV5eATsUvwewRQsKanl+iiY2BUhKxDDBOiXuku0qDMyq8zdOnrOpNTU9fWDq5K4zJ/aceP7QcSjnNYysHnr7ffeMbh1s1Vod6UgCIjJqFNaR9RiJFhkWQPyxV+IXFeDi31J9Va6vVxgulf49mxR8hart/2rStMd/CFtKfDwZMDVOG2ytnZ+ZX0NjPdP3K9qjkopqWSUViQoDh+ITH6qLKIGMMjkDa44dPDExPnnlldcSvdp8V8arjxUFUAwFaKA58Av/6l999/vf357myY1DfX0jx49Pnzz90HPP1s7fMLl968aRoaGEG05z4a5TbxAWb0y0aiOrHZE7H3n+otDcuTyTbidbaHfOzMydnpmZnpmbn2+pMpSJjSf6M1RIiDiWY4vvSkwdKKJoPhytAXf3aQQAAJYakvq8Pv+1FwBsOndTK1/odDqUMozHw4k40FE9mh9eJYkC058lRF+9TuMiDSxEwgNp3ZvnFJtS+sOqF2zF8aJSVHgPgCOuVcqf8J9GwzsKEEUpfCK0FNRsyUoJDkGQHlR+BmivZCvcjriIGkmIVORjRwu8enD1VXkITA4hlUHgjFjOIZwwaKK+dsN5W3cOyJQ98Ozhh77yjYm1I7fedVtjtG8BMx3qKIeHw6kUUCFCSjkVmWdF0DPS5wvXqPyxsKjPitD0uqkx0lusNlWlf3GgVxlLdYxW7guBmCltJA55t50zTLwLceLlo4ug0Rcb+OGI/kUKdy0qDVK/8tQ6tTA3tfC+N79v7bo1ZwW/VsbrGCsKoBiB13nXHbfef/utH/vkJxbW1dFNBkabtkZnsvZDr7z89OH9myY3nLvpwvHBRi1JGbknimpIAoiPKQBAo3Tz5EPnRES6ebdr3Wy7PTfbOj0zNzMzNz/fViVj6v5QIVlKQBBmKFz5JsRSEai+I4iuvlc+RCKeSo8E9UZ38Nhz+2cPT2+/5IJzrzz/m0dfGDw1NdlcnTA7aEVQxlBtMN45CHBEg5TA5IO0EA15TaGVjH+PGZ7LoTHfIcri2HgACg4qTAVErNHIrjDxK8K6IpN6zVKKwHC0D4vPe9ajPNpySEI1NVujpRmQl4pILQ8YGSwU9yBHIGILAZOSEZWWdo1yYkg0E8m7+cILx1465/oNl990paPujCxk6DISn0ahLidfP0GK9uXhLleUYUEE0qqNjACiVa6nuHdaiNy4ZlUfIRZXK0CbssF6IW0rz3DPClR+Rs8Ui8fIk/gpTQ2R2sx6PoQAqD4XhZYmlFhUvH/xMIgPBhE48K5I/EPW4PqJYydlQW7eeROroxUh9rcYK2vXMxiocfJvf+EXP/nZTx85MjVyYqJ/skmGOamldc6de2Xfvv2HjkwOj5y7ccOGtRMpJ2wIRKQUemR7kiOVstKp5NZ1u1mr3Zmbn5+Za83Mzc8ttNvtXJUMpQp2ooCHU5gCbFSYar3mFuDhkcL+j8WICICIsCGGgWhdOWmbZx5+ZnjtyMU7L28PWDPfmJ2aWb9ukhIiQFl8EDna9IGsUaSfBv8mpCqAACZiYoWnZEe0IpI4K/ZhkPlVKaUhTgtCYIYgwBgBBCpcgUVJTVgij1B6JpUR9y+YuCXoEyZTzWkodg4Uxd4zRaVTuQFBUnpmu3pnK5b+s0oMkHEqpKxIjx499tAXvrp27cYLd16cp9ZJ7ljYMUMZvnk8Q13kOGmJ/cSzFm5ZZcbVeQeToDpdIDCovFQv/aDSRY3CvRpKjgd9TSvaa4kloGShoZTAxGTSBEDeySveR5Hujt5VpeJhQXETohoiKjwwgXocUeAILTpzZG5ibOKGm25MzEr1/7/VWFEAiwYRcMGOC/+PH/nRX/2fv3b80PE1WyfSJqk4n6Bbr6UsemJ65vjMmcE9AxvXrp0cGR4aaSaGSZEkxncFceKsOutsq9Pp2s5CqzPfas8tdOfm251WnmdWIQCDSKC+zVNBx9AIgXpOOC81yCrYSI989W6AECkz0KTa7udfnj926pb33CWrkq7p9K8Zmj11SnNQzfPZlYlJXGF4RYmuRVaNrz6fBNJR0BPqaykXNM0grQpPHXF6GkmvUdqEYvvejGdP2UEUb54lGrVaOEop1Sswh9cuhQqMQqSUaYXcE0X1i0V4RymIdPESVy8iXF4U/6RqQC44AjDGWHK+njRzWqM0O73w2OcfWjMycePNN+TIbZ6rIVYmQFWduGDYk/fDJCpciFctPqdKIYXBrOVUI42gFP3FapAHSRTCEApuQnlDKNS59avFWKJBzz6qi6OLUnVjxEgBzytgw+IksIV9eKNw1sKVhgvx38S0EMTkj0IPelfP515CRBM2NVPvzNqj+078wDv+weBAcwX//1uOFQWweBDAwP/5s//yrz7913te2rv+vHVrRlal9cQa6ztsgf0mOtfqvLx3z37DA0N9gwMDQ82BvnqdiKAus1mW552s3el02912u9PtdPNupuoAMHkmWyhnAI8rVwRVJEKWkyokWPHiaZG5E/D0EAUIeVY1NWjlzzz85KrNa0c3rOlQ5lLUmukZI612e7jZYIaoi85GOGY0IyNKwMwoKPkhwC1aePGB5xTFZFWKFoiS+rL6FCRUD67NPh20gmhX+I/hXy/ICwiCer+vCOoeYVZgGdW9lso76vlx8SKjBw0JcsyrSQqZ22CCiFAgPpma1mXWPvK5rw02Bm65+U4HsciVAVElEdJYHEICqgYoQyEBHIsq0S+XqCIk1RVMKSquw8eDQqosAfA1+6CsygAKPoK3oCk8YxK0cuFcVNbpdQ2q/ESVpS6+MkyAEHO1c00RTqJi6lVPorryGlPLKfhtgAJqmEgIHZo9NkcdvebKK2s184ZmvjKWjhUFsMwgYHxs9MP/9Ve+64e+75UnXhlaUx+cHAInVnNxwfolIohYVevc/Kns5NR8ahL2jR1BTpyqOHEiIiHF1DeMIkIUiSXEE03n0tIFsEimLZ1jBevw7wuRc5oYSsk0pe/gS3taM/NX3Xldt086RkDUSGuNWmN2YWFopMbMSr5rLZcnC4m3hQ2t8Lx4T4EX9TlkfqKBs+5bJoY5lcK9sNgJWkQBUP2wCHdSpWnAMle51DwH0JMV1TsqwdLys9IpUFQmU0yoUChUbF7Ml0qJ69fIn1zAbAARZzkhw0kifSZLv/HVh6lLN7z5+rwvb1PHkRoAmov4gDGT8U2WyYmSCAPeWoaC4AMCQeGSeothyWyJyFOpmFRZSZ2AQAJfcQgxmKxckAUIos4X41DlkMdQ6u4lK3bW0eNp9X7jFVpgANdqtaiE4t0N0BQFziuqqx4UW/gq6ENP+1cHMf6aBcbp0f1Hz9m8+aYbbyKsOAB/27GiAJYdpMBtd975D7//H/4/f/Abu5/Yt2Pn+fXRAVVmEYWADdTTdGCICXCq1uUEUriAFUSuiie7MBkGizrAN/CNTnGPUVuI0MJmpUqArmKraWV/ivwaQWoShZAY08Zzjz03snVseMvqtrG++btJkkZff7fddm6YjRCXbPxoK1aMX3/QUCeAQwU4VvjEsWiboerWBzmrMQU3qoIeNntUXlpeV2HOn03lFZerSz7r3azQRKqLVIRPQ+v5sMfSDxcQoZaqDih3KMomAIAoGHBJkio5EjKu9sKj35w6duqm229qDve1peMpupT4np9QdUbFs3ahYowyITGUJEmScM3U0oQTTpjZMIuIinrtHCtEwInkzubOZZltdbu5kzy3IGKwQaKh8jYJxIkDfMMur980tJj2aX/xrvRe51lcgfjxoshMtNTj2oQV8r6oGGMUYPLFjYoFLzC/6o3QeLxo73N4g4p7qiRQSlxCbZw5NnvXfe+eXDO5Uvzhbz9WFMDyg0FA8hM/+U8/8+XPH3rxpWMDA1uu2ZqkPgFMRYTJVwVQcc4/9YH4QhqY597XDWE5UlFSjZWHqTDlAAQ5HiVh1TSKn1QURclnjAK6QIUDx5oM+Pi+Q6cPz1/zfbdlfeqoqyBVYUZffzNrt61zdW9oKUlh/lIBMJRGcQH4BkSWIOqrXUrUAxWEN0ywGswLkiNQWKOwiYtQGveFLSoV/YdCP5TCGYW12CuwtKrACoCsYJkoqv8Vp4sRigpz1+/kvTSurAAQF0cVREwsVpH6o9T6ksaBZ/buevrZq667amLj5IK087olYSe5c8QAyBqWVNQYrdfTwUb/UKM+Ojw0PNTf39/X7OtLOanVUgYlxihgqGxRTGSYSUVMYiAQqHOaW9vOsnY3PzU9c3p27szswskzM/OdrOustxmUyRKpEyIDo2Rh1Otd8dZ41P/RYSivsBxhVcrlLjaI9kp1a3+bRMRpUkvKXL3i3lIF5+lVAPF3jZyC6o1VAowadrWpY/Oa8w07bzTkEcqV8bcaKwrgrEOh68Yn//VP/9wP/qMfPPbSyYGx5sQFkxmzhDwhUWElYSaVyI4IKGdBxY/ySxRQgaOC/aLyKqcu962ILC0N0HCWIIAJiIa6wEE47ZoXHnuxb2xwfMOaDtoK9X3giTlNE+lQZl3qb70C7OfS49dH2U0RXQIAMEQFyiEb119cIIAEIY+q9C98glJmFPBLRdx6yw+V81cFQFRDpUosuT0VG51691pkYVaEVmVJy22qxqwCJd1WQ8Z2KeZCAQtVFWYjBIFrUGPh5Pw3vvLoxg0btl2yrSWdrJZbsQShFExaS7iZmqGB/uH+5ujA4NjwyFCjPtzsq7Gp11NjyDDDF4UIXl+wCRQqLjRpUAWRg2FVIGGt14b76wLasHqVsnFCuchCuzM1P3f81Km9hw+fnpldyDqOCKy5IyZS31RONIQAohsQPNUIxZSLu8QjW/xB4SsVN0qh1kElrSWOpOpelLfI55uXO/fokFBHJB7SWx5CYp2rd82ZIzOrBscvuvASYwwKwtMKEvStjhUFcNZBIAbuv/+tP/Hkj/3m7/7m/qf3D61eZcZqqrkyjC8w6XteUcgQhU+Ois+6F2/xAVcHD2yyFmGuJS9Y7wTKoYHhF6Rq8fJEMiVUlQksbLL63Mm5gwdPXnzj5ZJqTmrgCCoqDE4Sk6UmU+kDtEzSKtBUirAVCLFsJ4Ut4DPHCODANFXxkUuo06pRF3bRCk8oXkZh/6PHQC8grx7JHangVKqLyvIUfkvlmIu0QDhNyU+qkukrC01Ld4kzoZggVp0AE4taQ5RokrKhFn35sw8mfcnVd17TqtkF0xWyBEnI1gyG6n0To4MbVw1NjA6PDg030lo9bRjyXFwyRAqfI+EZvRLdSQJEFInh4uKJyFcm9XwpZSGAWYlESfpqPJiaieGR89eMXHfhufOt7tFTJ5/ZtfvgySkGWyVRgqjxJSwQgXgUvN9y5c8mUZdsQwXdNDyJorZrSXlgeLBIAaTwvSv2oyLO5N3W6AlSSDCMN8GzzdQxUZokcHT84PH7bv2uc7duVhWEJhNnmevKeB1jRQG82iCgbpKf+al/+dhDjzz4/NcPrj286fpNXGOF5DZnNsy+5K+EvK/42EZCm9cBPj4XHlTpqUDzqkMrNlqU+xqFZMml8xR+ZgVIuc+lu59/GQnWbd+Ya0ahuA2Hog+snCQiCM3FguKAD+f6FscgxJY1ITHWiwr2kAiH1H82BCYSiMJp7rmnRTVOrxS1aJAetRaibwHfgSziPPFyJF5yoIuGS9ZCGZWrRpUfglm+ZAXDkaNOqwBcVNmgopKKHm3xzkHLNggSbgWrgg0DapwZpqFHH3rqzJHT97zrrnQ4aWlXyTLljTQdHxjasHpsw+TkWH9ztK9WT0xiEkO+oCwMiEAxCE+kqk7Zs2FBoThdsNQppmvHmYV11mAFqBhWUQtShiaJqTHVBxurBjZu33Lu4ROnHnnqyb3Hj1txjlJ1EJQwe4wzoRS7vTp1yTtRXa2SfhuCAMLSFRJOGzWB63G3lv+hks4Q7gcFkMpPgpSUSNmIyeZtPptdcdml9ZoxK2b/t2OsKIDXGAQMDgz9ws//+/d+8B0vPPnC0PqBVZvHLTuFqBIiEF4a8xE0DSacf7MLuCYQYNBrS5393ECwzXrf01LZhG1IRIgNq5GOe+W5XRObNzSGmgvaNkTirBLYkBVL4HqSKokVx4aYSCW8gkQAGKFYabS5/Q8MhLLPUbpH15sMSMCUOmu5DFmHEK8hY9WSD05qqBkTwXmORi0AiChBQSwq0CB5laq6ATG2vHSdAl7VI9rRs8ARZwrgfrFrKPET7c3Cri2Emtc6WoReiFRUVAxpirSB5qn9U88/9fylV108vmWiZVrKtgE32myet37NlsnVY0P9A/VGklAtSX02VKwBDhGBKlfcOvi0PBIoQmFV5mhiA2p88wWNl+3rVIsCSqos6ogTjvrQsBDYSGfr6uF1d9y878Sxh5569ujUGaupCFRTVRGONkqxlhUMqly9csl6Vz/0PStySAhOOwvtJE1qgzVHeaWYU/CEI04Yi55qVXEXHomoPzbBcyhISHI9dvj4YP/wBRde5Pdb0v1zZbzhsRJFee3BhJ03X/fT//xnuFN78asvdE7Nc8RHgNjcxQ+fxEPeiqHIZIP4xk5UATLOTmOsDvVkwPhiKJZjRAZoiJhBgpOHTndPtc7dca5jC/Ixai/cNSFOwEouy+atzZwvBs2qBo7VBWOMIFBfX9OjseFCw4tM4cWE73LiJSsRjKewmCRhkxiTGpMakxiqJ/XUJKkx9SRNkyRN/Z/UGGLDxvjiQgVopowYmCRQJMerhiJvyy6axm2i+4EKxbYkZKFYRF/UIhrAvYoi3kfE4hqFAA5JagpCkiQkpLmiax574PFVowOXXn2Rpl2qd5sDbts5626/5qprL7xw0+rx8cFmf2rqZEhAwr48DjkmxwyTcEpqSJnA4VkJ10ihdWaI7TMpqxCKMqgKLYoKKcWkBAOFOPU9WLyOIQMl26hjy/qJu2+++rIdGxu1rjG5UysRtonXf3ZXqvqw9a48ok3gFTQrzZ2e66v3mTorROK6hxoUXg8X9VXDI1U5XYWJpqqeO81MTIZzOrr7xI7tl+y45GLCiuz/9owVD+C1hxd2P/D9P/TQww/9xV/92a7H9227/cK0LxV1TkU5ZvRAStpdafZ4xFM9mBwkyLJjOVCopLWEiRS4EApjVhWkZDhRJ4nS3pd3JYP1iTVrcnIKZSgpOVFiBZNChwf6h4dqnFLXZs7BEEl0AQSq4vmCABlP/6dYCsifPADSIbsrpBFoKH4BmGDfA/BGbAhdAKq+ZTAAqCgZKhisAmFRhQFgC4tYq7pSGbR04ZYF/INnFDePQfJwD8rPencqXbIgVQtrODpZ8OoIzEQqRElfrX/fM/tOHjv1lrff1Ddmun0Lk2MD501u3LFh81i9wSQJgzSkiakSfHs1pZgWx5BoPGvl4ajiMVH5FDON4Xf2ljI0ZnJ471JjqhVBVADNHZTEkKaMyZG+ay/ZlhrzzRf35VZBMc+s+OfsQrXUnxWHIVRAEfjMOFK4XOZm5oZXDZtarGNY1C0qPBd/d2Jh0AIqRWQde5pVeMJVIcxqJEfekp033Nisp4bKuMjK+NuMFQXwOocO9w/84s/9/ANf+/r+Z48kQ40Lrttq0kRhRaFimRIUHcBRpNJGLnwQm0VNnSUmZ/Xn8rXwv0V8IL43hflUEHX8Nsxk29mhvYfXbdpk+usdnoUvR0SUGHaSqzCRjI6O3nPj9Vm7tfvg3uPTU5nNuk7FCSeGQI4UzKK+/VUExIt3l2KbVi+tok9PgA8QaEwL86FhACKOilLPgcoUmsl7hmNCLI5AoaEjB4M9YEn+5Exx3bT8HxUJsAhEKCWhlzxFQCFupYv2B6GIdofjaJD3FP4tMCRv8Bo11LYvf/P5Cy/aumnbGgzk4+vGLjzn/HPGJuuidc+4IQFxaJPTw2wq8KQi1BJVeQi4EKlvChmWQQEXfZZgTaCYcnhiSv6B1wG+h67XuiIEYWgzTc85Z/OuQ6dmF9olGFZI9EX+UPkhLftdCR95/QbO5jut+YXV529CTYWdpx6gMAIQlW9BHisanxW3VSvZzRQAMbF65uhMX9o877zzjOFyYis64G83VhTA6xoe696+fcdP/Yuf/tl/9/N7n9i/avWqteeOGCWwgg3gi1yCtIDVCsQzvDnLvGRV6kUJXRcwEYDSpgtHooosK45ExGoolzPHT9tWfs6F51hjHamKGG+qS2g5rOqajWS4pmMjo1vG+uY683tPHj96ZnZ2ZqGdZdaF/q2kCKWKfAFo5RJwDUHZgLyX1jbBgxWxhJzvAiZsuILOhHAwRZfA/88ADJGE4jUiqhRKbZOgKBi0GCum6uJRlM+I32k0qatqImpXKm9Cz02pqnD4BNoizqnBmiUylBhb27fniMtmbrn1jrHJ+tD6gcsuvHDINFJR9uCfItj6DBViNhRkZWQEhCqyVEw9aiwCYlVqqKgo4HyLg4qsrjxScRHCLOMXPlSjYkAqJApHQgqSBGKg6hkJKIDgXssknmW5T6PKEn+F7EO1xuRmdnous3ZwYoCMxBxzLWpAeDpEeGQ1MA+kGvQPXbJ95F7FqUk4MTV0zeG9R4eaY9fdcG1Va6+Mv+VYUQCvd3gg6Ef+0Q9/8atf+NQX/uaVx/Yyb5zYPA6IlZx8Hf5QOodK3L5q32i0MwuNEKUUgJCWVA1kVuUdxagCiqqKoNDNC2BAYDI+vOsI6hhdO9ZBV2A5WFiqKsTghERg864xrmak2V8faKbDQ0PnO5mdnT955sz07Mzp+bl2lme5c+oxWAJIEJpNAvEygnGK4H8UHrlS6E+pPnLqaewamwwUcjd4CRSvDlAyBIEhNh78FVGvDHwfYIkQR3E/tOIY9eRKB24qBeS5yJXr0cHFUaK4LCVn4WH4T0qQBOprOKnNbbfxzSeeu/2OS865cGRs/dAF27Y1UYPNjcIwRf6UB0l89SiK9jrHmGkw1BF6fGpUjeVMvc50Kk4DO14UxEUkO5AKKGgThKsvtCWDfMFBj8cbOItTJ6bn5jq+JJAvSaiVG1NZFsRlocqHhT/qf/ahXIVRcS7t1s8cnjEm6R8bkNgDWOPjS4FpFlYSFNmu8UnyzwNRbIxG8NYDLNM8dabmLz33sqHhQeMJb8u8oyvjDY8VBfAGBgFN0/yVX/ovB773PS8e2rX7MR1o9g+saUCRi4TCKwRSX43RKGKabWm6xRc7GHqxz0p5BlSs2MLxDtY3w7c/LOUYM/vetgYwOR/fd2L1htWm3zjqIJRTAYhVkQTRpWqddQ5EqpwoDyToT9LxVcm6kaE8s/NZZ3a+NbOwMNdaaHU67dxlLu86caJWyYmA/RsfnJHoFEhocelxMD9RJYBMTC0oArMBNfLlP2M3qcgngWeAsIEyQcln+xhhEREfCi66QVYRnUKqVIg9njMDqgq0Qg3FHLeQwxcdKy3ciGL1i0w2BXy4hWtp+soz++p9ev1dV5x7wdoN69fXnWErUG8JeKNaAAYMqSEigmNfJgRcKcQj3iEQ9Sxa8bCeP63AOXVOxYe/w9R90AVFzjRF8Kf0C5WgEFZiolytqjKpYXXWtBbkpb1H5rsWXGDzpR9RaJOKJqxK2sqDquUjbNUlyglSzmonDkwNjA43RmsZtxWOKD7PYVkpAlxx5UPWffAB1Kc3ILSmY29hWJ0+NN2d7rz3ve9OmJfkH6+Mb32sKIA3MLxwOW/Tlt/6tQ/d/653zh/q7P7GwXOvWNO/pimsOTkAKo6VA1BC5XsS+6UHI7aI+BWiaQmtmShGUP0f8qI3iilvVpHAG0TsOJ/PZ6fmt158nkusSSBKDBIVXyNUnYA0cjGZYEiJRBM2AjFESZqKSQcajVXNQQdk1rba7VaWL2StuW631c3aWd7NrROxvsadihNxIqJwGjJ2QEREKhLmCaEQP/bYvvf7FQE28JwXLaRNaeMXlqKHrggmMQwNhS5F4du+R+NeA5YQZ1HZPxw5zk18k3ef6ADAJ2GpInSn7BF4CP6TB2XY30oWQ9Yceunld33P3Rdedv7qoYGG1Ekcq7Axok697inYqX46pArygHjFvyOfiu39FEZMuwCcioNfXhE//ZgwEK/HH6UAwfyzFAhMqqSkFo4ZzvmyRZQ7vHjgyKGTp0UpAVeIlBSxHkKvyD/ru+CZnCrEwgoCTG4609n86daWi7ZwA2SIQKrEVOj20mUsckFQgHfeQ/BOUfFAsJKyyXnuxILpJlddcXWarFQA/XaOFQXwxob3l6++8tpf/aX/+s/+2f9xcvcJgVx0y/Z0iMSpQokTK05JDTMp2KNCAfUMBlU0sQo5EFNmK0ogvAEipXwDVIJQJbB3OCBgJQEZ4VMnTsNictO6HFbEkQpQMG+CXcwwEHK+sCeMz23ybWxIyad2JTVSULOWDDbqTiXXkcy5zObdLHPOtjuZFc1zm7s8E9fuZq08b3e7We4c4MSDNia0D5Mg4Dx9NKT5e4alqELJcGCSIpDtY9xXAonQ09w5Kg6/OKxQGMTQI7zkYw29awJYLxJC5kHGKVSFyQTIIchL9WFqhDBJYCp6MQoEIclKzqkxmnLN2PrRfcdWr0pvuX7H+PBILamT87m1opprLJsHJQ5tHW2Bc2hEdcgru3CvvZdE4lMjlASaa+78Ivpnp4SptMAEK0qOAAgJVHyPHwUrnKgk5MvEGiJz4PSJR198sZs7YxJRKeIMBVDpR4FIFqK2AgnF35QAX6eTGUw5pXly8MBRyWXV+klXUzVS+CZagH1EHFIQVSvZbZFp7BeDFQJSdaqWVci23clDp+64+02r10yupH99e8eKAnjDgwADes+73tGfJj/0oz9y9KWTfcPNTRdN1gfTDNaqY/ZGsHrhRyW0UETxdPExNRqp/vfI5yDfYczjNSFySSwk3rAk9c3cCZQiOXV4qtbX1xxsdrirLLG3iIJYS2INqao6IQbEM8cDB1+LNzv47QRWZk7BDU0VNZGmaLBJnXNWNBM33+m2smyh02lnWe6kmzsnzjoRJ9aKU+scxKmoOOdzvKAKUQf2lHYXQQEA5DlLgNdv7GEjUa9LFEGgR+M3MnXibfHEmQLNIRNURsDbVQAJORNBLHneIZOqEFhi/xQKUFRwI5ihqsYQwGKRWHPglVfe+923bd4wWTP1RNnfPDKIWj5o9yKRoKdcQgU7C7MrtgVAEHKi4tRJeVNQ6KLyASkUm/+A4MET/1iIigGSpOZcJqICmWllDz793JmFHEjIgQxTBciLAWRUZlYO7fnZZ+76ubFXJAlx4vjwnqP1voH+sUElpzGcGyDBiDAR4r4VFmhgBhQnUIgRNqROqUsLUy3t6rVX3pCYpJzQiib4dowVBfCtDJ8D9ea33vc7H/7IB3/0gy899DIyOf+68+r9idq2GhAHsoN41reawMmDj/oF17cUy5HbF8uyaXCTS5VQQbIVDIh6rCAe1umJ48dXrR5FyoKArYR3K6aXejmjqk5yQAADRLXj65aSZ2FC1fnMsuJ7gJWUmB1bBdSQMDulRr2eOdfpZl3nurntZrlVZ0WcE3HOiVgLa8VZZ8VaK06cdS53VhTWehKtiFBxJk9W8jKaw4UGm1li3qsPBvhdgpQODdUqUJpKNJy9E6ZEIMOIPEqCCfsGdCgKWWhsZRxcNufrPbHXT3rm9OlGnW65ZWd/o8nksSyr6vyKcwB7wlJzSVXiGPmJ5nbxTQj+BgPeibNqJQBavluQL6nhtVf0lzxk43vMAAIWiiyCUOeTxKmA2JiFXL/y5FMHjp5gSj3uRJ5khVLw+1FJAl5exFLMi9agkpQ1IaG5mdnp6ak1kxvTfmPZSvB2gHgSv28BfPpFKNKcA0gUT+/fjIRMXevHj51qJPV77r3XcAySr0j/b9NYUQDf6lCkJr3/rW/53Q//z+/9oR966eu71fGWy9Y2x+tdsT7D1kMeHrzwdNGyeLN/SSvGnX+kvaMcodEgIdSzOUrngdj7DOJLSMLA5K1sZurMBZddhhS+xrOX9wg2sxBFNF3gRKLbH49ZvvVEitCdXhDZNR7eZ6iSCpGCffMbsDENMn1krGrX2m7dOhGnzqk6cU6cCDlHKuLEOSe5c9brAGutuFxcnjubi7POORGnnvISMX3WiOsTG1YARlUZ6mvqibD3C4LiFK1QKwEoCVWIs95mpQrsRkH6xk73wRwvTHYlH8/2yL0xSV3T3fteuebqyycmhqHWUApRYhWC86m63gKPVNcigF+dQsSDKJag8ACOCtSps+qciMBb8/5ZCGhUIaqVKkn8Qd1QdI+UiYnY9xMgTee77qFnXnhm1z5IquSbl8W+xtSjj6peaO8oTh0eQ3jM3q9LrqlLjhw65vJ8Yv0ENUmNFM9q0BTwiRRVjw3leRUFI8h/52lDsKAOdWc6wwNjG9ZvSHilA/C3eawogG91ePFH/KZ73/Ir/+FX/sk//ccvP/aS2HzbdVuTIc4ckNL/x957x2t2XuWhz1rv3t936syZqtEUTVEvVpdsWW5yA4OxwQQTk9AvJDekQMJNIMklhYSE/HLDDcXmJiQQAqEFCJCAjcG4YbnIsqxmdY1GZfqc+p3v23u/a637x3rfvfcZiXuTH5bGf5zX8pxzvrrrKs961rOMk7ICmZEKU18LAkCOerxnh7K1Rhvvu/Vws90S7jxfJ06SMRYsFEYr55brqNsv3q4QYiMzIk4dRD4osoV2jUTVK6VdvJwFN89rkaJsHNh7WYFAwfmhfuuWzMQ0YFay6QFHKaJpIxoliqcBRqamULWg6lrxGlVFTM0aiRJVREW0biRKbGppGmlijCIqJEoiakbiYJEZXEs4AEbGEGV16MMApDEFvTqm9fCXhEVTcCuIjKuTWdCMw/lRMj85igCoWuCgaiEUQcLK2Rfe9pb3lUwgNhFSE6Suhw1Scpn1ko5mgqbaRKz1Avk7DWJ+xDSF5TkXQv60VPbJmGEysebPJtdFRGDnjJKKVCg/8/Cjn/riI4YhFcHFh6CKrgFh42X9UvhKjstbmIqA3CJRcEAox8MXnjhdFsX2vduoVPOejvYjW8wOaaMT1pbr9RkOAnJOS6AALqisxvH5o8ff9fb3cvB0aNMHfDnXpgP4cy0zK0L4ju/6VtL6e//W9z3++SeowKHr9xZbi0hRAowl3S2c4IWM/uTbOaEXPRZQd0umVvjMmoABxkZqIDI1JlgATLmxpRNny8Dz27aIShKJSV1cni6oU+pgULMooi5tSaG73bPzyZm6R2yWgjHn3ECTVSMOLd0GINe2NA5MpYUhmxaFiNaxiapC6uJ5xmQgRVAzVRIR1VKhamhEVU1Uo1qMGiWKWFSNUWOUGKOqqkLcgwjUfF49DCym6tQbg9cn8u4kC0eWbTEllmdma7ZBeRvre3aABOQoJS0H4ggNgVaeX961ff7I/n0wCwGAiUkb67ZRuaVaRA/ES6e0F8TnVMTdgSZWlbSngb2RwJExtHlMetK8pyoTNX3aMIMJSU+ITM343ocf/fR995sVAiahwMhegvqXWnvdtf/Si57a+NOcp0mGQnl0ZrJ8Ym1hYWF6a1lTFFWP4i3rpLc+OWFCvXIXfDeyT0tgqSd+tVbLsZDwpje+cVCWtHGDN9eff206gD/fIgowZv7L3/Edyvw3//bffPq+pxHtytuPlDvCWCaqIDZhUeWQsFvrB97JgGbcuG/9jQxgMpAxm2avQaA0aENMApgoDDF97vi5uYUtw63DdVpzjAgZD7dUi/RQjAyIJuJK9OZk+VSj7t3hRnkuh5kpaaoVmgKhmyIAsqw3QDAGk1kwI2ajINBhCGIWEaOqwsQV3dQIUIOIqkM+QCMSzdRMoFHEbOCSbSImUUTN4X8TE9Uoomr+ehWYOWii3rxmUVU01TLUTAydwGjqMO7bZjIgCTq1fpgMRkxsDDWNSoGKsiApHz/2wmtuv3FmyEwGggmIWSEmZmbMbve4A1TaTiq389pG7hnxIAZM1dRUk5cFmBOOlHDxXE9O8T6TMRGU0qy5tBNetcnZZG3lfU8+/dF7v1hFAnEIgdSCi8tpqqD/L2DpiSJLgLcVG3MgQQBPYfj40Sebqr7oksvCHFuoQIq2xaOdleFXVt6bnGRBfSqdU6UzJKamASFQOHPizNCGt996Wz8b2lxfrrXpAP5cKwWRZoNB+Z3f8Z1M+Gt/628+++AJVbv8lv3DXUVFKiagVEoEQXMJLKf01LoBN8JEyJMkPY1P0V/rGswACFh91jBqYGyLJ88tXLxTgyAwIMQ9pBkgZlOBQU0J5JMMrHFWfa7DdZi4fw8nvr2bHkOaUN7zXwmpR+tEOkSbgEDMTiYiLoN5hdMdgAMmVkAUYlEIUU1MxZMAjpY5/laolOr2yu1EymDU3KNIVInpI6N59dm5SuKbp2Jw3qtnBVyo80OdkuqoiqlLeRCApIXHIGIlUkZJKNgMwzC7fOrM1V/31qIMHMwsqgnDCe9KKemgpH2US6xtL20fXzcAloEamNfCU407XxgbQ3Tq+LLd5Udi6gWb3EgbDKSKRuj+o898/N77YlM4l4yV2mJPIuZvCMJ7CdGfsSxxQ42gSmRkJYUwYVqzE0+f4KLYsn/BpoUKIwUZm2kXWuTbJQ0ja6s06RpNqJfTuJKrUQoNr54ZHT50+d6D+zYbwF6OtekAvgyLiAIQCv62b//Oqen57/mr3/3co89R0Kted/Vwa6zrxkyNLJoGL8C1sVHGXbKxT4wW/1spsdwJQOpCUhCxBYCgqiykNOTBaHltdXl01e03C8w1PRM+C0sfYGACUeAigEklRYz52zkZfwC5G4c6zR+HgtJ0J4GmIDOTmPzV/RGX2po6MqaARFWyghNS4Rx9eGiOQh0ggomZd/zGKGqq3gZl0GSzTdT3LaiRmqoZRKOIKdS0IVWR2IiqmZailjrVLCcQQIwCK1qKp1e2k6nywNS5PgayAGMuClAU4kEYjE5NZFQdOXgo0dYNzGQGUXE2JBOpGcHIHH9jELLuveZ6SsfhJIJCxTTtC2XZO6YkhYwO/EsYFpSQZ6QBAaQmBim88UpMjRoaPvTMsT+5577lUcU0SPKuCQxDmwB11Wnkn46a9S1tjgwMnnSYEcOYTIhloGVohmeeX1o+u7x1546Zi+brgShJ6vBuPyDFEP1CgrWdy0itv7ljxDQQlxSssXq9WV5af/fXvc6BuD/HPbq5XnptOoAv3zIMCv6Lf/GbLr5o+3d93/cefeC54czMoev2T8+XNcXaYjf6CWoGTpQRam1DisEcaGnlhCwZY8t4RS7tAkwKY6VTL5yJwgu7tpML6FDCe7oezFRzVA7BYI3UUeOAinxb9sCftkcZGWJuqwM9XCOFgy2m1UaHHajV4tNKKfYj/1AysoS4GIO8SRjE/ilewLBgDmWLCuBjx0hUBFkLwrdPlV3zwd2DqRv9KCrRsw0VEUluBRLFjBwvShmGZySWasC+zw7PszKUzWeqlWHL1Ny5J08PaLB1ywybWKKuQlV889zOcjaZzoP0/u8O9ss9yn3+r/+HdNrazMFJuATqXpkEwFOvBhFBLHqyoQbRWKAgLr909NjH7v3C6lokngZpIHRzZPyL2mksf8bKpZEuTklvTm13xIGZVBsZ1OH5J05q1L2H9/EcCyqxiKRClB1Kei86zSbkC6d1MZblrpwTFa3gsHh6OYBvufXWshz8r92Mm+t/bm06gC/fIhCsZLzxrrt++9f/21//wb/+6Y9/eu3Myo1vunVqm8LWzcy0UUJgNjVRYeYU85Nr7wMZ6c+/BMfihbNZ9yDV0uRU01gMwtFnnpnfMV/ODoQbSygrkrJWmjhFYB/pSDE2MUaf+NIjgrthSEaQACfn54jQWnsPf3nbu2ZeMOimYKILLLkFDVpykaMV7auSBcgdUwCDzdTM0wYi0wJECjGiKLEx0TQix7RXv0gpj09EcTcALxSbiIh6TzVUxCTVs9t/YeRv8mMgSGO3WAkCgVIxAGxuajhaX5mbnZoqC5ioemHfPDlhZqZAPcA7kFlGeey8g5ZcauczATBB0T2QGKy5Mw3ZDiu5H6CcPjKIA0G1NlhEcfT4yT+97wuLqyPioWXGAScXn1OPzI7tX73+b3bfOd7u0TYNLv5J3sIXUAQr15eqE8+cnp6bvejQbi5M2EjJSK1XYEgn3DZ8W6I3+P7lNkfvTjQ1ER0Ww/W19dnB9B133M6b8M/LszYdwJd3EcwK5puuf9Wv/6df/oEf/Lu/+du/cfbE8qvuvG7fVbsl1DVZA4uxCVyGECxH2HBait+vzvrJzUmZx544LZl6bQYjRQhBqubUidOHr7yCZsiC5Jmy6CGv1vsL8Bl7yJhvJqMAbdDe3au+eQmJSLlBh2J0Ow2HK7Lxy2O+iJC0HFI4mS1hYr5Swpfy5yC/JsXCCUpigkkOQnP/Qm84lvNciVw3zczIBdiQRjYTQSn3zRFC3n4zA9xWBTLPh0gpOwADjFS1KIgDhlxO1ta2zM+VUwUo2fb2AKafOW2DweXxjBzdaDE2P5kJ/zH4WebAJpY61txDMlrNy77t8wvAEyBlAMRkpKIhFET8pWePf/K++88srQOFl7ETW7hXUchUgBetHkCz4cF8reQGZkMaasOllc88eWwymhy4fN/8rtlJqEQlpzOW4pBcBCd0TRHt9eV9AWxgb71270QIHKzB2tL6oYOXb906x7Q5u/BlWZuH9cu9iAjEsH179v/sT77/b/+Nv7P43Nm7/8cnn773OayGoQwLMFsbHBozMQcCGBRywxcbiKwAytzARUi6Lc5sISZisNBQps4dX67Xm/1H9lGpQj59JTNI2ve4xU32Vp1L39mA7DHcnjmdMhvm9H+D95ChQ3dsg+XecF8nkMahjdzU0MOP2uYmQs5q+lFnm1uQLzCFVFpVULK2zMzM6UggTSQjgJiIKcAImhi0DAqcvjO1VBkIxkSB4FMpC+aCqCAKxEXggrkouCy5LENBXCLMTc0sL61Oz88zEZkxc66Iph1yilOLphmZuiI2ctNycgNEBGZiYiYOPvGQKJBveS7atJV2tB+i5FoWlqJnsBEJKBpiI/zMycVP3f+lZ08vRyuMC8/LnCzKadxm5tn0V4fQvMj8917LRmTBR7dZkIKY62KyWD396NGyKPZdcQgzjJAKzdReMxtP6oa4xNJ/yPlBygTNYFwUZT2OZ08u3nrrreWg/P9AqzbXn2dtOoCXZxkTYceOhX/54z/2q7/wyxfN7P7073ziE7/92XNHV4exHIZBQO7ucrtAaO0TMTzKT7QNZ6YYszF3sbiCyQhDnXn+yZMYYutFWxtESTY36UMk50FGZETEzEQQE5GYBoxY38xnUDxTdZKRT1CJiqq149rbPc3b3f1ALvq1qI5bvewI/G3Z6HfvzFBT/zAmaXgzN6++a5RRLmIK7BJ2ydBzIGYw1KBGBhPLHPukkZa4qGZOS/J6ALodBrxrzV0gFGQIpEj1aCI2i+asT/d0DrSBvDqbN9yrC60nbMP47LySa8t0fJ/J6Qwkag1iPsBoDSUUau2hNKgamRUcTiwufeq+B46fXQZNAawxcgBnqeW+aaeNdj4TX9F+S88nWNd9mEEkK8RYCRg0xYlHnl9fGs3vWth5eGfDtSBynvWGVAynNtnYsDvtdnRuLl0TiS2rNF5sZC3edP31RZFn6WyuL/fadAAvw7JsFmEF6Ju+8b1/+MEPvftrvv74/Uc/+t8++dRnj/FKOcSg1EDGaBkxpOa0bpgyjMCu+dnybIyC5wdOOhSewnAQp45+6dglV+znGW1CFcqg5kNc2Aip65/geAgMxByImRnmsHDCYAyqUDUVb/L11/sNmlAqghOMWvA8pRY9dkbCq9smI8sFZmqdSdsVihTSWiYnJiuYCIqcEwTybZNGY1RREyPLPKJM8ifNDQmSPpYM7PMwfRdVVSTGqIKcGCVBIUpMq7bpTtOmQ9VimkJApkwo6kmcmZlppFGIQsV5sUSUIn0vJ0Azym95oA2ykyAYkVEanNlScswH5mQP1B6ihLWg9U0ggAmBAIgLQlGU4uTS+DMPP/b0qXNN4+4WRSjYYKr+Zc5y7Q40kMgBcDt/HibkyRq1fppABlJK2hysRdCiXoxHH3meiS+79jCmJJJEabwvo3/+zvP66D1KCZ/0uMeJZMSBATVpls+sbpleuP3WW3Petrm+/GvTAbwMqzOITGA2veLyK//Tf/xP/+CH/kE823z+w1+896MPrj67HpQLDgA7pG5pAHaLnrIDNcRsBCU1SkaXWMsimFiQsHp6vHJ6+dJXXSZlQyUZlDmVDtx0E8Cc8CCDMaMsi1AU7G2syOLBORCj1nAjJwgJ6umh/4bkTnoGIsM5ObDPIWXv3k0v4+5VyIiItyclAAfuBjPwpElTyFIOkGNqpDFjOdWxzhKrKQxMzOQjdCwPVIQH8ulPsGNxqhD/V1TURMz8ZWIASYwiUldCIBGbNA0CnKHk7jY1PHQb7gGwqSvZZRKQpQPU5Vuafk3UeG+Lc6kHzUfdsqcVMiFjqLf4uiyTRDq1tP65Lz3+1LOnYsMEkCqgTLkAnSoOBtJ+DnZ+ZP4Sl7HlH2TkbhUECUasIVSDFx4/vnxufcfFu3cc3o5pRaHMlHKefC37JdV6mhevDmGEpRKAGYMGPDj3wsrh/ZcvLGzPdOXN9eVfmw7g5VzJxHEAbZ3f8iP/5B//7m/+zkVbLnrqi0/d98kHx2eVI7MV4MJAQqZMQj4WhMmUiEMgNeMklGhgspCC6cA0YH7owftmd81tu3h34+AxhBNZyBuwiDjjCmXBgUOwUIA4qCUsOQWcbdSZAs0cowJo/QmyMfN7dSNI3X6Mf2hboLUMk6N9NbITQRvdInOKNBUHcoDqmUKjjZoqkgZRYkimD3SDrtnueg2YUtVaTUViFFWLolHaITamZoqkualQIWtYYsighx8G9x9kxihCCcXayurqeBTJjMmYlNg4cTN9BEGAum90C+4ZlZJGFcvZhQLiEw9cDQkmsMYaRTSoIDq5U2Fti4AYnMvELIQaEA6ojU8sjz/zyGMPPvPsqBY1I0hwMW3VxCRuyz3WHTK8lDXunaUWujfAWIkQDGZoQHGAUE5m1k/HZx99lokO3niEtnHDY7Xaa97MQcVS1pcKx5Qvgg4i81lo7eE2Upd601otUmHF6pnRq19752BY5ktic33516YDeGUWmdmgKN75te+49wuffc9X/4UTDx3/wkfuPf3kEtc0IA4hACRpkBaImYiFNEIdxWELbD5aNhhByFhZGzv21NGDV15qhRmiKyQbGwcwcwipSAoCBVfvJCYqysDMud/Vet2plBHYVKtEa9d9ZRi7Jfi9+K5MliaTPtGa0jb/773WWsOQS8JElGYkOrcdqU8qIzaufODNDqkSYanB2Emf3g+Q6Opi5m3C3mDsmI/2TJ9lDIaUkhSpf4RDXejQLlWrq2owNRONQ1lEa4RqI/+63C9N3TFtKyupKJJYt6kfwc2fpHjfxBLWlMJnI1GpNeZ+OseBmMFEQdSimppNannuzNI9jz72xDPHx2OXLDUQU9Kd5h5xl1o3DqArv3eXZ/K1vb87z02U1GUtCMhYMKiGzz90aunU6s59Fy0c2GahbqwWVa9nmyn3cjw6/1O7MnBvO8gICjVWClxSuXpiFCJffeQqBm9WgF++tekAXqGVqpzQ3Tv3/PzP/dwHfvoDWMO9H/780tFRWC4H1bC0cmDFQAJFIBqBJaBmcSYHEwIRg3xEWMFcWnHihbOV6eXXHjFutOVHEhkl+54awtDdhQZjnxcomnk8nbH2F/iQmWTU0mNuk9HyaHJW8OLILH1oih9bT0HIJB1qtwwZV9r4Xp8Fpp3NdMdmrblX9SheNSb9n/ZBE/H2MTVTldg0sWmiuJiQphwlfW6ubPvyYrDHpZ5QSPp6mJoVpAWVZVhaPtdUsa6TOhE4yVdHmOT/UoAPQ1vi8KYE08b/E21Ua5NKNJqD+V6cRq7GgPJ4AIWaE/BVrIlRTRHWann6zJl7H3vs8WPPjSbRDE4QIHeADKSkhNEe677nO9/8t7+86AkfUIMIVmhBcXpgs8snlo89cpQGxaHrDxZzFksJTmLLMg+cfeFLfUt7JXXPGLlH9C7GMKDByunVKYRbb7m5LYtvrpdjbfYBvJKLYMRkW2anvuu7vuPO19zxvd/3v3/+D+45ct2lOw9uXdi7pZwaGFO0WMMaExCIKRXvkOqwIYVWNF2Uj3/piT0HLx5sn5qUIzNHfJBkjinPQ8lvTmQZVVWtqyrGWBYBlBt+KX0+iEyVuPUm+RdHRGB5mq53puYgPHesWgYPsi/K2UQO+tIPb2jy5/I39U2EG0M1U5iQSS4f+9zfBG1oGjWJXHH2VCLpAkVpmtiIiIh/j9PxTVtxvISwczq6vjHJ6qX4110LawB4yMUUnzl3+vSJxcFgYboMMBAVRnBlVknUl/bD03F1f9l5uuTlLJX31dKEzhbSMqixz4aRlG25hrMRLCrWquqpF049euzZ4yeXqhpioe3lMGp3wrLR79v3vDbY0/waAiwz9V1/EFAyQENBMGEJIQ5sVBx94OnlxaVLX3XprsM741A0ie0l/0pA362n+CK3mFF+sGMQpPE/ICIjmGhoivVz413bd+3bt599LNEmBPTyrM0M4JVdifdHJfiaa675nd/67e/+5u++/0P3fPEj9z9195PrxyZYRdEMghUuMcbKjrgkUjkps4/Z5dWl9ePPnbr0mqubMgqZlwACkUMFjkeAjANRoBDIACIORVDTqpqISC/8T9uW7IDfclnRMjdrGbKiGjLu31r1zP9sgYZUwHXHkJoRqC0xWBIBAvzuJ8oWzMxVDdRcGlpFzYF7NUgWjItqUU1SppCgFcuRoiqiaFXXtYf/ampwuQZNKJJTi3LEDSjARqEVyW+9UWrO46aKq+PRTa+5ZjxZvf8Lj8RJ2URSkHsmkIlKO8xGWxFS3xH44xSNo6ExbVSjiZiISmMdSBVVG7NoiObbrI1EgSo3GpqG4prEZxdX7nny6OefePLoybOjaJEIZqwgNTJi82pRB6y9ZKyfk7jujGUqQPKNbCAEWCANBFZVMpTKU9X084+ceOaxo/MLMwevPxi2BAuS5Oy6ihF6znxDBroxI6D+RedkLCYKFLThcyeXr7n2+iJwzio218uyNh3ABVlExGy2Y/v2H/8X/+J3f/t/7J/b/+AnH/rTD/7p4/c8vX58XE7CQIuBhsJnSYKSng6ZQQOVQ5p94qGnZ+fnt120oKEGAeyNq+TFSW+EYgZIic2S4qS4HHzdNOuTyrs5nXuaQzK/JxOtvr1BUyU5wQnIFHbaCOFsMDXW/ueDfakrrdqGdyVfotl8OAKfZaK10TRBTLQthybGqsFcIk/zJ4pZVKtjUzd1E1PtV3z4jM8nE2s/x6WnFRah0ZRUSdpiNCnSDOFAPnahgNmuPXODId3/wFOjJtRqkcygYqLwgrC/w3LLln9FCu8VGjU2Ko2KmPjeaaIg5eKEk4mYmIPAKtMamJiMY1wZ16dWlp9+4cR9jzz50KNHT51ZbSL7WDQH3illaX4qOlivPZXJEHeNBqBOrcF32oMMtD7ew3I1YzJSDGVqfHz85L1PShMPX39oy4GFiipin5lgbfpBbedH3oQukMhflj/d3Q4RKMk5QRlcj2T93OjGG24oAtPG62pzfXnXJgR0IZbn3MxsOizCO772q2+68fqf+Q8/8+9//t8/ePfDLxw7ffjqSy46uHu4fUABkTVylRuBjCmwDmJNTzzy+LU3vYqmNDKpaChSTU3VnCaemf8WAszABGbMThfTZUnMVSNRjAKFPAm9j39k9AIJ2nFj7zT2Nk7MnCAj7xzgvHMJBvIMPyEzPf5JSwfPNNP2CYCgKsn6m8bExnQsiABoC2+5ZhEBTg0yMxUfFdDEWMfGjDsOZuKZEOWirzNq8zh3EDhh/grLEJcL7ogJsXERVOLs1nD4in1PPn3i2RNnD83OzVJwNq2auJIPkbElSClBHJq+2SiXR9LW+EG0NOErVYi8I4PUbKyoGhVAVOvYrKyPT54+9/zJ0+dWJk2jpiUxQZTB7NRTaj0tpU5zIk6SUdk799mU+bi3xV5fnCVHQIBaNAusjKKMU7LCT97/pZXTS7su2XXguv02WwsLQ2HCBJ+9DGvVr9uroP26zi9Z3xtYgivVjJQIWF1cLa24+YYbeZP+8zKvTQdwIVaXibOZhcD7L9n3j//RP/3qt3ztP/1nP/qRj//J6WMn91568WXXHt57eBe2shRkHqKyGYeimH7miefHVbXnsotsqjFycWcCs9d4OaDwQnFBXNBgEIqyHJYDAuampwZFGE4No0ojEpgDccvK9G3K8Cz1AGRqAWXeSBdJT1AmreeXta/om39/SwKGstEDweNfzhQgycQYUU12U7NJajeRiLw+3Cvn1k0TGx8x2Y4zbvk4Le7sdQRKCYyfBUChqRcsKwUxsQu3GoHFBDo1Pbjlzht/8f4/fuDhJ3defOVgahqpX47E30lUsLsd+JYhDXVvyVBAqvcae6ceSM0gSTBHzCTapB5Xhog4is3qyvjs4tK5xdVzK2ujSSViMIIRiwHGCbj3GUF+SDUjK6aO5G0E4ZAr/L7B6TQ5NKlgsEHB7nC9S4VKK3hSPvelZ5959JlyZnj5zVfO7pkeFWMikEK1BX3IkgBqe27Ph6D6NYKsDETwUowpcVHqYPXcyuxgy7U3XMub5v9lXpsO4IItT7UzO4iGXL7hDa/77f/227/yS7/8f7//px555IGTx08eetXBw7cdnNs9Ewq2Wpi4lGILlY9+9t69h/YMt8xOwtjYmLiWejgYMMAWmWnAtHVuduvM1NYts3PTUxy4GJZ1XbtGvRGNYxyLBC6KgjoD3ZXqejPCiHLvldmGWzpbNgNaeTW0FV8zgHy6X88dWJ4NaJBMF/ImXkTTaNpoTIxPU1XTdoKO5I0hc8VQ59XEqGqJ4N9E0ShJ1dNMOZdytReHMrcuKlNdlYgV5vNmTS2PO1BipoKaRkomLmnSyKEjl+7Y8vnPf/pzN99+YGq6mJ4JUGNwCGxmRQg+DTEqXI/VyEgJZqTGqZHWj5Mp0Dh3l4ORRtWqiQ5emcVJo+OmWlwdnVtcXVqarK6tx5j6r8m1QpzrCSinXKrTj6bUzsFOIU0xubdZGJErryUH2IfwAgK0IA5ijaAJAYWGop6bbmZf+NILj33+6Vr0ylsO7bpiZz1U5jSfgYkUIpmtkM16Kipn3/LiLNBScpmUay2EYCCmsLI8PnLZlUyb9v9lX5sO4IKu82pjZvMzM9/zvd/z9e/7hp//uZ//uV/8D49/8dFTiyeuuuXQjl07tixsG4bZLTw89aUTZx45+5bvfBOmY1NotFiUXAxLSM1EW2eHC1vndm7Zsn1u6/z0cGZ6MCymzLQhHU1GdXQM2sy0aRrhwgrqIJt+tExJjNjhnhQU56JdCh8T64Uy+77VjXA4uEWKkLEk6jRushlOFgoaTaNJVAFIVExbBVD1soMYMoVG1UxFomrTSHTJfxPJlQwzZXQjWHKmQjkWdhxJu3qlqY/MYfjwNoWBmKKpRSqpgFpsIhdzi+eWL9q19czZxUcefXwwc6WFqYJDUUBVA4UmCplxYPVeXspyCmY+fNkABXmtWAy1ai1aNaOqaeoYo4jBRLSu67W1am00Xl4bra6M62gaU5FWocScD3AaFGFZRjA/6MW9XL93kKU9KZl/hPSRrSAIxJRJnXWAwlRlYGFQl2tPLz9xz1OrZ9YvvuyiS647yPNorFITMwsU4JKrULBrzWU2by7zZhlWSjhQyzmjlptFIHghXypZObN22zfcMRwOzssmN9eXfW06gK+QZdk8GQG75nf8Hz/wg9/+nd/2B3/wuz/3Sz/3xMe+9NzWkxcfvmRufmHPzOwf/8FH57cPdh3YFVkNMRSi0JJ4fmqwY2HLRVtnd+3Yvn1+y9xwqgzeChaiSW0iOlCrjYwNFrVumqYYiAb4KL7O9LdNuL1oP/9l+Z7sdYn14mvrRo1Q7zlCDvPSQ0lSwjmUZiomUTVqFNftb7mSOU5XU2vZQSKioqKNSBTv7e1w9WzXEjaCxFZ1H5aFrC25O4bHz0o+A9JbHiyAYaaIKBhEFpnn5reder755V/89b/6zd904rlTf/zRP9135GBZErFNTU2TYnrAMOOCVYRQMFy3FQwnuEPAMFGjGlqpVFHGTRxP6kldNXUUJ72a1U1cG4+XV8bjtfGkajQakoiDl2u453p9B5PSq189Zpp0h1ogKkN8lrAz6k4KMiPIAB/tDEQT4ghjVh7E6fpM9cBnHzz7/OnZbdOHbj0ydWCmLsfGVihpmk7tcy28fcRTNssQXx+Aa/1xOlGJM5qTzhipKII0onW89MiRotisAL/sa9MBfIWsNheg/IftXtj1rd/8bV//7nd97I8+8d9++7//6ec++cTSY6B66eTotvfcUGwZrtuyWjTVwYB3b9tyaOeuPdsWdi/Mz00NCRSYOOOtpIBZQUVAjEaU8YcYJUokZqAnuJ559mmLMk6QDXqy/t3dS8Qpn3eTStlJdLWETPtpq4AJf3D0P0qMqtEkmuTvpmwfkiynqIhBRJoYY4xpLLDl7lnkDMV8tGMaMpu/2qfn+GYnSKItc3gZVn0ImcvzBBcIRUElm2nUYnpqNKrf/89/9sbrr3j7m2598slnfuX3/+hzn3ni5tfsX5ifFipJbVILAeX00MQGIbBFJgQjJEW4YBAxqUXWYzOu66qpqiZWtUZRjVCzKDKe1Cuj8fp4MhlHbdS7QUKK0M28yNOdhQ2XT5KTcoZPYu6aeeE85Tqc9hxegjBCmlmQJQTZRIkMQQtQaVPVaXv8M0+dPbbERbjsxsN7rtrTTDXKPjeTGNxnA7OxwXX5vI/Nk4+UAeYKeAuC5Y3yH4yiKMh4tLI+Pz1z6JKDjPDnvq021//P2nQAX7GLAoi43Daz8xve9Z6v+er3rC2f/Q+/+O//3g/+g22Xbr32+mvHMp4UYxQ6N1Uc2bPv0n0H9y9sGbBOD4d5pgxBjZgNVDArghQamCHwOVhRpI4xCicByQST9KThgFy0c3OcrSgA14OwhDJ0kVpLEspYgFuC9un0BMRSx6wjPyamUR29MedCKgxsaqaiUbVuGlETiTFKbu01uCtrIWd0/VRJujQZm7TVaq3Zd5YLFMpscGFSJSgTwaBNE0tmABoHszRcPMPv/5n/cMm22X/2Q98zM1Xvv2TnW9/22j/444/v3Ps1dHCqiiPRZnV9PJia2oKFOS4DxEzVSFOJWyVYZaibetJUk0lV1bU0IqqaLCA3dbM2WV9ZXR+N6liLCZmkagwZkXoHm/Vx8eyAkTKulmSUmjSSu005QmJncq4TmB8kTvE6G1GjQlwTUyE8jFN2durhTz9+9JHnKNq+6/cfue1ImNWGIzFYs5Jdl+2lgY/EnmKwmZIROHN7narUK/0mTAhQ0wKhLIcW+ezi8vaFHde+6mpvz/jz3USb6/9nbTqAr+jllAyBFINyZmHmvgceRok3fO2dYQFjrPBU2Ltj+7WHLju0fddcUcyEdrAuEYiJXTnZodWSy8jKxAwSVROLLHVs6iYwUZo53yYg1hHGN3D2O0h2g0JjUmvoQUZtVKjoJoJlGMCRCouev0AFJm7bXB/boR6zKBJFY2yiaF03jvVIL+a3jFhZxnXIq67eWWspi8i+wlpIyVlEOcVQImiaHaZqKk3jAjiN1HM2u7TS/OxP/Fqs5cd/5vu3zljdNHPzU3/hm97+J5/43B/990999TvfVA7jWhzXdcMc5uaWD158cV3WM8NBoGAhimkjsRGtEKvYVHXdxEZEKabNi43WzWRlbX15db0e1yogZWiiRZmZ5j7odBB7R7krsOa5M+moZwIvKA2Xd3DOfM6m64WYcHLZMAoKtaBMWhoGzdBO0TP3nXjqi8dkUm/ds+2K264Y7gzrNFGVkOoa6SzQhuwwaW6kzvC0fUQE4+DXTYL+OCFYAJjJCKbKKJbOLF+568ZBOeBN6//yr00H8JW+CBzAVaz+9Y/9+K/88i9f/vpLdt+wcBrn5raWVx85/KpLrtgSBqUp+SwwIlVzZmISKTP4hCwxZaJQMAtrQ06imUSZEgkaiBAIbA4woBOQQJ4fDjippL9l1iYERJr4/h3ygmSNEuzugj5qyMgPxFTUG3HJjJ3Kb2ai1qg2GiVKVTdRYuOID6C5ruvItf9FmlioLVxlSK1lbqMcBHdEPEFLLck1izSYCTNEGzJBkFAMVcvZMHv2TPPvf/aXqtWln/2pf7Bv3zZtzEJBxgcWtvz1b3/Xv/mZX73v8w/O7Ni+3qwUXASz6dnl5bPjhS3zu/fsmJ4tDU3JQZqalKM2US26QwYbQ6M0Tb2+Pl5cXK0mTWzIlFgDMYQikboQrAaKlsx7S7xFatZO8H2yxt453uu7S3F/IukbyNudFRbI+wYp/Y2iYUJh5bRM65mph+9++MkvPiNjmdpZ3HTX1dsOTE9CjaGQQEVC1zWST3TL/QTYr4e2luRcN7+yvO2BXLCboZZdkAaGTrB8cvmqt10xGBZ9GvLmepnWpgP4Sl8CQPDrv/IbP/HT//aiK7ff/s6bFoeLOy5eePXVV18yv3tOA0tkKABhBRMJIykAs8PwCahVkLlKF7vpVBXEWMUmBG8/zZIyIMv9921zFygT6/PKEEOLQzBaKmJ+WSdw4+aHEv4kpiBK41YoyTn4vJcYpW6kik3C+hvJwm0tecTFMfyLjIg0WfgM7nveI+kFG9uvjKgtUSREyryEgVpBAZEMJMEU82F6soz/9FO/sXZm5QP/5geuu+wiKKgAGavoNA2+6q47P/vZB//kngdvesNdMzNbiGESo9jxEyeeO/7C4Lly69b5hS2zC/NbS7apYqgQM7JABjKl2MRqUi8uLq+PR3Vj1sCMvEJvAoMaLCAwZXlRAjzH60i3ROZn19pWDmp3C8RATGbavOWWUmaQRvEQwMQKgJUIQ4RQ0/iF+rG7jz59/9OyFsuFqStuvHzfNXvqaREWUyMVIxZvlU6tgQn0S1lXW69IZ95HHXuJyBLW2NWICQAHghlbaNa4Wdc9F+3ZtP6vzNp0AF/pywx3333PD//ID2OKrr7j4EVXbDt07a5L9++f1yHXEpQAUYACKZmpBYDMgWxKDH0yB2Ypz6F16QARY9XRpHLjISiGzOyixuST2L2AmAjePlayzfMTERRZej9RetDBwtkWuOi0wUTN9TtjovnACK700MQYRWqJdRPrOjYSfR6LJWudWEjUWsMcbKoTT4w8C1HJkj/OlUf2DrnAbqqZMQMzYyaLRgEWEOsabMwDyPQMQjMe/JcP/Mbisef/xY/+jVuvP2hRqBjGpkGgEMxQz5RT3/c93/L40/9kcvb5vQcuq21cBZXYFAUVXJjo4snFlTPLz/HJ4VRRhmI4HJZlCSaD1nVdV3Xd1KqkUbyBj2CphQDm4h4EtlRpYcfQRTWlP5nIn2azUU9y38lBqi4Km/XdLCBJhQQq1OfFB6mtCgWYmZvBsJpdfmzpobsfPvn0GdLAW/mqOw5feueRuNWUG4b5DDQhNSKmYGYKTRuZdfA6npilhr/kvpP/JoMqGWDsU49UTcFMZrZ0etUauvnm6ymfr831sq5NB/CVvEwsLi3X//Cf/N2VevXa2w9/1XvecsUt+7dsGUwph+hjT4woKKmZFwx6aKxllouj35RmZrnSjxGpaiUR/giTESiEgpktDRFAUqGkhOUSVLWtFAAdwu4vSGYe3h+alpoakNT2TVxuJ6qqWRNFTJoodYx108QoUaWJrszcmn4QkagRcw502zQk8Yha7CdNoVHzoqPbxha1at9oamCvcZCouvCOiRFBScXiQAaD4dbf+YX/8cT9j/3kv/j+17/hOqGaGCAhIiaDz1vRat+uhb/9Pd/6Yx/4hW3bd9JMUZSsg2C1kIGNmEszU+HxWrOuFfHYKy0ed4P9+BEQCAbNlHhib97lbEZ91CV7yTs3ZSdVi1ToBSUKvhdclUDErK1eBhQZFgoEUTOIOwQKBLOhFHYunHj4ucfvObr4/FLBpRLv3L/j0puPFLuGFU9cG1BUOTDDBIkC4C656/2ynvVP5fa20ttua3v+TGEc2IIpGcewdmY00MHhSw7wZhPYK7I2HcBX7lIAxP/mJ//l5x76/CVX7Pym73jHLXdcQUU1JGqaRpWDU8Ezq9vSYHHJBboUt/udmNGd1BGlaWiXTsS0sjR8qiwHHMrAgZkdHMj9poR2bG0mFQKZedJC/xnqSao2ruhgCo2iYtqoeddu1Chmk7r22L9poku2SY4x0yIApB2u4NC+f1OiPLYgjyF3CqQ6s6Wyp+92t7GUwQeDW0kDhAo2H802PZgaljv/x3/92Bc/f98P/d1vfeObrxWqUTAIGoUoaycxlGIp+tpbbvq6Nzz+kfs+f8l118eGi8CNNaZgKl0ciA1KzGWh0DTexBMYJQabKShF9fBGr0T1Mfe//tqQJIbyiSQj7zPOuVYCXLKRTU+SqzvAKZVKSSaIiCioAgHEWhRNsCU+/cC5x+95euXMChFJGcNseeObr5/aM9OEWn1yaB7aRUDRayawrqKSUhFY0pTtmALtBDkvAnhgkhnDRWCNkaKOV9YP7ju4deuWTfP/yqxNB/AVutyS3v/Q53/6//mpuT3D9/2Vd938+supaEJQUWLjRJjxkehgj8IInaXPUZdXOFtcJulEu72MIiJSk9YxVk1TlYOpshyW5bAsgloISWW6SO4ldRl5XThX/Frb78oHqjBRE0BNvOYZVWOMjUhMYm3SNLFRiTG6EnIe6EgZ5edkrEHZXPqw3WT981QABUi1LfFm9qMvcrfghrUtSbfRJxnU6UMEYvZRCTTkmaHMP3jPg5/9xJ/8ze99z3vf9/ZGRqEAFWyiFBgwg2ROjgkbrPmOb/0L9z/x8LFjTx44dI1oRUVpQqqpDZiJjaGqTJwyszZvUqFec3SGxZKOm0I0k7MIZKapWts7x22t13cwk4LISJXFgIJCujCQJpeZKRXBCMHAiqEVtsRP3fPUsw+cHC/VhEK0jlH2XrZ/+pKFOBSFuaZdOhE+mQ3UYj6AdduUjm7+f9qxRCrzjmFHJluumRHEjIghNDq3ctuNNzFv2qVXaG0e6K/QJYi1Vn/vh/+hDeI3f9fXve6rXkVTNUFUjS042oOs+25ppoY5y86ZdTlUTNGuWb7zMobjXbUNUEk9rm29rodlPT0YTA+HQymnBoNCqeBQENx0hVRVBoDMKQQAlwJ2nreoRtMo2qioSCPSiDUSE5Mn9+56QmBmmox+m09kXCNDB2YZVW5phznqd0hBu9onLDfEutwnZXOYPjxVEpJf4ewInfgfmILpkLadeHb0h//1v/2d7/0L7/sLd9VxdTAMHEKsYyBSR8Q80nUCfyCTpmD9R3//+7/7b/3IqfKZXfsOiAkHayBQFFw2ojALqRRC/l6vZ7Q9DF5qybAOURooRnlrFQxNz7T4T66OZHvqYT8lzM6r/ShSygMFg9X9lpEFGBNzXdoqPf25Z48+8Nz60mQQphoyEAWynRdtCVPccE2k7vAUapbI+ZlRyuf3HrRuoE2zcq0ibWX2UvBrxz1bRABrjWbUXHvddWVZ8GYB4BVZmw7gK3EpANB/+vn//Cd3f/wbv+PN7/zG15Zz0bQxRQBHkUEIaewqEuiew2VNesQOh6ebyABy6TTbECGrqEQgRhFR1E0ZqrWimBlODYfF9GA4VRSDsigDT3EomArvSHX0PBtlH8YrKmoQaCOxjrFppJEmRolRKhdri6Jp6G6i+SdblpmbHlRqG+OnzSfz9gBzAqhlZ9Zy+k01YeVebkTixrSuqguPU8jpfVHpee+N1cDMylM0XFkZ/eZ/+aUf+L5vffdX38nluDCAEGOktlkO4o1tTAxYU8cyFIBsm5395z/8w3/17/yD6eHWLTt2NDpiDoCpRgI7udYhNCMlN+dtMRrZowJtiYVaFnyLqni8bzlv8eeS/ma7eUgNFVl2yb2ekc+7VGMEUACVEoomTE7pE/c9ffT+Y/Wo2bZ924FDBx595HFZk1CUoSisICVhRDBICEoG4fZAZ6y/d+W2DtbawQLUnopuuzNjCwCnowGh0blxKXzVkSsDbVr/V2htOoCvxNVIPH7y1D/9l//qmluu+Kbvevv0PJM1liS3nC+RwNwU+meKhd9UQkixXv4fqG3dMU1IuapZFK1F1cetwBq1IFI1cToORuuT4aCcKgeDshgWoeAQOBS8cX6XQs1EJJpE1TrGGJtGJEaNKipR1KIrWOYmJTM4YMXESX/fyYNmfX3QDDkoYNGrlz0eaAJKLKu75WREiSmDDIY0Hgv9wrivZIkYBFUtgxWBS5sPNv17v/xL3/zOt77zHbcp1huVoiw0wvuDsxHzbohAhKhiBZpgJkqIV1560Y/98Pf//R/9yWtuu23H3ovVxkaNmCgAZTPOAtUMuHkWuBZ0Bwy5JzO07Re+7+m3/Eokzo8XAvKm5YoAQQyWa8ySkgQzM2JjhMDMGng0WHt+/akvHjv26LFqPN558cKR6/fvPbz35NILJx9bKyTwxAjRILnMwkTECJY1nSydp3xQe1BU55/ShqVHkTwdZXqB5EEPzFqsL48K4Vtuvqnz2pvrZV6bDuArbinMEH7kH//IWOp3f/M7du2dK0KjoGBEzBotKbZ39Aqn3rSquwxAUxWQFfBE2zQh7c6LhPsCkaZuoogklpCZ1gVxVVeDoijGoSyKYRHKInBgJgouM0mmRpYkOVXUXMotjWZ3sNnpfo7KJL0x8oougU1FLI3m8vzEAX7LtJZc3DSDgRPM3KY01ivspuJpxqeTNMWLKIReYmjBpECBQGrCxEwoag6Y/rVf/I1brrzqW7/xXYGWmGEcYoxFKMkKM8m9bs5rtNSpaySiDAvMbPbG1976ne/9xp/6xd987ZvetGP3bCSxorRIpr5zxokU70Y8zTukXD/NqZylgo2RpoYM86YtRhq/m48RU/ozOUICCRkxWTr6xiQ+/4ARCrWAIkiBNTvzyJlHPv/42ecXwXTRkT1X3HHZ1r3buRhuv2TXqadPkoV6HEXUexbMlAwgbqvnLR3Lzu8PyacjP9i9EGTtqfJnErIkoShpjeJaNTc7v7CwbbMH+BVbmw7gK24Z9I8+8qHf/dDvvuGrX3fnXdeXYV1hHioppa4uNfMR8DDNmH7WTwPDlNw0kndZiRoULGbis0g8vo5WN1FNo5nFXGIlrsyqOlakw7IclGiCBtZciBRTcdWclJEYDKTQXN3raKbB0eeEmLsenbZ0zWT+nYoEQM1cNCZH8NSWHNJQd294SvhIqkTDWlwnI0MgELhVH+IMkntQbaylERM0SiwGysyFzc4OtvzGz//eQjn9fX/lm2HLRKAQTBGMO94RUvuSqYfqCgOYgoG58M5axOq7vu0bB8PB+z/wyze95XVbdu8mNIFrjSoEZjJ1nJ7JzA9qAnASgyYVWtPuZM6OkU/2DACR93ukGlCHdRFAxgrzsZJQIGoAcfDtZkJZTsqiKepzzdGHn3zi/seqUT2Ymz547cG9N1xSbi2aIhqqPVfuefyeh23V1s6NueEwCMTiJl/TtB7KbloZnBWvM/yfO0Haq9l/dLlAC0smz+0elS1i8fTKq+98U1GGzS6wV2xtOoCvrNVYXFo89fd/+O/uObjwF7/z7bNbGk2gKZDFLmFw22+pJbat9WoCfFOYC4+FXRdAfHJirg4AsCypr9qGcGSW7JuYTeomig6Hg0FJRSi8lwySOgecwEGAGWv68BwepuKkw00QVbKeOKQmBAomlrFqsqwQnQCEXDnMfBKYUvZ0QK4QWz9fQJKeyaASJazHwGRmbI6OAWTKGko2FTJjmfrEH336zNHjP/ajPzQ7HYMfJVGAmJ0Bm7wsgb0BwMxCHiUPZjOhQCIxBApc/+X3fg019H//3C/edOebd128W4OYzzJIEwgoad4lFCe5bUpwSk4GWuA8o/lIChJkZIwcPSefAHOHzK7oaTAumZmCyyqxFgOb0iV77skXnrr/ybMnTjNj9/6L9t9wePuRHTQLK1SImZqZPYP5PVsWl8drSxXGMcxAWi+bcrjkT7tt7NbGEL9bXRNYerNZbjcnA0yM6mC13nrzLUXgTQjoFVubDuAraBmspOJf/8RPHF96/i9/y7svPjjFQyPOiEBCX8iynWj7rlqo2wBNgFAyHa6044Qf9RYrR2lg0TQmabVWYSZLOvikPxUj01rFilKlLIrAKMDkpD3AI1lRC2DzQS5p2K6aQjXPX0eu2WrL58nQRkuqSaWMXCRMQWCqZlJSr3SLQa3yfVv38OOQ4GlKTBqylENIchrkcghRIkIkJhYe0uzRh05+7IOf/zt/432HD88RReVE8nTJBiDRaAgw0mzICNCAYEYEVphF44BoMZhwKP7yX3rH3v07/v7/+RNX3HDLocuuLAZ1FcYqda7UKwHEHCHpvJmko+/7bOwlDs0dHESqpEwMQZJM9RNsILVA7A0fSJgZUWBp1JRCMxgiVOeap5949vlHn148dbagsGvfjgNXHd5xeAdvL3QQwU4RUlev23PZnuWnjk2WJ/VSNbVtEEPIpFzK8FPL5W89ft9oZ6rniy9wPzOdP6BEEDVrKp0qp1/zmtvbJsTN9QqsTQfwFbTM9KFHHvvV3/ivV1938PVvuWYwh9Rjb20dLb2S8r8pdqbu2axx2fFgjCCKaBpVo/fdA4B5AS4Hm8mSti28BDaQRjOJ0sSamJkZ5nVg5kCB2DFoJVUDWVRVqCHNcXfBYFEP+x2wcUtKAMzryZxVioFEWKEXDQK3rtmNWpTE99VfTtkaZmiGkOJpR1AsNQwzFErRq8kiMtSpuGYf/rWPven2N77jba8uB6sKZhAxkwWDUJazRFf6dOAdSG25DLHAAQRTDWwulRxC87a33l6P/tqP/8QvLp9Yvvbma2d2TFUFVEQsTTYjnz5JyffBkPyqzybzDts8z9GgRKQwL22TEbdWOQ1MY3Xup3KgQBZMUK/Vy6dXnnn0+dPHTlRrk9np8pLDB/Zcsmfhkt28tdTpaCRgteB9XjASULH34J4np1+QSVw9szR7cE/SmSMzjQoF2BtCOtebLpj+aesxlbIj9+YRbxlPowvccTMVCJOxNBPdv2/Ppu1/JdemA/hKWQYY2f/1E/9KEO96x+2792/jEDXl35YrfEBKpz0GRroJe233SbIz60AYoEbi1l/F671eFhCXYjOcdy+7Mm9qmGVASVU1jWrP5D72Ts8E+ydZsBwSWiIHdtGhfyzBiChYagrygR8+GD3ViYHsIjokoQWB/K82akxeMR+VfBQpzXdRnySjSfWHTdSMyKgmJoo0xVPcbPnwb32MG/nub/u6opxQoEFZxkaJiaCZVJlD8Cwk6gbZaZwOyAQk0QWYA/FKVlmsv/qdr92xY+FH/uFPfuZjZ6+84fZt+3aVU6sTnXiBXFQTqua9tGqcavltNmTZH1NA8JyOWdRIhYzzyScBJ4WfoimoLkJNo+Xx6ROnn3vy2NLpRRgWdizsvfLSbQd3Tu+c5pkilqohNtYEJmZSNVeVUwTVOLdrbuvO2eVnV86eXrpILobmInw6wrm7EOg1g7Wdgd35xobTmH7Nr/dCMTGYFGhQj0VrjlWz6QBeybXpAL5SVtT4wP33/sGHP3jTG1910+uvGgxUGSZGrf2HY7/ZTMP/bUFwo9ZFdL03ZDAxjSaNxCgi5nx8aVSiiGm+sRPunswwUdGmEklhKGu8a7L1jvJQpusbzGvPKe5Lsa1vJJGHtHAl0l6DVgY4kPYQyLIClDOCtK8JKQJau5z2PzckADmOVsegiNrDR6YwNoqkgJgZKxc2//Bnnn7wc4/82N/73w8emlJaMUMToxNmWj3jVsmY0LN25mkCAeAEPRlxUBhEODCIuAyw5jV3XveB9//oj/yjf3v3hz964LLLL73mwNaL5mOIE63AkmUr3CQaLHDCfMxg3lYNEJubZ4UqAgGBDCZExkYobEgSOMIaHq/WZ184e+rp588cf6Ge1Fu2zl961eHdBy6e3b01zIVmUNeFGtVEQoSCSVSg7dx4MlOmEAratW/bmWfPrJxdQyQrU9EpZXBdhaIP96cIogv602VlWSwon67UtWDp0PmeClYW1y/au2/Hrh1fnttpc/3PrU0H8BWxvOD40+//WczVb/iqa3dcPJBSQgic5155KN1133YxsaXwLINAbeJNxmriMsu11HVsGkk6ayJW1dEpQLDACWOhHhc9FxUy49L1RY2zIIRRvpOBth03dzYRoeDUI+o2tIV7AW+hTaCOqkHNRUpTMVgzJNKCRabsHaeZUoSU9KRSrbNJU11VE4pFasoQzgVUMzOLVSwGVpYD1ukzZ0af+NBH3v66G17/+suLciJlQawaNTB3ovVm3lmQgKpk+5W99EIUkp9MRVkigNhEjUmgTADksisuev9P/pN/9W9+/qOfuuezJ09fcd0VF+/fMZiZKaaaJlYITEkVO3lDJShQAJ5bAOz0pcBEYDVix4GESilN2SZYXRmdO3Hy1HPHV86dsxi37Nhy5U0Hdx65eLB9xoaFFlaF2siYTFU8t1MxGFxzzbLPDsxQtUALu7cZ62h5PU6Epnzuo9dX2s5f6nF1LKdK3bWTL0J3aBmjtMxNa727gUCDoqxH9c7tu5FHxGyuV2ZtOoCviGWmDz/06Ic/8pFbXnft9a8+RLOlWpNql2QwdoZPC4q090g2uO1Tqcsm5AqpwB2ANFEaFeejS5SmiVHFBYOt/eR8R6vBxYk5aeU7KORf2DX/ZBYOshCFb0uSifAZVBnRT+hAauBtIRxKb1VLrUvq1KJU+fV99PkB6EoVLYvJVKWPSACgoOybISZiZlAWgwCmZSi0abRRbmY/84f3bCnDd33n108vBKWGwCZIfW4GhbmVB5jTOJX2GCeWKgwKyeeFkxt0JQVHn8hBnXrr7sGP/Iu/dtdHvvCbv/LBL973hccfLvZefmDfpfumt85IYRIrEmN3XCagAJcq8iqwEYgYKKUMMnCsqKqq9ZXR0umVc8eXzp0411TjqZly78UXXXH9ke27dhQzQQZWc9MMVWxiwczFYs0YRMYw8fPb7+MlMjUNCAbMzc9wQaPV9XrcDHaECPFrzc8x1Hm2Lez/Ypu9oQZMieCVKiqdGwfSxdsYIm684UZqXen/5J2zuf58a9MBXPhlgJD+m5/5v8I83vr1b9yyaw5UE7OOlZi0w8TJJ3Gn6Df7AR+A0qKtIeOwRhCgMRlLXI9NrVHEiEgNtUkldZQ2mAOyME0mj3Y+JjFqmGAEZo0NZRKqG35C97/cCZAcU6ardmEmCDkaTNF9quRKxn9cwZhSddqQOK/tTKvsfpIWEHIm0SL0fnQUJgHRx6q4YIWpWQC4lMET9z3z1INP/42/8nUHLtshoaZAEJCBQsKuua1ImKZaPDqrRdmmEVMuZfrWam4wNiY2IYDACjQDq+968zVvf+stjz/29H/8L7/7sU999tgTR6+46fpte7dPzc5pUWtTU2FkJipFIAaIgocADDKhteXJyplzS6eWF5eX1ydrYnF6y/TOHdtuuOa6LdvnirmhMMA2ZlOKCmWYeQ+zpnQunwNttdw6U5uUOcj9wPTsdFkWdR3Ho/EQM87Ccg0+UO7Mc0CyLX5kCKglc7XFG83XmFl2pZTKWkwM0WYisZZXvepVbV/f5npl1qYDuPAranz4kfs/9umPXHfHlZfdeDCwOeBLCSzNfVEtDn4eSy71J8FVvhJ9ECRqjclE4iTGuomipmpEFEXHk2pSNaZELiqXSJo52CfKUJN1KTmRqGkUEKGdSZI3JbufzNQnO+/ZXCHuq/TkhML3iTKEn0P81thqMr5JMtQ3OKkJufU1j8dTmUI0ehuZigolUSQfgts0EnSIsd738S/ceOWV7/jq1wSqxECmjJAIP5yNeaYWtcg/uhpmrlVYt39Z5AC5E5lgRmwKNhOCgKNwPHLl7n/8z77/qSdP/t5vfuT3P/xHTz4qV914/dbdu8JwqsaYApHPfnHpaCYwlGh5eemZx54alvM79u06eN3hqdmpwSBoIUaqokYSrdLgNN4u/YKi26p0oPOZ6HgE/ZIKzGAq09MzxTCsr1bVuGbME+rs8XNq2aWMKetpwccN1r/73HRhUaYROLDIxADWVtchdnD/gc0e4Fd4bTqAC7wMKDj86m/+RgyTO++6ccu20kIkI21iFnKwRI/0cmAnstV3AylwZ8BYCSTGFcVxbNabelw3MaoqABZYDWliFFEYMwcf2UId+5KgagZm55d6XEgAClYC12n8YuKfIuHCidzdblOy7j2dmi5peTHKm0yr6xqkt1iqD7qUkCWVCII3EQOtL3ATpMk5pN40U7BBmIgAgSmEAwZhiqvZh+55tBkvf8v7/tL8loFyDEVQER+PqznlsERWRGpJTu7AMtrdOYKekfP9Sn1MuYYOU3XpaUDMIsCI1VWXzl/1/d/4vve+7td+63c+fPdnlpePHLzmWrBpbIoA00gBRhBTY2Zgy/btt73pIhDXSDN1aquVVEkpwMzYCM6uTEc084f892ygLfc2tCmZv5p6HSYGCmUoQhBRqSMTm1Gf+9/ubK4DtBlAxiT7ykBtOSeBlaCWt6wWTaaKQVOtz07NHD5yYBP+eYXXpgO4wEtMzi4tfujDH778yn3X33aIpxQMFeNQwHuSbENS3PF8suWhtiDnubYBZA1kYs261OtNXcdoAjZmhpo2MVZ1LdGIoJZzcXcgKQInECXmvm8Ag0DsPE4h4qCdDgTQqwajRyWlVn25d0t3Br5/mxNZykLc0LvQUDLk6SlzIXp0vWRtRQEJ5cj1BYBSwTyBQmQUYEy6jvVT8ZF7Hnvra2+49dYrFBUFAMYc2uSK2oyhzVdycTyjJCkla3Md32NLBjGbvwwaEShLeyppYbBhCdUxKFxyYPsP/sB3v+cb3/0rv/mHdz/46b2XHhnMzrnOh7FnVIxAJIrAE1MgCgvYRL2UmqlTRjC4DqzllCfb0uSz8qlFr5byUovMTIkRipAmtyFdVcwMUtHUG9Ev1HccLaQNa9OB/FtuV+l9rROEXOZ16+wWi8LDTYv0iq7Nw30hl0ebv/5ff/3s0un3/tVv2LqjDMHLlmBmgeZguJdx++IOJrE2HTAAaer6ROOoqdaqatI0ospGTjBU1bqJVRWbKGlCPOVwLo/t9m8iI1ElYuJkZOtJtXxuZX5hexiE3k0NZAihpYIjWYFEjk8bngCEXBJQZJIR+RgwE7iGQ4J6svXPJshy0Tm7Acv73nqUVl+fciuDAmTGyiBIYJRf/PR9U4y/+BffOTUFKYxDMFGfmQJiSh4F3T8bAuV8NpLDTDvXmjzrkqJUI2gVqk0jeW+xaNP4ZGYzUmuqSw/s/KHv//bHX3jhp//9f16rl7Zu36NK4qrOBIsRRBxio7kvOJ/5tvifbDC1elCGnhfqZ4o50+r9nU5QmzpkjJ8AGDPnRo1ECm6loKnnzbuLwdB/LF1L7fWFNlxwdxTKMmiFc6cW3/yqO8syYHO9smvTAVzIJdA6rv/qb/zGxUf23HzHVcNCGxERC1aoSr7BKJtWApCbKD3mA3CepSKB1RpHcbLWVOOmjj5UigkEn/s4qetJVVeThoMRcZmxerI0QybbdgNItVYyZrJamnH1+INP7dsb9xzeYyUAJWU3M9oOX8ySAW2gvhGosvzR+Rfzf320uyM9aTdzKtB/NfqmpTV9yIF/SjhcQdmdF7R1JWXNy+dWjj32xLe+750HLtutGPsLc4SeauEdvJZqAEY56kci5xjlyVzWNil3BtThFM05QoLnCg6AmTEFFvO5WiSQsghNHKOYHNm/8I/+3t/873/ykU/de//s3E4pgqThbQxwlJx1mHFy2SllArVz2fNWehrTP/y9A0fZ3G/IAqh7lau9qUQKPBgMiJjaun/qNkZqSc4pXuZlZZSpzU/bM0/IdIHklAwwVVgY8GD51PLFr93Dm4OAX/G16QAu5DLTuz/7mcefe+xb/trXbFkwKiQgzds2TVFlDtDbYN9Nokv8OjrhK/V81aqjGEd1M6nqJsaUeZO5MH0UbSptxhLH4lWFJgPZXQtASi6S7otoDGCORFLM2Pyj9z6xa9sezGulDRTMUIWRs4/QId/tPiZmSNcojM4AtSara+RKMb31tqPdJjrvY9O84/bfTColUyUqoGSmYsKkJYYDbPn8p+/etTD91W+/LQxiJC04QNXpTb7HuV8t5xAMV4NIwSu5OW63NVu2FgbqYv8WI3EJaEdWkriGpW5tM2ijDDZCg2jbp4fv+5q37Nu548Of+RzRnAgCERdcq5oph1JMAuCyEBndyocjfUcHvRgS8GRtyI7WjZ4HxqRz5CkaDKomgqIsZ+Zm1FRT96/1St0pmUgNvx3da0P4nw4LaSo8WFspafv5VGpp1upD+w+GsFkBfqXXpgO4YMsAVfzmb/3W1u0zV193RTldmk4ACuRSvyGFwVkSwBIUlHAUc8OXxoYowGISTcfSrDXVqJpUVaNeQWRSaC0KUJSoUbQxqVUt80B7mG3aMoMRFMrEgIoRGQ8w2H/wkqOPHz/37NmtR+Yai4hiZEYM8kIhp7D8fDuQrEWOEnP0mu2I5S/tLephW10OkJ7Lzipr0uXuIpcpcOPmfoktFGRKPCkni/L8E8e/+1vevO/ggpFwQUQg5Ra4z71ouUpNRL22pGzjkNE3S3azowl1JjZhM14K6RWQW5zE94qZxRjQIGUgkqYpC7rr1TctbJ3/tQ99zGjIKBpRBCJlJsqS2amWa3kcTnvaNgByG44ZNqZQLaEnn6oEXpFBjaHRmqouy3I4PfBGPNfVCOxC2z5rxzIBrUtQyRi55pyvrXY7KMcyyU0SoQjFZLVpRs3s1Mxm+P/Kr00HcMGWAUurS3/8Rx+99vZDV1y1DzYhLi3GpBpsRK6b7yiCx6CqmjFUUQXUx2xTMDGpNE5MViaT9VjXjWgEGxcUACjEk4D18Xh9NG4aUYUknkui1vemsCawIQWERg4CNFwPZoe7L9r1wBfuf+PeOwK4CQCZT6k3kAv7tyXZjavXxHz+T3ThfTYrqTiQkagUlyYr3L3RgR6kakMGhcwJTGYkLk7BMixs6v7PP7Rzy9TXfv1d5YCoAIg0WpFxa/QYr0BrtRKKTwT4XPp87jQBHZTNPNgtbFst9mYxAD4MksjAbp0TbJbxGxCrgMw4lGTgZnLzZUfCYOZXPvghES14GCPUfDi7qVlgR6C6QN7bonsl6d6x7Y5VPsD57zbhan2zQYkYhqpqYqPTW2eHc4NoE0tFevdtnNNTZI5ur4aPHoKX9zBN/ULuOMwTygxKas1Ig5aHDh/ItLLN9cqtzZzrgi2D/fFHPr7erF33qsu2zhM4iMsXk6WRKig81BIyhaqpmKlaVImqjseqmZE1qhON69Ks1dV6U1dNbBoVJWIGkRqiYFI1K6vjpeXR2npVR1UDKbN5BBvYC31tJTF5IJ9CxcGHvBAry94jFy8tr547tjiQoDAlC4xcPm7ZJj423DUAuDWU1CFa/f8SjmKGpE2UGaAp6fEsiFIFtP0vF5BzxTjbPMpImZoKxcZq1qCj8uSTz9/1xtfOLZSGiVgUM+ptAnIy0WEo+QsBKJl3M2XL5QxVtSR51EsUOiSt9VXqrRN5gg0SmOTvs1xeJTKwGohCgF59yZ53vfFO5rHahMmKIiTCFmemlO+oi6wiW/NsW9ORQ6qIoGXIbkB+0pVouW5sWYSjXm+0trm5OQouDJ4SnnavX7z6H5dzEeQjkDZGYZbRJGIQYI3Wq7Iwt/3A/ouZz8sbN9fLvjYzgAu2FPJrv/1rxSzd/NprYLXbLAZ5/G8wSTYQ6hLO3Z3m4ZSKqKcAlWEs9Xpdr1eTupZoYgQK1JjWtSksSlwfTZZX15dX1tfHtSgAcIaVOrEuZCQY3vqbvspJnxAio607tuy5aNdD9zzw+p1vGA54PQr7lGIv2WYsHF0xubUGXayNNrRPZWB/NMld9HApy7x+AOhbmbyxoDYKzXmDZwpK1FhDQFEMpmzL04+eLK2+6203hWloQQzXIGqlqPsf2C0vd6bIltKQxcwB8mGXXTEeKfTvEJlkVQ0gdZ+RFTyTozBLgmiJsqPKDGWGYQrxNVdfubK29rEv3FPVVIQpjaacvCFaIMWyEl93uDekTu2mbUCAzsfb/LVETFBlo9W1dVPbsmMbCoiJ15zyddt+YvbcyLXdjHzlU9x9myX/124gkxHDIDpaHh05clkIAS/hnDbXy7s2M4ALsDwkq5q1z33hc9sObZnbU3JhUDb1WDCrdao0IlGiREmzXMwkZe4kZg20gTjov1pPRlU1qWs1AYGYFFLFZqLN6mR85tzKydNLZ8+urK1OmiZpKGgXlydkIkMKlFr1Qdabu5iCN8LVN157+tzKsSeeC1oysbIp1A2h5W4izwB68IO1lB+0WI3/m9CHDEi0x6g9UunfNvDuf1z3xhYS8UfVhAOBYTWmZO7Yw08e3LP9yGX7eFAggEA+FhjwHIjacfft16at84PRzjjI1j9wknLesNn5GHaHNX0a57owkHvu2kObWEzOWgJDmUAFFWWMb7r55rtuu01lNQThfLN2jjAH/T09p+6fPgyUDlM+9m2KkE9V3m1NRd71lQk0zC3MgtXjgYxrcUby291tOwjbiym7/A2HpTX9eQvNNYXCyuLaRbv2bVKALsjadAAXYDne8dSjTy8trV7xqivKWatRRa0UTWNNVIkmjcWooqZpfLqYWh4IzmSswiaEsdhq3axNqknVSKOsoeCiBEOsrmLVxPG4Wl4enTm7vLyyPh7HGA2KdqirtRCC+v2fYI5E8m6Z7m4bmaxAzZOZ3TP7L9n/mbu/GJdkKg7IyGfEU3vf52ZTahuEetBN95L0jR1E0dl9mKmZ5vnB2fZa9gZooQ70DU2LRqQp6og8U8w8f/TZ088/+7avvWtmroCJqZizY9POAd2cW/Q/UZEG3RtSOSYBN8xMzAn7yV6zd4bzcfNjSSAmYwYxegbSUZmeiCb5FB73ooqCeAb06qtedcct1zXNmqoSsQ/xsRSJd0iLok1G3LimNGsDOtMCUNR/JLvQlKUAkdZW1pnD1h0LQprqUG3p/UV5WMq5cn2jh4b1HUViNHfHyIgpqNJ4Uh289HDYtP8XYm06gAuziOz+++9vtNpz8XYtZNyMG60baaoYG2mi1qIiSAIIbvXd8BNYRSqRcYxr0izVk5V6MpY6mhhAYI00mcS11cloNFlZHZ05vXL29MpoeVKNVSORBjKGUg+p9TAyhfvGZmRKCu//z4W5nPirBpqguvyWK2D04CceCiMOysqszuNPL0pG2+17a7uTC+hiaTNYGhtppim9sWT93X5l0eDWrrSpQjb+vWQAmV7jBk0NwEyYeuL+x2am6Jbbri/YSDUDIoq+58guxbIZVC8jePux5vg1uUImr4pQL3HABrOKZCr93JFzSVPHhbd4eALRt4qJi0tkxOYv1KkS115+eOvclJGaCiWkXntHIuHsOTl5SaBng9W283733C37ZBJaObMynJqa3jIAfH5ZukbO2830ARu9X/7wHhwG9DLNtnGPVC1Wwo0c2r+fabML7AKszRrAhVmE8NjTTwiNZuYVFNfryaBElKiiTFBXtEeeFsWJjy5mqqhUa5L12IyaSS0qGX0X0bqq1sdNVVVVLVVVj9fH4/VaxCTCI9AWJkmhWpahTw919zEnTIGyiIzbSwUTKWOwc+r6W6/+4kcfPHzgkq3XztWWdHNMBblS6mVNaqmSlEqVHT+RMmuE8pdSBrRdCW8jIpRt8J99UBMJJ70wUKESY62nT77wqusvn98xo4hEZirMAb2agbN4MpZkLcrSUhy9Ouy2H275TbKwTaK6dLgMoT0EIEuwiWl2p270c7bE5ONqumYz8sfMwJU0Y6kZumvbjsW1JUWWwAMs8Y5MAJBp6qBLn0I9ROw8698e/O4vMgMYHEAao03iaHVlftvsYGHQ8MQgCfBvPdqf4Vd6+9XD4zacwlwHsuQ+q4lapIMHDgTeDEYvwNp0ABdgeUD58CMP7tw92H3RrKpRMahN1JQD5Tm9TvxkkKmxqSnQqFYSx9KMmskkxmhpnoqqTOp6PKomk7VaIYLJeDJeq5oqSmPk7ArLDcR+K1MvdOuMna9O2cXNdQJzzMBkajEKc3Hw2iOP3fvMZz5+75t23zHcgXrgg39VRdRHYhkb1OM601avaMOiDeYBie+DHvoEtPhKX7Ny46e86Fc3pIapYnrpxNpkZeXyKw8PZkO0CQEMpqxZlng/uam1jfNzJN3B25waYnMYm71NGxdTKmBzch/dATWDT1Zot8w7DBKlP81J85BZATYjUrKJ1CONZ0bLSyuTgqcCFaJi6Mosbkxb6Z0+9gLqjm3eoZ7j9Tf23uXwTwAFK6uVuqkm2w/swZQpRUuNxvkMnXfEzzsD2QN1Z6onFZo611zDwnlRtVktUsUXXxub6xVYmw7gAiw3GxOprr3+0m3bt1cxoABpY+zjRQgwFTG1JqZprSqxaqSSWElsRIRMBQo0sRmPJ+OqrutaIlRtNKkmo6qaNFILlMiCqPNrnOfXj2wp+wI3+F6bADlKQpzAIcdvPaw0EBCYFaZD3Pa22/74Vz70yCfuu/mtdzRzkzpoQUTMUDATCyEPiUwGsXUqPccDOCi9QbWgpZVgw+va3639oxWwcyQfLTsfJoppKk6ePsOkV157uJjhCBmQH1G4xJ2DKZ3HA9pgnrwsSckbhRBA8LZeVSMKqVaSX97bKJjlmQq5mB6SmWYjU3Crz8lerlBOGRMDZLU1EViNkzOra2eWR+dGzUSIKLBFkBhx6lDLEF5r1bvj1Ttm1ulVbDDc+cUGELFBxYSKplxcXI0ad16yzQaKpEibWbndOerFCy9KCLKLbr1Ge6786rO2paFar63G3OzcZgXggqxNB3ABlpqeOnvuuVPHr33jxYPZclQ3MBpILSYgBpGoikQViVFF0IiqSjREk0YsileKtW7qOjaiGtVEqRlX41E9GlciZsIBBZLKASWGZQsZ95CSXuzWhq208dlcFm4RG4OSSaHz++auuuWqxz7zyO49zy7ctCvMQqkhJRCbNWbgwOzibhso6K2JzQjLS/A7s5JO97aNeFD7Rw/m8H5gawVUmRnlqedPL2yd233RNiUVE296yG0FyRG2tW6k0Dl3eSVmC9hRf6R/QZrl1nIWsXHL+rCaM0lzMOydVN44oUQEH0MGZgrKoqQRUpksjqul9dHi6nhxXJ9ZXl9ZHfs4IMtONBdH4JOi2+J3/6TaiwCZ3hFuz7pnLkaBJMowTi2fWmUKCzu2gNRy6uBhwPnJV+fR0WFv/VP14sA+552mYNBoeZ1puGPb5ijgC7M2HcAFWEy8Pl45d+7M6vpMY1aNV0ouzDSqRjNTi6J108QoGk0MUVREBTAl1/OpoxeLY6MqtUkjsWmqyaSuG4nQCNJWKdIod+Yn6aBUiATgwVg/lkMPuWjjdENLAmznPxkphEu99jXXnHnqxOc+/YW79r11avewHqqxF0+NmdpIOHHps02njD+4R7JO1qCH8mTAOW9JTlRA7Wf5kxsAo4y7u/Eteerc6XN7dy9s2zFnqNWUUHThPrkKECg7vnRkmOCofK7w5jqv7wzaVtwUPrffnDAVzc7I+wxcr41So0TuuPIzEpjNSEmNTNhqs3EtK5PJ6dF4ZTxZHVUrq+Ols2vjtXVY0x2o3motf8uzt04RiNAdxpdYG3pzAUYRZHj2+PL0/Oz8jtnGJpbFjja0Z/SuF+ujTef5h43tJWg9tCObZkS8vra+dXZux45t1kYJm+sVXJsO4AIsg00mlTKaQKNm0mBST7hqtI4iqiLqMj0uxW4UoqARETUVi2qxVhGJkghDUklT1VEsNpDIBApIoaIz+zL6A6BvPDvT2vMBvXygi9f9ldSaOjNjgIgjaZgdvOFdb/i9X/qDez9y76u/6tXlReWEmmDMDA9WFWlaDKWYOUPluZiJLl7euAVt/pH+8F2h3jZhY6ngRYGuwSysj9Z2XnrpcDYAGBQDF9jMNV2AjIk5zSFm8g64ziUg5QHJhfa2ZiPM3x6bbBIp51KJ/mheZ/WDaAogeC3AAGYzbUjH2qzV9fL6eHF9vLjerE3q1dW1pXNr55ZWqthYizadv7v5zFDnhbpXtJvePbJxEci8LgIWljUbLY127t3GM2RsqkpdorbR1KfyUM/B9DOE/HCPwAUAZgpiAxEH1BQn1b6LLhkOBrxp/S/E2nQAF2ApbDKZKDCY2bIykSauBWEaUyXRxYddhI2ZwWhEnRbaiDaNNI3GRjSKxBhrbapoqqaomggNjMK006rsEfpbQ/liA3Ceveib+g0RHeWMnthETc2YQsN1uTO8/mvu+Mh//fhDdz989Vuv46mCyybC1OKgKNrm4Iy3tCH7S0D7rXndEOSmQcgZjOhtdGuQ20SGuu1nAjeVVJN6246tU7NDMylCwSbEbeWX3e4USfLe86O0nZSRKCKfspihHp+lCU7HmXobbhkX2uCJqC0O54oCEyixjihGaIU4kmZpfbyyXi1N6tVxszySlfW1xaVzo5WVphYx52m+OPw//8zlP6kFnqi1wl0QkGP39GL2nSy0WDy9NBlNLr5knw3MSIiSklH+NstmvTtzmVDQfUGXdyRSl+djaP8goOBQrdVxEm+96zbe1AG9QGvTAVyARWARGgwGCFNN1OXR8tzWmZKnXUnRlNhLctEakXEjTYOqliZqVTciolGkkRgFyt6lz4bSggIKMUpSEm5zDQAU3KmXJbP50rTuLg3PQmbUhnDJpVA7gx4mEg02tIUrd15x+zWP3f3wcDh32auP2Lw0A5JgjUVCGdpPc80ykBr1Q9YubM0KN+cdL2t/68MKaYu7B6i1dQYKYGZpDA1mZqeKMrVZExCS3U+je5mYwVkJwTq8J+03t9WRvOOkKm1xuIdstduAzPXsNpC8vEsOAxG8+guqUa/reGkyWRyvr4yqlUmzOmnWRs1oXZYXl1dHq01VJ7fEoNSOYF3mkw5Jbr3rAvQc+mcfYVkA1H1kYp22PcSsIApCZ4+fK8Jg2+7tqdyQexe6QD9/S4sO5Rfm3e0lAb1OvYQG+kapKZibcVOPqquvumqTA3qh1qYDuACLgPnZmcHUcG29Xp8ooQxcBCuIKCAwQWFNHSexWVkbjycyqbSqRAQiAgJiA0I0mAhHZiNVTSr1bB0LsW9aU+jZycd31t96JrVraMo88lRzzCJpBKiADEwGIBCBlUyC3fCm69aXVr/0mS9u2zKz+5q9xUzg6RokcB0h76piAkwNPg0ra+f3Y0rr2yw3pHlU7fmcUXTWv7VR7fvMoIzQ1IDR1PSACyYWzyPIpU/ZR6IRg0IrAtfFyP6T89FyPMt8giGItIOAel/b7Yd3WLcfl2c8+uxiVlWo8kRlNdbnJqNz6+srk2o0juOqGVXj0erK2uJ4sl6LENNAEI1AAnTTYNCWzsmoPU8dJtQF4/1jTBny7/mKxDwiqFnUk8dP79i5fWbrjGLi7WZkMHDu8DVLPsTSpNAWRuyfGesfly7bSL2A7E5U66oOxgcP7A+bOhAXaG06gAuwCChCAaPTi+dW1ic7d8/Xk3ERpgfDaYta1fWkrtfW1tdG49Fo0iiLsEQn2YMIPgzGldzM+SgZTielNCssi291KKwbLEujflMJ0ldqENiYCmxEb1MqbyAQgzVPEwaZRhBbU8bb33LLn5xd+9MP331LfdslNx6oB9YwmAwqqhoCRdPATGA1133r4xnJyJ4XTKcw27eBehvXblgGuDzCzEbHmCwQx0ahVJRBJBYBDGUmjzcDBaLECO32nDZ8Q0vxTCFs222LdkN7QXF3rPOmUddUZqQAKVQJjUotUk3i0vpocby+WK2vNNV6FatJnIyqpZXVyWRNGos+2b0nrs3IbNc+rt792Tmu8/xqQqY6WCg9TeZk39R+MV6djJbXLrnqSJgLTaGJsw8DlBBarMeyu+lCixwyoD1A3QZuSOgs52GkqEb1oJzeuXMnNtcFWpsO4AIsgy1smd+6sOX46UUIYxKLYqjg1dXxeFKvjsaro/F4fVJHI01aONCcuqd4PU2NMRKjFKRl6DnbsBQmpvhMNpj3NFOekOBzjyIzRpvVnTcCLqlZyiulBnb7qEbECpGAsHXwune/4aP//aOf/+jniGXvrfuqEGjKjJMSJHOwxgrjwOxQiCtMpDZWOm8i70b0Kf30Z63/KBFpu5Xt7huZIBhDQRS4CETRQgCTAgFeKScfrtz1uHq+0kpkZOm9bPXTX543pIlsBN3gmjiBUHACrKopBaslglhUK8jqpF5er1bW1lcnk+UmThqpK5mMq9FotLY2mVRSNQpn3rdSpwDIekqcnYnNdYguzO/SKWQRzozapINKZJSqCmAYiSkXMjh18rRG27Vvtw5UgsKVQZJeFLdTB7IP6LCl7CoNXdbWc+XdNoNApAQjNq5GNVkYlOVm+H+h1qYDuACLgIWdC7u373z6+KnxanVq/Vw5PYhxPKon1aSq6thEUzFjYoRkdKy7kz0Ga5s7k45ZCzjnr0BnGtq14eket9PtSYq2NzBEWzp+i/z298NTBwNgShYHsdw39dp33PHR3/74PR+79xbBgZv2r4dxM7AiEEQ1WgCCq9kZqY+OcrpkNllEvW/KwSrhvJA31Xlb60x5exyuyg3MxgCMRMVMDcac4H923Dw5HMrl2c6KmSF3QCHH/SBY7u5K30kZe89bRUAAKZEQ2ExhIFaQKmsVm9H6ZGl9tDQar1Rx1DTrUapo1ViqtaZaHY8m46qO0QgICk2ZHDqvD2sHwZ8fYG8s8reJk3kM39VhOz+W25kBYiBiMBmcfubs3Pzc9NZhbbGSBmRMME1Rf98TJmSpd/20oGJLjbV+RSe5sO5iIkU9FrJik/9zAdemA7ggixh86SWHP/qle868cC7yok0p0dAIiB6fMRGbQVQZxlkhp29+LeuzdSaoC/u6L9rwA+c/1TZ2brgHe6Bx77VOULKNwbffymLGBG1YxWT+kq1vfM8bP/k7n/jCx++d3zq15crtZGJBFC4bCgmmVoOZvAvXgYhEVqVscttdavlImvfWs4O84yn4zRmNJUNHxDBjMkAn1aSJysHxayPA2PcHGaMhMwVBU5BNWVouyy7krgT2IoKBIF48dTYRsjl0tyQm5lQpmKpO6moc5czK2vLa+sqoGtVNbdqIVlHGk6Zar8ejST1uYlRNn4pku1tCUbLYL6GnsaF60X/YWi+e87k0Aa7N/mBkQhqMg03LCp97fnnP3n1T24txWPVpQ64fZYAhKkNA1NU3vO2ujRDaikd7VfWrI23bhKUjLxorMwlkmw7ggq3N4vsFWabQ177m9c0KrZxZpzCIgkasiSbGRsGMVAAlNnLsNXertv+lz0lOoecaNlhygEBsSXlm43/nvd3gqH4vnejJMwBk/ZZ/f3d3O2cTo0CFZuuhhbe8543zO2c/+sFPLT6yODsZBmME5hAU2ogoLEaNChWoQkUtmolBzMRMQAKIIf9LYiTkgTUpkl6S99UakfrwsrSnMJCjTgYzAWwyETWIqcB8XkLae6fWExLSlv8TVTHN2+KTANLSLFPtKULSSm4jaoOYCGKjsYE0JBOT5WpyYnn5iROnjp46+8K51TNr1ajSca1rVbM6qpeX1laWR2vjZhy1NoqAY2H91GLDGX/pC6p9UXb51j3ef03XLuwHEBYAGEJTrJ+q6zVduHirzlZaCHycvbpEHwAzEqM0psh6+WCXquUEwXKS0b8ccxLlEyIpINTjqNGaGDc9wIVamxnAhVkMvvbKV22Znls6uzK/dxdIRFCwkzHEUwSzPLKrfztZe1+9KOA/H6LpHu0Qkg0BvJnlinGbwL/E23sI+IZIOxscj6RNjIKRKTdSNzM7Z9/y3rd87Pc//qcf+tM77fXbrt3WRKEyqlGEFBRAIBMCk/bKk6Y9tmLXpWxA6iFLSUHiJCksC/ZQlwwApG6LbUhEIYzXqqayYgoGqELJAFUDweEHtRyeJkzJ1Hp2rQdyE8HghQMQjLOIkeVTYmpNjFGgkVA1srxenV1eOb20sjJu6gaqENIIaZo4GU/W16t60jSVRoWBFMoEU+pmtKXD3vnhlzo//Z/pGmlBu7ZQ0D/Flg8nGbEEs1DEcPzYc6Eotl+yoy5roAkwWLunRtBCusBgw3d7pmE5TVJPLimr1qXrp+tFJB9yR1KjsGI4KM/fqc31Sq1NB3BhVkC4ePe+Sy4+fPb06b2yjQim0RDIm4Q0W/zOpiVgBEAfvEV6tM8732A2kbjnPaJgu9p7cgOe3FqMLrDrIUmWepFac+KbSt7cr+YcvzJUXJcXh9e/+7Wf+r3Pf/KP/vT2+tUXX7m7mq7jAMSmEplLgrmmTvp0dZ2djVuaYJWUpSTb5WiRg9yuV+8BM7cgtwEEpqmZQRjw6sr6uKoL1SkMYD63gAkgE+RiNzKTKlN9vEaQ8CbH3rNyhh+AYDADG4RgBlPTaKIWxaSxONa4uDo+tbhyZnFtZbU2HqiSQdWskaoaV+P1ajKuBKTiBto5pklVdKOl/zMCf2Rz3ntR+9L2AG7cmQ3+xEAKBA2Y0MlnX9iysGVqYaA0rqUhAyMQHBLUXI5o08LeNUZt21dC+ruUIB+xhLERvOXOoNWkJnA5mArF5iSAC7Y2IaALsgjA/Mz8W1735sVzKyALlIF+UjMY96koyPdsG+F1VtlaW9wBQS1MxN5xmnT287wPUEZk4WQWAihhJy2IkpFngHqfb7l3qvd1BgIzyPvRAAGZsdShHlvF26ff9J437j148X1/cs+5x8+Wk7mpem5KysJCHrLOHq76viu10bYP4co2P3NOydx+UCbaONHSYR9KM3uRUGpVEW2KQbG0srY+rpo6appGkw6FQsVUzQTmCqh5SxLfyg8XAa1yRN5WUzUxiGpUraNOmnocx+NmMmmacaVLa/r8qfXHjp16+vkzZ5eaSstaUGmcaD1aH60sjpaXRuNJoyii02qMkPUYPH/xKBpo29deCibJCWJ3VVnvDRm0y2Ta7ikzSjAjASFYE1aW1pZWT++7fAcPNZqmIW9mSiokyupfr+TzgqAwJROCEASkIAFFRiQ0jIY1simTwhQ+GEjBZlBThVlBoaAQq3qqKANvOoALtjYdwAVbDHv97a9ePzNePbFGDZUhEBmbAzOZf5iQ6v7KcVwPpOivbMN6OK91tgDpyRz35m9Bn0uY4lH07F16tN22NMQrjQw2UzCYQQWCiaooEVFZVBSbWX3TO1+3a9/Oj3/ok8/c+8zUapjSmUAlQVSS8SYzh9PVB4mRgbjzbTmY7LYfAAiWB85Yh1AhR6uqJhqp1Jn52brWGFU0Nk0dzYySYRKopLmPoiTaTn2EE0RT3UXNDFCQ+kweQ1RtTCuJtTRVrNebyagZj5rJJFZrdXV6ZfTEc6cfffr4C6dWVtesaUgaq5tYVZO1pbXR2nhcNU1ELVaLePEB3pyr6PiV+Xy9FOiz4Wf3xJ+dJ3SflaAsg9NwyUwxsMGp508ZxZ0Ht0nhBSlxpwp3zPlycwBfjZTSCIH2GmMDqwcUhCRz1OpnGHIhCgw19bpKjM3c7DRvToO8cGsTArpgqwS9+sYbDsxsO/aFp29+802T0Kg1ZAjB4dnMgDc638jnfBvozHp/tTDIBtDn/I9ouT5d2dBfb6DcqJVem9MNAKkLKf9pBjgzhondJITAYgohJbOgZBanp9/03rf86Qc/8cVP3Lty6szVd95SXDSHcllJCCALEcoczIgN7EPHyIgcm89iRrmDqMt5cgdSWxRwx5ACYQaITW1+65ZxPanrSBbG48nUzACCwJJzGU39bF7WNfigRlXLHVjWHh4zn1rJamqoozYao5oaaVSJigZ2ZqU6fmblxKmV0XgM5oKHBayJ44k2Vax95mcUNXEeDbnSjkNQLYryEmd1AxD3Eu6hrRq0Z6Y9JIaNT/hRMjCIYwhS8iQce/TY/MLU/NZBY9HUiNny1eUgThuRAL1rI22DkbeGU6pbgciS3Id5SzYZGdTzCApM4Mmkrpt6157ds/NzL97dzfXKrE0HcCHXtoUt7/nqd/7UL73/xteCpmEMUvEJwKnJCMBLmG7gvMDQNv7a2a2XsCTp3Z3xt57xz5x+YGNA2en/9L7X2lZdIqhap9tgAKkJERsCjQe1DPTVX3VbYH3iC8cmy3rtHVctXDoczyCaiBIFNpiIEAU3NmlbiPOxQFdkPK8NDL1pMq36kT+vRgGzW2ZfePZ0jBbrenp2WDWqhEFOfs3MZzJ21tK0Bf6NkoMwqDe9RVOBiIipKImIGjRKFONxFc+urh09sXhmeb2qNFA5CAXU1mM9qceVNFFExCCtD/NGuAT2bCRNbhAZaqskLz7dG894/mDb8Lp8mvOfyZqTC+ZRw6Mz9bnnzl15016eMeEa3rZs5t3LKcHK1ZpcacpnP32T5T7B/P3BO83ymYMZ5wjDrGA2mDS6dX7LS16lm+uVWZsO4EIuBr33G97z/n/3s8888dzem/cMQ2kUPelOJqgXeqf14mz5vNsn2e0X5Q1o0aMNL865woZovx9OZjZOhqGt3YockDtb3dF4kKmmVlGKMI4mYBFtZmanbn77q0saPHnvU/d9dPXq1et3XL632LJWh0YjG0sIgYDIpinm12AEg/bYqS1nPHstF75uDae1T7ExWaitmp4erK6Nz51b2r5zanpGEGujoIqCudXrz2XfhDUl1qebftP89VAjMY1iZpFMo0UX7o6GlfXJqcXlZ0+cXl6vG9HAA4LGuhGL43rSVE3imZq14JwfJXehmneMurOeMf3zTuSfYf37579NhixfMtb9pNRpAlZDtFgiHnvyeKGD3fMHZwcLTVgUjQZzpAeAQ2YEStUKpa5Ona+1Fq/rWrKV0J41EuTCEmXUMMZIoJ07d/JLljc21yuyNh3AhVwMXHfNtdcduvGej3zuwFX7p7baSEUojYRvB1u1so7pbRtv6w0rsSfbJ6hnL/ES6UGn6+KGyPrf04Z41L27jSVzcpJ7Ya01LkoENijnxk9jgENlOpgvb3vHnTOD4f2fefjTf/ipS5+75Oo7r5zfPbOOptJoHA1w1Uw1ZU46CE4JzYfivL3eqCZkOQb1wrJEFMWO3QuyvH70yROHLj28NlqfniJlM7CqM4FSk6tj722e0VL+TaGpIuPwt5lBVX04mxFX0c6srD13+tzJxeX19WgKJg5MqnFNq1qbRs2i9s2cZQWjVoGps/75JTkD7J3MlzrdG9eL2sQyp7Z39h15Yj/TQkJVdfroyRBnV8/FledXw85iJhQa1NiE1BIY5jNyfBNDzrMsdfKZ5nI5EWkfu1TP5dytefdd2lKejCdMdsn+/UXYpIFesLXpAC7kYvCwnP7n//RHv+pbvu6JTz124xuvG07pOirrbIGl6ucGq94LEc+LDan1Aen59jl7Ufif3EprzXvk8fTuPqicuffZZBmMsjxxphhacI2gZKmTHIAyBYUpi5mgsGvedtPufbs/90efe/yBp08eP3ndG27efcVODLnWiZK45J0RCGwgM3YaYtjgg6ivwZD6mFscyzdWlYLVPNmycwagE8+ebapLo46JhqLQIgQ2CuoNwyYKQOFkoFyDN1LkOndCQdTrAKJRoY3R2rg6vTJ65rnFxdW1SRSiIsDYoFabxVqbRiMZBypgguwwkz/r12hSpO82mjacupd09f9fziCdastGuj0o+QyTqjFZwcxSrJxeXz9Xb5/eeezhk4898RTN0fzu6fkt08O5qfmtc9OzU8OZYSgKLpUDUQATG7vGBYiUCGaMYGDX2WYFpEhtBhpqFRNol70RARw0TNbGbLZz5wKdf2FurldubTqAC77sjW+64823vf4jd39w/xUHFg7NMDcC8zJsH6TvDEGHFmw0A6mE2ALAG4JjtLag/eLuuRZkboFna4u9+eNalIiS6+iABkp6NYnzYW3vAYjZYKpQRgBCUYUopjtu3HvXzrse+uiDTzx69FMf/NNDJy658rarp3dMj7kRRLOamKAAM0iVjRRB/bs5fy+nWjlIW4Z7K+pDrgiktdWmgadnzz2/Ml6LM1tCFaVRUSm4CByJieFbyKxI8a63iaXmApAZBKoKdehfoCoT0+X1yfHT554/fmZtbGLMBUtsQmCYVfUEwYyIrWAzNhXLkznbbMYPq5+1FmbLJ7vLCl7iksFGD5CbPM4rGfhLKRczHKUhg4KYjEQVkOLUs2esqn/qp/+1lfbkU0898ewTJ5ZOPn/i2bOPnTlVnY4xej96URIPOJTMAw6DEMpQDsowDMOpQSjDoBxQMDGtVieTURUCLeyZ33rRbLF1Og5YGGoaDFkpD4NQro/GzGFh6za8aKM31yu2Nh3ABV6BSFR+4Vd+4ZJDRz76B59693e+fTgsqiACVUPBLfjTWv/2rX0L38I37Z/IiHjbGGada8ivMTN4yN4LR93ZdFBy+0RuhTWPUr2T1hLB3AgEhauHAZpeZwRidn4rGoswCFPDOrV3+qavu+OiK/bf9+n7nrz3mcXR2lW3XTu/e0c5qAIrIpxJxMSiGpjVKE+VQZoQZgSzkCb4Zul6Qq6sMhE3avPTw4WZLY899KQ0bzFwhJEYm7FE3wkyEIw4GIFcKMLBH3dqxmIa1TSqiohEFRtHOTUeHX3+5NK5NbgTH00AACmwSURBVDWzEMyMtGFTiSRmVEA1sZg0oecuemPUOwddrpQx9Pa0IdckUvXFzrsOetlZ+17rPd5GCY6K5fPXgoFBC7bheBlPP3L8LW+89bKrd6+Oli/Zf8Nd8YZaRaKKNqvLaytLy4tLKyvr68uryyurayujldX18fpktLq2Nj43WRuPztSrdVPXVT2pa2ni1GBw+aEjb3n961ab5Y/+4SfmDu04dPNhmquijYkIEBgZQ0mXRism5baFXbxp/S/c2nQAF26lmI8UfObZ5duvueOzj33qyU9/6crXXyoFS8rflSjAOuNNG6xAa3n9Lu969Lvsv7UMnTxka827GoAlkAPUmpIcOPZRZH+lAkq5vEfGbm1SJi/p492saRL6NCR1NmMQAgI3hLDAB2+77OJ9ex74zAMPPPToZ07efeXNV++7Ync5PyA2ZZHcP2oKzhqfRqYp9mcgbST365JJUUKVjRBixI5dOx6/72GpUTdqGtn7X9k7h4m8BUw9MFYXE1VTVW1EiIOo1o3EKFHEVJsmnl5afeqFM+tVbUIAMxkjmpp3IivUhLIv9ePGKUUxo6yV0J6b7BISjt7L4c6/Xl78YFun6bn/9gJJV4dlvpb7ZvIWQaNyHJ564Fw8x1/1trfFuKbNpGpqiVKEgkyHIQwXtu7Zsd3AoODe3AiqJqYiFiXGGOuqGk/GdVXFqE1TX7zn4h3btgbC3Ja5I7uO/MMP/Muq0StvOzg9N5QQI0AKlw9cH42msO2pR547dGDftm0zGzLRzfVKrU0HcOFWum9psjJ+8oGHfuB7/rcf/8nTX7r73kPXHpy6aFBr5CKQqqgQOCAgSWdmc5/KAN0EX4/1e1gC9WPJ7n3IkHaC88VpL23HK/exY2qdAdhFlNGLw30+lRH5NEpTQBnEHDS/1J0NUQCMGcRsqjCg8EJxcdGhg4cPXHXHa45/6IN/8Pjd9588unDw2st2HdgZpgcoVSVyEaCqwcwJKOwlB5cR8mZWiqo99weFGkMJxKFR3nNg7+Ofe+Cxh5685U1XWlQKpAGqKo2ScTAGALZENzISFwoCi2JSj0RUjdSsAa2N6xMnzpw6fU4lmDKl0gQZwCCfk4PUcpub1NJ57gfu2di1rH/a8PsGEMd6b7SXfAnQ5gDtIUibRT3H75mEghAsFHFQrcjRh5695rLrrrvqusl4wiAT81FFxCSmxFpTVAgAUhCxRQMQQAXTkImGbDNTxDNMgZmYuK4a4Wp9XNdx8sbXvvF9jz3+y3/y65PVxZted+NwIVAZDRLAk3E9GdWHdu2B2Cf/8CP7D+y+8rprZ7bM/y/cPpvry7E2HcAFXAkjefi++7fPzIUtcz/41//6D/zo//Gx3/3km77lzVNzg4k2akIcCFARyqB/ryaYm2LzT+o93OL62RAArtzTPZ70jJXSvFx/JEM+BPQZli0WZT6sJBksa79fCcYcDKZpSpkSOc/SI2MmIigVxhAulAk0qdcm6+Nzx5aOP/JsXK6mVvnc0tLiC/ctXLLzwBUH9l6+Zzg9FJhwBCQpNasRkakahIsgohQISTE77wUxYBojEYxl654tYDx8/9O3vvmGOq6oUjBQAaaCrQiWvKEiRhFVFZWqljrGKJ4JmCqiyOJofOrs4upKFYUKTr2unDF76ersbZGkrbQnC59ncqZHepdCrzDQK+SkU7FhbZDubqG5DS/unGG3HarGIA4MiIGCDI8/8mxzdv1r3vu2mUEpk4qo8JpPJyBhBkEAO4PHgJ52vzNEBQZTEtRRiLyxQVCWBNao9Xd/27cXZfGLv/ufR8srN77++p0H5ymQNWF1VNcr8fLbD+/duy1OyqWl1U9/6lOXHDx45LJLuSw2U4FXbG06gAu4yGBnz5ytJuMCYMW+Xfv/8jd860/8yr999AuPXXvntcZWaVTSkMaVt41XeLEV6QTEWieQOXf55d6an/IGv9FdTIEsEEDGQNJ46cmGZbGQrLPsnD8Gez3RbQXnad/e/+mIkAMeasoloCg5QEIgFBJCxOri6MSJxReeev70seOTlfGWcv7G62541TdcX0/qzz/wxfueeuSLzz107tjqJZcf3Hbx9mI6xHJE1KgZcuBtphoVYI1GUFDSgjMznzlcIBBLxOrM3Mxg1/SjDx8bryhvYyWTOkKYGU6HJOUIrTXWdSOidayjmYrFKKYQU1EdTSYnTy2O1xq1ELgwCHE31TijbhnV70XsyAF7j8ll2HgakT6ipbBiA5//vNe1fi77ij6I1H1tfrt76sBkaqoWmAoajs/R0QeOX7Z/312vvSXBX2hDjJxUUmoAzBeeZb+QKwnoBtSQu3ywmTruJFaVYfi/ve/bt81t+cD/+MAnfv+TN7/2xkOHD09NDUanj2Eihw/unZ7BWGmAKaX47LNHV1eWr7jmmtm5uY1ubnO9XGvTAVzIRaCVpRVWZWOwIurrX3Pnb/7R7z1yzxMHrzowfXE5cYl8H8DtyzIm008F+lw/tyOJgd8nS/qd3GUNPbQgq6C14WWL/HTfkIoETI7mUyJL5i0z1XZDQmAn7xOHgJJBzERCFGVtafz8Ey8sPX9u8fgyVdi+devbbn3L13zVO6+47EoWmAk4fMt7/9JjTx39z7/+a5/7wmePP/zcjn07dx/etfvwji27ZrlktSimjQqYVY0Dw4zYtW1y8mNw/U41KJo4qPcc2n3soWOnTq5unx5oOSk4qIpprZoOqf8RozRe6hUxZQduqiaePrW4NhqpsFkRKLgWktvFvs1Ox67LsqiLznM9pTv5L7oYuj/y+aQNL2jPMVH7fNYptSTxY+lre9vgp9gpVQgC4qEOjj50eny6/pp3v3371rlqMgrEsOTFqd0C69UkEgZo+UG/iHpEtPZaIDaAAaOgGoPhfe997y133vxTv/BTn/vDz586cOaWW284++hiWZdXXXZEmgpmUCO2qXK4srp6/333Xn3VtQs7trsCHzbXy7n6V8rmekWXW4YHvvjwmaeeLSkoq0Irot/5yO9/4Jfff/mbLrv+LZeuYz1qAWW2JJUM61llAO2DvT97xqgFhfP8qw2lws4bJEw/Iwob7Q55KTGDCcikwlSYJUOqapKRIXARlAK4NLZIQXl9HE8+d+r00eNnnz3TjKrZ6flDey+57ZY7rrj08isvvWJ+yyyiwbRp6iKEGKUoC0KopX7+5ImPfeKjv/8Hf3h25Uy5dWrbvh0HL9u/46Idw63TUsSaG4EoKWBRhZ0mlCRpxJSE1cjUbBB4+Zlzn/mte979Pe9607teXYe1kkUV0mgTY2xUVEUtKkRNIEZg5iZqI7K8Mj57bkka05gVcojYzCy2EqHJ8G88cP3YvH8su5PzEgT41pnYRoO/4ZOoM8td417u6d7gRjxiJwQAokKlBKJgQ17ij/7HBw/M7/+XP/pD27dNx6byyV/UFfzbC2WDfaDz/u4uofOXIVNbjWHBmNdp/Mef/vC/+3fvHw5nltZ0x8K2//wzH5gfzDQxcgQKNhMlYuamaa685uqL9u2HKnhTsPJlXJsZwAVbKdRmCLSgwgA1IuD2G2/77Q/vf+aRk1fdfvFgq0U0qiVblvLthXbAhmpAztKzbUD7VMrW3ZRYCpE7Q26UxFo2eIMNm2pIAjlG6E1gcftiSmAiBnNQIwEaqarqxIlz5547d/rYmdVzo1LDZXsPvfn133DNZdccOnTZ/JZ5ChgOB7FppKmlakIITAS1sggqItYMirB/z8Xf8a3f/k1f/01f/MIXP/bJT3zqi5/99CN3F/ODXft27Tm8Z/v+bdNbpmkYRCUEJW8ISNNduEEkEGlRGJFg2/wOmho89aWjb/ra28ViRAOlGMn1oMUQASEgBEgQadYmk8XllXOLSxKJqQxUUjAz6Y42Uc6bekjNRjR+g32nFz1sOM+un39yk6XvI3sbVnbuKTujNmBPjxgbt317VMBIQDzE8KmHj1Yrq1/1nrfu3LlQT1Zf7Kn6H3/eQxvaT9Lf/SiSuh8+RNQM0Ng001ODd731a++48faf/MAHPvapu6+75s752fk4HpuCQvAGDGIzEWJ+5EsPF0W5Y/eul3Cjm+vLtzYzgAu5DHjmyacfvfehqcG0kSqkMRXwf/ngr/7yH/6XG77qyFW3Xb6mVaNEESxIcXgCdvqpQG6xz+yTXmgIuGFps3VrjX7eiJ6NIcrGPYVwyRWkCBFERj5pPRAXxgCzsRhP1nVtPS6fWj7z7Mnlk2dHSyssdtH2nTddd8Orrrz22suv2bl9x3Q5VQQy02iRSopR3KwCxCARCUxGMDUuYLBoRqFgQcEDET5z9uQXHnzo/23v3GIsu47zXFVr77PPpe/dMyI5F3J4E6WhJFK0SVOyLSdwEAQJEARKgCAwAgMBkofAeTD8kBe/JYjzmrznLUCCGAiQy0NgA4kB2nJIiheJoShRQ4rk3DnT09OXc9l7rao81Fpr79PdQw4hUjPWqW8G3X1u++zLOVWr/lVV689ffun1H751MB1jRdXK6MTpzdWT6+sn14seUd9BwUIgGIhAMEBwxM5BUdXlS//zL0N/8gf/9nd9NQvk2QMIMYMIBQ6NBM8yq8P+eLp9e/fmrW0OUpV9IqenhiEwBydY6grASJyXZYb2ZLe9HTryfVLKjzNkmEWVQ9/EVFXXlfpU4cm+QgBi9VuSmKJGo87cEVPs/EYYikAQ+lSFj/v/+z/92RPLZ//oD/9oebkMwWPq2vapEcAdHVF3ZJKfKLE7KBIhSB08OUeubIL/2cXLG5ubD64t+aZBIt150vbeIlKQZnw9+9xzw9EyWLvoLwyLAO4lCLC+uuo5BGSVYJ0IIb3wtWf+65/+8fUPtp97fnnUW556ZueFmT1zENY1rdJ6kVG61zZmWbqNNim3kxEt3Up/JdGIIWd/aohBUVFuBQ1M+j8BAWLwgQM3dTOehYPdyf7tya0bO3u3Dia3JmHKVVmc3jr9zFd+4/yTX336qa9sbm5UDsvSIWs5VAjAGFe+lAJIc+d15xxpylA0tyjoSOetMUjjA5/c2vqbv/Xbv/2dv35wMH777R+/+uZrr7z5+qU3PrrAFwC4NyoHa4Ph2rA/qkYrw+XlAQMzQwjMDZa1C5Mw3h9vX91feahqnKQKNmLhAK6u693x/rXrt/Z292dNIOoNqwELIgeWAACE5FSNYdGM1Gjk8uxoMvm50Ta0A/P2iif5J90TXbEc9wTuxHLpvbpD7ORvUFJbOcitK0B7MAF6RAQMKOyoGPjqnTeuTq/P/v7v/O3N9Wo684SEyMLQUbOiqARJd/yEYbjkTxu2d6WfuhQ1BmZAKVwh6Jq6HlTVlx9+BJwE74lIQOsnUCBWfUBgRJzV/t133/36176B6Gwy4AvCHMA9Zri80qsqz1yQoKBDApRHTz3yzCPn/+K1l//4yp9srK8Phivl0JX9qlcUrixcSa4kVzhySM4hiCsIkQpXCDMSAgIhMosDUl08BNbMjhAYEJgZECUwoSN0iChemEE8M0vdNL72fubrWWjqenIwbWazycFkOp5NDyazWe2bpmk8sLiiGA0GJ0+eePyBs+d//cnHHjl37uwjq2urVTHAOPgLCMDcAKVuRIIimm/UKiFxBiEtLEtIap5dQGFBQhYpeijsRRpCWl3qfevFZ7794rNNaG7evPnu+z996513Ll659OHFD2+8d3M31ADQNI0P2tknAEi/V/HU1TT58RsfPbv5FFdhrz6Y1mFWcz2rJ+PpdH86mU28FxDsl5UjB6FxTjyAc47jai3qYbXdXdBQSdp2eno4OhubXHA02Wons1/oWNM2cSgSHUs+QUnSix3VYv+16BWw9esEBALsAAMACCITEolDDrUDKqFHTX9vmy+8/rPnv/orv/7Ci+yDIxEGSNM5qYn/oV08fKPdd4E81khHE482dTFBQUCHKjAi+6pwHAJzQ+wEczu+NF+us8jAEGDQK29v3/z4+scnH3rgrr5LxmfHHMA9puwV65sbN27eiGnygCJQlYN/+g//2fnXf+WDDy+ND/bHO7W/We81ByE0gYMIBxFB34QGCJi1pQEjADM756ggCSkjU0BEgg/A4IroNjSOEC+iK2NpnwPPUSli/c66wpW9XlW6st/vr1Xrq1sro1PD9Y31jdX19dXVk1tbG5vrq6vL1aBXlk5z8VEkgJdmAk7LxXSWGGNr/dw5utNTLhuMpEvlRyXWJbMAAAa1niIQGBik4cDOFZtbaye/9MK3fu3XmGUyme4fjMfjg8lksr+3T+R8CFVVjZaXitJVxfK//w//7k//x5+cefY0r9Xb451b++N6FsK05gacdyKOSDTZPc+5YFzlRJctjAP6TvFdK9fP5eHm6tuuNJcmiyF16cjb7LiB1Fg1zs/Ez0mq5NAgI5YToLarQ46d1pAhrtOl54rV2xMhAKNw6Yc/fvMCTfG7f/e7g/4gzGYMpPV9eX8Td6UMd+oNoD0Lca6pfQJm0w7IWrlNRSe6SJ1GEVKhBAIyBC6wuHL50ubJLVdax9AvBHMA957Hn3zsyvcuF9gnNTkEEOTx0489ee5pdEIYmLUXmQQOQYRD8IEb3/jQzJp6PJ1N6/rmre3d/f29/f0b2zf29vZ2bt3aPziYzSYhcAieRQaDwcrKyubG5mhpVPX6hOjQVWWv6pVE1CtKVxRLw+GgPxgNRv3BYDQYDgb9sihc4QpXiDASEhEiEmGsEgIRYUARDogYOJC2RaBYW9yxcZ3GFHl6Id2RtYeczSQdwxmH1OnPqBYEICTNPWUfdHvDQTkarCFuCIBD7TtGmFabFIB//A+++/K/+t7PfnzxzDcf8DOZhhBIHLqyJAQn7EFYSyNYxBGhrnQIIOBjf7nYUC3tXBqlS/sTIOVmtlMo0OkKka1sekaeuoUjT83zMtk6prfWNwxp0kcAG/WgqNmcCIIBJQB4QGGBqlftfjT96Rsf/o0Xnn/x+Webep8AHVKnHZ109uoz0hG5MMuMem1j1kEa4Mv8AaabKoVpWgIBIjlmKJAO9vbG4/Hy6sqdohDj58EcwL1n7eTG2vLq9GAqojExEGJgbqYHQJ4cYxo+g6DKuoVzFRFAH0ckgkJE54gBiUgAONV7URqRIunssQTPAFi4In//MI5zRQSEAwBA6oasBpmFEYOmgIN4iKXAufGntgNWTd8xpz6geV4asrmK4/rU+nJeV77Tt7ttLZpt4ZxSrW9DMeNE3ywAQACvXeGi5ILgqHzqyfNnNs/+v7/40aPPnKlchcVE2EMBUgsUIoE4MGEa+kuaI0ndqeeMbwpvWht/aMfbXZX2riMNXDsH1H2o83dMtRUmbaos2U/Gwg4hlqBXHkRn67XJnwACkiAEhKII5dvf+9FGb/Wf/KPfKcR7IL3SWcPvBivS3a/DHK8NtSIQdu9s56ZjpJfL1dsuRSLdpyNx8I6cc8QsLDKdzpZXj9sR4+fGHMC9BgEAv/bsM//3pZcCkgoNDgi4QadfUWTpmLEoJ0dtRU0UIaKgQxEfPYQ+ORtazdlkkco5FkZpQNq6Kd0N6UwdIwkCqbhNaezepiTGSVBMM8txhKd2KRUJZAmg1TOSqpGHvodnRDM5n71Thzp/2lqz0Wlol6xWTICXqNGolsTCRdV/4bnn/+P/+S+zndlgMBzK/tjXcZcFIMpwmuIJLIJO2ye3dRZZ5IlqXdaykobRivzxJGQ3J9Kx/njkj7nPRNeKSjSRyCSdGCm32RMhh8gcAFK9tl5MBCR0UpAUJa/+7AdXr793+fd/95+fPf3QeLznXJq9zu+Szqm0l+awm74LWs/RLSLTh1K8J9ANmjAtjBMn1INzJCAcYhxlSUBfHFZkcV+wtrp25uGHa54GQIA0SGIBQQlIQsCE4pARmUiQmFCQGJEJGSGAprKTjsxBEASBEUX/qyVyCCweQYSDMAMLas9jEQkCrP8BGTAgsKCuzMKIgsAQ/8fAIBWOQuzDE32I5KQXTEYwOa22FcJhs3/MOBMhjcMP3Y3RIwjmvwkJYiKTHrruU7S+DAIocQY38F/79ndoCt//szddXa6Xy6PeUoE9JIcIFA8NJa4EjNr/EiieUEEAEqE86xs9Jkoa8IueIuzI95ia68w5jxREQIq10ok9dC5aYUiS8h+3JRq0qUAFuoCjXikkcA7IMRUN4QxhQjuXDt586e3vfPP5v/d3/tZ4Osa2oUMnKJFWmusGIMd9YI8Bj7nVkf3i5TkUPsTTJCiMEPPaKC2Jh8IgVLrllZVuqaPxOWIRwP0BwpfPf3Vvd/fW9rYrq6CNhQkBGRg4RvqQbIwGAvmrqyVJnThaf0pKCtfbInNj9jTozzsQXxnl65S00w7l4rNE8tPTYD7fkzqywdwIPb/DoTvmbkraeM5El5TW+qlf/Vw7276Vxj3SmuV4cEEefuTcN89/4+X/9ertunnhN58bOedgOuUJQ3CIAUC0EyhzbG6DIFrwjCmcAgEUSlV0ksOZrk9ozxEnMQagNbA5csrTqIdVGIkCltrIuD3qvKX6Rw1CGFDAaU0FOgcNOsQKkIKrD8K1y9vvvPLRkyee+Je//3vQjBGZyHFgihd67vLi3Jm8O+bjhXxvzHySuVhgzgvGcUPME9APNrGOfAAKmjSzx89+uSzLzxaEGHeNFYLdJ4gwhFnz/e+/fHtnp8A+AQsRgmgbuCjUSmoX1prFfE+ru2LbB7q11rEmM04Qtq/OT2j/TqYp2x2Zfwocte93YTY6m5/Tf7rbyM/sqhNdif2Y7gnzL+reSIZfRXwSAATHlfvgo/f/8N/86/cvvAtnhmfPnzp97szqyeVq2OPAgYUlAIoAM4pIiMuN5dyfpM64OGmZ05fSeY5PglxMJ0AorS9rfXLribtHnw9VAOLsR6pZyEmfAoAoiEAoyMiCzkGcjQcPTmi8N9u+eGv74s0bH93GuvzVb/zGH/zev9gcAE8mjC5WFAilLNZWxbpDZ4dPIA1EUmbSoVr0bqCn5ybXt0XRKZ03AQ3liAoKLFNozpw9d+7co0TYNiU0PlfMAdw3iIBIPavfeO21ne2dqqiYQ0zqT98cgNaap29dO7w6bEK6t+PT0jC08/jxNh2TZzm6m3nDGPe684rjD+zYQXybCdqx2u3hSNql/MDhjjjZbsr8ZufuwlY6UAvjAJHATf3s9be//5//+3/74U/fmtzeB5DiS8snzpw4c/bU+tbKyuZIoA4le/aBhIl1ARzWTWtTNV3apD2KNPndPebUTRM6rivVYAsm24nxWOPIX++JgZC6fJ0GihIJIwICETgXtB7bcQMFhKn3H1/dvnTh8s7FW2HCy8O1sydPf+vF33r6qa89duqR0jXIHoRFiw41JjxSwHvkFHbOd5y07dS5HXNxZf5XfIHkqfX8UtQwh4EIASjNIDFIE5q1jbVT5x7b2NpwYNb/C8QcwP2EiAiw5w/evfD+e+9R4QBYO0FDHD1GH5D+tTO3mTa3QjepktGRp6pNOmS+5x++824ebyHu9EE6uq25t4o6VatBASRvdEiS6O5+K+0ct932gVYDQgQRJNE0VVcUjmrEg2nzwQeXX3ntldd++MpbF96ajfcBAgxp66GNM19+8OwTp6rVfgMcwDMKo6ZHiUoWzHOztYLRScWDPqxfRdsnKb0KkiCU11HWlyFA7DGBcTyNGAQRWVjAiRCRA+RADoQbt/vx/s6l29feuzQ+mPRc/yvnnnrmqeeffurrD516aGk4AuDCAQYfpNbJDQAU1oZ2MBflfcrYHztX/lDMOPda1BkUxDYywhT6iSBSOgmAREB6WAiBEbCqyo2NrQfPPNQfjKAoyOGnfBaNnw9zAPcVcVDvG75x9dq7P3m7brxDB7HzcC72P6SKdH/dIRrQR4686lgRZi5y/zRPcMyfdySO7I8ZtB99o6MRAECuFYbuweTfnSwgSO+i+gLl2WcUIIoqjecA4ErqQU9E2Pu96e233/3Jn7/6l6/+4IcXL13z3Ljl/nCjGq0tn33iwfUvrY1W+oJCVdlgCDBm1sb3qd4L40BXAKK0ohGA6uCaiZtb90SFHEVAXTwixXRTRgQQZiJEYQdIIATaJqQMNc2mcuv67SsfXb119ePpzngog7NfOvvtX/3Nx889ee6RR1eXByWVBIzCQYIjaLx3+cx1ZSc9syKH/WmaIG4jlPn5p2MuYreHEAKAtsoQ5DRMofTqeLmRUZhZAMBRf9DfWN/cOnFiaXnFlVpngpai8gvAHMD9h14Rxp2bNy9c+Mn+7l6eZc3fsnQzTU4eNd0dfST9PBKzt1/iuTvvENsfG+xnteauD+6INHWHcSRAZ8gfTWd7+84KRBSyNYkVGUHncHVzuZQsWkICAKYAhMJBil4h6IAKoWJy0Nze3b967do77//k/Ysf3di5fvn6h9PZuOyXqxuro63h5qmlla3lquoBOQYQgiAsIl5YYsansMRkIgZmAgZOddDq6QWjlA8gKKJpSCIsgALMgMLCMhM/DePdyfbVm9tXb+9cPyi4tzZYefTUo8+c/+b5p77+8INnlkd9CVyWBYGwNAhBKKRGTzrpgKwpXJJ7+rcnsGvQj5F+JGlVrWvtBD0AsZVctPS6EckeGfUoU8MMzXoikqrqr66tLa+srp/YqvoVgBbtxSU/sXv5jS8McwD3KSIMARjClYtXLn304f7+vogU5FKxJKQ+vxj18fTCjqw8v8Gjd3am4NJrj3DngOLI5ju/2qM4ZMQhK1fHqkiftv27NwlRfdeXkWQX0E55ShKGABAFmFgbF4s4CEKuLKkERGFsGt80vD85eP0Hb7311o9efvW1i1cuM+wCMBZUVsXSynA4GgyXh4Ol/mBp2Ov3ykFVlEVvWBWlc4Oi6AFXjKUE5BBLqJk9S2Bi5Ia59rs7e9PbB/WsmU1m3nMzberaz2ZBamjGstxf2to6cfrEg88+/ezj55549JHHql6BUKAEEg6hAcchBEJSTS1eN0QWjiOEnAAmHUsOrR/X54tkmR/b65WuQVRk2gSt+auYihAENHFBWIRFAjM5Knu90XC0vLyysrq6tLTUH1RIDgkA43KTd/9pMD4vzAHcr+jQTIIE4BB2trevX7t269a2Lt4NiBStPuIRu9gabZzfHnYMabLBaVjcNfVHRJlj7O4hE55CAWkfTgPBz3TYhyTmY7zE8YlAh4WJfFQSiwNAYyVGiKmayehJalCURuvaSg8kDnAZiZwAOSx9gyiunobLlz6+evXSB5c/uHTl/QvvX7h56/rOzk7dTBtpG0SjQyBdm9KxC1jxysnh1qmN5ZXlACIhOoAwDc3ESxPqST3sDZZHS2vLGw89cOrE5oOrK+trS5sFVGfPnO33q9HSsHCIIITIHLzUzB4IEDXXVKeHgYBiG4446xFLBmMs0I35uhesvVzYeg9oLwVqswYd8AtmLyHpX7z8cWFIJOf6VX84Gg1HSytrq/3BsNfvaaU6EoHECsO2MPizflKMzwNzAPc/cSjFEnwTxvt7N29u397ZmU7GwQfhoE3fEJKUrN/c45VaVaoT87pRRynKqdvzrzy6Y8f/hjSevIOxPnxs2Qp03QmkjBM47EikNeB3MhoSVQokljh7oqp0nvSMU65q83WOIAjqtCi0wRUgMManAAgKgSNEcmVVViICLA03IuibZu/g4GA83t3d3d/f29vfn81mO7s79ay+fuPm1WvXuJQHTp984IGTD558sF9VCNgv+4PBYNgfINLK8opzRb8aFK5w6ApXFOQIyaELjRcJzEEgsHgR8eKJSMfYROh945zjwLo0s3YLTR5ej5nzwB2S0Z7/BKjfT5+YpNR3PACmyxAX3MmtQgSRiFzhyl41GAyGw+FotFQNBqPRgIi0edTcp6k7WjGrf68xB/BXgfwNZRWJAVG48bPpdDqZTCbjejZr6qauZ7Pp1DcNi3AIzLnyC1LsLt15vU5e5d0pMofG5UcVhKyrZPFZZE7zObI9abcAkCSI9F7JKmGnT0Xe/ieeKgEQRBFwKmgIAkogEIgSSRK1AQGACVC03Wh8ROvlUlFx3AUEirI9MbCOgRGICIWB0HHspaxN8xwiEpFmuJRlKYIcL5/EWu3AEiV/SNmisYm2FqCx1i+7JOkwo2gDUBDRhlECAuRQV5PLVWXUGeFjbA+RPHIU47sNO9I1yDO06fSmdYb1HQWJiLAoiqrfr6p+fzDoVVV/NCx7VdWvXFEgISJhahF1t4V8xr3DHMBfZQQgfaFBIiGEEEJTN977yWQSQpjNZl5b+DeN9z54HzgEHwCEWcWR9nufE0yTwKSWOA8Msxrclf3nnUfH1LRikhz71EjH7mPrTOZcR1dwPqpvtVtqz0us/koKD+RCpe7mAHJuFTK0UyIcW1oIEAtg+zVRJQQBUyv7zpnQbBddK1I6Xyxd9IYJBMnlI1Xbmqr2NOLBVOellV/aYFWzS1HXoYmlfMIxdMFcuU0IIpQWds9BlD4DhKF78tugTQAgOh3tBygSJ2MdOVf0emVR9gaDQb8/GA2HVVX1B/2iLImIHAEAIqn3OCYl2bjvMQfwy0hq9Ja+jgiYogdWLcl77wOHpm6auvbeN3Xtg/d14733vvHecwghBBFm5iTxYjaX+jbZELd9JjDd3xn6YSviJPtz6EMnc3apuybJJ1mUzvu375gqoxlzANHZNLR+JG05+wVJ3T0BUhYnaHZOfELa1UM7n3xkKrnAztlAhtxzT4fo+frkDJ18GDlEU+udFmpROx7tcj5uYuaYg6qZNVrj3YnrYsUCahJOUm4AmLl7ZhDJOdfrlWXZK3u94WhU9nrDpaWyLHtVz7nSFS5WKjjq9HE1W//LgDmAxUZkzixrSMEAICEEEfGNZ+a6qTWqYA45kshOIoTAProKEW29LzkiySJDsjidcAAg6VMx2sizgsm8dFIUsWuy52pSOyQDq5YQonyj/mG+E3NnZ9JsqCS9vC0/a3ceclub+HpO76cOJ07EQB6mR7lFIGW1AyBwbDGk8vy80+p4SFKVT9uDC0BHx8e0dzFsSkk3KZ+psxu6m3rwIgLiiqJXlmWvqqpeFHCqftWvemVJzvV6PdDQJu/w3Fk1fgkxB2B8Iockg/kHNOWdmYWZQ2Bm770w++A5BN/4EHzwPoSg3oI5sBLSH9rxvUNOJ0nv0xm6A6Rx+dwO5VKm+QkF0qbZWafJpjvbbTWtulqmE21CmfySxGR5yVn7SefQG535lUMnJdUDJyuv0wBy5JsmebIkHx6kPRUA5E4gFScPIB+lSkEoRIgEhFT2ekRUFGWhU8hErijKsuz1+4RUDfq9qlf2SgAsyiKa+Kzb2Lh+UTEHYHyRSGeqt52rSL9UctaQgTnNXbMAZD8R4g/OAQe3TxdmlsBBWMMPTpqVlmBpPALQFl5JW6qbhtII2uenAJVutLM0xb1vYxiW7hwEHIk94iahDVg6JyFuCDsPpRAnTrcgAAAR6QQyAQCBLsBGziEhEhZFWZRFURbOubJXOVf0+72yLHu9HiJSURCpKN9J3lFovuXo0T00FhVzAMYvmkO2p3sz/921V8mJtBOX8VfqkZeUq6Q/cQAADswizAwgHFgjFf3JIeh8ZxNCHbyXAMzNeAIAwTORAyRyRAjBewBiEV14J+6RJLmncxxd2USVdyLSEIBIc4N0ehjJOeecI+cKl9EBOyIQuTiPrJ4hLuqmG8/pqblQK1b55o5PrczWnjgz9cYdMQdg/FLSmXXuzBQIZLPZijSg43tdAA1RhLMw3yYn6c92emJOuml1svYGHHn4OA65vjm/F+dD0k6aITc+f8wBGAvHUWMb/cThWYQYXBzuQfRJpviTHr4b+92NgczeG1805gAMwzAWFGvAZBiGsaCYAzAMw1hQzAEYhmEsKOYADMMwFhRzAIZhGAuKOQDDMIwFxRyAYRjGgmIOwDAMY0ExB2AYhrGgmAMwDMNYUMwBGIZhLCjmAAzDMBYUcwCGYRgLijkAwzCMBcUcgGEYxoJiDsAwDGNBMQdgGIaxoJgDMAzDWFDMARiGYSwo5gAMwzAWFHMAhmEYC4o5AMMwjAXFHIBhGMaCYg7AMAxjQTEHYBiGsaCYAzAMw1hQzAEYhmEsKOYADMMwFhRzAIZhGAuKOQDDMIwFxRyAYRjGgmIOwDAMY0ExB2AYhrGgmAMwDMNYUMwBGIZhLCjmAAzDMBYUcwCGYRgLijkAwzCMBcUcgGEYxoJiDsAwDGNBMQdgGIaxoJgDMAzDWFDMARiGYSwo5gAMwzAWFGKQe70PhmEYxj2A8F7vgWEYhvGLRwD+P9+1NmicLKQOAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prompt = \"A green pokemon on white background\"\n",
"image = pipe(prompt=prompt).images[0]\n",
"image"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: PixArt-alpha-ToCa/notebooks/train.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"id": "c423d2a1-475e-482e-b759-f16456fd6707",
"metadata": {},
"source": [
"# Install"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0440d6a7-78b9-49e9-98a2-9a5ed75e1a2f",
"metadata": {},
"outputs": [],
"source": [
"!git clone https://github.com/kopyl/PixArt-alpha.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0abadf51-a7e3-4091-bb02-0bdd8d28fb73",
"metadata": {},
"outputs": [],
"source": [
"%cd PixArt-alpha"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4df1af24-f439-485d-a946-966dbf16c49b",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117\n",
"!pip install -r requirements.txt\n",
"!pip install wandb"
]
},
{
"cell_type": "markdown",
"id": "d44474fd-0b92-48fc-b4cf-142b59d3917c",
"metadata": {},
"source": [
"## Download model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06b1c1c9-f8b1-4719-8564-2383eac9ff28",
"metadata": {},
"outputs": [],
"source": [
"!python tools/download.py --model_names \"PixArt-XL-2-512x512.pth\""
]
},
{
"cell_type": "markdown",
"id": "f298a89c-d2a5-4da7-8304-c1390da0ba58",
"metadata": {},
"source": [
"## Make dataset out of Hugginggface dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e17b8883-0a5c-4fa3-a7d0-e8ee95e42027",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from tqdm.notebook import tqdm\n",
"from datasets import load_dataset\n",
"import json"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92957b2c-6765-48ee-9296-d6739066d74d",
"metadata": {},
"outputs": [],
"source": [
"dataset = load_dataset(\"lambdalabs/pokemon-blip-captions\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0095cdda-c31a-48ee-a115-076a5fc393c3",
"metadata": {},
"outputs": [],
"source": [
"root_dir = \"/workspace/pixart-pokemon\"\n",
"images_dir = \"images\"\n",
"captions_dir = \"captions\"\n",
"\n",
"images_dir_absolute = os.path.join(root_dir, images_dir)\n",
"captions_dir_absolute = os.path.join(root_dir, captions_dir)\n",
"\n",
"if not os.path.exists(root_dir):\n",
" os.makedirs(os.path.join(root_dir, images_dir))\n",
"\n",
"if not os.path.exists(os.path.join(root_dir, images_dir)):\n",
" os.makedirs(os.path.join(root_dir, images_dir))\n",
"if not os.path.exists(os.path.join(root_dir, captions_dir)):\n",
" os.makedirs(os.path.join(root_dir, captions_dir))\n",
"\n",
"image_format = \"png\"\n",
"json_name = \"partition/data_info.json\"\n",
"if not os.path.exists(os.path.join(root_dir, \"partition\")):\n",
" os.makedirs(os.path.join(root_dir, \"partition\"))\n",
"\n",
"absolute_json_name = os.path.join(root_dir, json_name)\n",
"data_info = []\n",
"\n",
"order = 0\n",
"for item in tqdm(dataset[\"train\"]): \n",
" image = item[\"image\"]\n",
" image.save(f\"{images_dir_absolute}/{order}.{image_format}\")\n",
" with open(f\"{captions_dir_absolute}/{order}.txt\", \"w\") as text_file:\n",
" text_file.write(item[\"text\"])\n",
" \n",
" width, height = 512, 512\n",
" ratio = 1\n",
" data_info.append({\n",
" \"height\": height,\n",
" \"width\": width,\n",
" \"ratio\": ratio,\n",
" \"path\": f\"images/{order}.{image_format}\",\n",
" \"prompt\": item[\"text\"],\n",
" })\n",
" \n",
" order += 1\n",
"\n",
"with open(absolute_json_name, \"w\") as json_file:\n",
" json.dump(data_info, json_file)"
]
},
{
"cell_type": "markdown",
"id": "25be1c03",
"metadata": {},
"source": [
"## Extract features"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f07a4f5-1873-48bf-86d0-9304942de5d3",
"metadata": {},
"outputs": [],
"source": [
"!python /workspace/PixArt-alpha/tools/extract_features.py \\\n",
" --img_size 512 \\\n",
" --json_path \"/workspace/pixart-pokemon/partition/data_info.json\" \\\n",
" --t5_save_root \"/workspace/pixart-pokemon/caption_feature_wmask\" \\\n",
" --vae_save_root \"/workspace/pixart-pokemon/img_vae_features\" \\\n",
" --pretrained_models_dir \"/workspace/PixArt-alpha/output/pretrained_models\" \\\n",
" --dataset_root \"/workspace/pixart-pokemon\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fc653d0",
"metadata": {},
"outputs": [],
"source": [
"!wandb login REPLACE_THIS_WITH_YOUR_AUTH_TOKEN_OF_WANDB"
]
},
{
"cell_type": "markdown",
"id": "2cf1fd1a",
"metadata": {},
"source": [
"## Train model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea0e9dab-17bc-45ed-9c81-b670bbb8de47",
"metadata": {},
"outputs": [],
"source": [
"!python -m torch.distributed.launch \\\n",
" train_scripts/train.py \\\n",
" /workspace/PixArt-alpha/notebooks/PixArt_xl2_img512_internal_for_pokemon_sample_training.py \\\n",
" --work-dir output/trained_model \\\n",
" --report_to=\"wandb\" \\\n",
" --loss_report_name=\"train_loss\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: PixArt-alpha-ToCa/requirements.txt
================================================
torch==2.1.1
torchaudio==2.1.1
torchvision==0.16.1
mmcv==1.7.0
git+https://github.com/huggingface/diffusers
timm==0.6.12
accelerate
tensorboard
tensorboardX
transformers
sentencepiece~=0.1.99
ftfy
beautifulsoup4
protobuf==3.20.2
gradio==4.1.1
yapf==0.40.1
opencv-python
bs4
einops
xformers
optimum
peft==0.6.2
================================================
FILE: PixArt-alpha-ToCa/scripts/infer_pixart_8_bits.py
================================================
# pip install -U accelerate transformers bitsandbytes
# pip install -U git+https://github.com/huggingface/diffusers
from transformers import T5EncoderModel
from diffusers import PixArtAlphaPipeline
import torch
import gc
def flush():
gc.collect()
torch.cuda.empty_cache()
def bytes_to_giga_bytes(bytes):
return bytes / 1024 / 1024 / 1024
# Loading in 8 bits needs `bitsandbytes`.
text_encoder = T5EncoderModel.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
subfolder="text_encoder",
load_in_8bit=True,
device_map="auto",
)
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=text_encoder,
transformer=None,
device_map="auto"
)
with torch.no_grad():
prompt = "cute cat"
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
del text_encoder
del pipe
flush()
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=None,
torch_dtype=torch.float16,
).to("cuda")
latents = pipe(
negative_prompt=None,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
num_images_per_prompt=1,
output_type="latent",
).images
del pipe.transformer
flush()
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")
image[0].save("cat.png")
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
================================================
FILE: PixArt-alpha-ToCa/scripts/inference.py
================================================
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import warnings
warnings.filterwarnings("ignore") # ignore warning
import re
import argparse
from datetime import datetime
from tqdm import tqdm
import torch
from torchvision.utils import save_image
from diffusers.models import AutoencoderKL
from diffusion.model.utils import prepare_prompt_ar
from diffusion import IDDPM, DPMS, SASolverSampler
from tools.download import find_model
from diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2
from diffusion.model.t5 import T5Embedder
#from diffusion.data.datasets import get_chunks, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
from diffusion.data.datasets import get_chunks, ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--image_size', default=256, type=int)
parser.add_argument('--t5_path', default='../autodl-tmp/pretrained_models/t5_ckpts', type=str) # change to your own path
parser.add_argument('--tokenizer_path', default='../autodl-tmp/pretrained_models/sd-vae-ft-ema', type=str) # change to your own path
parser.add_argument('--txt_file', default='asset/samples.txt', type=str) # change to your own path
parser.add_argument('--model_path', default='../autodl-tmp/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str) # change to your own path
parser.add_argument('--bs', default=1, type=int)
parser.add_argument('--cfg_scale', default=4.5, type=float)
parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])
parser.add_argument('--seed', default=0, type=int)
parser.add_argument('--dataset', default='custom', type=str)
parser.add_argument('--step', default=-1, type=int)
parser.add_argument('--save_name', default='test_sample', type=str)
parser.add_argument("--fresh_ratio", type=float, default=0.30)
parser.add_argument("--cache_type", type=str, choices=['random', 'attention','similarity','norm', 'compress'], default='attention')
parser.add_argument("--ratio_scheduler", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant','linear-mode','layerwise','ToCa'])
parser.add_argument("--force_fresh", type=str, choices=['global', 'local'], default='global',
help="Force fresh strategy. global: fresh all tokens. local: fresh tokens acheiving fresh step threshold.")
parser.add_argument("--fresh_threshold", type=int, default=3)
parser.add_argument("--soft_fresh_weight", type=float, default=0.25,
help="soft weight for updating the stale tokens by adding extra scores.")
return parser.parse_args()
def set_env(seed=0):
torch.manual_seed(seed)
torch.set_grad_enabled(False)
for _ in range(30):
torch.randn(1, 4, args.image_size, args.image_size)
@torch.inference_mode()
def visualize(items, bs, sample_steps, cfg_scale):
for chunk in tqdm(list(get_chunks(items, bs)), unit='batch'):
prompts = []
if bs == 1:
prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False) # ar for aspect ratio
if args.image_size == 1024:
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
latent_size_h, latent_size_w = latent_size, latent_size
prompts.append(prompt_clean.strip())
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
for prompt in chunk:
prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())
latent_size_h, latent_size_w = latent_size, latent_size
null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]
with torch.no_grad():
caption_embs, emb_masks = t5.get_text_embeddings(prompts)
caption_embs = caption_embs.float()[:, None]
print('finish embedding')
if args.sampling_algo == 'iddpm':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)
model_kwargs = dict(y=torch.cat([caption_embs, null_y]),
cfg_scale=cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type = args.cache_type,
fresh_ratio = args.fresh_ratio,
fresh_threshold = args.fresh_threshold,
force_fresh = args.force_fresh,
ratio_scheduler = args.ratio_scheduler,
soft_fresh_weight = args.soft_fresh_weight)
diffusion = IDDPM(str(sample_steps))
# Sample images:
samples = diffusion.p_sample_loop(
model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,
device=device
)
samples, _ = samples.chunk(2, dim=0) # Remove null class samples
elif args.sampling_algo == 'dpm-solver':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type = args.cache_type,
fresh_ratio = args.fresh_ratio,
fresh_threshold = args.fresh_threshold,
force_fresh = args.force_fresh,
ratio_scheduler = args.ratio_scheduler,
soft_fresh_weight = args.soft_fresh_weight)
dpm_solver = DPMS(model.forward_with_dpmsolver,
condition=caption_embs,
uncondition=null_y,
cfg_scale=cfg_scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=sample_steps,
order=2,
skip_type="time_uniform",
method="multistep",
model_kwargs = model_kwargs,
)
elif args.sampling_algo == 'sa-solver':
# Create sampling noise:
n = len(prompts)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type = args.cache_type,
fresh_ratio = args.fresh_ratio,
fresh_threshold = args.fresh_threshold,
force_fresh = args.force_fresh,
ratio_scheduler = args.ratio_scheduler,
soft_fresh_weight = args.soft_fresh_weight)
sa_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)
samples = sa_solver.sample(
S=25,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=caption_embs,
unconditional_conditioning=null_y,
unconditional_guidance_scale=cfg_scale,
model_kwargs=model_kwargs,
)[0]
samples = vae.decode(samples / 0.18215).sample
torch.cuda.empty_cache()
# Save images:
os.umask(0o000) # file permission: 666; dir permission: 777
for i, sample in enumerate(samples):
save_path = os.path.join(save_root, f"{prompts[i][:100]}.jpg")
print("Saving path: ", save_path)
save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))
if __name__ == '__main__':
args = get_args()
# Setup PyTorch:
seed = args.seed
set_env(seed)
device = "cuda" if torch.cuda.is_available() else "cpu"
assert args.sampling_algo in ['iddpm', 'dpm-solver', 'sa-solver']
# only support fixed latent size currently
latent_size = args.image_size // 8
lewei_scale = {256: 1, 512: 1, 1024: 2} # trick for positional embedding interpolation
#lewei_scale = {512: 1, 1024: 2} # trick for positional embedding interpolation
sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}
sample_steps = args.step if args.step != -1 else sample_steps_dict[args.sampling_algo]
weight_dtype = torch.float16
print(f"Inference with {weight_dtype}")
# model setting
if args.image_size in [256, 512]:
model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
else:
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
print(f"Generating sample from ckpt: {args.model_path}")
state_dict = find_model(args.model_path)
del state_dict['state_dict']['pos_embed']
missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)
print('Missing keys: ', missing)
print('Unexpected keys', unexpected)
model.eval()
model.to(weight_dtype)
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)
t5 = T5Embedder(device="cuda", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)
work_dir = os.path.join(*args.model_path.split('/')[:-2])
work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir
# data setting
with open(args.txt_file, 'r') as f:
items = [item.strip() for item in f.readlines()]
# img save setting
try:
epoch_name = re.search(r'.*epoch_(\d+).*.pth', args.model_path).group(1)
step_name = re.search(r'.*step_(\d+).*.pth', args.model_path).group(1)
except Exception:
epoch_name = 'unknown'
step_name = 'unknown'
img_save_dir = os.path.join(work_dir, 'vis')
os.umask(0o000) # file permission: 666; dir permission: 777
os.makedirs(img_save_dir, exist_ok=True)
save_root = os.path.join(img_save_dir, f"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_samp{args.sampling_algo}_seed{seed}")
os.makedirs(save_root, exist_ok=True)
visualize(items, args.bs, sample_steps, args.cfg_scale)
================================================
FILE: PixArt-alpha-ToCa/scripts/inference_ddp.py
================================================
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import warnings
warnings.filterwarnings("ignore") # ignore warning
import re
import argparse
from datetime import datetime
from tqdm import tqdm
import torch
from torchvision.utils import save_image
from diffusers.models import AutoencoderKL
import torch.distributed as dist
from torch.utils.data import DataLoader, DistributedSampler
from diffusion.model.utils import prepare_prompt_ar
from diffusion import IDDPM, DPMS, SASolverSampler
from tools.download import find_model
from diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2
from diffusion.model.t5 import T5Embedder
from diffusion.data.datasets import get_chunks, ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--image_size', default=256, type=int)
parser.add_argument('--t5_path', default='../autodl-tmp/pretrained_models/t5_ckpts', type=str) # change to your t5 path
parser.add_argument('--tokenizer_path', default='../autodl-tmp/pretrained_models/sd-vae-ft-ema', type=str) # change to your tokenizer path
parser.add_argument('--txt_file', default='asset/samples.txt', type=str) # change to your txt prompt file
parser.add_argument('--model_path', default='../autodl-tmp/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str)
parser.add_argument('--bs', default=1, type=int)
parser.add_argument('--cfg_scale', default=4.5, type=float)
parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])
parser.add_argument('--seed', default=0, type=int)
parser.add_argument('--dataset', default='custom', type=str)
parser.add_argument('--step', default=-1, type=int)
parser.add_argument('--save_name', default='test_sample', type=str)
parser.add_argument("--fresh_ratio", type=float, default=0.30)
parser.add_argument("--cache_type", type=str, choices=['random', 'attention', 'similarity', 'norm', 'compress'], default='attention')
parser.add_argument("--ratio_scheduler", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant', 'linear-mode', 'layerwise', 'ToCa'])
parser.add_argument("--force_fresh", type=str, choices=['global', 'local'], default='global')
parser.add_argument("--fresh_threshold", type=int, default=3)
parser.add_argument("--soft_fresh_weight", type=float, default=0.25)
return parser.parse_args()
def setup_ddp():
dist.init_process_group(backend='nccl')
local_rank = dist.get_rank()
torch.cuda.set_device(local_rank)
return local_rank
def cleanup_ddp():
dist.destroy_process_group()
def set_env(seed=0, local_rank=None):
global_seed = seed + local_rank
torch.manual_seed(global_seed)
torch.cuda.manual_seed(global_seed)
#torch.cuda.manual_seed_all(global_seed)
torch.set_grad_enabled(False)
return torch.device(f'cuda:{local_rank}')
@torch.inference_mode()
def visualize(items, bs, sample_steps, cfg_scale, device):
sampler = DistributedSampler(items, shuffle=False, num_replicas=dist.get_world_size(), rank=dist.get_rank())
data_loader = DataLoader(items, batch_size=bs, sampler=sampler, drop_last=False)
pbar = tqdm(data_loader, unit='batch') if dist.get_rank() == 0 else data_loader
for chunk in pbar:
prompts = []
if bs == 1:
prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False) # ar for aspect ratio
if args.image_size == 1024:
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
latent_size_h, latent_size_w = latent_size, latent_size
prompts.append(prompt_clean.strip())
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
for prompt in chunk:
prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())
latent_size_h, latent_size_w = latent_size, latent_size
null_y = model.module.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]
with torch.no_grad():
caption_embs, emb_masks = t5.get_text_embeddings(prompts)
caption_embs = caption_embs.float()[:, None]
#print('finish embedding')
if args.sampling_algo == 'iddpm':
# we have not tested this part, there may bugsss.
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)
model_kwargs = dict(y=torch.cat([caption_embs, null_y]),
cfg_scale=cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type=args.cache_type,
fresh_ratio=args.fresh_ratio,
fresh_threshold=args.fresh_threshold,
force_fresh=args.force_fresh,
ratio_scheduler=args.ratio_scheduler,
soft_fresh_weight=args.soft_fresh_weight)
diffusion = IDDPM(str(sample_steps))
samples = diffusion.p_sample_loop(
model.module.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,
device=device
)
samples, _ = samples.chunk(2, dim=0)
elif args.sampling_algo == 'dpm-solver':
# Main srategy, we have tested and make sure it works.
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type=args.cache_type,
fresh_ratio=args.fresh_ratio,
fresh_threshold=args.fresh_threshold,
force_fresh=args.force_fresh,
ratio_scheduler=args.ratio_scheduler,
soft_fresh_weight=args.soft_fresh_weight)
dpm_solver = DPMS(model.module.forward_with_dpmsolver,
condition=caption_embs,
uncondition=null_y,
cfg_scale=cfg_scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=sample_steps,
order=2,
skip_type="time_uniform",
method="multistep",
model_kwargs=model_kwargs,
rank = dist.get_rank()
)
# not supported now
elif args.sampling_algo == 'sa-solver':
n = len(prompts)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,
cache_type=args.cache_type,
fresh_ratio=args.fresh_ratio,
fresh_threshold=args.fresh_threshold,
force_fresh=args.force_fresh,
ratio_scheduler=args.ratio_scheduler,
soft_fresh_weight=args.soft_fresh_weight)
sa_solver = SASolverSampler(model.module.forward_with_dpmsolver, device=device)
samples = sa_solver.sample(
S=25,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=caption_embs,
unconditional_conditioning=null_y,
unconditional_guidance_scale=cfg_scale,
model_kwargs=model_kwargs,
)[0]
samples = vae.decode(samples / 0.18215).sample
torch.cuda.empty_cache()
dist.barrier()
#if dist.get_rank() == 0:
os.umask(0o000)
for i, sample in enumerate(samples):
save_path = os.path.join(save_root, f"{prompts[i][:100]}.jpg")
#print("Saving path: ", save_path)
save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))
if __name__ == '__main__':
args = get_args()
# Setup DDP
local_rank = setup_ddp()
# Setup environment
device = set_env(args.seed, local_rank)
# only support fixed latent size currently
latent_size = args.image_size // 8
lewei_scale = {256: 1, 512: 1, 1024: 2}
sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}
sample_steps = args.step if args.step != -1 else sample_steps_dict[args.sampling_algo]
weight_dtype = torch.float16
print(f"Inference with {weight_dtype}")
# model setting
if args.image_size in [256, 512]:
model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
else:
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
print(f"Generating sample from ckpt: {args.model_path}")
state_dict = find_model(args.model_path)
del state_dict['state_dict']['pos_embed']
missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)
print('Missing keys: ', missing)
print('Unexpected keys', unexpected)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
model.module.eval()
model.module.to(weight_dtype)
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)
t5 = T5Embedder(device="cuda", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)
work_dir = os.path.join(*args.model_path.split('/')[:-2])
work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir
with open(args.txt_file, 'r') as f:
items = [item.strip() for item in f.readlines()]
epoch_name = re.search(r'.*epoch_(\d+).*.pth', args.model_path).group(1) if re.search(r'.*epoch_(\d+).*.pth', args.model_path) else 'unknown'
step_name = re.search(r'.*step_(\d+).*.pth', args.model_path).group(1) if re.search(r'.*step_(\d+).*.pth', args.model_path) else 'unknown'
img_save_dir = os.path.join(work_dir, 'vis')
os.umask(0o000)
os.makedirs(img_save_dir, exist_ok=True)
save_root = os.path.join(img_save_dir, f"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_samp{args.sampling_algo}_seed{args.seed}")
os.makedirs(save_root, exist_ok=True)
visualize(items, args.bs, sample_steps, args.cfg_scale, device)
cleanup_ddp()
================================================
FILE: PixArt-alpha-ToCa/scripts/inference_lcm.py
================================================
import os
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import warnings
warnings.filterwarnings("ignore") # ignore warning
import re
import argparse
from datetime import datetime
from tqdm import tqdm
import torch
from torchvision.utils import save_image
from diffusers.models import AutoencoderKL
from diffusion.model.utils import prepare_prompt_ar
from tools.download import find_model
from diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2
from diffusion.model.t5 import T5Embedder
from diffusion.data.datasets import get_chunks
from diffusion.lcm_scheduler import LCMScheduler
from diffusion.data.datasets import ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--image_size', default=1024, type=int)
parser.add_argument('--t5_path', default='output/pretrained_models/t5_ckpts', type=str)
parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)
parser.add_argument('--txt_file', default='asset/samples.txt', type=str)
parser.add_argument('--model_path', default='output/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str)
parser.add_argument('--bs', default=1, type=int)
parser.add_argument('--cfg_scale', default=4.5, type=float)
parser.add_argument('--sample_steps', default=4, type=int)
parser.add_argument('--seed', default=0, type=int)
parser.add_argument('--dataset', default='custom', type=str)
parser.add_argument('--step', default=-1, type=int)
parser.add_argument('--save_name', default='test_sample', type=str)
return parser.parse_args()
def set_env(seed=0):
torch.manual_seed(seed)
torch.set_grad_enabled(False)
for _ in range(30):
torch.randn(1, 4, args.image_size, args.image_size)
@torch.inference_mode()
def visualize(items, bs, sample_steps, cfg_scale):
# 4. Prepare timesteps
scheduler.set_timesteps(sample_steps, 50)
timesteps = scheduler.timesteps
for chunk in tqdm(list(get_chunks(items, bs)), unit='batch'):
prompts = []
if bs == 1:
prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False) # ar for aspect ratio
if args.image_size == 1024:
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
latent_size_h, latent_size_w = latent_size, latent_size
prompts.append(prompt_clean.strip())
else:
hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())
latent_size_h, latent_size_w = latent_size, latent_size
with torch.no_grad():
caption_embs, emb_masks = t5.get_text_embeddings(prompts)
caption_embs = caption_embs.float()[:, None]
print('finish embedding')
# Create sampling noise:
n = len(prompts)
latents = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
# 7. LCM MultiStep Sampling Loop:
for i, t in tqdm(list(enumerate(timesteps))):
ts = torch.full((bs,), t, device=device, dtype=torch.long)
# model prediction (v-prediction, eps, x)
model_pred = model(latents, ts, caption_embs, **model_kwargs)[:, :4]
# compute the previous noisy sample x_t -> x_t-1
latents, denoised = scheduler.step(model_pred, i, t, latents, return_dict=False)
samples = vae.decode(denoised / 0.18215).sample
torch.cuda.empty_cache()
# Save images:
os.umask(0o000) # file permission: 666; dir permission: 777
for i, sample in enumerate(samples):
save_path = os.path.join(save_root, f"{prompts[i][:100]}.jpg")
print("Saving path: ", save_path)
save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))
if __name__ == '__main__':
args = get_args()
# Setup PyTorch:
seed = args.seed
set_env(seed)
device = "cuda" if torch.cuda.is_available() else "cpu"
# only support fixed latent size currently
latent_size = args.image_size // 8
lewei_scale = {512: 1, 1024: 2} # trick for positional embedding interpolation
sample_steps = args.sample_steps
# Initalize Scheduler:
scheduler = LCMScheduler(beta_start=0.0001, beta_end=0.02, beta_schedule="linear", prediction_type="epsilon")
# model setting
if args.image_size == 512:
model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
else:
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
print(f"Generating sample from ckpt: {args.model_path}")
state_dict = find_model(args.model_path)
del state_dict['state_dict']['pos_embed']
missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)
print('Missing keys: ', missing)
print('Unexpected keys', unexpected)
model.eval()
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)
t5 = T5Embedder(device="cuda", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)
work_dir = os.path.join(*args.model_path.split('/')[:-2])
work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir
# data setting
with open(args.txt_file, 'r') as f:
items = [item.strip() for item in f.readlines()]
# img save setting
try:
epoch_name = re.search(r'.*epoch_(\d+).*.pth', args.model_path).group(1)
step_name = re.search(r'.*step_(\d+).*.pth', args.model_path).group(1)
except Exception:
epoch_name = 'unknown'
step_name = 'unknown'
img_save_dir = os.path.join(work_dir, 'vis')
os.umask(0o000) # file permission: 666; dir permission: 777
os.makedirs(img_save_dir, exist_ok=True)
save_root = os.path.join(img_save_dir, f"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_sampLCM_seed{seed}")
os.makedirs(save_root, exist_ok=True)
visualize(items, args.bs, sample_steps, args.cfg_scale)
================================================
FILE: PixArt-alpha-ToCa/scripts/interface.py
================================================
import argparse
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import os
import random
import torch
from torchvision.utils import save_image
from diffusion import IDDPM, DPMS, SASolverSampler
from diffusers.models import AutoencoderKL
from tools.download import find_model
from datetime import datetime
from typing import List, Union
import gradio as gr
import numpy as np
from gradio.components import Textbox, Image
from diffusion.model.utils import prepare_prompt_ar, resize_and_crop_tensor
from diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2
from diffusion.model.t5 import T5Embedder
from torchvision.utils import _log_api_usage_once, make_grid
from diffusion.data.datasets import ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
from asset.examples import examples
MAX_SEED = np.iinfo(np.int32).max
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--image_size', default=1024, type=int)
parser.add_argument('--model_path', default='output/pretrained_models/PixArt-XL-2-1024-MS.pth', type=str)
parser.add_argument('--t5_path', default='output/pretrained_models', type=str)
parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)
parser.add_argument('--llm_model', default='t5', type=str)
parser.add_argument('--port', default=7788, type=int)
return parser.parse_args()
@torch.no_grad()
def ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs,) -> None:
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
_log_api_usage_once(save_image)
grid = make_grid(tensor, **kwargs)
# Add 0.5 after unnormalizing to [0, 255] to round to the nearest integer
return grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to("cpu", torch.uint8).numpy()
def set_env(seed=0):
torch.manual_seed(seed)
torch.set_grad_enabled(False)
for _ in range(30):
torch.randn(1, 4, args.image_size, args.image_size)
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
@torch.inference_mode()
def generate_img(prompt, sampler, sample_steps, scale, seed=0, randomize_seed=False):
seed = int(randomize_seed_fn(seed, randomize_seed))
set_env(seed)
os.makedirs(f'output/demo/online_demo_prompts/', exist_ok=True)
save_promt_path = f'output/demo/online_demo_prompts/tested_prompts{datetime.now().date()}.txt'
with open(save_promt_path, 'a') as f:
f.write(prompt + '\n')
print(prompt)
prompt_clean, prompt_show, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device) # ar for aspect ratio
prompt_clean = prompt_clean.strip()
if isinstance(prompt_clean, str):
prompts = [prompt_clean]
caption_embs, emb_masks = llm_embed_model.get_text_embeddings(prompts)
caption_embs = caption_embs[:, None]
null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]
latent_size_h, latent_size_w = int(hw[0, 0]//8), int(hw[0, 1]//8)
# Sample images:
if sampler == 'iddpm':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)
model_kwargs = dict(y=torch.cat([caption_embs, null_y]),
cfg_scale=scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
diffusion = IDDPM(str(sample_steps))
samples = diffusion.p_sample_loop(
model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,
device=device
)
samples, _ = samples.chunk(2, dim=0) # Remove null class samples
elif sampler == 'dpm-solver':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
dpm_solver = DPMS(model.forward_with_dpmsolver,
condition=caption_embs,
uncondition=null_y,
cfg_scale=scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=sample_steps,
order=2,
skip_type="time_uniform",
method="multistep",
)
elif sampler == 'sa-solver':
# Create sampling noise:
n = len(prompts)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
sa_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)
samples = sa_solver.sample(
S=sample_steps,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=caption_embs,
unconditional_conditioning=null_y,
unconditional_guidance_scale=scale,
model_kwargs=model_kwargs,
)[0]
samples = vae.decode(samples / 0.18215).sample
torch.cuda.empty_cache()
samples = resize_and_crop_tensor(samples, custom_hw[0,1], custom_hw[0,0])
display_model_info = f'Model path: {args.model_path},\nBase image size: {args.image_size}, \nSampling Algo: {sampler}'
return ndarr_image(samples, normalize=True, value_range=(-1, 1)), prompt_show, display_model_info, seed
if __name__ == '__main__':
from diffusion.utils.logger import get_root_logger
args = get_args()
device = "cuda" if torch.cuda.is_available() else "cpu"
logger = get_root_logger()
assert args.image_size in [512, 1024], "We only provide pre-trained models for 256x256, 512x512 and 1024x1024 resolutions."
lewei_scale = {512: 1, 1024: 2}
latent_size = args.image_size // 8
t5_device = {512: 'cuda', 1024: 'cuda'}
if args.image_size == 512:
model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
else:
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)
state_dict = find_model(args.model_path)
del state_dict['state_dict']['pos_embed']
missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
model.eval()
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)
if args.llm_model == 't5':
llm_embed_model = T5Embedder(device=t5_device[args.image_size], local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)
else:
print('We support t5 only, please initialize the llm again')
sys.exit()
title = f"""
'' Unleashing your Creativity \n ''

{args.image_size}px
"""
DESCRIPTION = """# PixArt-Alpha 1024px
## If PixArt-Alpha is helpful, please help to ⭐ the [Github Repo](https://github.com/PixArt-alpha/PixArt) and recommend it to your friends 😊'
#### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS) checkpoint.
#### English prompts ONLY; 提示词仅限英文
Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).
"""
if not torch.cuda.is_available():
DESCRIPTION += "\nRunning on CPU 🥶 This demo does not work on CPU.
"
demo = gr.Interface(
fn=generate_img,
inputs=[Textbox(label="Note: If you want to specify a aspect ratio or determine a customized height and width, "
"use --ar h:w (or --aspect_ratio h:w) or --hw h:w. If no aspect ratio or hw is given, all setting will be default.",
placeholder="Please enter your prompt. \n"),
gr.Radio(
choices=["iddpm", "dpm-solver"],
label=f"Sampler",
interactive=True,
value='dpm-solver',
),
gr.Slider(
label='Sample Steps',
minimum=1,
maximum=100,
value=14,
step=1
),
gr.Slider(
label='Guidance Scale',
minimum=0.1,
maximum=30.0,
value=4.5,
step=0.1
),
gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=0,
),
gr.Checkbox(label="Randomize seed", value=True),
],
outputs=[Image(type="numpy", label="Img"),
Textbox(label="clean prompt"),
Textbox(label="model info"),
gr.Slider(label='seed')],
title=title,
description=DESCRIPTION,
examples=examples,
)
demo.launch(server_name="0.0.0.0", server_port=args.port, debug=True)
================================================
FILE: PixArt-alpha-ToCa/scripts/interface_controlnet.py
================================================
import argparse
import os
from datetime import datetime
import numpy as np
import sys
from pathlib import Path
from typing import List, Union
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import gradio as gr
from gradio.components import Textbox, Image, Slider
import torch
import torchvision.transforms as T
import torchvision.transforms.functional as TF
from torchvision.utils import _log_api_usage_once, make_grid, save_image
from diffusion import IDDPM, DPMS, SASolverSampler
from diffusion.data.datasets import *
from diffusion.model.hed import HEDdetector
from diffusion.model.nets import PixArtMS_XL_2, ControlPixArtHalf, ControlPixArtMSHalf
from diffusion.model.t5 import T5Embedder
from diffusion.model.utils import prepare_prompt_ar, resize_and_crop_tensor
from diffusion.utils.misc import read_config
from diffusers.models import AutoencoderKL
from tools.download import find_model
vae_scale = 0.18215
DESCRIPTION = """
# PixArt-Alpha 1024px + ControlNet. This is the demo for ControlNet combined with 1024px PixArt-Alpha.
# The input reference image need to be around 1024x1024. And descriptive prompts also need to be provided.
# You may change the random seed, if you didn't get satisfied results.
"""
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("config", type=str, help="config")
parser.add_argument('--num_sampling_steps', default=14, type=int)
parser.add_argument('--cfg_scale', default=4.5, type=int)
parser.add_argument('--image_size', default=1024, type=int)
parser.add_argument('--model_path', type=str)
parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)
parser.add_argument('--llm_model', default='t5', type=str)
parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])
parser.add_argument('--port', default=7788, type=int)
parser.add_argument('--condition_strength', default=1, type=float)
return parser.parse_args()
@torch.no_grad()
def ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs, ) -> None:
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
_log_api_usage_once(save_image)
grid = make_grid(tensor, **kwargs)
ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to("cpu", torch.uint8).numpy()
return ndarr
def set_env():
torch.manual_seed(0)
torch.set_grad_enabled(False)
@torch.inference_mode()
def generate_img(prompt, given_image, seed):
torch.manual_seed(seed)
torch.cuda.empty_cache()
strength = 1.0
c_vis = given_image
save_promt_path = f'{save_prompt_path}/tested_prompts{datetime.now().date()}.txt'
with open(save_promt_path, 'a') as f:
f.write(prompt + '\n')
prompt_clean, prompt_show, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device) # ar for aspect ratio
prompt_clean = prompt_clean.strip()
if isinstance(prompt_clean, str):
prompts = [prompt_clean]
caption_embs, emb_masks = llm_embed_model.get_text_embeddings(prompts)
caption_embs = caption_embs[:, None]
null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]
# condition process
if given_image is not None:
ar = torch.tensor([given_image.size[1] / given_image.size[0]], device=device)[None]
custom_hw = torch.tensor([given_image.size[1], given_image.size[0]], device=device)[None]
closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]
hw = torch.tensor(closest_hw, device=device)[None]
condition_transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(int(min(closest_hw))),
T.CenterCrop([int(closest_hw[0]), int(closest_hw[1])]),
T.ToTensor(),
])
given_image = condition_transform(given_image).unsqueeze(0).to(device)
hed_edge = hed(given_image) * strength
hed_edge = TF.normalize(hed_edge, [.5], [.5])
hed_edge = hed_edge.repeat(1, 3, 1, 1)
posterior = vae.encode(hed_edge).latent_dist
condition = posterior.sample()
c = condition * vae_scale
c_vis = vae.decode(condition)['sample']
c_vis = torch.clamp(127.5 * c_vis + 128.0, 0, 255).permute(0, 2, 3, 1).to("cpu", dtype=torch.uint8).numpy()[0]
else:
c = None
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
# Sample images:
if args.sampling_algo == 'iddpm':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size, latent_size, device=device).repeat(2, 1, 1, 1)
model_kwargs = dict(y=torch.cat([caption_embs, null_y]), cfg_scale=args.cfg_scale,
data_info={'img_hw': hw, 'aspect_ratio': ar},
mask=emb_masks, c=c)
diffusion = IDDPM(str(args.num_sampling_steps))
samples = diffusion.p_sample_loop(
model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,
device=device
)
samples, _ = samples.chunk(2, dim=0) # Remove null class samples
elif args.sampling_algo == 'dpm-solver':
# Create sampling noise:
n = len(prompts)
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks, c=c)
dpm_solver = DPMS(model.forward_with_dpmsolver,
condition=caption_embs,
uncondition=null_y,
cfg_scale=args.cfg_scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=args.num_sampling_steps,
order=2,
skip_type="time_uniform",
method="multistep",
)
elif args.sampling_algo == 'sa-solver':
# Create sampling noise:
n = len(prompts)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks, c=c)
sas_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)
samples = sas_solver.sample(
S=args.num_sampling_steps,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=caption_embs,
unconditional_conditioning=null_y,
unconditional_guidance_scale=args.cfg_scale,
model_kwargs=model_kwargs,
)[0]
samples = vae.decode(samples / vae_scale).sample
torch.cuda.empty_cache()
samples = resize_and_crop_tensor(samples, custom_hw[0, 1], custom_hw[0, 0])
return ndarr_image(samples, normalize=True, value_range=(-1, 1)), c_vis, prompt_show
if __name__ == '__main__':
args = get_args()
config = read_config(args.config)
set_env()
device = "cuda" if torch.cuda.is_available() else "cpu"
save_prompt_path = 'output/demo/online_demo_prompts/'
os.makedirs(save_prompt_path, exist_ok=True)
assert args.image_size in [512, 1024], "We only provide pre-trained models for 512x512 and 1024x1024 resolutions."
lewei_scale = {512: 1, 1024: 2}
latent_size = args.image_size // 8
weight_dtype = torch.float16
print(f"Inference with {weight_dtype}")
model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size])
if config.image_size == 512:
print('model architecture ControlPixArtHalf and image size is 512')
model = ControlPixArtHalf(model).to(device)
elif config.image_size == 1024:
print('model architecture ControlPixArtMSHalf and image size is 1024')
model = ControlPixArtMSHalf(model).to(device)
state_dict = find_model(args.model_path)['state_dict']
if 'pos_embed' in state_dict:
del state_dict['pos_embed']
elif 'base_model.pos_embed' in state_dict:
del state_dict['base_model.pos_embed']
missing, unexpected = model.load_state_dict(state_dict, strict=False)
print('Missing keys (missing pos_embed is normal): ', missing)
print('Unexpected keys', unexpected)
model.eval()
model.to(weight_dtype)
display_model_info = f'model path: {args.model_path},\n base image size: {args.image_size}'
base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')
vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)
hed = HEDdetector(False).to(device)
if args.llm_model == 't5':
print("begin load t5")
llm_embed_model = T5Embedder(device=device, local_cache=True, cache_dir='data/t5_ckpts', torch_dtype=torch.float)
print("finish load t5")
else:
print(f'We support t5 only, please initialize the llm again')
sys.exit()
gr.Markdown(DESCRIPTION)
demo = gr.Interface(fn=generate_img,
inputs=[
Textbox(label="Enter a reference image, the resolution of image need around 1024 x 1024",
placeholder="Please enter your prompt. \n"),
Image(type="pil", label="Condition"),
Slider(minimum=0., maximum=10000., value=0, step=2, label='seed'),
],
outputs=[Image(type="numpy", label="Img"),
Image(type="numpy", label="HED Edge Map"),
Textbox(label="clean prompt"),]
)
demo.queue(max_size=20).launch(server_name="0.0.0.0", server_port=args.port, debug=True)
================================================
FILE: PixArt-alpha-ToCa/scripts/pipeline_pixart_inpaint.py
================================================
# Copyright 2023 PixArt-Alpha Authors and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import html
import inspect
import re
import urllib.parse as ul
from typing import Callable, List, Optional, Tuple, Union
import torch
import torch.nn.functional as F
from transformers import T5EncoderModel, T5Tokenizer
from diffusers.image_processor import PipelineImageInput, PixArtImageProcessor, VaeImageProcessor
from diffusers.models import AutoencoderKL, Transformer2DModel
from diffusers.pipelines.pipeline_utils import DiffusionPipeline, ImagePipelineOutput
from diffusers.schedulers import DPMSolverMultistepScheduler
from diffusers.utils import (
BACKENDS_MAPPING,
deprecate,
is_bs4_available,
is_ftfy_available,
logging,
replace_example_docstring,
)
from diffusers.utils.torch_utils import randn_tensor
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
if is_bs4_available():
from bs4 import BeautifulSoup
if is_ftfy_available():
import ftfy
EXAMPLE_DOC_STRING = """
Examples:
```py
>>> import torch
>>> from diffusers import PixArtAlphaInpaintPipeline
>>> # You can replace the checkpoint id with "PixArt-alpha/PixArt-XL-2-512x512" too.
>>> pipe = PixArtAlphaInpaintPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)
>>> # Enable memory optimizations.
>>> pipe.enable_model_cpu_offload()
>>> prompt = ""
>>> image = Image.open('')
>>> image = pipe(prompt,
image=image,
mask_image=mask_image,
strength=1.0).images[0]
```
"""
ASPECT_RATIO_1024_BIN = {
"0.25": [512.0, 2048.0],
"0.28": [512.0, 1856.0],
"0.32": [576.0, 1792.0],
"0.33": [576.0, 1728.0],
"0.35": [576.0, 1664.0],
"0.4": [640.0, 1600.0],
"0.42": [640.0, 1536.0],
"0.48": [704.0, 1472.0],
"0.5": [704.0, 1408.0],
"0.52": [704.0, 1344.0],
"0.57": [768.0, 1344.0],
"0.6": [768.0, 1280.0],
"0.68": [832.0, 1216.0],
"0.72": [832.0, 1152.0],
"0.78": [896.0, 1152.0],
"0.82": [896.0, 1088.0],
"0.88": [960.0, 1088.0],
"0.94": [960.0, 1024.0],
"1.0": [1024.0, 1024.0],
"1.07": [1024.0, 960.0],
"1.13": [1088.0, 960.0],
"1.21": [1088.0, 896.0],
"1.29": [1152.0, 896.0],
"1.38": [1152.0, 832.0],
"1.46": [1216.0, 832.0],
"1.67": [1280.0, 768.0],
"1.75": [1344.0, 768.0],
"2.0": [1408.0, 704.0],
"2.09": [1472.0, 704.0],
"2.4": [1536.0, 640.0],
"2.5": [1600.0, 640.0],
"3.0": [1728.0, 576.0],
"4.0": [2048.0, 512.0],
}
ASPECT_RATIO_512_BIN = {
"0.25": [256.0, 1024.0],
"0.28": [256.0, 928.0],
"0.32": [288.0, 896.0],
"0.33": [288.0, 864.0],
"0.35": [288.0, 832.0],
"0.4": [320.0, 800.0],
"0.42": [320.0, 768.0],
"0.48": [352.0, 736.0],
"0.5": [352.0, 704.0],
"0.52": [352.0, 672.0],
"0.57": [384.0, 672.0],
"0.6": [384.0, 640.0],
"0.68": [416.0, 608.0],
"0.72": [416.0, 576.0],
"0.78": [448.0, 576.0],
"0.82": [448.0, 544.0],
"0.88": [480.0, 544.0],
"0.94": [480.0, 512.0],
"1.0": [512.0, 512.0],
"1.07": [512.0, 480.0],
"1.13": [544.0, 480.0],
"1.21": [544.0, 448.0],
"1.29": [576.0, 448.0],
"1.38": [576.0, 416.0],
"1.46": [608.0, 416.0],
"1.67": [640.0, 384.0],
"1.75": [672.0, 384.0],
"2.0": [704.0, 352.0],
"2.09": [736.0, 352.0],
"2.4": [768.0, 320.0],
"2.5": [800.0, 320.0],
"3.0": [864.0, 288.0],
"4.0": [1024.0, 256.0],
}
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
def retrieve_timesteps(
scheduler,
num_inference_steps: Optional[int] = None,
device: Optional[Union[str, torch.device]] = None,
timesteps: Optional[List[int]] = None,
**kwargs,
):
"""
Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles
custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.
Args:
scheduler (`SchedulerMixin`):
The scheduler to get timesteps from.
num_inference_steps (`int`):
The number of diffusion steps used when generating samples with a pre-trained model. If used,
`timesteps` must be `None`.
device (`str` or `torch.device`, *optional*):
The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
timesteps (`List[int]`, *optional*):
Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
timestep spacing strategy of the scheduler is used. If `timesteps` is passed, `num_inference_steps`
must be `None`.
Returns:
`Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the
second element is the number of inference steps.
"""
if timesteps is not None:
accepts_timesteps = "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
if not accepts_timesteps:
raise ValueError(
f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
f" timestep schedules. Please check whether you are using the correct scheduler."
)
scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
timesteps = scheduler.timesteps
num_inference_steps = len(timesteps)
else:
scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
timesteps = scheduler.timesteps
return timesteps, num_inference_steps
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents
def retrieve_latents(
encoder_output: torch.Tensor, generator: Optional[torch.Generator] = None, sample_mode: str = "sample"
):
if hasattr(encoder_output, "latent_dist") and sample_mode == "sample":
return encoder_output.latent_dist.sample(generator)
elif hasattr(encoder_output, "latent_dist") and sample_mode == "argmax":
return encoder_output.latent_dist.mode()
elif hasattr(encoder_output, "latents"):
return encoder_output.latents
else:
raise AttributeError("Could not access latents of provided encoder_output")
class PixArtAlphaInpaintPipeline(DiffusionPipeline):
r"""
Pipeline for text-to-image generation using PixArt-Alpha.
This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
Args:
vae ([`AutoencoderKL`]):
Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder ([`T5EncoderModel`]):
Frozen text-encoder. PixArt-Alpha uses
[T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the
[t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) variant.
tokenizer (`T5Tokenizer`):
Tokenizer of class
[T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).
transformer ([`Transformer2DModel`]):
A text conditioned `Transformer2DModel` to denoise the encoded image latents.
scheduler ([`SchedulerMixin`]):
A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
"""
bad_punct_regex = re.compile(
r"["
+ "#®•©™&@·º½¾¿¡§~"
+ r"\)"
+ r"\("
+ r"\]"
+ r"\["
+ r"\}"
+ r"\{"
+ r"\|"
+ "\\"
+ r"\/"
+ r"\*"
+ r"]{1,}"
) # noqa
_optional_components = ["tokenizer", "text_encoder"]
model_cpu_offload_seq = "text_encoder->transformer->vae"
def __init__(
self,
tokenizer: T5Tokenizer,
text_encoder: T5EncoderModel,
vae: AutoencoderKL,
transformer: Transformer2DModel,
scheduler: DPMSolverMultistepScheduler,
):
super().__init__()
self.register_modules(
tokenizer=tokenizer, text_encoder=text_encoder, vae=vae, transformer=transformer, scheduler=scheduler
)
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
self.image_processor = PixArtImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.mask_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True
)
# Adapted from https://github.com/PixArt-alpha/PixArt-alpha/blob/master/diffusion/model/utils.py
def mask_text_embeddings(self, emb, mask):
if emb.shape[0] == 1:
keep_index = mask.sum().item()
return emb[:, :, :keep_index, :], keep_index
else:
masked_feature = emb * mask[:, None, :, None]
return masked_feature, emb.shape[2]
# Adapted from diffusers.pipelines.deepfloyd_if.pipeline_if.encode_prompt
def encode_prompt(
self,
prompt: Union[str, List[str]],
do_classifier_free_guidance: bool = True,
negative_prompt: str = "",
num_images_per_prompt: int = 1,
device: Optional[torch.device] = None,
prompt_embeds: Optional[torch.FloatTensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,
clean_caption: bool = False,
**kwargs,
):
r"""
Encodes the prompt into text encoder hidden states.
Args:
prompt (`str` or `List[str]`, *optional*):
prompt to be encoded
negative_prompt (`str` or `List[str]`, *optional*):
The prompt not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds`
instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). For
PixArt-Alpha, this should be "".
do_classifier_free_guidance (`bool`, *optional*, defaults to `True`):
whether to use classifier free guidance or not
num_images_per_prompt (`int`, *optional*, defaults to 1):
number of images that should be generated per prompt
device: (`torch.device`, *optional*):
torch device to place the resulting embeddings on
prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the ""
string.
clean_caption (bool, defaults to `False`):
If `True`, the function will preprocess and clean the provided caption before encoding.
"""
if "mask_feature" in kwargs:
deprecation_message = "The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version."
deprecate("mask_feature", "1.0.0", deprecation_message, standard_warn=False)
if device is None:
device = self._execution_device
if prompt is not None and isinstance(prompt, str):
batch_size = 1
elif prompt is not None and isinstance(prompt, list):
batch_size = len(prompt)
else:
batch_size = prompt_embeds.shape[0]
# See Section 3.1. of the paper.
max_length = 120
if prompt_embeds is None:
prompt = self._text_preprocessing(prompt, clean_caption=clean_caption)
text_inputs = self.tokenizer(
prompt,
padding="max_length",
max_length=max_length,
truncation=True,
add_special_tokens=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
untruncated_ids = self.tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
text_input_ids, untruncated_ids
):
removed_text = self.tokenizer.batch_decode(untruncated_ids[:, max_length - 1 : -1])
logger.warning(
"The following part of your input was truncated because CLIP can only handle sequences up to"
f" {max_length} tokens: {removed_text}"
)
prompt_attention_mask = text_inputs.attention_mask
prompt_attention_mask = prompt_attention_mask.to(device)
prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=prompt_attention_mask)
prompt_embeds = prompt_embeds[0]
if self.text_encoder is not None:
dtype = self.text_encoder.dtype
elif self.transformer is not None:
dtype = self.transformer.dtype
else:
dtype = None
prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)
bs_embed, seq_len, _ = prompt_embeds.shape
# duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method
prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)
prompt_attention_mask = prompt_attention_mask.view(bs_embed, -1)
prompt_attention_mask = prompt_attention_mask.repeat(num_images_per_prompt, 1)
# get unconditional embeddings for classifier free guidance
if do_classifier_free_guidance and negative_prompt_embeds is None:
uncond_tokens = [negative_prompt] * batch_size
uncond_tokens = self._text_preprocessing(uncond_tokens, clean_caption=clean_caption)
max_length = prompt_embeds.shape[1]
uncond_input = self.tokenizer(
uncond_tokens,
padding="max_length",
max_length=max_length,
truncation=True,
return_attention_mask=True,
add_special_tokens=True,
return_tensors="pt",
)
negative_prompt_attention_mask = uncond_input.attention_mask
negative_prompt_attention_mask = negative_prompt_attention_mask.to(device)
negative_prompt_embeds = self.text_encoder(
uncond_input.input_ids.to(device), attention_mask=negative_prompt_attention_mask
)
negative_prompt_embeds = negative_prompt_embeds[0]
if do_classifier_free_guidance:
# duplicate unconditional embeddings for each generation per prompt, using mps friendly method
seq_len = negative_prompt_embeds.shape[1]
negative_prompt_embeds = negative_prompt_embeds.to(dtype=dtype, device=device)
negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
negative_prompt_attention_mask = negative_prompt_attention_mask.view(bs_embed, -1)
negative_prompt_attention_mask = negative_prompt_attention_mask.repeat(num_images_per_prompt, 1)
else:
negative_prompt_embeds = None
negative_prompt_attention_mask = None
return prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
def prepare_extra_step_kwargs(self, generator, eta):
# prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
# eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
# eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
# and should be between [0, 1]
accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
extra_step_kwargs = {}
if accepts_eta:
extra_step_kwargs["eta"] = eta
# check if the scheduler accepts generator
accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
if accepts_generator:
extra_step_kwargs["generator"] = generator
return extra_step_kwargs
def check_inputs(
self,
prompt,
height,
width,
negative_prompt,
callback_steps,
prompt_embeds=None,
negative_prompt_embeds=None,
prompt_attention_mask=None,
negative_prompt_attention_mask=None,
):
if height % 8 != 0 or width % 8 != 0:
raise ValueError(f"`height` and `width` have to be divisible by 8 but are {height} and {width}.")
if (callback_steps is None) or (
callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0)
):
raise ValueError(
f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
f" {type(callback_steps)}."
)
if prompt is not None and prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
" only forward one of the two."
)
elif prompt is None and prompt_embeds is None:
raise ValueError(
"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
)
elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
if prompt is not None and negative_prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `prompt`: {prompt} and `negative_prompt_embeds`:"
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
)
if negative_prompt is not None and negative_prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
)
if prompt_embeds is not None and prompt_attention_mask is None:
raise ValueError("Must provide `prompt_attention_mask` when specifying `prompt_embeds`.")
if negative_prompt_embeds is not None and negative_prompt_attention_mask is None:
raise ValueError("Must provide `negative_prompt_attention_mask` when specifying `negative_prompt_embeds`.")
if prompt_embeds is not None and negative_prompt_embeds is not None:
if prompt_embeds.shape != negative_prompt_embeds.shape:
raise ValueError(
"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
f" {negative_prompt_embeds.shape}."
)
if prompt_attention_mask.shape != negative_prompt_attention_mask.shape:
raise ValueError(
"`prompt_attention_mask` and `negative_prompt_attention_mask` must have the same shape when passed directly, but"
f" got: `prompt_attention_mask` {prompt_attention_mask.shape} != `negative_prompt_attention_mask`"
f" {negative_prompt_attention_mask.shape}."
)
# Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._text_preprocessing
def _text_preprocessing(self, text, clean_caption=False):
if clean_caption and not is_bs4_available():
logger.warn(BACKENDS_MAPPING["bs4"][-1].format("Setting `clean_caption=True`"))
logger.warn("Setting `clean_caption` to False...")
clean_caption = False
if clean_caption and not is_ftfy_available():
logger.warn(BACKENDS_MAPPING["ftfy"][-1].format("Setting `clean_caption=True`"))
logger.warn("Setting `clean_caption` to False...")
clean_caption = False
if not isinstance(text, (tuple, list)):
text = [text]
def process(text: str):
if clean_caption:
text = self._clean_caption(text)
text = self._clean_caption(text)
else:
text = text.lower().strip()
return text
return [process(t) for t in text]
# Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption
def _clean_caption(self, caption):
caption = str(caption)
caption = ul.unquote_plus(caption)
caption = caption.strip().lower()
caption = re.sub("", "person", caption)
# urls:
caption = re.sub(
r"\b((?:https?:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))",
# noqa
"",
caption,
) # regex for urls
caption = re.sub(
r"\b((?:www:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))",
# noqa
"",
caption,
) # regex for urls
# html:
caption = BeautifulSoup(caption, features="html.parser").text
# @
caption = re.sub(r"@[\w\d]+\b", "", caption)
# 31C0—31EF CJK Strokes
# 31F0—31FF Katakana Phonetic Extensions
# 3200—32FF Enclosed CJK Letters and Months
# 3300—33FF CJK Compatibility
# 3400—4DBF CJK Unified Ideographs Extension A
# 4DC0—4DFF Yijing Hexagram Symbols
# 4E00—9FFF CJK Unified Ideographs
caption = re.sub(r"[\u31c0-\u31ef]+", "", caption)
caption = re.sub(r"[\u31f0-\u31ff]+", "", caption)
caption = re.sub(r"[\u3200-\u32ff]+", "", caption)
caption = re.sub(r"[\u3300-\u33ff]+", "", caption)
caption = re.sub(r"[\u3400-\u4dbf]+", "", caption)
caption = re.sub(r"[\u4dc0-\u4dff]+", "", caption)
caption = re.sub(r"[\u4e00-\u9fff]+", "", caption)
#######################################################
# все виды тире / all types of dash --> "-"
caption = re.sub(
r"[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]+",
# noqa
"-",
caption,
)
# кавычки к одному стандарту
caption = re.sub(r"[`´«»“”¨]", '"', caption)
caption = re.sub(r"[‘’]", "'", caption)
# "
caption = re.sub(r""?", "", caption)
# &
caption = re.sub(r"&", "", caption)
# ip adresses:
caption = re.sub(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", " ", caption)
# article ids:
caption = re.sub(r"\d:\d\d\s+$", "", caption)
# \n
caption = re.sub(r"\\n", " ", caption)
# "#123"
caption = re.sub(r"#\d{1,3}\b", "", caption)
# "#12345.."
caption = re.sub(r"#\d{5,}\b", "", caption)
# "123456.."
caption = re.sub(r"\b\d{6,}\b", "", caption)
# filenames:
caption = re.sub(r"[\S]+\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)", "", caption)
#
caption = re.sub(r"[\"\']{2,}", r'"', caption) # """AUSVERKAUFT"""
caption = re.sub(r"[\.]{2,}", r" ", caption) # """AUSVERKAUFT"""
caption = re.sub(self.bad_punct_regex, r" ", caption) # ***AUSVERKAUFT***, #AUSVERKAUFT
caption = re.sub(r"\s+\.\s+", r" ", caption) # " . "
# this-is-my-cute-cat / this_is_my_cute_cat
regex2 = re.compile(r"(?:\-|\_)")
if len(re.findall(regex2, caption)) > 3:
caption = re.sub(regex2, " ", caption)
caption = ftfy.fix_text(caption)
caption = html.unescape(html.unescape(caption))
caption = re.sub(r"\b[a-zA-Z]{1,3}\d{3,15}\b", "", caption) # jc6640
caption = re.sub(r"\b[a-zA-Z]+\d+[a-zA-Z]+\b", "", caption) # jc6640vc
caption = re.sub(r"\b\d+[a-zA-Z]+\d+\b", "", caption) # 6640vc231
caption = re.sub(r"(worldwide\s+)?(free\s+)?shipping", "", caption)
caption = re.sub(r"(free\s)?download(\sfree)?", "", caption)
caption = re.sub(r"\bclick\b\s(?:for|on)\s\w+", "", caption)
caption = re.sub(r"\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\simage[s]?)?", "", caption)
caption = re.sub(r"\bpage\s+\d+\b", "", caption)
caption = re.sub(r"\b\d*[a-zA-Z]+\d+[a-zA-Z]+\d+[a-zA-Z\d]*\b", r" ", caption) # j2d1a2a...
caption = re.sub(r"\b\d+\.?\d*[xх×]\d+\.?\d*\b", "", caption)
caption = re.sub(r"\b\s+\:\s+", r": ", caption)
caption = re.sub(r"(\D[,\./])\b", r"\1 ", caption)
caption = re.sub(r"\s+", " ", caption)
caption.strip()
caption = re.sub(r"^[\"\']([\w\W]+)[\"\']$", r"\1", caption)
caption = re.sub(r"^[\'\_,\-\:;]", r"", caption)
caption = re.sub(r"[\'\_,\-\:\-\+]$", r"", caption)
caption = re.sub(r"^\.\S+$", "", caption)
return caption.strip()
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
def prepare_latents(
self,
batch_size,
num_channels_latents,
height,
width,
dtype,
device,
generator,
latents=None,
image=None,
timestep=None,
is_strength_max=True,
return_image_latents=True,
):
shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
if isinstance(generator, list) and len(generator) != batch_size:
raise ValueError(
f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
f" size of {batch_size}. Make sure the batch size matches the length of the generators."
)
if (image is None or timestep is None) and not is_strength_max:
raise ValueError(
"Since strength < 1. initial latents are to be initialised as a combination of Image + Noise."
"However, either the image or the noise timestep has not been provided."
)
if return_image_latents or (latents is None and not is_strength_max):
image = image.to(device=device, dtype=dtype)
if image.shape[1] == 4:
image_latents = image
else:
image_latents = self._encode_vae_image(image=image, generator=generator)
image_latents = image_latents.repeat(batch_size // image_latents.shape[0], 1, 1, 1)
if latents is None:
noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
# if strength is 1. then initialise the latents to noise, else initial to image + noise
latents = noise if is_strength_max else self.scheduler.add_noise(image_latents, noise, timestep)
# if pure noise then scale the initial latents by the Scheduler's init sigma
latents = latents * self.scheduler.init_noise_sigma if is_strength_max else latents
else:
noise = latents.to(device)
latents = noise * self.scheduler.init_noise_sigma
# scale the initial noise by the standard deviation required by the scheduler
latents = latents * self.scheduler.init_noise_sigma
return latents, noise, image_latents
def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator):
if isinstance(generator, list):
image_latents = [
retrieve_latents(self.vae.encode(image[i : i + 1]), generator=generator[i])
for i in range(image.shape[0])
]
image_latents = torch.cat(image_latents, dim=0)
else:
image_latents = retrieve_latents(self.vae.encode(image), generator=generator)
image_latents = self.vae.config.scaling_factor * image_latents
return image_latents
def prepare_mask_latents(
self, mask, batch_size, height, width, dtype, device, generator, do_classifier_free_guidance
):
# resize the mask to latents shape as we concatenate the mask to the latents
# we do that before converting to dtype to avoid breaking in case we're using cpu_offload
# and half precision
mask = torch.nn.functional.interpolate(
mask, size=(height // self.vae_scale_factor, width // self.vae_scale_factor)
)
mask = mask.to(device=device, dtype=dtype)
if mask.shape[0] < batch_size:
if not batch_size % mask.shape[0] == 0:
raise ValueError(
"The passed mask and the required batch size don't match. Masks are supposed to be duplicated to"
f" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number"
" of masks that you pass is divisible by the total requested batch size."
)
mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1)
mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask
return mask
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.get_timesteps
def get_timesteps(self, num_inference_steps, strength, device):
# get the original timestep using init_timestep
init_timestep = min(int(num_inference_steps * strength), num_inference_steps)
t_start = max(num_inference_steps - init_timestep, 0)
timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :]
return timesteps, num_inference_steps - t_start
@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
self,
prompt: Union[str, List[str]] = None,
image: PipelineImageInput = None,
mask_image: PipelineImageInput = None,
strength: float = 1.0,
negative_prompt: str = "",
num_inference_steps: int = 20,
timesteps: List[int] = None,
guidance_scale: float = 4.5,
num_images_per_prompt: Optional[int] = 1,
height: Optional[int] = None,
width: Optional[int] = None,
eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
callback_steps: int = 1,
clean_caption: bool = True,
use_resolution_binning: bool = True,
**kwargs,
) -> Union[ImagePipelineOutput, Tuple]:
"""
Function invoked when calling the pipeline for generation.
Args:
prompt (`str` or `List[str]`, *optional*):
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, numpy array or tensor representing an image batch to be inpainted (which parts of the image to
be masked out with `mask_image` and repainted according to `prompt`). For both numpy array and pytorch
tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the
expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the
expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but
if passing latents directly it is not encoded again.
mask_image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one
color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B,
H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W,
1)`, or `(H, W)`.
negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`).
num_inference_steps (`int`, *optional*, defaults to 100):
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
expense of slower inference.
timesteps (`List[int]`, *optional*):
Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps`
timesteps are used. Must be in descending order.
guidance_scale (`float`, *optional*, defaults to 4.5):
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
`guidance_scale` is defined as `w` of equation 2. of [Imagen
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
usually at the expense of lower image quality.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
height (`int`, *optional*, defaults to self.unet.config.sample_size):
The height in pixels of the generated image.
width (`int`, *optional*, defaults to self.unet.config.sample_size):
The width in pixels of the generated image.
eta (`float`, *optional*, defaults to 0.0):
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
[`schedulers.DDIMScheduler`], will be ignored for others.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic.
latents (`torch.FloatTensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings.
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be "". If not
provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.
negative_prompt_attention_mask (`torch.FloatTensor`, *optional*):
Pre-generated attention mask for negative text embeddings.
output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step.
clean_caption (`bool`, *optional*, defaults to `True`):
Whether or not to clean the caption before creating embeddings. Requires `beautifulsoup4` and `ftfy` to
be installed. If the dependencies are not installed, the embeddings will be created from the raw
prompt.
use_resolution_binning (`bool` defaults to `True`):
If set to `True`, the requested height and width are first mapped to the closest resolutions using
`ASPECT_RATIO_1024_BIN`. After the produced latents are decoded into images, they are resized back to
the requested resolution. Useful for generating non-square images.
Examples:
Returns:
[`~pipelines.ImagePipelineOutput`] or `tuple`:
If `return_dict` is `True`, [`~pipelines.ImagePipelineOutput`] is returned, otherwise a `tuple` is
returned where the first element is a list with the generated images
"""
if "mask_feature" in kwargs:
deprecation_message = "The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version."
deprecate("mask_feature", "1.0.0", deprecation_message, standard_warn=False)
# 1. Check inputs. Raise error if not correct
height = height or self.transformer.config.sample_size * self.vae_scale_factor
width = width or self.transformer.config.sample_size * self.vae_scale_factor
if use_resolution_binning:
aspect_ratio_bin = (
ASPECT_RATIO_1024_BIN if self.transformer.config.sample_size == 128 else ASPECT_RATIO_512_BIN
)
orig_height, orig_width = height, width
height, width = self.image_processor.classify_height_width_bin(height, width, ratios=aspect_ratio_bin)
self.check_inputs(
prompt,
height,
width,
negative_prompt,
callback_steps,
prompt_embeds,
negative_prompt_embeds,
prompt_attention_mask,
negative_prompt_attention_mask,
)
# 2. Default height and width to transformer
if prompt is not None and isinstance(prompt, str):
batch_size = 1
elif prompt is not None and isinstance(prompt, list):
batch_size = len(prompt)
else:
batch_size = prompt_embeds.shape[0]
device = self._execution_device
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
# 3. Encode input prompt
(
prompt_embeds,
prompt_attention_mask,
negative_prompt_embeds,
negative_prompt_attention_mask,
) = self.encode_prompt(
prompt,
do_classifier_free_guidance,
negative_prompt=negative_prompt,
num_images_per_prompt=num_images_per_prompt,
device=device,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
clean_caption=clean_caption,
)
if do_classifier_free_guidance:
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
prompt_attention_mask = torch.cat([negative_prompt_attention_mask, prompt_attention_mask], dim=0)
# 4. Prepare timesteps
timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
timesteps, num_inference_steps = self.get_timesteps(
num_inference_steps=num_inference_steps, strength=strength, device=device
)
# at which timestep to set the initial noise (n.b. 50% if strength is 0.5)
latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)
# create a boolean to check if the strength is set to 1. if so then initialise the latents with pure noise
is_strength_max = strength == 1.0
init_image = self.image_processor.preprocess(image, height=height, width=width)
init_image = init_image.to(dtype=torch.float32)
# 5. Prepare latents.
latent_channels = self.transformer.config.in_channels
latents_outputs = self.prepare_latents(
batch_size * num_images_per_prompt,
latent_channels,
height,
width,
prompt_embeds.dtype,
device,
generator,
latents,
image=init_image,
timestep=latent_timestep,
is_strength_max=is_strength_max,
)
latents, noise, image_latents = latents_outputs
mask_condition = self.mask_processor.preprocess(mask_image, height=height, width=width)
mask = self.prepare_mask_latents(
mask_condition,
batch_size * num_images_per_prompt,
height,
width,
prompt_embeds.dtype,
device,
generator,
do_classifier_free_guidance,
)
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 6.1 Prepare micro-conditions.
added_cond_kwargs = {"resolution": None, "aspect_ratio": None}
if self.transformer.config.sample_size == 128:
resolution = torch.tensor([height, width]).repeat(batch_size * num_images_per_prompt, 1)
aspect_ratio = torch.tensor([float(height / width)]).repeat(batch_size * num_images_per_prompt, 1)
resolution = resolution.to(dtype=prompt_embeds.dtype, device=device)
aspect_ratio = aspect_ratio.to(dtype=prompt_embeds.dtype, device=device)
added_cond_kwargs = {"resolution": resolution, "aspect_ratio": aspect_ratio}
# 7. Denoising loop
num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(timesteps):
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
current_timestep = t
if not torch.is_tensor(current_timestep):
# TODO: this requires sync between CPU and GPU. So try to pass timesteps as tensors if you can
# This would be a good case for the `match` statement (Python 3.10+)
is_mps = latent_model_input.device.type == "mps"
if isinstance(current_timestep, float):
dtype = torch.float32 if is_mps else torch.float64
else:
dtype = torch.int32 if is_mps else torch.int64
current_timestep = torch.tensor([current_timestep], dtype=dtype, device=latent_model_input.device)
elif len(current_timestep.shape) == 0:
current_timestep = current_timestep[None].to(latent_model_input.device)
# broadcast to batch dimension in a way that's compatible with ONNX/Core ML
current_timestep = current_timestep.expand(latent_model_input.shape[0])
# predict noise model_output
noise_pred = self.transformer(
latent_model_input,
encoder_hidden_states=prompt_embeds,
encoder_attention_mask=prompt_attention_mask,
timestep=current_timestep,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)[0]
# perform guidance
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
# learned sigma
if self.transformer.config.out_channels // 2 == latent_channels:
noise_pred = noise_pred.chunk(2, dim=1)[0]
else:
noise_pred = noise_pred
# compute previous image: x_t -> x_t-1
latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
init_latents_proper = image_latents
if do_classifier_free_guidance:
init_mask, _ = mask.chunk(2)
else:
init_mask = mask
if i < len(timesteps) - 1:
noise_timestep = timesteps[i + 1]
init_latents_proper = self.scheduler.add_noise(
init_latents_proper, noise, torch.tensor([noise_timestep])
)
latents = (1 - init_mask) * init_latents_proper + init_mask * latents
# call the callback, if provided
if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
progress_bar.update()
if callback is not None and i % callback_steps == 0:
step_idx = i // getattr(self.scheduler, "order", 1)
callback(step_idx, t, latents)
if not output_type == "latent":
image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
if use_resolution_binning:
image = self.image_processor.resize_and_crop_tensor(image, orig_width, orig_height)
else:
image = latents
if not output_type == "latent":
image = self.image_processor.postprocess(image, output_type=output_type)
# Offload all models
self.maybe_free_model_hooks()
if not return_dict:
return (image,)
return ImagePipelineOutput(images=image)
================================================
FILE: PixArt-alpha-ToCa/scripts/pipeline_pixart_reference.py
================================================
# Copyright 2023 PixArt-Alpha Authors and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import html
import inspect
import re
import urllib.parse as ul
from typing import Callable, List, Optional, Tuple, Union
from PIL import Image
import torch
import torch.nn.functional as F
from transformers import T5EncoderModel, T5Tokenizer
from diffusers.image_processor import VaeImageProcessor, PipelineImageInput
from diffusers.models import AutoencoderKL, Transformer2DModel
from diffusers.schedulers import DPMSolverMultistepScheduler
from diffusers.utils import (
BACKENDS_MAPPING,
deprecate,
is_bs4_available,
is_ftfy_available,
logging,
replace_example_docstring,
)
from diffusers.utils.torch_utils import randn_tensor
from diffusers.pipelines.pipeline_utils import DiffusionPipeline, ImagePipelineOutput
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
if is_bs4_available():
from bs4 import BeautifulSoup
if is_ftfy_available():
import ftfy
EXAMPLE_DOC_STRING = """
Examples:
```py
>>> import PIL
>>> from io import BytesIO
>>> import requests
>>> import torch
>>> from diffusers import PixArtAlphaReferencePipeline
>>> def download_image(url):
... response = requests.get(url)
... return PIL.Image.open(BytesIO(response.content)).convert("RGB")
>>> # You can replace the checkpoint id with "PixArt-alpha/PixArt-XL-2-512x512" too.
>>> pipe = PixArtAlphaReferencePipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)
>>> pipe = pipe.to('cuda')
>>> img_url = "http://p1.qhimgs4.com/t01fef6f9d5e69335dd.jpg"
>>> ref_image = download_image(img_url).crop((0, 0, 2160, 2160)).resize((1024, 1024))
>>> image_out = pipe(
... prompt='',
... height=1024,
... width=1024,
... image=ref_image,
... num_inference_steps=20,
... guidance_scale=4.0,
... ).images[0]
```
"""
ASPECT_RATIO_1024_BIN = {
"0.25": [512.0, 2048.0],
"0.28": [512.0, 1856.0],
"0.32": [576.0, 1792.0],
"0.33": [576.0, 1728.0],
"0.35": [576.0, 1664.0],
"0.4": [640.0, 1600.0],
"0.42": [640.0, 1536.0],
"0.48": [704.0, 1472.0],
"0.5": [704.0, 1408.0],
"0.52": [704.0, 1344.0],
"0.57": [768.0, 1344.0],
"0.6": [768.0, 1280.0],
"0.68": [832.0, 1216.0],
"0.72": [832.0, 1152.0],
"0.78": [896.0, 1152.0],
"0.82": [896.0, 1088.0],
"0.88": [960.0, 1088.0],
"0.94": [960.0, 1024.0],
"1.0": [1024.0, 1024.0],
"1.07": [1024.0, 960.0],
"1.13": [1088.0, 960.0],
"1.21": [1088.0, 896.0],
"1.29": [1152.0, 896.0],
"1.38": [1152.0, 832.0],
"1.46": [1216.0, 832.0],
"1.67": [1280.0, 768.0],
"1.75": [1344.0, 768.0],
"2.0": [1408.0, 704.0],
"2.09": [1472.0, 704.0],
"2.4": [1536.0, 640.0],
"2.5": [1600.0, 640.0],
"3.0": [1728.0, 576.0],
"4.0": [2048.0, 512.0],
}
ASPECT_RATIO_512_BIN = {
"0.25": [256.0, 1024.0],
"0.28": [256.0, 928.0],
"0.32": [288.0, 896.0],
"0.33": [288.0, 864.0],
"0.35": [288.0, 832.0],
"0.4": [320.0, 800.0],
"0.42": [320.0, 768.0],
"0.48": [352.0, 736.0],
"0.5": [352.0, 704.0],
"0.52": [352.0, 672.0],
"0.57": [384.0, 672.0],
"0.6": [384.0, 640.0],
"0.68": [416.0, 608.0],
"0.72": [416.0, 576.0],
"0.78": [448.0, 576.0],
"0.82": [448.0, 544.0],
"0.88": [480.0, 544.0],
"0.94": [480.0, 512.0],
"1.0": [512.0, 512.0],
"1.07": [512.0, 480.0],
"1.13": [544.0, 480.0],
"1.21": [544.0, 448.0],
"1.29": [576.0, 448.0],
"1.38": [576.0, 416.0],
"1.46": [608.0, 416.0],
"1.67": [640.0, 384.0],
"1.75": [672.0, 384.0],
"2.0": [704.0, 352.0],
"2.09": [736.0, 352.0],
"2.4": [768.0, 320.0],
"2.5": [800.0, 320.0],
"3.0": [864.0, 288.0],
"4.0": [1024.0, 256.0],
}
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
def retrieve_timesteps(
scheduler,
num_inference_steps: Optional[int] = None,
device: Optional[Union[str, torch.device]] = None,
timesteps: Optional[List[int]] = None,
**kwargs,
):
"""
Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles
custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.
Args:
scheduler (`SchedulerMixin`):
The scheduler to get timesteps from.
num_inference_steps (`int`):
The number of diffusion steps used when generating samples with a pre-trained model. If used,
`timesteps` must be `None`.
device (`str` or `torch.device`, *optional*):
The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
timesteps (`List[int]`, *optional*):
Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
timestep spacing strategy of the scheduler is used. If `timesteps` is passed, `num_inference_steps`
must be `None`.
Returns:
`Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the
second element is the number of inference steps.
"""
if timesteps is not None:
accepts_timesteps = "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
if not accepts_timesteps:
raise ValueError(
f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
f" timestep schedules. Please check whether you are using the correct scheduler."
)
scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
timesteps = scheduler.timesteps
num_inference_steps = len(timesteps)
else:
scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
timesteps = scheduler.timesteps
return timesteps, num_inference_steps
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents
def retrieve_latents(
encoder_output: torch.Tensor, generator: Optional[torch.Generator] = None, sample_mode: str = "sample"
):
if hasattr(encoder_output, "latent_dist") and sample_mode == "sample":
return encoder_output.latent_dist.sample(generator)
elif hasattr(encoder_output, "latent_dist") and sample_mode == "argmax":
return encoder_output.latent_dist.mode()
elif hasattr(encoder_output, "latents"):
return encoder_output.latents
else:
raise AttributeError("Could not access latents of provided encoder_output")
class PixArtAlphaReferencePipeline(DiffusionPipeline):
r"""
Pipeline for image-to-image generation using PixArt-Alpha.
This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
Args:
vae ([`AutoencoderKL`]):
Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder ([`T5EncoderModel`]):
Frozen text-encoder. PixArt-Alpha uses
[T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the
[t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) variant.
tokenizer (`T5Tokenizer`):
Tokenizer of class
[T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).
transformer ([`Transformer2DModel`]):
A text conditioned `Transformer2DModel` to denoise the encoded image latents.
scheduler ([`SchedulerMixin`]):
A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
"""
bad_punct_regex = re.compile(
r"["
+ "#®•©™&@·º½¾¿¡§~"
+ r"\)"
+ r"\("
+ r"\]"
+ r"\["
+ r"\}"
+ r"\{"
+ r"\|"
+ "\\"
+ r"\/"
+ r"\*"
+ r"]{1,}"
) # noqa
_optional_components = ["tokenizer", "text_encoder"]
model_cpu_offload_seq = "text_encoder->transformer->vae"
def __init__(
self,
tokenizer: T5Tokenizer,
text_encoder: T5EncoderModel,
vae: AutoencoderKL,
transformer: Transformer2DModel,
scheduler: DPMSolverMultistepScheduler,
):
super().__init__()
self.register_modules(
tokenizer=tokenizer, text_encoder=text_encoder, vae=vae, transformer=transformer, scheduler=scheduler
)
self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.mask_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True
)
# Adapted from https://github.com/PixArt-alpha/PixArt-alpha/blob/master/diffusion/model/utils.py
def mask_text_embeddings(self, emb, mask):
if emb.shape[0] == 1:
keep_index = mask.sum().item()
return emb[:, :, :keep_index, :], keep_index
else:
masked_feature = emb * mask[:, None, :, None]
return masked_feature, emb.shape[2]
# Adapted from diffusers.pipelines.deepfloyd_if.pipeline_if.encode_prompt
def encode_prompt(
self,
prompt: Union[str, List[str]],
do_classifier_free_guidance: bool = True,
negative_prompt: str = "",
num_images_per_prompt: int = 1,
device: Optional[torch.device] = None,
prompt_embeds: Optional[torch.FloatTensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,
clean_caption: bool = False,
**kwargs,
):
r"""
Encodes the prompt into text encoder hidden states.
Args:
prompt (`str` or `List[str]`, *optional*):
prompt to be encoded
negative_prompt (`str` or `List[str]`, *optional*):
The prompt not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds`
instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). For
PixArt-Alpha, this should be "".
do_classifier_free_guidance (`bool`, *optional*, defaults to `True`):
whether to use classifier free guidance or not
num_images_per_prompt (`int`, *optional*, defaults to 1):
number of images that should be generated per prompt
device: (`torch.device`, *optional*):
torch device to place the resulting embeddings on
prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the ""
string.
clean_caption (bool, defaults to `False`):
If `True`, the function will preprocess and clean the provided caption before encoding.
"""
if "mask_feature" in kwargs:
deprecation_message = "The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version."
deprecate("mask_feature", "1.0.0", deprecation_message, standard_warn=False)
if device is None:
device = self._execution_device
if prompt is not None and isinstance(prompt, str):
batch_size = 1
elif prompt is not None and isinstance(prompt, list):
batch_size = len(prompt)
else:
batch_size = prompt_embeds.shape[0]
# See Section 3.1. of the paper.
max_length = 120
if prompt_embeds is None:
prompt = self._text_preprocessing(prompt, clean_caption=clean_caption)
text_inputs = self.tokenizer(
prompt,
padding="max_length",
max_length=max_length,
truncation=True,
add_special_tokens=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
untruncated_ids = self.tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
text_input_ids, untruncated_ids
):
removed_text = self.tokenizer.batch_decode(untruncated_ids[:, max_length - 1: -1])
logger.warning(
"The following part of your input was truncated because CLIP can only handle sequences up to"
f" {max_length} tokens: {removed_text}"
)
prompt_attention_mask = text_inputs.attention_mask
prompt_attention_mask = prompt_attention_mask.to(device)
prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=prompt_attention_mask)
prompt_embeds = prompt_embeds[0]
if self.text_encoder is not None:
dtype = self.text_encoder.dtype
elif self.transformer is not None:
dtype = self.transformer.dtype
else:
dtype = None
prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)
bs_embed, seq_len, _ = prompt_embeds.shape
# duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method
prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)
prompt_attention_mask = prompt_attention_mask.view(bs_embed, -1)
prompt_attention_mask = prompt_attention_mask.repeat(num_images_per_prompt, 1)
# get unconditional embeddings for classifier free guidance
if do_classifier_free_guidance and negative_prompt_embeds is None:
uncond_tokens = [negative_prompt] * batch_size
uncond_tokens = self._text_preprocessing(uncond_tokens, clean_caption=clean_caption)
max_length = prompt_embeds.shape[1]
uncond_input = self.tokenizer(
uncond_tokens,
padding="max_length",
max_length=max_length,
truncation=True,
return_attention_mask=True,
add_special_tokens=True,
return_tensors="pt",
)
negative_prompt_attention_mask = uncond_input.attention_mask
negative_prompt_attention_mask = negative_prompt_attention_mask.to(device)
negative_prompt_embeds = self.text_encoder(
uncond_input.input_ids.to(device), attention_mask=negative_prompt_attention_mask
)
negative_prompt_embeds = negative_prompt_embeds[0]
if do_classifier_free_guidance:
# duplicate unconditional embeddings for each generation per prompt, using mps friendly method
seq_len = negative_prompt_embeds.shape[1]
negative_prompt_embeds = negative_prompt_embeds.to(dtype=dtype, device=device)
negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
negative_prompt_attention_mask = negative_prompt_attention_mask.view(bs_embed, -1)
negative_prompt_attention_mask = negative_prompt_attention_mask.repeat(num_images_per_prompt, 1)
else:
negative_prompt_embeds = None
negative_prompt_attention_mask = None
return prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
def prepare_extra_step_kwargs(self, generator, eta):
# prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
# eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
# eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
# and should be between [0, 1]
accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
extra_step_kwargs = {}
if accepts_eta:
extra_step_kwargs["eta"] = eta
# check if the scheduler accepts generator
accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
if accepts_generator:
extra_step_kwargs["generator"] = generator
return extra_step_kwargs
def check_inputs(
self,
prompt,
image,
height,
width,
negative_prompt,
callback_steps,
prompt_embeds=None,
negative_prompt_embeds=None,
prompt_attention_mask=None,
negative_prompt_attention_mask=None,
):
if height % 8 != 0 or width % 8 != 0:
raise ValueError(f"`height` and `width` have to be divisible by 8 but are {height} and {width}.")
if (callback_steps is None) or (
callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0)
):
raise ValueError(
f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
f" {type(callback_steps)}."
)
if prompt is not None and prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
" only forward one of the two."
)
elif prompt is None and prompt_embeds is None:
raise ValueError(
"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
)
elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
if prompt is not None and negative_prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `prompt`: {prompt} and `negative_prompt_embeds`:"
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
)
if negative_prompt is not None and negative_prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
)
if prompt_embeds is not None and prompt_attention_mask is None:
raise ValueError("Must provide `prompt_attention_mask` when specifying `prompt_embeds`.")
if negative_prompt_embeds is not None and negative_prompt_attention_mask is None:
raise ValueError("Must provide `negative_prompt_attention_mask` when specifying `negative_prompt_embeds`.")
if prompt_embeds is not None and negative_prompt_embeds is not None:
if prompt_embeds.shape != negative_prompt_embeds.shape:
raise ValueError(
"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
f" {negative_prompt_embeds.shape}."
)
if prompt_attention_mask.shape != negative_prompt_attention_mask.shape:
raise ValueError(
"`prompt_attention_mask` and `negative_prompt_attention_mask` must have the same shape when passed directly, but"
f" got: `prompt_attention_mask` {prompt_attention_mask.shape} != `negative_prompt_attention_mask`"
f" {negative_prompt_attention_mask.shape}."
)
if image is None:
raise ValueError(
"Provide `image`. Cannot leave `image` undefined."
)
# Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._text_preprocessing
def _text_preprocessing(self, text, clean_caption=False):
if clean_caption and not is_bs4_available():
logger.warn(BACKENDS_MAPPING["bs4"][-1].format("Setting `clean_caption=True`"))
logger.warn("Setting `clean_caption` to False...")
clean_caption = False
if clean_caption and not is_ftfy_available():
logger.warn(BACKENDS_MAPPING["ftfy"][-1].format("Setting `clean_caption=True`"))
logger.warn("Setting `clean_caption` to False...")
clean_caption = False
if not isinstance(text, (tuple, list)):
text = [text]
def process(text: str):
if clean_caption:
text = self._clean_caption(text)
text = self._clean_caption(text)
else:
text = text.lower().strip()
return text
return [process(t) for t in text]
# Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption
def _clean_caption(self, caption):
caption = str(caption)
caption = ul.unquote_plus(caption)
caption = caption.strip().lower()
caption = re.sub("", "person", caption)
# urls:
caption = re.sub(
r"\b((?:https?:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))",
# noqa
"",
caption,
) # regex for urls
caption = re.sub(
r"\b((?:www:(?:\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\w/-]*\b\/?(?!@)))",
# noqa
"",
caption,
) # regex for urls
# html:
caption = BeautifulSoup(caption, features="html.parser").text
# @
caption = re.sub(r"@[\w\d]+\b", "", caption)
# 31C0—31EF CJK Strokes
# 31F0—31FF Katakana Phonetic Extensions
# 3200—32FF Enclosed CJK Letters and Months
# 3300—33FF CJK Compatibility
# 3400—4DBF CJK Unified Ideographs Extension A
# 4DC0—4DFF Yijing Hexagram Symbols
# 4E00—9FFF CJK Unified Ideographs
caption = re.sub(r"[\u31c0-\u31ef]+", "", caption)
caption = re.sub(r"[\u31f0-\u31ff]+", "", caption)
caption = re.sub(r"[\u3200-\u32ff]+", "", caption)
caption = re.sub(r"[\u3300-\u33ff]+", "", caption)
caption = re.sub(r"[\u3400-\u4dbf]+", "", caption)
caption = re.sub(r"[\u4dc0-\u4dff]+", "", caption)
caption = re.sub(r"[\u4e00-\u9fff]+", "", caption)
#######################################################
# все виды тире / all types of dash --> "-"
caption = re.sub(
r"[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]+",
# noqa
"-",
caption,
)
# кавычки к одному стандарту
caption = re.sub(r"[`´«»“”¨]", '"', caption)
caption = re.sub(r"[‘’]", "'", caption)
# "
caption = re.sub(r""?", "", caption)
# &
caption = re.sub(r"&", "", caption)
# ip adresses:
caption = re.sub(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", " ", caption)
# article ids:
caption = re.sub(r"\d:\d\d\s+$", "", caption)
# \n
caption = re.sub(r"\\n", " ", caption)
# "#123"
caption = re.sub(r"#\d{1,3}\b", "", caption)
# "#12345.."
caption = re.sub(r"#\d{5,}\b", "", caption)
# "123456.."
caption = re.sub(r"\b\d{6,}\b", "", caption)
# filenames:
caption = re.sub(r"[\S]+\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)", "", caption)
#
caption = re.sub(r"[\"\']{2,}", r'"', caption) # """AUSVERKAUFT"""
caption = re.sub(r"[\.]{2,}", r" ", caption) # """AUSVERKAUFT"""
caption = re.sub(self.bad_punct_regex, r" ", caption) # ***AUSVERKAUFT***, #AUSVERKAUFT
caption = re.sub(r"\s+\.\s+", r" ", caption) # " . "
# this-is-my-cute-cat / this_is_my_cute_cat
regex2 = re.compile(r"(?:\-|\_)")
if len(re.findall(regex2, caption)) > 3:
caption = re.sub(regex2, " ", caption)
caption = ftfy.fix_text(caption)
caption = html.unescape(html.unescape(caption))
caption = re.sub(r"\b[a-zA-Z]{1,3}\d{3,15}\b", "", caption) # jc6640
caption = re.sub(r"\b[a-zA-Z]+\d+[a-zA-Z]+\b", "", caption) # jc6640vc
caption = re.sub(r"\b\d+[a-zA-Z]+\d+\b", "", caption) # 6640vc231
caption = re.sub(r"(worldwide\s+)?(free\s+)?shipping", "", caption)
caption = re.sub(r"(free\s)?download(\sfree)?", "", caption)
caption = re.sub(r"\bclick\b\s(?:for|on)\s\w+", "", caption)
caption = re.sub(r"\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\simage[s]?)?", "", caption)
caption = re.sub(r"\bpage\s+\d+\b", "", caption)
caption = re.sub(r"\b\d*[a-zA-Z]+\d+[a-zA-Z]+\d+[a-zA-Z\d]*\b", r" ", caption) # j2d1a2a...
caption = re.sub(r"\b\d+\.?\d*[xх×]\d+\.?\d*\b", "", caption)
caption = re.sub(r"\b\s+\:\s+", r": ", caption)
caption = re.sub(r"(\D[,\./])\b", r"\1 ", caption)
caption = re.sub(r"\s+", " ", caption)
caption.strip()
caption = re.sub(r"^[\"\']([\w\W]+)[\"\']$", r"\1", caption)
caption = re.sub(r"^[\'\_,\-\:;]", r"", caption)
caption = re.sub(r"[\'\_,\-\:\-\+]$", r"", caption)
caption = re.sub(r"^\.\S+$", "", caption)
return caption.strip()
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None,
image=None,
timestep=None,
is_strength_max=True,
return_image_latents=True,
):
shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
if isinstance(generator, list) and len(generator) != batch_size:
raise ValueError(
f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
f" size of {batch_size}. Make sure the batch size matches the length of the generators."
)
if (image is None or timestep is None) and not is_strength_max:
raise ValueError(
"Since strength < 1. initial latents are to be initialised as a combination of Image + Noise."
"However, either the image or the noise timestep has not been provided."
)
if return_image_latents or (latents is None and not is_strength_max):
image = image.to(device=device, dtype=dtype)
if image.shape[1] == 4:
image_latents = image
else:
image_latents = self._encode_vae_image(image=image, generator=generator)
image_latents = image_latents.repeat(batch_size // image_latents.shape[0], 1, 1, 1)
if latents is None:
noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
# if strength is 1. then initialise the latents to noise, else initial to image + noise
latents = noise if is_strength_max else self.scheduler.add_noise(image_latents, noise, timestep)
# if pure noise then scale the initial latents by the Scheduler's init sigma
latents = latents * self.scheduler.init_noise_sigma if is_strength_max else latents
else:
noise = latents.to(device)
latents = noise * self.scheduler.init_noise_sigma
# scale the initial noise by the standard deviation required by the scheduler
latents = latents * self.scheduler.init_noise_sigma
return latents, noise, image_latents
@staticmethod
def classify_height_width_bin(height: int, width: int, ratios: dict) -> Tuple[int, int]:
"""Returns binned height and width."""
ar = float(height / width)
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
default_hw = ratios[closest_ratio]
return int(default_hw[0]), int(default_hw[1])
@staticmethod
def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor:
orig_height, orig_width = samples.shape[2], samples.shape[3]
# Check if resizing is needed
if orig_height != new_height or orig_width != new_width:
ratio = max(new_height / orig_height, new_width / orig_width)
resized_width = int(orig_width * ratio)
resized_height = int(orig_height * ratio)
# Resize
samples = F.interpolate(
samples, size=(resized_height, resized_width), mode="bilinear", align_corners=False
)
# Center Crop
start_x = (resized_width - new_width) // 2
end_x = start_x + new_width
start_y = (resized_height - new_height) // 2
end_y = start_y + new_height
samples = samples[:, :, start_y:end_y, start_x:end_x]
return samples
def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator):
if isinstance(generator, list):
image_latents = [
retrieve_latents(self.vae.encode(image[i: i + 1]), generator=generator[i])
for i in range(image.shape[0])
]
image_latents = torch.cat(image_latents, dim=0)
else:
image_latents = retrieve_latents(self.vae.encode(image), generator=generator)
image_latents = self.vae.config.scaling_factor * image_latents
return image_latents
def prepare_mask_latents(
self, mask, batch_size, height, width, dtype, device, generator, do_classifier_free_guidance
):
# resize the mask to latents shape as we concatenate the mask to the latents
# we do that before converting to dtype to avoid breaking in case we're using cpu_offload
# and half precision
mask = torch.nn.functional.interpolate(
mask, size=(height // self.vae_scale_factor, width // self.vae_scale_factor)
)
mask = mask.to(device=device, dtype=dtype)
if mask.shape[0] < batch_size:
if not batch_size % mask.shape[0] == 0:
raise ValueError(
"The passed mask and the required batch size don't match. Masks are supposed to be duplicated to"
f" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number"
" of masks that you pass is divisible by the total requested batch size."
)
mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1)
mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask
return mask
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.get_timesteps
def get_timesteps(self, num_inference_steps, strength, device):
# get the original timestep using init_timestep
init_timestep = min(int(num_inference_steps * strength), num_inference_steps)
t_start = max(num_inference_steps - init_timestep, 0)
timesteps = self.scheduler.timesteps[t_start * self.scheduler.order:]
return timesteps, num_inference_steps - t_start
@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
self,
prompt: Union[str, List[str]] = None,
image: PipelineImageInput = None,
strength: float = 1.0,
negative_prompt: str = "",
num_inference_steps: int = 20,
timesteps: List[int] = None,
guidance_scale: float = 4.5,
num_images_per_prompt: Optional[int] = 1,
height: Optional[int] = None,
width: Optional[int] = None,
eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
callback_steps: int = 1,
clean_caption: bool = True,
use_resolution_binning: bool = True,
**kwargs,
) -> Union[ImagePipelineOutput, Tuple]:
"""
Function invoked when calling the pipeline for generation.
Args:
prompt (`str` or `List[str]`, *optional*):
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
The reference image guides the image generation.
negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`).
num_inference_steps (`int`, *optional*, defaults to 100):
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
expense of slower inference.
timesteps (`List[int]`, *optional*):
Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps`
timesteps are used. Must be in descending order.
guidance_scale (`float`, *optional*, defaults to 4.5):
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
`guidance_scale` is defined as `w` of equation 2. of [Imagen
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
usually at the expense of lower image quality.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
height (`int`, *optional*, defaults to self.unet.config.sample_size):
The height in pixels of the generated image.
width (`int`, *optional*, defaults to self.unet.config.sample_size):
The width in pixels of the generated image.
eta (`float`, *optional*, defaults to 0.0):
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
[`schedulers.DDIMScheduler`], will be ignored for others.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic.
latents (`torch.FloatTensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings.
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be "". If not
provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.
negative_prompt_attention_mask (`torch.FloatTensor`, *optional*):
Pre-generated attention mask for negative text embeddings.
output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step.
clean_caption (`bool`, *optional*, defaults to `True`):
Whether or not to clean the caption before creating embeddings. Requires `beautifulsoup4` and `ftfy` to
be installed. If the dependencies are not installed, the embeddings will be created from the raw
prompt.
use_resolution_binning (`bool` defaults to `True`):
If set to `True`, the requested height and width are first mapped to the closest resolutions using
`ASPECT_RATIO_1024_BIN`. After the produced latents are decoded into images, they are resized back to
the requested resolution. Useful for generating non-square images.
Examples:
Returns:
[`~pipelines.ImagePipelineOutput`] or `tuple`:
If `return_dict` is `True`, [`~pipelines.ImagePipelineOutput`] is returned, otherwise a `tuple` is
returned where the first element is a list with the generated images
"""
if "mask_feature" in kwargs:
deprecation_message = "The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version."
deprecate("mask_feature", "1.0.0", deprecation_message, standard_warn=False)
# 1. Check inputs. Raise error if not correct
height = height or self.transformer.config.sample_size * self.vae_scale_factor
width = width or self.transformer.config.sample_size * self.vae_scale_factor
width *= 2
ref = image
image = Image.new("RGB", (width, height), (255, 255, 255))
image.paste(ref, (0, 0))
mask_image = Image.new("RGB", (width, height), (255, 255, 255))
balck_rect = Image.new("RGB", (width // 2, height), (0, 0, 0))
mask_image.paste(balck_rect, (0, 0))
if use_resolution_binning:
aspect_ratio_bin = (
ASPECT_RATIO_1024_BIN if self.transformer.config.sample_size == 128 else ASPECT_RATIO_512_BIN
)
orig_height, orig_width = height, width
height, width = self.classify_height_width_bin(height, width, ratios=aspect_ratio_bin)
self.check_inputs(
prompt,
image,
height,
width,
negative_prompt,
callback_steps,
prompt_embeds,
negative_prompt_embeds,
prompt_attention_mask,
negative_prompt_attention_mask,
)
# 2. Default height and width to transformer
if prompt is not None and isinstance(prompt, str):
batch_size = 1
elif prompt is not None and isinstance(prompt, list):
batch_size = len(prompt)
else:
batch_size = prompt_embeds.shape[0]
device = self._execution_device
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
# 3. Encode input prompt
(
prompt_embeds,
prompt_attention_mask,
negative_prompt_embeds,
negative_prompt_attention_mask,
) = self.encode_prompt(
prompt,
do_classifier_free_guidance,
negative_prompt=negative_prompt,
num_images_per_prompt=num_images_per_prompt,
device=device,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
clean_caption=clean_caption,
)
if do_classifier_free_guidance:
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
prompt_attention_mask = torch.cat([negative_prompt_attention_mask, prompt_attention_mask], dim=0)
# 4. Prepare timesteps
timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
timesteps, num_inference_steps = self.get_timesteps(
num_inference_steps=num_inference_steps, strength=strength, device=device
)
# at which timestep to set the initial noise (n.b. 50% if strength is 0.5)
latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)
# create a boolean to check if the strength is set to 1. if so then initialise the latents with pure noise
is_strength_max = strength == 1.0
init_image = self.image_processor.preprocess(image, height=height, width=width)
init_image = init_image.to(dtype=torch.float32)
# 5. Prepare latents.
latent_channels = self.transformer.config.in_channels
latents_outputs = self.prepare_latents(
batch_size * num_images_per_prompt,
latent_channels,
height,
width,
prompt_embeds.dtype,
device,
generator,
latents,
image=init_image,
timestep=latent_timestep,
is_strength_max=is_strength_max,
)
latents, noise, image_latents = latents_outputs
mask_condition = self.mask_processor.preprocess(mask_image, height=height, width=width)
mask = self.prepare_mask_latents(
mask_condition,
batch_size * num_images_per_prompt,
height,
width,
prompt_embeds.dtype,
device,
generator,
do_classifier_free_guidance,
)
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 6.1 Prepare micro-conditions.
added_cond_kwargs = {"resolution": None, "aspect_ratio": None}
if self.transformer.config.sample_size == 128:
resolution = torch.tensor([height, width]).repeat(batch_size * num_images_per_prompt, 1)
aspect_ratio = torch.tensor([float(height / width)]).repeat(batch_size * num_images_per_prompt, 1)
resolution = resolution.to(dtype=prompt_embeds.dtype, device=device)
aspect_ratio = aspect_ratio.to(dtype=prompt_embeds.dtype, device=device)
added_cond_kwargs = {"resolution": resolution, "aspect_ratio": aspect_ratio}
# 7. Denoising loop
num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(timesteps):
latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
current_timestep = t
if not torch.is_tensor(current_timestep):
# TODO: this requires sync between CPU and GPU. So try to pass timesteps as tensors if you can
# This would be a good case for the `match` statement (Python 3.10+)
is_mps = latent_model_input.device.type == "mps"
if isinstance(current_timestep, float):
dtype = torch.float32 if is_mps else torch.float64
else:
dtype = torch.int32 if is_mps else torch.int64
current_timestep = torch.tensor([current_timestep], dtype=dtype, device=latent_model_input.device)
elif len(current_timestep.shape) == 0:
current_timestep = current_timestep[None].to(latent_model_input.device)
# broadcast to batch dimension in a way that's compatible with ONNX/Core ML
# predict noise model_output
noise_pred = self.transformer(
latent_model_input,
encoder_hidden_states=prompt_embeds,
encoder_attention_mask=prompt_attention_mask,
timestep=current_timestep,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)[0]
# perform guidance
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
# learned sigma
if self.transformer.config.out_channels // 2 == latent_channels:
noise_pred = noise_pred.chunk(2, dim=1)[0]
else:
noise_pred = noise_pred
# compute previous image: x_t -> x_t-1
latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
init_latents_proper = image_latents
if do_classifier_free_guidance:
init_mask, _ = mask.chunk(2)
else:
init_mask = mask
if i < len(timesteps) - 1:
noise_timestep = timesteps[i + 1]
init_latents_proper = self.scheduler.add_noise(
init_latents_proper, noise, torch.tensor([noise_timestep])
)
latents_ = latents
latents = (1 - init_mask) * init_latents_proper + init_mask * latents
latent_model_input = torch.cat([latents_] + [latents]) if do_classifier_free_guidance else latents
# call the callback, if provided
if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
progress_bar.update()
if callback is not None and i % callback_steps == 0:
step_idx = i // getattr(self.scheduler, "order", 1)
callback(step_idx, t, latents)
if not output_type == "latent":
image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
if use_resolution_binning:
image = self.resize_and_crop_tensor(image, orig_width, orig_height)
else:
image = latents
image = image.chunk(2, -1)[1]
if not output_type == "latent":
image = self.image_processor.postprocess(image, output_type=output_type)
# Offload all models
self.maybe_free_model_hooks()
if not return_dict:
return (image,)
return ImagePipelineOutput(images=image)
================================================
FILE: PixArt-alpha-ToCa/timing_analysis.py
================================================
import json
import numpy as np
import matplotlib.pyplot as plt
with open('timing_info.json', 'r') as f:
data = json.load(f)
attn_times = []
cross_attn_times = []
mlp_times = []
block_times = []
for entry in data:
timing_info = entry['timing_info']
attn_times.extend(timing_info['attn_time'])
cross_attn_times.extend(timing_info['cross_attn_time'])
mlp_times.extend(timing_info['mlp_time'])
block_times.extend(timing_info['block_time'])
average_attn_time = np.mean(attn_times)
average_cross_attn_time = np.mean(cross_attn_times)
average_mlp_time = np.mean(mlp_times)
average_block_time = np.mean(block_times)
print(f"Average Attention Time: {average_attn_time:.4f} ms")
print(f"Average Cross Attention Time: {average_cross_attn_time:.4f} ms")
print(f"Average MLP Time: {average_mlp_time:.4f} ms")
print(f"Average Block Time: {average_block_time:.4f} ms")
labels = ['Attention', 'Cross Attention', 'MLP', 'Block']
avg_times = [average_attn_time, average_cross_attn_time, average_mlp_time, average_block_time]
plt.bar(labels, avg_times, color=['blue', 'green', 'red', 'orange'])
plt.ylabel('Average Time (ms)')
plt.title('Average Time per Module')
plt.savefig('module_average_times.png')
================================================
FILE: PixArt-alpha-ToCa/timing_info.json
================================================
[{"timing_info": {"block_time": [10.906271934509277], "attn_time": [7.704576015472412], "cross_attn_time": [0.9379839897155762], "mlp_time": [2.0203518867492676]}, "current": {"num_steps": 20, "step": 0, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.602560043334961], "attn_time": [0.5560320019721985], "cross_attn_time": [0.5662720203399658], "mlp_time": [0.30105599761009216]}, "current": {"num_steps": 20, "step": 0, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4970879554748535], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 0, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 0, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4755840301513672], "attn_time": [0.4925439953804016], "cross_attn_time": [0.52019202709198], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 0, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4776320457458496], "attn_time": [0.48742398619651794], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 0, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4428160190582275], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5038080215454102], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 0, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4407680034637451], "attn_time": [0.4761599898338318], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 0, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.46943998336792], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 0, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.465343952178955], "attn_time": [0.48230400681495667], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 0, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4632960557937622], "attn_time": [0.48742398619651794], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 0, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4612480401992798], "attn_time": [0.4761599898338318], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 0, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4592000246047974], "attn_time": [0.48230400681495667], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 0, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.435647964477539], "attn_time": [0.47308799624443054], "cross_attn_time": [0.506879985332489], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 0, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.46943998336792], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 0, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5421439409255981], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5631999969482422], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 0, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.474560022354126], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5048320293426514], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 0, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4725120067596436], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 0, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4499839544296265], "attn_time": [0.48230400681495667], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 0, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4899200201034546], "attn_time": [0.506879985332489], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 0, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4551039934158325], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 0, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.46943998336792], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 0, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 0, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4776320457458496], "attn_time": [0.4853760004043579], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 0, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.457152009010315], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 0, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.5099520087242126], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 0, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.462272047996521], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 0, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4888960123062134], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.30105599761009216]}, "current": {"num_steps": 20, "step": 0, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.733855962753296], "attn_time": [0.579584002494812], "cross_attn_time": [0.567296028137207], "mlp_time": [0.3266560137271881]}, "current": {"num_steps": 20, "step": 1, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5523840188980103], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 1, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.48742398619651794], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 1, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 1, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 1, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5242879986763], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 1, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.481727957725525], "attn_time": [0.48844799399375916], "cross_attn_time": [0.5038080215454102], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 1, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 1, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4796799421310425], "attn_time": [0.48127999901771545], "cross_attn_time": [0.5099520087242126], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 1, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 1, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5242879986763], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 1, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.4843519926071167], "cross_attn_time": [0.52019202709198], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 1, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 1, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 1, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 1, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 1, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [2.950144052505493], "attn_time": [0.5017600059509277], "cross_attn_time": [1.1509759426116943], "mlp_time": [0.9451519846916199]}, "current": {"num_steps": 20, "step": 1, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29900801181793213]}, "current": {"num_steps": 20, "step": 1, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4940160512924194], "attn_time": [0.4904960095882416], "cross_attn_time": [0.506879985332489], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 1, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 1, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 1, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4899200201034546], "attn_time": [0.4864000082015991], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 1, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5099520087242126], "cross_attn_time": [0.52019202709198], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 1, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [3.0791680812835693], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5181440114974976], "mlp_time": [1.8472959995269775]}, "current": {"num_steps": 20, "step": 1, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.6936960220336914], "attn_time": [0.6215680241584778], "cross_attn_time": [0.5591040253639221], "mlp_time": [0.30003198981285095]}, "current": {"num_steps": 20, "step": 1, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5421439409255981], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 1, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 1, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5452159643173218], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 1, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.6070079803466797], "attn_time": [0.5406720042228699], "cross_attn_time": [0.5591040253639221], "mlp_time": [0.3092480003833771]}, "current": {"num_steps": 20, "step": 2, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.598464012145996], "attn_time": [0.5355520248413086], "cross_attn_time": [0.5427200198173523], "mlp_time": [0.30822399258613586]}, "current": {"num_steps": 20, "step": 2, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5472639799118042], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5437440276145935], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 2, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29900801181793213]}, "current": {"num_steps": 20, "step": 2, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.56876802444458], "attn_time": [0.5191680192947388], "cross_attn_time": [0.5427200198173523], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 2, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5554560422897339], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.3041279911994934]}, "current": {"num_steps": 20, "step": 2, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5544320344924927], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5437440276145935], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 2, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 2, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5738879442214966], "attn_time": [0.5263360142707825], "cross_attn_time": [0.536575973033905], "mlp_time": [0.30105599761009216]}, "current": {"num_steps": 20, "step": 2, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.558527946472168], "attn_time": [0.5191680192947388], "cross_attn_time": [0.536575973033905], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 2, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5472639799118042], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 2, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5007359981536865], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 2, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.563647985458374], "attn_time": [0.5191680192947388], "cross_attn_time": [0.5396479964256287], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 2, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5441919565200806], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 2, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.6773120164871216], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.3491840064525604]}, "current": {"num_steps": 20, "step": 2, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5656960010528564], "attn_time": [0.5191680192947388], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.3041279911994934]}, "current": {"num_steps": 20, "step": 2, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5493119955062866], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.30105599761009216]}, "current": {"num_steps": 20, "step": 2, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 2, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546239972114563], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 2, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 2, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 2, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.572864055633545], "attn_time": [0.5283839702606201], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 2, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.558527946472168], "attn_time": [0.5191680192947388], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 2, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 2, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.504256010055542], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5109760165214539], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 2, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.506879985332489], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 2, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5109760165214539], "cross_attn_time": [0.532480001449585], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 2, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 2, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5578240156173706], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5457919836044312], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 3, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4915199875831604], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 3, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 3, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 3, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 3, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.491968035697937], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 3, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 3, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 3, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5452159643173218], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2979840040206909]}, "current": {"num_steps": 20, "step": 3, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4970879554748535], "attn_time": [0.4853760004043579], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 3, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 3, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 3, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.48742398619651794], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 3, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [2.6859519481658936], "attn_time": [1.0670080184936523], "cross_attn_time": [0.8294399976730347], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 3, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 3, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5109760165214539], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 3, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 3, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 3, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 3, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 3, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.48742398619651794], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 3, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 3, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.506879985332489], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 3, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.4843519926071167], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 3, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4878720045089722], "attn_time": [0.4853760004043579], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 3, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 3, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.506879985332489], "cross_attn_time": [0.52019202709198], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 3, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 3, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.547327995300293], "attn_time": [0.5232639908790588], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.30105599761009216]}, "current": {"num_steps": 20, "step": 4, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 4, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 4, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 4, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4940160512924194], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5079039931297302], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 4, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 4, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 4, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.4833280146121979], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 4, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 4, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 4, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5400960445404053], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29900801181793213]}, "current": {"num_steps": 20, "step": 4, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 4, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 4, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 4, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 4, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 4, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2979840040206909]}, "current": {"num_steps": 20, "step": 4, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.499135971069336], "attn_time": [0.4853760004043579], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 4, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 4, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 4, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 4, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 4, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 4, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.4864000082015991], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 4, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.4833280146121979], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 4, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 4, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.506879985332489], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 4, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 4, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.53711998462677], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5437440276145935], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 5, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5089280009269714], "cross_attn_time": [0.532480001449585], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 5, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 5, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 5, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5089280009269714], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 5, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 5, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5411200523376465], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.3020800054073334]}, "current": {"num_steps": 20, "step": 5, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 5, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 5, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 5, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 5, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 5, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 5, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 5, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 5, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.48844799399375916], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 5, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 5, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 5, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 5, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 5, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 5, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.491968035697937], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 5, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 5, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 5, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.5058559775352478], "cross_attn_time": [0.52019202709198], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 5, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 5, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.481727957725525], "attn_time": [0.48230400681495667], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 5, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 5, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5565439462661743], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5416960120201111], "mlp_time": [0.3041279911994934]}, "current": {"num_steps": 20, "step": 6, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.48742398619651794], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 6, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 6, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5011839866638184], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 6, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 6, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.506879985332489], "cross_attn_time": [0.52019202709198], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 6, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 6, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 6, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 6, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.5007359981536865], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 6, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 6, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 6, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4970879554748535], "attn_time": [0.48844799399375916], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 6, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4735360145568848], "attn_time": [0.48127999901771545], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 6, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 6, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 6, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 6, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 6, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 6, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 6, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 6, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5472639799118042], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 6, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 6, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.535904049873352], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 7, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 7, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5079039931297302], "cross_attn_time": [0.52019202709198], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 7, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.506879985332489], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 7, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 7, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5130239725112915], "cross_attn_time": [0.52019202709198], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 7, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5452159643173218], "attn_time": [0.502784013748169], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 7, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 7, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 7, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 7, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 7, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 7, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 7, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.506879985332489], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 7, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 7, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 7, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5441919565200806], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 7, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 7, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5038080215454102], "cross_attn_time": [0.532480001449585], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 7, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 7, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 7, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5089280009269714], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 7, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 7, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.502784013748169], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 7, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5534080266952515], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2969599962234497]}, "current": {"num_steps": 20, "step": 7, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5109760165214539], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 7, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.506879985332489], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 7, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5411200523376465], "attn_time": [0.5160959959030151], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 7, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546463966369629], "attn_time": [0.5181440114974976], "cross_attn_time": [0.536575973033905], "mlp_time": [0.3031040132045746]}, "current": {"num_steps": 20, "step": 8, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5017600059509277], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 8, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [3.84716796875], "attn_time": [0.749567985534668], "cross_attn_time": [0.5457919836044312], "mlp_time": [0.30720001459121704]}, "current": {"num_steps": 20, "step": 8, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 8, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 8, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 8, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 8, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 8, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.506879985332489], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 8, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 8, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 8, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 8, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.502784013748169], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 8, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 8, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 8, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.506879985332489], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 8, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 8, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 8, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5298559665679932], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 8, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 8, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 8, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 8, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 8, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.506879985332489], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 8, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 8, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.506879985332489], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 8, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 8, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 8, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5425920486450195], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5375999808311462], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 9, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 9, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.502784013748169], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 9, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 9, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 9, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 9, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 9, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 9, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 9, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.49459201097488403], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2744320034980774]}, "current": {"num_steps": 20, "step": 9, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 9, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.506879985332489], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 9, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 9, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 9, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 9, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 9, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 9, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 9, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 9, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.506879985332489], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 9, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 9, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5109760165214539], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 9, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.499135971069336], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 9, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5166079998016357], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 10, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.502784013748169], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 10, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 10, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 10, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 10, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 10, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 10, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5109760165214539], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 10, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.502784013748169], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 10, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.502784013748169], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 10, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 10, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 10, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 10, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 10, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.506879985332489], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 10, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.506879985332489], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 10, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 10, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 10, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5820800065994263], "attn_time": [0.536575973033905], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 10, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.6005120277404785], "attn_time": [0.5335040092468262], "cross_attn_time": [0.5478399991989136], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 10, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.551360011100769], "attn_time": [0.5120000243186951], "cross_attn_time": [0.536575973033905], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 10, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 10, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.6680959463119507], "attn_time": [0.5765119791030884], "cross_attn_time": [0.5488640069961548], "mlp_time": [0.30003198981285095]}, "current": {"num_steps": 20, "step": 10, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5718400478363037], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5437440276145935], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 10, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5523840188980103], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 10, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 10, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 10, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546239972114563], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 10, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5750720500946045], "attn_time": [0.5294079780578613], "cross_attn_time": [0.5529599785804749], "mlp_time": [0.29900801181793213]}, "current": {"num_steps": 20, "step": 11, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 11, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.502784013748169], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 11, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5544320344924927], "attn_time": [0.5222399830818176], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 11, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5089280009269714], "cross_attn_time": [0.532480001449585], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 11, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5171200037002563], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 11, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5375999808311462], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 11, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 11, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 11, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 11, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5646719932556152], "attn_time": [0.5345280170440674], "cross_attn_time": [0.5242879986763], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 11, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5400960445404053], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 11, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.506879985332489], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 11, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5242879986763], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 11, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5411200523376465], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 11, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5534080266952515], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.29900801181793213]}, "current": {"num_steps": 20, "step": 11, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.506879985332489], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 11, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [2.9480960369110107], "attn_time": [0.5120000243186951], "cross_attn_time": [0.8601599931716919], "mlp_time": [1.1284480094909668]}, "current": {"num_steps": 20, "step": 11, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5503360033035278], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5406720042228699], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 11, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 11, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.4925439953804016], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 11, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 11, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.506879985332489], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 11, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 11, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 11, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 11, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 11, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5493119955062866], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 11, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5667200088500977], "attn_time": [0.5396479964256287], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.30617600679397583]}, "current": {"num_steps": 20, "step": 12, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5605759620666504], "attn_time": [0.5222399830818176], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 12, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5482879877090454], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 12, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.502784013748169], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 12, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.5048320293426514], "cross_attn_time": [0.52019202709198], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 12, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5421439409255981], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 12, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 12, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5038080215454102], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 12, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 12, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 12, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5349760055541992], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 12, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 12, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 12, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 12, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.8565119504928589], "attn_time": [0.6768640279769897], "cross_attn_time": [0.5652480125427246], "mlp_time": [0.317440003156662]}, "current": {"num_steps": 20, "step": 12, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5615999698638916], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5447679758071899], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 12, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5534080266952515], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 12, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5523840188980103], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 12, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5441919565200806], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 12, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5472639799118042], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 12, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.551360011100769], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 12, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5431679487228394], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 12, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.502784013748169], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 12, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546239972114563], "attn_time": [0.506879985332489], "cross_attn_time": [0.536575973033905], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 12, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5441919565200806], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 12, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.558527946472168], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 12, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.551360011100769], "attn_time": [0.5140479803085327], "cross_attn_time": [0.532480001449585], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 12, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5534080266952515], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 12, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.593727946281433], "attn_time": [0.5375999808311462], "cross_attn_time": [0.5550079941749573], "mlp_time": [0.3051519989967346]}, "current": {"num_steps": 20, "step": 13, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5708160400390625], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5447679758071899], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 13, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5575040578842163], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5396479964256287], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 13, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5677440166473389], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5406720042228699], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 13, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5575040578842163], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 13, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5595519542694092], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5396479964256287], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 13, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5503360033035278], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 13, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5779839754104614], "attn_time": [0.5242879986763], "cross_attn_time": [0.536575973033905], "mlp_time": [0.3031040132045746]}, "current": {"num_steps": 20, "step": 13, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 13, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5411200523376465], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 13, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 13, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5411200523376465], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5355520248413086], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 13, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5605759620666504], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5375999808311462], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 13, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 13, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4940160512924194], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 13, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 13, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 13, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 13, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5298559665679932], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 13, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 13, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 13, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 13, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 13, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.502784013748169], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 13, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 13, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 13, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5421439409255981], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 13, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.504256010055542], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2764799892902374]}, "current": {"num_steps": 20, "step": 13, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5440959930419922], "attn_time": [0.5212159752845764], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.30003198981285095]}, "current": {"num_steps": 20, "step": 14, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5073280334472656], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5109760165214539], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 14, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 14, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.504256010055542], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 14, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 14, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5089280009269714], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 14, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 14, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5130239725112915], "cross_attn_time": [0.52019202709198], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 14, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5349760055541992], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 14, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 14, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.504256010055542], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 14, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 14, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5022079944610596], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 14, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5099520087242126], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 14, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 14, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [5.377024173736572], "attn_time": [1.6383999586105347], "cross_attn_time": [1.7756160497665405], "mlp_time": [1.4632960557937622]}, "current": {"num_steps": 20, "step": 14, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5130239725112915], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 14, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.539072036743164], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 14, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 14, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 14, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5349760055541992], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 14, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 14, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 14, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5421439409255981], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 14, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 14, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5400960445404053], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 14, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 14, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.502784013748169], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 14, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.540992021560669], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.3031040132045746]}, "current": {"num_steps": 20, "step": 15, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 15, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5595519542694092], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.3031040132045746]}, "current": {"num_steps": 20, "step": 15, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5375999808311462], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 15, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 15, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 15, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 15, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5400960445404053], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 15, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2979840040206909]}, "current": {"num_steps": 20, "step": 15, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5130239725112915], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 15, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 15, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 15, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 15, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5441919565200806], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 15, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 15, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 15, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 15, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 15, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 15, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 15, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 15, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 15, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 15, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 15, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 15, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 15, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 15, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 15, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.540287971496582], "attn_time": [0.5181440114974976], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2979840040206909]}, "current": {"num_steps": 20, "step": 16, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 16, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 16, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 16, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 16, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 16, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 16, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5130239725112915], "cross_attn_time": [0.5242879986763], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 16, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 16, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 16, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5001599788665771], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 16, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 16, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.502784013748169], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 16, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 16, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 16, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4970879554748535], "attn_time": [0.48947200179100037], "cross_attn_time": [0.52019202709198], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 16, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 16, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 16, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 16, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5185920000076294], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 16, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.502784013748169], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 16, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 16, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.506879985332489], "cross_attn_time": [0.5120000243186951], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 16, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.506879985332489], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 16, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 16, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 16, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 16, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [2.0336639881134033], "attn_time": [0.7659519910812378], "cross_attn_time": [0.6256639957427979], "mlp_time": [0.3164159953594208]}, "current": {"num_steps": 20, "step": 16, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.549888014793396], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5447679758071899], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 17, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.506879985332489], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 17, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546239972114563], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5386239886283875], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 17, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.4935680031776428], "cross_attn_time": [0.532480001449585], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 17, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.506879985332489], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 17, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.502784013748169], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 17, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5120000243186951], "cross_attn_time": [0.52019202709198], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 17, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5058559775352478], "cross_attn_time": [0.532480001449585], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 17, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 17, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5482879877090454], "attn_time": [0.5160959959030151], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 17, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.546239972114563], "attn_time": [0.5171200037002563], "cross_attn_time": [0.5242879986763], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 17, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 17, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4981119632720947], "attn_time": [0.4904960095882416], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 17, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 17, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.509376049041748], "attn_time": [0.4864000082015991], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 17, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 17, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 17, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5370240211486816], "attn_time": [0.506879985332489], "cross_attn_time": [0.5335040092468262], "mlp_time": [0.28569599986076355]}, "current": {"num_steps": 20, "step": 17, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5007359981536865], "cross_attn_time": [0.532480001449585], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 17, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 17, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 17, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5298559665679932], "attn_time": [0.5089280009269714], "cross_attn_time": [0.52019202709198], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 17, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.502784013748169], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 17, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 17, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 17, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5196160078048706], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 17, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5308799743652344], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5283839702606201], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 17, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5360000133514404], "attn_time": [0.5160959959030151], "cross_attn_time": [0.52019202709198], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 17, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5399680137634277], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5427200198173523], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 18, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 18, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5400960445404053], "attn_time": [0.5150719881057739], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 18, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.528831958770752], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 18, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5298559665679932], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 18, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.4976640045642853], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 18, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.4935680031776428], "cross_attn_time": [0.5304319858551025], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 18, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5237120389938354], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5345280170440674], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 18, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5242879986763], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 18, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.502784013748169], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 18, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5206400156021118], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 18, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.491968035697937], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.27750399708747864]}, "current": {"num_steps": 20, "step": 18, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5017600059509277], "cross_attn_time": [0.532480001449585], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 18, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5083520412445068], "attn_time": [0.502784013748169], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 18, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 18, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.28672000765800476]}, "current": {"num_steps": 20, "step": 18, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 18, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5273600220680237], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 18, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5155199766159058], "attn_time": [0.502784013748169], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 18, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5319039821624756], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 18, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.499135971069336], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 18, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [14.97599983215332], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5130239725112915], "mlp_time": [13.750271797180176]}, "current": {"num_steps": 20, "step": 18, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 18, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5124479532241821], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5109760165214539], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 18, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5104000568389893], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5232639908790588], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 18, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5267839431762695], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 18, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.502784013748169], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 18, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5380480289459229], "attn_time": [0.5109760165214539], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.29388800263404846]}, "current": {"num_steps": 20, "step": 18, "layer": 27, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5278079509735107], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.2949120104312897]}, "current": {"num_steps": 20, "step": 19, "layer": 0, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4960639476776123], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 19, "layer": 1, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5038080215454102], "cross_attn_time": [0.5314559936523438], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 2, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5032320022583008], "attn_time": [0.49459201097488403], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2877439856529236]}, "current": {"num_steps": 20, "step": 19, "layer": 3, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.481727957725525], "attn_time": [0.48025599122047424], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 4, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.511423945426941], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 19, "layer": 5, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.499135971069336], "attn_time": [0.4997119903564453], "cross_attn_time": [0.5140479803085327], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 19, "layer": 6, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5063040256500244], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5222399830818176], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 19, "layer": 7, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5247360467910767], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5242879986763], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 8, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5329279899597168], "attn_time": [0.5089280009269714], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 19, "layer": 9, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.521664023399353], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5263360142707825], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 10, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.533951997756958], "attn_time": [0.5007359981536865], "cross_attn_time": [0.532480001449585], "mlp_time": [0.2836480140686035]}, "current": {"num_steps": 20, "step": 19, "layer": 11, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.4950400590896606], "attn_time": [0.48947200179100037], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.27955201268196106]}, "current": {"num_steps": 20, "step": 19, "layer": 12, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.5048320293426514], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 13, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5134719610214233], "attn_time": [0.5007359981536865], "cross_attn_time": [0.5160959959030151], "mlp_time": [0.28467199206352234]}, "current": {"num_steps": 20, "step": 19, "layer": 14, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5298559665679932], "attn_time": [0.5120000243186951], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 19, "layer": 15, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5099520087242126], "cross_attn_time": [0.5181440114974976], "mlp_time": [0.289792001247406]}, "current": {"num_steps": 20, "step": 19, "layer": 16, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5226880311965942], "attn_time": [0.506879985332489], "cross_attn_time": [0.5171200037002563], "mlp_time": [0.2826240062713623]}, "current": {"num_steps": 20, "step": 19, "layer": 17, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5431679487228394], "attn_time": [0.5171200037002563], "cross_attn_time": [0.52019202709198], "mlp_time": [0.29183998703956604]}, "current": {"num_steps": 20, "step": 19, "layer": 18, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5431679487228394], "attn_time": [0.5140479803085327], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.2959359884262085]}, "current": {"num_steps": 20, "step": 19, "layer": 19, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.5017600059509277], "cross_attn_time": [0.5058559775352478], "mlp_time": [0.2887679934501648]}, "current": {"num_steps": 20, "step": 19, "layer": 20, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5144959688186646], "attn_time": [0.49561598896980286], "cross_attn_time": [0.5253120064735413], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 19, "layer": 21, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5175679922103882], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5294079780578613], "mlp_time": [0.2805759906768799]}, "current": {"num_steps": 20, "step": 19, "layer": 22, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5257600545883179], "attn_time": [0.5058559775352478], "cross_attn_time": [0.5191680192947388], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 19, "layer": 23, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5349760055541992], "attn_time": [0.5079039931297302], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.29286399483680725]}, "current": {"num_steps": 20, "step": 19, "layer": 24, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.5052800178527832], "attn_time": [0.4915199875831604], "cross_attn_time": [0.5150719881057739], "mlp_time": [0.2908160090446472]}, "current": {"num_steps": 20, "step": 19, "layer": 25, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.516543984413147], "attn_time": [0.4986880123615265], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.2815999984741211]}, "current": {"num_steps": 20, "step": 19, "layer": 26, "is_force_fresh": true, "module": "mlp"}}, {"timing_info": {"block_time": [1.504256010055542], "attn_time": [0.49663999676704407], "cross_attn_time": [0.5212159752845764], "mlp_time": [0.27852800488471985]}, "current": {"num_steps": 20, "step": 19, "layer": 27, "is_force_fresh": true, "module": "mlp"}}]
================================================
FILE: PixArt-alpha-ToCa/tools/VLM_caption_lightning.py
================================================
# {'model': 'LLaVA-7B-v0', 'prompt': 'You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.Follow the instructions carefully and explain your answers in detail.###Human: Hi!###Assistant: Hi there! How can I help you today?\n###Human: ?\n###Assistant:', 'temperature': 0.2, 'max_new_tokens': 512, 'stop': '###', 'images': "List of 1 images: ['793f00027d3dc5bd69445a388a2f289c']"}
import sys
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import argparse
import torch
from transformers import AutoTokenizer, CLIPImageProcessor, CLIPVisionModel, AutoConfig
from diffusion.model.llava import LlavaMPTForCausalLM
from PIL import Image
from tqdm import tqdm
from os import path, makedirs
from torch.utils.data import Dataset, DataLoader
import json
DEFAULT_IMAGE_TOKEN = ""
DEFAULT_IMAGE_PATCH_TOKEN = ""
DEFAULT_IM_START_TOKEN = ""
DEFAULT_IM_END_TOKEN = ""
def expand2square(pil_img, background_color=(122, 116, 104)):
width, height = pil_img.size
if width == height:
return pil_img
elif width > height:
result = Image.new(pil_img.mode, (width, width), background_color)
result.paste(pil_img, (0, (width - height) // 2))
return result
else:
result = Image.new(pil_img.mode, (height, height), background_color)
result.paste(pil_img, ((height - width) // 2, 0))
return result
def pad2square(image):
max_hw, min_hw = max(image.size), min(image.size)
aspect_ratio = max_hw / min_hw
max_len, min_len = 800, 400
shortest_edge = int(min(max_len / aspect_ratio, min_len, min_hw))
longest_edge = int(shortest_edge * aspect_ratio)
W, H = image.size
if H > W:
H, W = longest_edge, shortest_edge
else:
H, W = shortest_edge, longest_edge
image = image.resize((W, H))
return image
def load_model(model_path):
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = LlavaMPTForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
if mm_use_im_start_end:
tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
vision_tower = model.get_model().vision_tower[0]
if vision_tower.device.type == 'meta':
vision_tower = CLIPVisionModel.from_pretrained(
vision_tower.config._name_or_path, torch_dtype=torch.float16, low_cpu_mem_usage=True).cuda()
model.get_model().vision_tower[0] = vision_tower
else:
vision_tower.to(device='cuda', dtype=torch.float16)
vision_config = vision_tower.config
vision_config.im_patch_token = tokenizer.convert_tokens_to_ids(
[DEFAULT_IMAGE_PATCH_TOKEN])[0]
vision_config.use_im_start_end = mm_use_im_start_end
if mm_use_im_start_end:
vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert_tokens_to_ids(
[DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])
model.cuda()
if hasattr(model.config, "max_sequence_length"):
context_len = model.config.max_sequence_length
else:
context_len = 2048
return tokenizer, model, context_len
class SanitizedLaion(Dataset):
def __init__(self, root_dir, index_file, prompt, config, img_extension='.jpg', caption=True) -> None:
super().__init__()
self.root_dir = root_dir
self.image_processor = CLIPImageProcessor.from_pretrained(AutoConfig.from_pretrained(config).mm_vision_tower, torch_dtype=torch.float16)
self.prompt = prompt
self.img_extension = img_extension
self.caption=caption
if '.txt' in index_file:
with open(index_file, 'r') as f:
self.lines = f.readlines()
elif '.json' in index_file:
with open(index_file, 'r') as f:
self.lines = json.load(f)
else:
raise ValueError(f'{index_file} format not supported')
def __len__(self):
return len(self.lines)
def __getitem__(self, idx):
item = self.lines[idx]
caption = item['prompt'].strip()
prompt = self.prompt.format(caption) if self.caption else self.prompt
with open(path.join(self.root_dir, item['path']), 'rb') as f:
img = pad2square(Image.open(f).convert('RGB'))
return self.image_processor(img, return_tensors='pt')['pixel_values'].squeeze(), prompt, item['path'].split(self.img_extension)[0]
@torch.no_grad()
def caption(tokenizer, model, context_len, images, prompt, prefix):
images = images.to(model.device, dtype=torch.float16)
# HACK: 256 is the max image token length hacked
replace_token = DEFAULT_IMAGE_PATCH_TOKEN * 256
if getattr(model.config, 'mm_use_im_start_end', False):
replace_token = DEFAULT_IM_START_TOKEN + replace_token + DEFAULT_IM_END_TOKEN
prompt = list(map(lambda p: p.replace(DEFAULT_IMAGE_TOKEN, replace_token), prompt))
temperature = 0.2
max_new_tokens = 1024
stop_str = '<|im_end|>'
max_src_len = context_len - max_new_tokens - 8
input_ids = tokenizer(prompt).input_ids
input_ids = list(map(lambda input_id: input_id[-max_src_len:], input_ids))
lens = list(map(lambda x: len(x), input_ids))
longest = max(lens)
input_ids = list(map(lambda x: x if len(x) == longest else [tokenizer.pad_token_id] * (longest - len(x)) + x, input_ids))
pred_ids = torch.zeros([images.shape[0], 0], device=model.device, dtype=torch.long)
past_key_values = None
finish = [False] * images.shape[0]
for i in tqdm(range(max_new_tokens), leave=False):
if i == 0:
out = model(
torch.as_tensor(input_ids).cuda(),
use_cache=True,
images=images)
del images
else:
attention_mask = torch.ones(1, past_key_values[0][0].shape[-2] + 1, device="cuda")
out = model(input_ids=token,
use_cache=True,
attention_mask=attention_mask,
past_key_values=past_key_values)
past_key_values = out.past_key_values
logits = out.logits
last_token_logits = logits[:, -1]
if temperature < 1e-4:
token = torch.argmax(last_token_logits)
else:
probs = torch.softmax(last_token_logits / temperature, dim=-1)
token = torch.multinomial(probs, num_samples=1)
pred_ids = torch.concatenate([pred_ids, token], dim=1)
for ii in torch.nonzero(token.cpu() == tokenizer.eos_token_id, as_tuple=True)[0]:
if finish[ii]:
continue
ii = int(ii)
output = tokenizer.decode(pred_ids[ii][:-1]).removesuffix(stop_str)
finish[ii] = True
yield output, prefix[ii]
if all(finish):
break
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model-path", type=str, default="liuhaotian/LLaVA-Lightning-MPT-7B-preview")
parser.add_argument("--data-root", type=str, required=True)
parser.add_argument('--index', type=str, required=True)
parser.add_argument('--output', type=str, required=True)
args = parser.parse_args()
prompt = """<|im_start|>system
- You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.
- You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.
- You should follow the instructions carefully and explain your answers in detail.<|im_end|><|im_start|>user
Given the caption of this image "{}", describe this image in a very detailed manner
<|im_end|><|im_start|>assistant\n"""
prompt_nocap = """<|im_start|>system
- You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.
- You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.
- You should follow the instructions carefully and explain your answers in detail.<|im_end|><|im_start|>user
Describe this image in a very detailed manner
<|im_end|><|im_start|>assistant\n"""
d = SanitizedLaion(args.data_root, args.index, prompt, args.model_path, img_extension='.png')
l = DataLoader(d, batch_size=32, pin_memory=True, num_workers=10)
tokenizer, model, context_len = load_model(args.model_path)
# model = torch.compile(model)
for b in tqdm(l):
for c, p in caption(tokenizer, model, context_len, *b):
o = path.join(args.output, f'{p}.txt')
makedirs(path.dirname(o), exist_ok=True, mode=0o755)
with open(o, 'w') as k:
k.write(c)
================================================
FILE: PixArt-alpha-ToCa/tools/convert_pixart_alpha_to_diffusers.py
================================================
import argparse
import os
import torch
from transformers import T5EncoderModel, T5Tokenizer
from diffusers import AutoencoderKL, DPMSolverMultistepScheduler, PixArtAlphaPipeline, Transformer2DModel
ckpt_id = "PixArt-alpha/PixArt-alpha"
# https://github.com/PixArt-alpha/PixArt-alpha/blob/0f55e922376d8b797edd44d25d0e7464b260dcab/scripts/inference.py#L125
interpolation_scale = {256: 0.5, 512: 1, 1024: 2}
def main(args):
all_state_dict = torch.load(args.orig_ckpt_path, map_location='cpu')
state_dict = all_state_dict.pop("state_dict")
converted_state_dict = {}
# Patch embeddings.
converted_state_dict["pos_embed.proj.weight"] = state_dict.pop("x_embedder.proj.weight")
converted_state_dict["pos_embed.proj.bias"] = state_dict.pop("x_embedder.proj.bias")
# Caption projection.
converted_state_dict["caption_projection.linear_1.weight"] = state_dict.pop("y_embedder.y_proj.fc1.weight")
converted_state_dict["caption_projection.linear_1.bias"] = state_dict.pop("y_embedder.y_proj.fc1.bias")
converted_state_dict["caption_projection.linear_2.weight"] = state_dict.pop("y_embedder.y_proj.fc2.weight")
converted_state_dict["caption_projection.linear_2.bias"] = state_dict.pop("y_embedder.y_proj.fc2.bias")
# AdaLN-single LN
converted_state_dict["adaln_single.emb.timestep_embedder.linear_1.weight"] = state_dict.pop(
"t_embedder.mlp.0.weight"
)
converted_state_dict["adaln_single.emb.timestep_embedder.linear_1.bias"] = state_dict.pop("t_embedder.mlp.0.bias")
converted_state_dict["adaln_single.emb.timestep_embedder.linear_2.weight"] = state_dict.pop(
"t_embedder.mlp.2.weight"
)
converted_state_dict["adaln_single.emb.timestep_embedder.linear_2.bias"] = state_dict.pop("t_embedder.mlp.2.bias")
if args.image_size == 1024 and args.multi_scale_train:
# Resolution.
converted_state_dict["adaln_single.emb.resolution_embedder.linear_1.weight"] = state_dict.pop(
"csize_embedder.mlp.0.weight"
)
converted_state_dict["adaln_single.emb.resolution_embedder.linear_1.bias"] = state_dict.pop(
"csize_embedder.mlp.0.bias"
)
converted_state_dict["adaln_single.emb.resolution_embedder.linear_2.weight"] = state_dict.pop(
"csize_embedder.mlp.2.weight"
)
converted_state_dict["adaln_single.emb.resolution_embedder.linear_2.bias"] = state_dict.pop(
"csize_embedder.mlp.2.bias"
)
# Aspect ratio.
converted_state_dict["adaln_single.emb.aspect_ratio_embedder.linear_1.weight"] = state_dict.pop(
"ar_embedder.mlp.0.weight"
)
converted_state_dict["adaln_single.emb.aspect_ratio_embedder.linear_1.bias"] = state_dict.pop(
"ar_embedder.mlp.0.bias"
)
converted_state_dict["adaln_single.emb.aspect_ratio_embedder.linear_2.weight"] = state_dict.pop(
"ar_embedder.mlp.2.weight"
)
converted_state_dict["adaln_single.emb.aspect_ratio_embedder.linear_2.bias"] = state_dict.pop(
"ar_embedder.mlp.2.bias"
)
# Shared norm.
converted_state_dict["adaln_single.linear.weight"] = state_dict.pop("t_block.1.weight")
converted_state_dict["adaln_single.linear.bias"] = state_dict.pop("t_block.1.bias")
for depth in range(28):
# Transformer blocks.
converted_state_dict[f"transformer_blocks.{depth}.scale_shift_table"] = state_dict.pop(
f"blocks.{depth}.scale_shift_table"
)
# Attention is all you need 🤘
# Self attention.
q, k, v = torch.chunk(state_dict.pop(f"blocks.{depth}.attn.qkv.weight"), 3, dim=0)
q_bias, k_bias, v_bias = torch.chunk(state_dict.pop(f"blocks.{depth}.attn.qkv.bias"), 3, dim=0)
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_q.weight"] = q
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_q.bias"] = q_bias
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_k.weight"] = k
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_k.bias"] = k_bias
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_v.weight"] = v
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_v.bias"] = v_bias
# Projection.
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_out.0.weight"] = state_dict.pop(
f"blocks.{depth}.attn.proj.weight"
)
converted_state_dict[f"transformer_blocks.{depth}.attn1.to_out.0.bias"] = state_dict.pop(
f"blocks.{depth}.attn.proj.bias"
)
# Feed-forward.
converted_state_dict[f"transformer_blocks.{depth}.ff.net.0.proj.weight"] = state_dict.pop(
f"blocks.{depth}.mlp.fc1.weight"
)
converted_state_dict[f"transformer_blocks.{depth}.ff.net.0.proj.bias"] = state_dict.pop(
f"blocks.{depth}.mlp.fc1.bias"
)
converted_state_dict[f"transformer_blocks.{depth}.ff.net.2.weight"] = state_dict.pop(
f"blocks.{depth}.mlp.fc2.weight"
)
converted_state_dict[f"transformer_blocks.{depth}.ff.net.2.bias"] = state_dict.pop(
f"blocks.{depth}.mlp.fc2.bias"
)
# Cross-attention.
q = state_dict.pop(f"blocks.{depth}.cross_attn.q_linear.weight")
q_bias = state_dict.pop(f"blocks.{depth}.cross_attn.q_linear.bias")
k, v = torch.chunk(state_dict.pop(f"blocks.{depth}.cross_attn.kv_linear.weight"), 2, dim=0)
k_bias, v_bias = torch.chunk(state_dict.pop(f"blocks.{depth}.cross_attn.kv_linear.bias"), 2, dim=0)
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_q.weight"] = q
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_q.bias"] = q_bias
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_k.weight"] = k
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_k.bias"] = k_bias
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_v.weight"] = v
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_v.bias"] = v_bias
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_out.0.weight"] = state_dict.pop(
f"blocks.{depth}.cross_attn.proj.weight"
)
converted_state_dict[f"transformer_blocks.{depth}.attn2.to_out.0.bias"] = state_dict.pop(
f"blocks.{depth}.cross_attn.proj.bias"
)
# Final block.
converted_state_dict["proj_out.weight"] = state_dict.pop("final_layer.linear.weight")
converted_state_dict["proj_out.bias"] = state_dict.pop("final_layer.linear.bias")
converted_state_dict["scale_shift_table"] = state_dict.pop("final_layer.scale_shift_table")
# DiT XL/2
transformer = Transformer2DModel(
sample_size=args.image_size // 8,
num_layers=28,
attention_head_dim=72,
in_channels=4,
out_channels=8,
patch_size=2,
attention_bias=True,
num_attention_heads=16,
cross_attention_dim=1152,
activation_fn="gelu-approximate",
num_embeds_ada_norm=1000,
norm_type="ada_norm_single",
norm_elementwise_affine=False,
norm_eps=1e-6,
caption_channels=4096,
)
transformer.load_state_dict(converted_state_dict, strict=True)
assert transformer.pos_embed.pos_embed is not None
state_dict.pop("pos_embed")
state_dict.pop("y_embedder.y_embedding")
assert len(state_dict) == 0, f"State dict is not empty, {state_dict.keys()}"
num_model_params = sum(p.numel() for p in transformer.parameters())
print(f"Total number of transformer parameters: {num_model_params}")
if args.only_transformer:
transformer.save_pretrained(os.path.join(args.dump_path, "transformer"))
else:
scheduler = DPMSolverMultistepScheduler()
vae = AutoencoderKL.from_pretrained(ckpt_id, subfolder="sd-vae-ft-ema")
tokenizer = T5Tokenizer.from_pretrained(ckpt_id, subfolder="t5-v1_1-xxl")
text_encoder = T5EncoderModel.from_pretrained(ckpt_id, subfolder="t5-v1_1-xxl")
pipeline = PixArtAlphaPipeline(
tokenizer=tokenizer, text_encoder=text_encoder, transformer=transformer, vae=vae, scheduler=scheduler
)
pipeline.save_pretrained(args.dump_path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# set multi_scale_train=True if using PixArtMS structure during training else set it to False
parser.add_argument("--multi_scale_train", default=True, type=str, required=True, help="If use Multi-Scale PixArtMS structure during training.")
parser.add_argument("--orig_ckpt_path", default=None, type=str, required=False, help="Path to the checkpoint to convert.")
parser.add_argument(
"--image_size",
default=1024,
type=int,
choices=[256, 512, 1024],
required=False,
help="Image size of pretrained model, either 512 or 1024.",
)
parser.add_argument("--dump_path", default=None, type=str, required=True, help="Path to the output pipeline.")
parser.add_argument("--only_transformer", default=True, type=bool, required=True)
args = parser.parse_args()
main(args)
================================================
FILE: PixArt-alpha-ToCa/tools/download.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""
Functions for downloading pre-trained PixArt models
"""
from torchvision.datasets.utils import download_url
import torch
import os
import argparse
pretrained_models = {'PixArt-XL-2-512x512.pth', 'PixArt-XL-2-1024-MS.pth'}
vae_models = {
'sd-vae-ft-ema/config.json',
'sd-vae-ft-ema/diffusion_pytorch_model.bin'
}
t5_models = {
't5-v1_1-xxl/config.json', 't5-v1_1-xxl/pytorch_model-00001-of-00002.bin',
't5-v1_1-xxl/pytorch_model-00002-of-00002.bin', 't5-v1_1-xxl/pytorch_model.bin.index.json',
't5-v1_1-xxl/special_tokens_map.json', 't5-v1_1-xxl/spiece.model',
't5-v1_1-xxl/tokenizer_config.json',
}
def find_model(model_name):
"""
Finds a pre-trained G.pt model, downloading it if necessary. Alternatively, loads a model from a local path.
"""
if model_name in pretrained_models:
return download_model(model_name)
assert os.path.isfile(model_name), f'Could not find PixArt checkpoint at {model_name}'
return torch.load(model_name, map_location=lambda storage, loc: storage)
def download_model(model_name):
"""
Downloads a pre-trained PixArt model from the web.
"""
assert model_name in pretrained_models
local_path = f'output/pretrained_models/{model_name}'
if not os.path.isfile(local_path):
os.makedirs('output/pretrained_models', exist_ok=True)
web_path = f'https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/{model_name}'
download_url(web_path, 'output/pretrained_models')
return torch.load(local_path, map_location=lambda storage, loc: storage)
def download_other(model_name, model_zoo, output_dir):
"""
Downloads a pre-trained PixArt model from the web.
"""
assert model_name in model_zoo
local_path = os.path.join(output_dir, model_name)
if not os.path.isfile(local_path):
os.makedirs(output_dir, exist_ok=True)
web_path = f'https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/{model_name}'
print(web_path)
download_url(web_path, os.path.join(output_dir, model_name.split('/')[0]))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--model_names', nargs='+', type=str, default=pretrained_models)
args = parser.parse_args()
model_names = args.model_names
model_names = set(model_names)
# Download PixArt checkpoints
for t5_model in t5_models:
download_other(t5_model, t5_models, 'output/pretrained_models/t5_ckpts')
for vae_model in vae_models:
download_other(vae_model, vae_models, 'output/pretrained_models/')
for model in model_names:
download_model(model) # for vae_model in vae_models:
print('Done.')
================================================
FILE: PixArt-alpha-ToCa/tools/extract_features.py
================================================
import os
from pathlib import Path
import sys
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
from PIL import Image
import torch
from torchvision import transforms as T
import numpy as np
import json
from tqdm import tqdm
import argparse
import threading
from queue import Queue
from pathlib import Path
from torch.utils.data import DataLoader, RandomSampler
from accelerate import Accelerator
from torchvision.transforms.functional import InterpolationMode
from torchvision.datasets.folder import default_loader
from diffusion.model.t5 import T5Embedder
from diffusers.models import AutoencoderKL
from diffusion.data.datasets.InternalData import InternalData
from diffusion.utils.misc import SimpleTimer
from diffusion.utils.data_sampler import AspectRatioBatchSampler
from diffusion.data.builder import DATASETS
from diffusion.data import ASPECT_RATIO_512, ASPECT_RATIO_1024
def get_closest_ratio(height: float, width: float, ratios: dict):
aspect_ratio = height / width
closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))
return ratios[closest_ratio], float(closest_ratio)
@DATASETS.register_module()
class DatasetMS(InternalData):
def __init__(self, root, image_list_json=None, transform=None, resolution=1024, load_vae_feat=False, aspect_ratio_type=None, start_index=0, end_index=100000000, **kwargs):
if image_list_json is None:
image_list_json = ['data_info.json']
assert os.path.isabs(root), 'root must be a absolute path'
self.root = root
self.img_dir_name = 'InternalImgs' # need to change to according to your data structure
self.json_dir_name = 'InternalData' # need to change to according to your data structure
self.transform = transform
self.load_vae_feat = load_vae_feat
self.resolution = resolution
self.meta_data_clean = []
self.img_samples = []
self.txt_feat_samples = []
self.aspect_ratio = aspect_ratio_type
assert self.aspect_ratio in [ASPECT_RATIO_1024, ASPECT_RATIO_512]
self.ratio_index = {}
self.ratio_nums = {}
for k, v in self.aspect_ratio.items():
self.ratio_index[float(k)] = [] # used for self.getitem
self.ratio_nums[float(k)] = 0 # used for batch-sampler
image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]
for json_file in image_list_json:
meta_data = self.load_json(os.path.join(self.root, 'partition', json_file))
meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]
self.meta_data_clean.extend(meta_data_clean)
self.img_samples.extend([os.path.join(self.root.replace(self.json_dir_name, self.img_dir_name), item['path']) for item in meta_data_clean])
self.img_samples = self.img_samples[start_index: end_index]
# scan the dataset for ratio static
for i, info in enumerate(self.meta_data_clean[:len(self.meta_data_clean)//3]):
ori_h, ori_w = info['height'], info['width']
closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)
self.ratio_nums[closest_ratio] += 1
if len(self.ratio_index[closest_ratio]) == 0:
self.ratio_index[closest_ratio].append(i)
# Set loader and extensions
if self.load_vae_feat:
raise ValueError("No VAE loader here")
self.loader = default_loader
def __getitem__(self, idx):
data_info = {}
for _ in range(20):
try:
img_path = self.img_samples[idx]
img = self.loader(img_path)
if self.transform:
img = self.transform(img)
# Calculate closest aspect ratio and resize & crop image[w, h]
if isinstance(img, Image.Image):
h, w = (img.size[1], img.size[0])
assert h, w == (self.meta_data_clean[idx]['height'], self.meta_data_clean[idx]['width'])
closest_size, closest_ratio = get_closest_ratio(h, w, self.aspect_ratio)
closest_size = list(map(lambda x: int(x), closest_size))
transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(closest_size, interpolation=InterpolationMode.BICUBIC), # Image.BICUBIC
T.CenterCrop(closest_size),
T.ToTensor(),
T.Normalize([.5], [.5]),
])
img = transform(img)
data_info['img_hw'] = torch.tensor([h, w], dtype=torch.float32)
data_info['aspect_ratio'] = closest_ratio
# change the path according to your data structure
return img, '_'.join(self.img_samples[idx].rsplit('/', 2)[-2:]) # change from 'serial-number-of-dir/serial-number-of-image.png' ---> 'serial-number-of-dir_serial-number-of-image.png'
except Exception as e:
print(f"Error details: {str(e)}")
idx = np.random.randint(len(self))
raise RuntimeError('Too many bad data.')
def get_data_info(self, idx):
data_info = self.meta_data_clean[idx]
return {'height': data_info['height'], 'width': data_info['width']}
def extract_caption_t5_do(q):
while not q.empty():
item = q.get()
extract_caption_t5_job(item)
q.task_done()
def extract_caption_t5_job(item):
global mutex
global t5
global t5_save_dir
with torch.no_grad():
caption = item['prompt'].strip()
if isinstance(caption, str):
caption = [caption]
save_path = os.path.join(t5_save_dir, Path(item['path']).stem)
if os.path.exists(f"{save_path}.npz"):
return
try:
mutex.acquire()
caption_emb, emb_mask = t5.get_text_embeddings(caption)
mutex.release()
emb_dict = {
'caption_feature': caption_emb.float().cpu().data.numpy(),
'attention_mask': emb_mask.cpu().data.numpy(),
}
np.savez_compressed(save_path, **emb_dict)
except Exception as e:
print(e)
def extract_caption_t5():
global t5
global t5_save_dir
# global images_extension
t5 = T5Embedder(device="cuda", local_cache=True, cache_dir=f'{args.pretrained_models_dir}/t5_ckpts', model_max_length=120)
t5_save_dir = args.t5_save_root
os.makedirs(t5_save_dir, exist_ok=True)
train_data_json = json.load(open(args.json_path, 'r'))
train_data = train_data_json[args.start_index: args.end_index]
global mutex
mutex = threading.Lock()
jobs = Queue()
for item in tqdm(train_data):
jobs.put(item)
for _ in range(20):
worker = threading.Thread(target=extract_caption_t5_do, args=(jobs,))
worker.start()
jobs.join()
def extract_img_vae_do(q):
while not q.empty():
item = q.get()
extract_img_vae_job(item)
q.task_done()
def extract_img_vae_job(item):
return
def extract_img_vae():
vae = AutoencoderKL.from_pretrained(f'{args.pretrained_models_dir}/sd-vae-ft-ema').to(device)
train_data_json = json.load(open(args.json_path, 'r'))
image_names = set()
vae_save_root = f'{args.vae_save_root}/{image_resize}resolution'
os.umask(0o000) # file permission: 666; dir permission: 777
os.makedirs(vae_save_root, exist_ok=True)
vae_save_dir = os.path.join(vae_save_root, 'noflip')
os.makedirs(vae_save_dir, exist_ok=True)
for item in train_data_json:
image_name = item['path']
if image_name in image_names:
continue
image_names.add(image_name)
lines = sorted(image_names)
lines = lines[args.start_index: args.end_index]
_, images_extension = os.path.splitext(lines[0])
transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB')),
T.Resize(image_resize), # Image.BICUBIC
T.CenterCrop(image_resize),
T.ToTensor(),
T.Normalize([.5], [.5]),
])
os.umask(0o000) # file permission: 666; dir permission: 777
for image_name in tqdm(lines):
save_path = os.path.join(vae_save_dir, Path(image_name).stem)
if os.path.exists(f"{save_path}.npy"):
continue
try:
img = Image.open(f'{args.dataset_root}/{image_name}')
img = transform(img).to(device)[None]
with torch.no_grad():
posterior = vae.encode(img).latent_dist
z = torch.cat([posterior.mean, posterior.std], dim=1).detach().cpu().numpy().squeeze()
np.save(save_path, z)
except Exception as e:
print(e)
print(image_name)
def save_results(results, paths, signature, work_dir):
timer = SimpleTimer(len(results), log_interval=100, desc="Saving Results")
# save to npy
new_paths = []
os.umask(0o000) # file permission: 666; dir permission: 777
for res, p in zip(results, paths):
file_name = p.split('.')[0] + '.npy'
new_folder = signature
save_folder = os.path.join(work_dir, new_folder)
if os.path.exists(save_folder):
raise FileExistsError(f"{save_folder} exists. BE careful not to overwrite your files. Comment this error raising for overwriting!!")
os.makedirs(save_folder, exist_ok=True)
new_paths.append(os.path.join(new_folder, file_name))
np.save(os.path.join(save_folder, file_name), res)
timer.log()
# save paths
with open(os.path.join(work_dir, f"VAE-{signature}.txt"), 'w') as f:
f.write('\n'.join(new_paths))
def inference(vae, dataloader, signature, work_dir):
timer = SimpleTimer(len(dataloader), log_interval=100, desc="VAE-Inference")
for batch in dataloader:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=True):
posterior = vae.encode(batch[0]).latent_dist
results = torch.cat([posterior.mean, posterior.std], dim=1).detach().cpu().numpy()
path = batch[1]
save_results(results, path, signature=signature, work_dir=work_dir)
timer.log()
def extract_img_vae_multiscale(bs=1):
assert image_resize in [512, 1024]
work_dir = os.path.abspath(args.vae_save_root)
os.umask(0o000) # file permission: 666; dir permission: 777
os.makedirs(work_dir, exist_ok=True)
accelerator = Accelerator(mixed_precision='fp16')
vae = AutoencoderKL.from_pretrained(f'{args.pretrained_models_dir}/sd-vae-ft-ema').to(device)
signature = 'ms'
aspect_ratio_type = ASPECT_RATIO_1024 if image_resize == 1024 else ASPECT_RATIO_512
dataset = DatasetMS(args.dataset_root, image_list_json=[args.json_file], transform=None, sample_subset=None,
aspect_ratio_type=aspect_ratio_type, start_index=args.start_index, end_index=args.end_index)
# create AspectRatioBatchSampler
sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset, batch_size=bs, aspect_ratios=dataset.aspect_ratio, ratio_nums=dataset.ratio_nums)
# create DataLoader
dataloader = DataLoader(dataset, batch_sampler=sampler, num_workers=13, pin_memory=True)
dataloader = accelerator.prepare(dataloader, )
inference(vae, dataloader, signature=signature, work_dir=work_dir)
accelerator.wait_for_everyone()
print('done')
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--multi_scale", action='store_true', default=False, help="multi-scale feature extraction")
parser.add_argument("--img_size", default=512, type=int, help="image scale for multi-scale feature extraction")
parser.add_argument('--start_index', default=0, type=int)
parser.add_argument('--end_index', default=1000000, type=int)
parser.add_argument('--json_path', type=str)
parser.add_argument('--t5_save_root', default='data/data_toy/caption_feature_wmask', type=str)
parser.add_argument('--vae_save_root', default='data/data_toy/img_vae_features', type=str)
parser.add_argument('--dataset_root', default='data/data_toy', type=str)
parser.add_argument('--pretrained_models_dir', default='output/pretrained_models', type=str)
### for multi-scale(ms) vae feauture extraction
parser.add_argument('--json_file', type=str)
return parser.parse_args()
if __name__ == '__main__':
args = get_args()
device = "cuda" if torch.cuda.is_available() else "cpu"
image_resize = args.img_size
# prepare extracted caption t5 features for training
extract_caption_t5()
# prepare extracted image vae features for training
if args.multi_scale:
print(f'Extracting Multi-scale Image Resolution based on {image_resize}')
extract_img_vae_multiscale(bs=1) # recommend bs = 1 for AspectRatioBatchSampler
else:
print(f'Extracting Single Image Resolution {image_resize}')
extract_img_vae()
================================================
FILE: PixArt-alpha-ToCa/train.sh
================================================
CUDA_VISIBLE_DEVICES=5,6,7 python -m torch.distributed.launch --nproc_per_node=3 \
--master_port=26662 train_scripts/train_controlnet.py \
configs/pixart_app_config/PixArt_xl2_img1024_controlHed_Half.py \
--work-dir output/debug
================================================
FILE: PixArt-alpha-ToCa/train_latents.py
================================================
import os
import sys
import types
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import argparse
import datetime
import time
import warnings
warnings.filterwarnings("ignore") # ignore warning
import torch
import torch.nn as nn
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from diffusers.models import AutoencoderKL
from torch.utils.data import RandomSampler
from mmcv.runner import LogBuffer
from copy import deepcopy
from PIL import Image
import numpy as np
from diffusion import IDDPM
from diffusion.utils.checkpoint import save_checkpoint, load_checkpoint
from diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.model.builder import build_model
from diffusion.utils.logger import get_root_logger
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def ema_update(model_dest: nn.Module, model_src: nn.Module, rate):
param_dict_src = dict(model_src.named_parameters())
for p_name, p_dest in model_dest.named_parameters():
p_src = param_dict_src[p_name]
assert p_src is not p_dest
p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)
def train():
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
start_step = start_epoch * len(train_dataloader)
global_step = 0
total_steps = len(train_dataloader) * config.num_epochs
# load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
# if load_vae_feat:
z = batch[0]
# else:
# with torch.no_grad():
# with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):
# posterior = vae.encode(batch[0]).latent_dist
# if config.sample_posterior:
# z = posterior.sample()
# else:
# z = posterior.mode()
clean_images = z * config.scale_factor
y = batch[1]
y_mask = batch[2]
data_info = batch[3]
# Sample a random timestep for each image
bs = clean_images.shape[0]
timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))
loss = loss_term['loss'].mean()
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
if accelerator.sync_gradients:
ema_update(model_ema, model, config.ema_rate)
lr = lr_scheduler.get_last_lr()[0]
logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
# logging on terminal
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
global_step += 1
data_time_start= time.time()
synchronize()
if accelerator.is_main_process:
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
synchronize()
if accelerator.is_main_process:
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000)
save_checkpoint(os.path.join(config.output_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
########### EVAL ###################
if epoch % config.save_image_epochs == 0 or epoch == config.num_epochs:
if config.validation_prompts is not None:
logger.info("Running inference for collecting generated images...")
assert config.eval_sampler in ['iddpm', 'dpm-solver', 'sa-solver']
sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}
sample_steps = config.eval_steps if config.eval_steps != -1 else sample_steps_dict[config.eval_sampler]
# base_ratios = eval(f'ASPECT_RATIO_{config.image_size}_TEST')
eval_dir = os.path.join(config.output_dir, 'eval')
os.makedirs(eval_dir, exist_ok=True)
save_path = os.path.join(eval_dir, f'{epoch}_{global_step}.png')
model.eval()
images = []
# device = t5.device
for ip, prompt in enumerate(config.validation_prompts):
prompts = [prompt]
# prompts = []
# prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device, show=False) # ar for aspect ratio
# if config.image_size == 1024:
# latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
# else:
# hw = torch.tensor([[config.image_size, config.image_size]], dtype=torch.float, device=device).repeat(bs, 1)
# ar = torch.tensor([[1.]], device=device).repeat(bs, 1)
# latent_size_h, latent_size_w = latent_size, latent_size
# prompts.append(prompt_clean.strip())
null_y = model.module.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]
with torch.no_grad():
caption_embs, emb_masks, len_prompts = val_txt_embs[ip]
# caption_embs, emb_masks = t5.get_text_embeddings(prompts)
# caption_embs = caption_embs.float()[:, None]
print(f'finish embedding')
n = len_prompts
if config.eval_sampler == 'iddpm':
# Create sampling noise:
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)
model_kwargs = dict(y=torch.cat([caption_embs, null_y]),
cfg_scale=config.cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
diffusion = IDDPM(str(sample_steps))
# Sample images:
samples = diffusion.p_sample_loop(
model.module.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,
device=device
)
samples, _ = samples.chunk(2, dim=0) # Remove null class samples
elif config.eval_sampler == 'dpm-solver':
# Create sampling noise:
z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
dpm_solver = DPMS(model.module.forward_with_dpmsolver,
condition=caption_embs,
uncondition=null_y,
cfg_scale=config.cfg_scale,
model_kwargs=model_kwargs)
samples = dpm_solver.sample(
z,
steps=sample_steps,
order=2,
skip_type="time_uniform",
method="multistep",
)
elif config.eval_sampler == 'sa-solver':
# Create sampling noise:
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
sa_solver = SASolverSampler(model.module.forward_with_dpmsolver, device=device)
samples = sa_solver.sample(
S=25,
batch_size=n,
shape=(4, latent_size_h, latent_size_w),
eta=1,
conditioning=caption_embs,
unconditional_conditioning=null_y,
unconditional_guidance_scale=config.cfg_scale,
model_kwargs=model_kwargs,
)[0]
samples = vae.decode(samples / 0.18215).sample
# decode image
image = make_grid(samples, nrow=1, normalize=True, value_range=(-1, 1))
image = image.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to("cpu", torch.uint8).numpy()
image = Image.fromarray(image)
images.append(image)
image_grid = make_image_grid(images, 2, len(images)//2)
image_grid.save(save_path)
for tracker in accelerator.trackers:
if tracker.name == "tensorboard":
np_images = np.stack([np.asarray(img) for img in images])
tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
elif tracker.name == "comet_ml":
logger.info('Logging validation images')
tracker.writer.log_image(image_grid, name=f"{epoch}", step=global_step)
else:
logger.warn(f"image logging not implemented for {tracker.name}")
del images, image, samples, image_grid
torch.cuda.empty_cache()
model.train()
synchronize()
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from', help='the dir to resume the training')
parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--tracker_project_name",
type=str,
default="text2image-fine-tune",
help=(
"The `project_name` argument passed to Accelerator.init_trackers for"
" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
),
)
parser.add_argument("--loss_report_name", type=str, default="loss")
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.resume_from is not None:
config.load_from = None
config.resume_from = dict(
checkpoint=args.resume_from,
load_ema=False,
resume_optimizer=True,
resume_lr_scheduler=True)
if args.debug:
config.log_interval = 1
config.train_batch_size = 8
config.valid_num = 100
os.umask(0o000)
config.output_dir = os.path.join(config.work_dir,
f"""{config.model}_{config.dataset_alias}_{config.image_size}_batch{config.train_batch_size}_{config.lr_schedule}_lr{config.optimizer['lr']}_warmup{config.lr_schedule_args['num_warmup_steps']}_gas{config.gradient_accumulation_steps}""")
os.makedirs(config.output_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
if args.report_to == "comet_ml":
import comet_ml
comet_ml.init(
project_name=args.tracker_project_name,
)
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with=args.report_to,
project_dir=os.path.join(config.output_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.output_dir, 'train_log.log'))
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.output_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
model_kwargs={"window_block_indexes": config.window_block_indexes, "window_size": config.window_size,
"use_rel_pos": config.use_rel_pos, "lewei_scale": config.lewei_scale, 'config':config,
'model_max_length': config.model_max_length}
if config.validation_prompts is not None:
logger.info('Precompute validation prompt embeddings')
from diffusion.model.utils import prepare_prompt_ar
from diffusion import IDDPM, DPMS, SASolverSampler
from diffusion.model.t5 import T5Embedder
from diffusion.data.datasets import ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST
from diffusers.utils import make_image_grid
from torchvision.utils import make_grid
t5 = T5Embedder(device="cuda", local_cache=True, cache_dir='output/pretrained_models/t5_ckpts', torch_dtype=torch.float)
device = t5.device
base_ratios = eval(f'ASPECT_RATIO_{config.image_size}_TEST')
pbs = 1
val_txt_embs = []
for prompt in config.validation_prompts:
prompts = []
prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device, show=False) # ar for aspect ratio
if config.image_size == 1024:
latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)
else:
hw = torch.tensor([[config.image_size, config.image_size]], dtype=torch.float, device=device).repeat(pbs, 1)
ar = torch.tensor([[1.]], device=device).repeat(pbs, 1)
latent_size_h, latent_size_w = latent_size, latent_size
prompts.append(prompt_clean.strip())
with torch.no_grad():
caption_embs, emb_masks = t5.get_text_embeddings(prompts)
caption_embs = caption_embs.float()[:, None]
val_txt_embs.append([caption_embs, emb_masks, len(prompts)])
del t5
import gc # garbage collect library
gc.collect()
torch.cuda.empty_cache()
logger.info('[ DONE ]')
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)
model = build_model(config.model,
config.grad_checkpointing,
config.get('fp32_attention', False),
input_size=latent_size,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
**model_kwargs).train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
logger.info(f"T5 max token length: {config.model_max_length}")
model_ema = deepcopy(model).eval()
if config.load_from is not None:
if args.load_from is not None:
config.load_from = args.load_from
missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
ema_update(model_ema, model, 0.)
if not config.data.load_vae_feat:
vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)
# used for balanced sampling
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
logger.info(f'Batch size {config.train_batch_size}')
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer, **config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
tracker_config = dict(vars(config))
accelerator.init_trackers(args.tracker_project_name, tracker_config)
accelerator.get_tracker("comet_ml").writer.add_tags([config.model,
config.dataset_alias,
config.image_size,
config.lr_schedule,
f'bs{config.train_batch_size}',
f'gs{config.gradient_accumulation_steps}'
])
start_epoch = 0
if config.resume_from is not None and config.resume_from['checkpoint'] is not None:
start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,
model=model,
model_ema=model_ema,
optimizer=optimizer,
lr_scheduler=lr_scheduler,
)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model, model_ema = accelerator.prepare(model, model_ema)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
train()
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train.py
================================================
import argparse
import datetime
import os
import sys
import time
import types
import warnings
from copy import deepcopy
from pathlib import Path
import torch
import torch.nn as nn
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from diffusers.models import AutoencoderKL
from mmcv.runner import LogBuffer
from torch.utils.data import RandomSampler
from diffusion import IDDPM
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.model.builder import build_model
from diffusion.utils.checkpoint import save_checkpoint, load_checkpoint
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
from diffusion.utils.dist_utils import get_world_size, clip_grad_norm_
from diffusion.utils.logger import get_root_logger
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
warnings.filterwarnings("ignore") # ignore warning
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def ema_update(model_dest: nn.Module, model_src: nn.Module, rate):
param_dict_src = dict(model_src.named_parameters())
for p_name, p_dest in model_dest.named_parameters():
p_src = param_dict_src[p_name]
assert p_src is not p_dest
p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)
def train():
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
start_step = start_epoch * len(train_dataloader)
global_step = 0
total_steps = len(train_dataloader) * config.num_epochs
load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
if load_vae_feat:
z = batch[0]
else:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):
posterior = vae.encode(batch[0]).latent_dist
if config.sample_posterior:
z = posterior.sample()
else:
z = posterior.mode()
clean_images = z * config.scale_factor
y = batch[1]
y_mask = batch[2]
data_info = batch[3]
# Sample a random timestep for each image
bs = clean_images.shape[0]
timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))
loss = loss_term['loss'].mean()
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
if accelerator.sync_gradients:
ema_update(model_ema, model, config.ema_rate)
lr = lr_scheduler.get_last_lr()[0]
logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
global_step += 1
data_time_start= time.time()
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
accelerator.wait_for_everyone()
if accelerator.is_main_process:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
accelerator.wait_for_everyone()
if accelerator.is_main_process:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from', help='the dir to resume the training')
parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--tracker_project_name",
type=str,
default="text2image-fine-tune",
help=(
"The `project_name` argument passed to Accelerator.init_trackers for"
" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
),
)
parser.add_argument("--loss_report_name", type=str, default="loss")
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.resume_from is not None:
config.load_from = None
config.resume_from = dict(
checkpoint=args.resume_from,
load_ema=False,
resume_optimizer=True,
resume_lr_scheduler=True)
if args.debug:
config.log_interval = 1
config.train_batch_size = 8
config.valid_num = 100
os.umask(0o000)
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with=args.report_to,
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512, 1024]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
model_kwargs={"window_block_indexes": config.window_block_indexes, "window_size": config.window_size,
"use_rel_pos": config.use_rel_pos, "lewei_scale": config.lewei_scale, 'config':config,
'model_max_length': config.model_max_length}
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)
model = build_model(config.model,
config.grad_checkpointing,
config.get('fp32_attention', False),
input_size=latent_size,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
**model_kwargs).train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
model_ema = deepcopy(model).eval()
if config.load_from is not None:
if args.load_from is not None:
config.load_from = args.load_from
missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
ema_update(model_ema, model, 0.)
if not config.data.load_vae_feat:
vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)
# used for balanced sampling
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer, **config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
tracker_config = dict(vars(config))
try:
accelerator.init_trackers(args.tracker_project_name, tracker_config)
except:
accelerator.init_trackers(f"tb_{timestamp}")
start_epoch = 0
if config.resume_from is not None and config.resume_from['checkpoint'] is not None:
start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,
model=model,
model_ema=model_ema,
optimizer=optimizer,
lr_scheduler=lr_scheduler,
)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model, model_ema = accelerator.prepare(model, model_ema)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
train()
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_controlnet.py
================================================
import argparse
import datetime
import os
import sys
import time
import types
import warnings
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import torch
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from mmcv.runner import LogBuffer
from torch.utils.data import RandomSampler
from diffusion import IDDPM
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.model.builder import build_model
from diffusion.model.nets import PixArtMS, ControlPixArtHalf, ControlPixArtMSHalf
from diffusion.utils.checkpoint import save_checkpoint, load_checkpoint
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
from diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_
from diffusion.utils.logger import get_root_logger
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
warnings.filterwarnings("ignore") # ignore warning
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def train():
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
start_step = start_epoch * len(train_dataloader)
global_step = 0
total_steps = len(train_dataloader) * config.num_epochs
load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
if not load_vae_feat:
raise ValueError("Only support load vae features for now.")
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start = time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
z = batch[0] # 4 x 4 x 128 x 128 z:vae output, 3x1024x1024->vae->4x128x128
clean_images = z * config.scale_factor # vae needed scale factor
y = batch[1] # 4 x 1 x 120 x 4096 # T5 extracted feature of caption, 120 token, 4096
y_mask = batch[2] # 4 x 1 x 1 x 120 # caption indicate whether valid
data_info = batch[3]
# Sample a random timestep for each image
bs = clean_images.shape[0]
timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info, c=data_info['condition'] * config.scale_factor))
loss = loss_term['loss'].mean()
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
lr = lr_scheduler.get_last_lr()[0]
logs = {"loss": accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{(epoch - 1) * len(train_dataloader) + step + 1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['img_hw'][0][0].item()}, {data_info['img_hw'][0][1].item()}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
if (global_step + 1) % 1000 == 0 and config.s3_work_dir is not None:
logger.info(f"s3_work_dir: {config.s3_work_dir}")
global_step += 1
data_time_start = time.time()
synchronize()
if accelerator.is_main_process:
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
os.umask(0o000) # file permission: 666; dir permission: 777
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
synchronize()
# After each epoch you optionally sample some demo images with evaluate() and save the model
if accelerator.is_main_process:
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000) # file permission: 666; dir permission: 777
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume_from', help='the dir to save logs and models')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--tracker_project_name",
type=str,
default="text2image-fine-tune",
help=(
"The `project_name` argument passed to Accelerator.init_trackers for"
" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
),
)
parser.add_argument('--lr', type=float, default=2e-4)
parser.add_argument('--data_root', type=str, default=None)
parser.add_argument('--resume_optimizer', action='store_true')
parser.add_argument('--resume_lr_scheduler', action='store_true')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.data_root:
config.data_root = args.data_root
if args.resume_from is not None:
config.load_from = None
config.resume_from = dict(
checkpoint=args.resume_from,
load_ema=False,
resume_optimizer=args.resume_optimizer,
resume_lr_scheduler=args.resume_lr_scheduler)
if args.debug:
config.log_interval = 1
config.train_batch_size = 6
config.optimizer.update({'lr': args.lr})
os.umask(0o000) # file permission: 666; dir permission: 777
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=9600) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with=args.report_to,
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [512, 1024]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
model_kwargs={"window_block_indexes": config.window_block_indexes, "window_size": config.window_size,
"use_rel_pos": config.use_rel_pos, "lewei_scale": config.lewei_scale, 'config':config,
'model_max_length': config.model_max_length}
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps))
model: PixArtMS = build_model(config.model,
config.grad_checkpointing,
config.get('fp32_attention', False),
input_size=latent_size,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
**model_kwargs)
if config.load_from is not None and args.resume_from is None:
# load from PixArt model
missing, unexpected = load_checkpoint(config.load_from, model)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
if image_size == 1024:
model: ControlPixArtMSHalf = ControlPixArtMSHalf(model, copy_blocks_num=config.copy_blocks_num).train()
else:
model: ControlPixArtHalf = ControlPixArtHalf(model, copy_blocks_num=config.copy_blocks_num).train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
logger.info(f"T5 max token length: {config.model_max_length}")
# if args.local_rank == 0:
# for name, params in model.named_parameters():
# if params.requires_grad == False: logger.info(f"freeze param: {name}")
#
# for name, params in model.named_parameters():
# if params.requires_grad == True: logger.info(f"trainable param: {name}")
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type, train_ratio=config.train_ratio)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=1)
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer, **config.auto_lr)
optimizer = build_optimizer(model.controlnet, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
tracker_config = dict(vars(config))
try:
accelerator.init_trackers(args.tracker_project_name, tracker_config)
except:
accelerator.init_trackers(f"tb_{timestamp}")
start_epoch = 0
if config.resume_from is not None and config.resume_from['checkpoint'] is not None:
if args.resume_optimizer == False or args.resume_lr_scheduler == False:
missing, unexpected = load_checkpoint(args.resume_from, model)
else:
start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,
model=model,
optimizer=optimizer,
lr_scheduler=lr_scheduler,
)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model = accelerator.prepare(model,)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
train()
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_diffusers.py
================================================
import argparse
import datetime
import os
import sys
import time
import types
import warnings
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import accelerate
import gc
import numpy as np
import torch
import torch.nn as nn
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from copy import deepcopy
from diffusers import AutoencoderKL, Transformer2DModel, PixArtAlphaPipeline, DPMSolverMultistepScheduler
from mmcv.runner import LogBuffer
from packaging import version
from torch.utils.data import RandomSampler
from transformers import T5Tokenizer, T5EncoderModel
from diffusion import IDDPM
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
from diffusion.utils.dist_utils import get_world_size, clip_grad_norm_, flush
from diffusion.utils.logger import get_root_logger, rename_file_with_creation_time
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
warnings.filterwarnings("ignore") # ignore warning
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'Transformer2DModel'
def ema_update(model_dest: nn.Module, model_src: nn.Module, rate):
param_dict_src = dict(model_src.named_parameters())
for p_name, p_dest in model_dest.named_parameters():
p_src = param_dict_src[p_name]
assert p_src is not p_dest
p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)
def token_drop(y, y_mask, force_drop_ids=None):
"""
Drops labels to enable classifier-free guidance.
"""
if force_drop_ids is None:
drop_ids = torch.rand(y.shape[0]).cuda() < config.class_dropout_prob
else:
drop_ids = force_drop_ids == 1
y = torch.where(drop_ids[:, None, None], uncond_prompt_embeds, y)
y_mask = torch.where(drop_ids[:, None], uncond_prompt_attention_mask, y_mask)
return y, y_mask
def get_null_embed(npz_file, max_length=120):
if os.path.exists(npz_file) and (npz_file.endswith('.npz') or npz_file.endswith('.pth')):
data = torch.load(npz_file)
uncond_prompt_embeds = data['uncond_prompt_embeds'].to(accelerator.device)
uncond_prompt_attention_mask = data['uncond_prompt_attention_mask'].to(accelerator.device)
else:
tokenizer = T5Tokenizer.from_pretrained(args.pipeline_load_from, subfolder="tokenizer")
text_encoder = T5EncoderModel.from_pretrained(args.pipeline_load_from, subfolder="text_encoder")
uncond = tokenizer("", max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
uncond_prompt_embeds = text_encoder(uncond.input_ids, attention_mask=uncond.attention_mask)[0]
torch.save({
'uncond_prompt_embeds': uncond_prompt_embeds.cpu(),
'uncond_prompt_attention_mask': uncond.attention_mask.cpu()
}, npz_file)
uncond_prompt_embeds = uncond_prompt_embeds.to(accelerator.device)
uncond_prompt_attention_mask = uncond.attention_mask.to(accelerator.device)
return uncond_prompt_embeds, uncond_prompt_attention_mask
def prepare_vis():
if accelerator.is_main_process:
# preparing embeddings for visualization. We put it here for saving GPU memory
validation_prompts = [
"dog",
"portrait photo of a girl, photograph, highly detailed face, depth of field",
"Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
"A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece",
]
logger.info("Preparing Visualization prompt embeddings...")
logger.info(f"Loading text encoder and tokenizer from {args.pipeline_load_from} ...")
skip = True
for prompt in validation_prompts:
if not os.path.exists(f'output/tmp/{prompt}_{max_length}token.pth'):
skip = False
break
if accelerator.is_main_process and not skip:
print(f"Saving visualizate prompt text embedding at output/tmp/")
tokenizer = T5Tokenizer.from_pretrained(args.pipeline_load_from, subfolder="tokenizer")
text_encoder = T5EncoderModel.from_pretrained(args.pipeline_load_from, subfolder="text_encoder").to(accelerator.device)
for prompt in validation_prompts:
caption_token = tokenizer(prompt, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt").to(accelerator.device)
caption_emb = text_encoder(caption_token.input_ids, attention_mask=caption_token.attention_mask)[0]
torch.save({'caption_embeds': caption_emb, 'emb_mask': caption_token.attention_mask}, f'output/tmp/{prompt}_{max_length}token.pth')
flush()
@torch.inference_mode()
def log_validation(model, accelerator, weight_dtype, step):
logger.info("Running validation... ")
model = accelerator.unwrap_model(model)
pipeline = PixArtAlphaPipeline.from_pretrained(
args.pipeline_load_from,
transformer=model,
tokenizer=None,
text_encoder=None,
torch_dtype=weight_dtype,
)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
generator = torch.Generator(device=accelerator.device).manual_seed(0)
validation_prompts = [
"dog",
"portrait photo of a girl, photograph, highly detailed face, depth of field",
"Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
"A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece",
]
image_logs = []
images = []
latents = []
for _, prompt in enumerate(validation_prompts):
embed = torch.load(f'output/tmp/{prompt}_{max_length}token.pth', map_location='cpu')
caption_embs, emb_masks = embed['caption_embeds'].to(accelerator.device), embed['emb_mask'].to(accelerator.device)
latents.append(pipeline(
num_inference_steps=14,
num_images_per_prompt=1,
generator=generator,
guidance_scale=4.5,
prompt_embeds=caption_embs,
prompt_attention_mask=emb_masks,
negative_prompt=None,
negative_prompt_embeds=uncond_prompt_embeds,
negative_prompt_attention_mask=uncond_prompt_attention_mask,
output_type="latent",
).images)
flush()
for latent in latents:
images.append(pipeline.vae.decode(latent.to(weight_dtype) / pipeline.vae.config.scaling_factor, return_dict=False)[0])
for prompt, image in zip(validation_prompts, images):
image = pipeline.image_processor.postprocess(image, output_type="pil")
image_logs.append({"validation_prompt": prompt, "images": image})
for tracker in accelerator.trackers:
if tracker.name == "tensorboard":
for log in image_logs:
images = log["images"]
validation_prompt = log["validation_prompt"]
formatted_images = []
for image in images:
formatted_images.append(np.asarray(image))
formatted_images = np.stack(formatted_images)
tracker.writer.add_images(validation_prompt, formatted_images, step, dataformats="NHWC")
elif tracker.name == "wandb":
import wandb
formatted_images = []
for log in image_logs:
images = log["images"]
validation_prompt = log["validation_prompt"]
for image in images:
image = wandb.Image(image, caption=validation_prompt)
formatted_images.append(image)
tracker.log({"validation": formatted_images})
else:
logger.warn(f"image logging not implemented for {tracker.name}")
del pipeline
gc.collect()
torch.cuda.empty_cache()
return image_logs
def train(model):
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
global_step = start_step + 1
load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
if load_vae_feat:
z = batch[0]
else:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):
posterior = vae.encode(batch[0]).latent_dist
if config.sample_posterior:
z = posterior.sample()
else:
z = posterior.mode()
latents = (z * config.scale_factor).to(weight_dtype)
y = batch[1].squeeze(1).to(weight_dtype)
y_mask = batch[2].squeeze(1).squeeze(1).to(weight_dtype)
y, y_mask = token_drop(y, y_mask) # classifier-free guidance
data_info = {'resolution': batch[3]['img_hw'].to(weight_dtype), 'aspect_ratio': batch[3]['aspect_ratio'].to(weight_dtype),}
# Sample a random timestep for each image
bs = latents.shape[0]
timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=latents.device).long()
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
loss_term = train_diffusion.training_losses_diffusers(
model, latents, timesteps,
model_kwargs = dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),
)
loss = loss_term['loss'].mean()
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
# if accelerator.sync_gradients:
# ema_update(model_ema, accelerator.unwrap_model(model), config.ema_rate)
lr = lr_scheduler.get_last_lr()[0]
logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step - start_step)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{global_step}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}," \
f"s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), "
# f"s:({data_info['resolution'][0][0].item() * relative_to_1024 // 8}, {data_info['resolution'][0][1].item() * relative_to_1024 // 8}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step)
global_step += 1
data_time_start= time.time()
accelerator.wait_for_everyone()
if accelerator.is_main_process:
if global_step % config.save_model_steps == 0:
save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f"checkpoint-{global_step}")
os.umask(0o000)
logger.info(f"Start to save state to {save_path}")
accelerator.save_state(save_path)
logger.info(f"Saved state to {save_path}")
if global_step % config.eval_sampling_steps == 0 or (step + 1) == 1:
log_validation(model, accelerator, weight_dtype, global_step)
accelerator.wait_for_everyone()
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000)
save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f"checkpoint-{global_step}")
logger.info(f"Start to save state to {save_path}")
model = accelerator.unwrap_model(model)
model.save_pretrained(save_path)
logger.info(f"Saved state to {save_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from', help='the dir to resume the training')
parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
parser.add_argument("--pipeline_load_from", default='output/pretrained_models/pixart_omega_sdxl_256px_diffusers_from512', type=str, help="path for loading text_encoder, tokenizer and vae")
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--tracker_project_name",
type=str,
default="text2image-pixart-omega",
help=(
"The `project_name` argument passed to Accelerator.init_trackers for"
" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
),
)
parser.add_argument("--loss_report_name", type=str, default="loss")
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.resume_from is not None:
config.resume_from = args.resume_from
if args.debug:
config.log_interval = 1
config.train_batch_size = 32
config.valid_num = 100
os.umask(0o000)
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with=args.report_to,
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
log_name = 'train_log.log'
if accelerator.is_main_process:
if os.path.exists(os.path.join(config.work_dir, log_name)):
rename_file_with_creation_time(os.path.join(config.work_dir, log_name))
logger = get_root_logger(os.path.join(config.work_dir, log_name))
logger.info(accelerator.state)
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512, 1024]
latent_size = int(image_size) // 8
relative_to_1024 = float(image_size / 1024)
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
# Create for unconditional prompt embedding for classifier free guidance
logger.info("Embedding for classifier free guidance")
max_length = config.model_max_length
uncond_prompt_embeds, uncond_prompt_attention_mask = get_null_embed(
f'output/pretrained_models/null_embed_diffusers_{max_length}token.pth', max_length=max_length
)
# preparing embeddings for visualization. We put it here for saving GPU memory
prepare_vis()
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)
model = Transformer2DModel.from_pretrained(config.load_from, subfolder="transformer").train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
logger.info(f"lewei scale: {model.pos_embed.interpolation_scale} base size: {model.pos_embed.base_size}")
# model_ema = deepcopy(model).eval()
# 9. Handle mixed precision and device placement
# For mixed precision training we cast all non-trainable weigths to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# 11. Enable optimizations
# model.enable_xformers_memory_efficient_attention() # not available for now
# for name, params in model.named_parameters():
# if params.requires_grad == False: logger.info(f"freeze param: {name}")
#
# for name, params in model.named_parameters():
# if params.requires_grad == True: logger.info(f"trainable param: {name}")
# 10. Handle saving and loading of checkpoints
# `accelerate` 0.16.0 will have better support for customized saving
if version.parse(accelerate.__version__) >= version.parse("0.16.0"):
# create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format
def save_model_hook(models, weights, output_dir):
if accelerator.is_main_process:
transformer_ = accelerator.unwrap_model(models[0])
# save weights in peft format to be able to load them back
transformer_.save_pretrained(output_dir)
for _, model in enumerate(models):
# make sure to pop weight so that corresponding model is not saved again
weights.pop()
def load_model_hook(models, input_dir):
for i in range(len(models)):
# pop models so that they are not loaded again
model = models.pop()
# load diffusers style into model
load_model = Transformer2DModel.from_pretrained(input_dir)
model.register_to_config(**load_model.config)
model.load_state_dict(load_model.state_dict())
del load_model
accelerator.register_save_state_pre_hook(save_model_hook)
accelerator.register_load_state_pre_hook(load_model_hook)
if config.grad_checkpointing:
model.enable_gradient_checkpointing()
if not config.data.load_vae_feat:
vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
logger.info(f"ratio of real user prompt: {config.real_prompt_ratio}")
dataset = build_dataset(
config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type,
real_prompt_ratio=config.real_prompt_ratio, max_length=max_length, config=config,
)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)
# used for balanced sampling
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer, **config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
tracker_config = dict(vars(config))
accelerator.init_trackers(f"tb_{timestamp}_{args.tracker_project_name}")
logger.info(f"Training tracker at tb_{timestamp}_{args.tracker_project_name}")
start_epoch = 0
start_step = 0
total_steps = len(train_dataloader) * config.num_epochs
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
# model, model_ema = accelerator.prepare(model, model_ema)
model = accelerator.prepare(model)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
if config.resume_from is not None:
if config.resume_from != "latest":
path = os.path.basename(config.resume_from)
else:
# Get the most recent checkpoint
dirs = os.listdir(os.path.join(config.work_dir, 'checkpoints'))
dirs = [d for d in dirs if d.startswith("checkpoint")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
path = dirs[-1] if len(dirs) > 0 else None
if path is None:
accelerator.print(f"Checkpoint '{config.resume_from}' does not exist. Starting a new training run.")
config.resume_from = None
else:
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(config.work_dir, 'checkpoints', path))
start_step = int(path.split("-")[1])
start_epoch = start_step // len(train_dataloader)
train(model)
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_dreambooth.py
================================================
import os
import sys
import types
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import argparse
import datetime
import time
import warnings
warnings.filterwarnings("ignore") # ignore warning
from mmcv.runner import LogBuffer
from copy import deepcopy
from diffusion.utils.checkpoint import save_checkpoint, load_checkpoint
import torch
import torch.nn as nn
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from torch.utils.data import RandomSampler
from diffusion import IDDPM
from diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.model.builder import build_model
from diffusion.utils.logger import get_root_logger
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.model.t5 import T5Embedder
from diffusion.utils.data_sampler import AspectRatioBatchSampler
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def ema_update(model_dest: nn.Module, model_src: nn.Module, rate):
param_dict_src = dict(model_src.named_parameters())
for p_name, p_dest in model_dest.named_parameters():
p_src = param_dict_src[p_name]
assert p_src is not p_dest
p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)
def train():
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
start_step = start_epoch * len(train_dataloader)
global_step = 0
total_steps = len(train_dataloader) * config.num_epochs
# txt related
prompt = config.data.prompt if isinstance(config.data.prompt, list) else [config.data.prompt]
llm_embed_model = T5Embedder(device="cpu", local_cache=True, cache_dir='output/pretrained_models/t5_ckpts', torch_dtype=torch.float)
prompt_embs, attention_mask = llm_embed_model.get_text_embeddings(prompt)
prompt_embs, attention_mask = prompt_embs[None].cuda(), attention_mask[None].cuda()
del llm_embed_model
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
z = batch[0]
clean_images = z * config.scale_factor
y = prompt_embs
y_mask = attention_mask
data_info = batch[1]
# Sample a random timestep for each image
bs = clean_images.shape[0]
timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))
loss = loss_term['loss'].mean()
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
if accelerator.sync_gradients:
ema_update(model_ema, model, config.ema_rate)
lr = lr_scheduler.get_last_lr()[0]
logs = {"loss": accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Steps [{(epoch-1)*len(train_dataloader)+step+1}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
global_step += 1
data_time_start= time.time()
synchronize()
if accelerator.is_main_process:
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
synchronize()
if accelerator.is_main_process:
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from', help='the dir to resume the training')
parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
parser.add_argument('--save_step', type=int, default=100)
parser.add_argument('--lr', type=float, default=5e-6)
parser.add_argument('--train_class', type=str)
parser.add_argument('--prompt', type=str, default='a photo of sks dog')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.resume_from is not None:
config.resume_from = dict(
checkpoint=args.resume_from,
load_ema=False,
resume_optimizer=True,
resume_lr_scheduler=True)
if args.debug:
config.log_interval = 1
config.train_batch_size = 1
config.save_model_steps=args.save_step
config.data.update({'prompt': [args.prompt], 'root': args.train_class})
config.optimizer.update({'lr': args.lr})
os.umask(0o000)
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with="tensorboard",
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
model_kwargs={"window_block_indexes": config.window_block_indexes, "window_size": config.window_size,
"use_rel_pos": config.use_rel_pos, "lewei_scale": config.lewei_scale, 'config':config,
'model_max_length': config.model_max_length}
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps))
eval_diffusion = IDDPM(str(config.eval_sampling_steps))
model = build_model(config.model,
config.grad_checkpointing,
config.get('fp32_attention', False),
input_size=latent_size,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
**model_kwargs).train()
logger.info(f"{config.model} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
model_ema = deepcopy(model).eval()
if config.load_from is not None:
if args.load_from is not None:
config.load_from = args.load_from
missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))
# model.reparametrize()
if accelerator.is_main_process:
print('Warning Missing keys: ', missing)
print('Warning Unexpected keys', unexpected)
ema_update(model_ema, model, 0.)
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
logger.warning(f"Training prompt: {config.data['prompt']}, Training data class: {config.data['root']}")
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=1)
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer,
**config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
accelerator.init_trackers(f"tb_{timestamp}")
start_epoch = 0
if config.resume_from is not None and config.resume_from['checkpoint'] is not None:
start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,
model=model,
model_ema=model_ema,
optimizer=optimizer,
lr_scheduler=lr_scheduler,
)
if accelerator.is_main_process:
print('Warning Missing keys: ', missing)
print('Warning Unexpected keys', unexpected)
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model, model_ema = accelerator.prepare(model, model_ema)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
train()
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_pixart_lcm.py
================================================
import os
import sys
import types
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import argparse
import datetime
import time
import warnings
warnings.filterwarnings("ignore") # ignore warning
import torch
import torch.nn as nn
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from diffusers.models import AutoencoderKL
from torch.utils.data import RandomSampler
from mmcv.runner import LogBuffer
from copy import deepcopy
import numpy as np
import torch.nn.functional as F
from tqdm import tqdm
from diffusion import IDDPM
from diffusion.utils.checkpoint import save_checkpoint, load_checkpoint
from diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.model.builder import build_model
from diffusion.utils.logger import get_root_logger
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
from diffusion.lcm_scheduler import LCMScheduler
from torchvision.utils import save_image
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def ema_update(model_dest: nn.Module, model_src: nn.Module, rate):
param_dict_src = dict(model_src.named_parameters())
for p_name, p_dest in model_dest.named_parameters():
p_src = param_dict_src[p_name]
assert p_src is not p_dest
p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)
def append_dims(x, target_dims):
"""Appends dimensions to the end of a tensor until it has target_dims dimensions."""
dims_to_append = target_dims - x.ndim
if dims_to_append < 0:
raise ValueError(f"input has {x.ndim} dims but target_dims is {target_dims}, which is less")
return x[(...,) + (None,) * dims_to_append]
# From LCMScheduler.get_scalings_for_boundary_condition_discrete
def scalings_for_boundary_conditions(timestep, sigma_data=0.5, timestep_scaling=10.0):
c_skip = sigma_data**2 / ((timestep / 0.1) ** 2 + sigma_data**2)
c_out = (timestep / 0.1) / ((timestep / 0.1) ** 2 + sigma_data**2) ** 0.5
return c_skip, c_out
def extract_into_tensor(a, t, x_shape):
b, *_ = t.shape
out = a.gather(-1, t)
return out.reshape(b, *((1,) * (len(x_shape) - 1)))
class DDIMSolver:
def __init__(self, alpha_cumprods, timesteps=1000, ddim_timesteps=50):
# DDIM sampling parameters
step_ratio = timesteps // ddim_timesteps
self.ddim_timesteps = (np.arange(1, ddim_timesteps + 1) * step_ratio).round().astype(np.int64) - 1
self.ddim_alpha_cumprods = alpha_cumprods[self.ddim_timesteps]
self.ddim_alpha_cumprods_prev = np.asarray(
[alpha_cumprods[0]] + alpha_cumprods[self.ddim_timesteps[:-1]].tolist()
)
# convert to torch tensors
self.ddim_timesteps = torch.from_numpy(self.ddim_timesteps).long()
self.ddim_alpha_cumprods = torch.from_numpy(self.ddim_alpha_cumprods)
self.ddim_alpha_cumprods_prev = torch.from_numpy(self.ddim_alpha_cumprods_prev)
def to(self, device):
self.ddim_timesteps = self.ddim_timesteps.to(device)
self.ddim_alpha_cumprods = self.ddim_alpha_cumprods.to(device)
self.ddim_alpha_cumprods_prev = self.ddim_alpha_cumprods_prev.to(device)
return self
def ddim_step(self, pred_x0, pred_noise, timestep_index):
alpha_cumprod_prev = extract_into_tensor(self.ddim_alpha_cumprods_prev, timestep_index, pred_x0.shape)
dir_xt = (1.0 - alpha_cumprod_prev).sqrt() * pred_noise
x_prev = alpha_cumprod_prev.sqrt() * pred_x0 + dir_xt
return x_prev
@torch.no_grad()
def log_validation(model, step, device):
if hasattr(model, 'module'):
model = model.module
scheduler = LCMScheduler(beta_start=0.0001, beta_end=0.02, beta_schedule="linear", prediction_type="epsilon")
scheduler.set_timesteps(4, 50)
infer_timesteps = scheduler.timesteps
dog_embed = torch.load('data/tmp/dog.pth', map_location='cpu')
caption_embs, emb_masks = dog_embed['dog_text'].to(device), dog_embed['dog_mask'].to(device)
hw = torch.tensor([[1024, 1024]], dtype=torch.float, device=device).repeat(1, 1)
ar = torch.tensor([[1.]], device=device).repeat(1, 1)
# Create sampling noise:
infer_latents = torch.randn(1, 4, 1024, 1024, device=device)
model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)
logger.info("Running validation... ")
# 7. LCM MultiStep Sampling Loop:
for i, t in tqdm(list(enumerate(infer_timesteps))):
ts = torch.full((1,), t, device=device, dtype=torch.long)
# model prediction (v-prediction, eps, x)
model_pred = model(infer_latents, ts, caption_embs, **model_kwargs)[:, :4]
# compute the previous noisy sample x_t -> x_t-1
infer_latents, denoised = scheduler.step(model_pred, i, t, infer_latents, return_dict=False)
samples = vae.decode(denoised / 0.18215).sample
torch.cuda.empty_cache()
save_image(samples[0], f'output_cv/vis/{step}.jpg', nrow=1, normalize=True, value_range=(-1, 1))
def train():
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
start_step = start_epoch * len(train_dataloader)
global_step = 0
total_steps = len(train_dataloader) * config.num_epochs
load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
# Create uncond embeds for classifier free guidance
uncond_prompt_embeds = model.module.y_embedder.y_embedding.repeat(config.train_batch_size, 1, 1, 1)
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
if load_vae_feat:
z = batch[0]
else:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):
posterior = vae.encode(batch[0]).latent_dist
if config.sample_posterior:
z = posterior.sample()
else:
z = posterior.mode()
latents = z * config.scale_factor
y = batch[1]
y_mask = batch[2]
data_info = batch[3]
# Sample a random timestep for each image
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image t_n ~ U[0, N - k - 1] without bias.
topk = config.train_sampling_steps // config.num_ddim_timesteps
index = torch.randint(0, config.num_ddim_timesteps, (bsz,), device=latents.device).long()
start_timesteps = solver.ddim_timesteps[index]
timesteps = start_timesteps - topk
timesteps = torch.where(timesteps < 0, torch.zeros_like(timesteps), timesteps)
# Get boundary scalings for start_timesteps and (end) timesteps.
c_skip_start, c_out_start = scalings_for_boundary_conditions(start_timesteps)
c_skip_start, c_out_start = [append_dims(x, latents.ndim) for x in [c_skip_start, c_out_start]]
c_skip, c_out = scalings_for_boundary_conditions(timesteps)
c_skip, c_out = [append_dims(x, latents.ndim) for x in [c_skip, c_out]]
# Sample a random guidance scale w from U[w_min, w_max] and embed it
# w = (config.w_max - config.w_min) * torch.rand((bsz,)) + config.w_min
w = config.cfg_scale * torch.ones((bsz,))
w = w.reshape(bsz, 1, 1, 1)
w = w.to(device=latents.device, dtype=latents.dtype)
# Get online LCM prediction on z_{t_{n + k}}, w, c, t_{n + k}
_, pred_x_0, noisy_model_input = train_diffusion.training_losses(model, latents, start_timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), noise=noise)
model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0
# Use the ODE solver to predict the kth step in the augmented PF-ODE trajectory after
# noisy_latents with both the conditioning embedding c and unconditional embedding 0
# Get teacher model prediction on noisy_latents and conditional embedding
with torch.no_grad():
with torch.autocast("cuda"):
cond_teacher_output, cond_pred_x0, _ = train_diffusion.training_losses(model_teacher, latents, start_timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), noise=noise)
# Get teacher model prediction on noisy_latents and unconditional embedding
uncond_teacher_output, uncond_pred_x0, _ = train_diffusion.training_losses(model_teacher, latents, start_timesteps, model_kwargs=dict(y=uncond_prompt_embeds, mask=y_mask, data_info=data_info), noise=noise)
# Perform "CFG" to get x_prev estimate (using the LCM paper's CFG formulation)
pred_x0 = cond_pred_x0 + w * (cond_pred_x0 - uncond_pred_x0)
pred_noise = cond_teacher_output + w * (cond_teacher_output - uncond_teacher_output)
x_prev = solver.ddim_step(pred_x0, pred_noise, index)
# Get target LCM prediction on x_prev, w, c, t_n
with torch.no_grad():
with torch.autocast("cuda", enabled=True):
_, pred_x_0, _ = train_diffusion.training_losses(model_ema, x_prev.float(), timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), skip_noise=True)
target = c_skip * x_prev + c_out * pred_x_0
# Calculate loss
if config.loss_type == "l2":
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
elif config.loss_type == "huber":
loss = torch.mean(torch.sqrt((model_pred.float() - target.float()) ** 2 + config.huber_c**2) - config.huber_c)
# Backpropagation on the online student model (`model`)
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad(set_to_none=True)
if accelerator.sync_gradients:
ema_update(model_ema, model, config.ema_decay)
lr = lr_scheduler.get_last_lr()[0]
logs = {"loss": accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
global_step += 1
data_time_start= time.time()
synchronize()
torch.cuda.empty_cache()
if accelerator.is_main_process:
# log_validation(model_ema, step, model.device)
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
synchronize()
if accelerator.is_main_process:
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000)
save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),
epoch=epoch,
step=(epoch - 1) * len(train_dataloader) + step + 1,
model=accelerator.unwrap_model(model),
model_ema=accelerator.unwrap_model(model_ema),
optimizer=optimizer,
lr_scheduler=lr_scheduler
)
synchronize()
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from', help='the dir to resume the training')
parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')
parser.add_argument('--local-rank', type=int, default=-1)
parser.add_argument('--local_rank', type=int, default=-1)
parser.add_argument('--debug', action='store_true')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.resume_from is not None:
config.load_from = None
config.resume_from = dict(
checkpoint=args.resume_from,
load_ema=False,
resume_optimizer=True,
resume_lr_scheduler=True)
if args.debug:
config.log_interval = 1
config.train_batch_size = 11
config.valid_num = 100
config.load_from = None
os.umask(0o000)
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with="tensorboard",
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
model_kwargs={"window_block_indexes": config.window_block_indexes, "window_size": config.window_size,
"use_rel_pos": config.use_rel_pos, "lewei_scale": config.lewei_scale, 'config':config,
'model_max_length': config.model_max_length}
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma,
snr=config.snr_loss, return_startx=True)
model = build_model(config.model,
config.grad_checkpointing,
config.get('fp32_attention', False),
input_size=latent_size,
learn_sigma=learn_sigma,
pred_sigma=pred_sigma,
**model_kwargs).train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}")
if config.load_from is not None:
if args.load_from is not None:
config.load_from = args.load_from
missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
model_ema = deepcopy(model).eval()
model_teacher = deepcopy(model).eval()
if not config.data.load_vae_feat:
vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)
# used for balanced sampling
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer,
**config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
accelerator.init_trackers(f"tb_{timestamp}")
start_epoch = 0
if config.resume_from is not None and config.resume_from['checkpoint'] is not None:
start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,
model=model,
model_ema=model_ema,
optimizer=optimizer,
lr_scheduler=lr_scheduler,
)
logger.warning(f'Missing keys: {missing}')
logger.warning(f'Unexpected keys: {unexpected}')
solver = DDIMSolver(train_diffusion.alphas_cumprod, timesteps=config.train_sampling_steps, ddim_timesteps=config.num_ddim_timesteps)
solver.to(accelerator.device)
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model, model_ema, model_teacher = accelerator.prepare(model, model_ema, model_teacher)
# model, model_ema = accelerator.prepare(model, model_ema)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
train()
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_pixart_lcm_lora.py
================================================
import os
import sys
import types
from pathlib import Path
current_file_path = Path(__file__).resolve()
sys.path.insert(0, str(current_file_path.parent.parent))
import argparse
import datetime
import time
import warnings
warnings.filterwarnings("ignore") # ignore warning
import torch
from accelerate import Accelerator, InitProcessGroupKwargs
from accelerate.utils import DistributedType
from torch.utils.data import RandomSampler
from mmcv.runner import LogBuffer
import torch.nn.functional as F
import numpy as np
import re
from packaging import version
import accelerate
from diffusion import IDDPM
from diffusion.utils.dist_utils import get_world_size, clip_grad_norm_
from diffusion.data.builder import build_dataset, build_dataloader, set_data_root
from diffusion.utils.logger import get_root_logger
from diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow
from diffusion.utils.optimizer import build_optimizer, auto_scale_lr
from diffusion.utils.lr_scheduler import build_lr_scheduler
from diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler
from peft import LoraConfig, get_peft_model, get_peft_model_state_dict
from diffusers import AutoencoderKL, Transformer2DModel, StableDiffusionPipeline, PixArtAlphaPipeline
def set_fsdp_env():
os.environ["ACCELERATE_USE_FSDP"] = 'true'
os.environ["FSDP_AUTO_WRAP_POLICY"] = 'TRANSFORMER_BASED_WRAP'
os.environ["FSDP_BACKWARD_PREFETCH"] = 'BACKWARD_PRE'
os.environ["FSDP_TRANSFORMER_CLS_TO_WRAP"] = 'PixArtBlock'
def filter_keys(key_set):
def _f(dictionary):
return {k: v for k, v in dictionary.items() if k in key_set}
return _f
def append_dims(x, target_dims):
"""Appends dimensions to the end of a tensor until it has target_dims dimensions."""
dims_to_append = target_dims - x.ndim
if dims_to_append < 0:
raise ValueError(f"input has {x.ndim} dims but target_dims is {target_dims}, which is less")
return x[(...,) + (None,) * dims_to_append]
# From LCMScheduler.get_scalings_for_boundary_condition_discrete
def scalings_for_boundary_conditions(timestep, sigma_data=0.5, timestep_scaling=10.0):
c_skip = sigma_data**2 / ((timestep / 0.1) ** 2 + sigma_data**2)
c_out = (timestep / 0.1) / ((timestep / 0.1) ** 2 + sigma_data**2) ** 0.5
return c_skip, c_out
# Compare LCMScheduler.step, Step 4
def predicted_origin(model_output, timesteps, sample, prediction_type, alphas, sigmas):
if prediction_type == "epsilon":
sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)
alphas = extract_into_tensor(alphas, timesteps, sample.shape)
pred_x_0 = (sample - sigmas * model_output) / alphas
elif prediction_type == "v_prediction":
sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)
alphas = extract_into_tensor(alphas, timesteps, sample.shape)
pred_x_0 = alphas * sample - sigmas * model_output
else:
raise ValueError(f"Prediction type {prediction_type} currently not supported.")
return pred_x_0
def extract_into_tensor(a, t, x_shape):
b, *_ = t.shape
out = a.gather(-1, t)
return out.reshape(b, *((1,) * (len(x_shape) - 1)))
class DDIMSolver:
def __init__(self, alpha_cumprods, timesteps=1000, ddim_timesteps=50):
# DDIM sampling parameters
step_ratio = timesteps // ddim_timesteps
self.ddim_timesteps = (np.arange(1, ddim_timesteps + 1) * step_ratio).round().astype(np.int64) - 1
self.ddim_alpha_cumprods = alpha_cumprods[self.ddim_timesteps]
self.ddim_alpha_cumprods_prev = np.asarray(
[alpha_cumprods[0]] + alpha_cumprods[self.ddim_timesteps[:-1]].tolist()
)
# convert to torch tensors
self.ddim_timesteps = torch.from_numpy(self.ddim_timesteps).long()
self.ddim_alpha_cumprods = torch.from_numpy(self.ddim_alpha_cumprods)
self.ddim_alpha_cumprods_prev = torch.from_numpy(self.ddim_alpha_cumprods_prev)
def to(self, device):
self.ddim_timesteps = self.ddim_timesteps.to(device)
self.ddim_alpha_cumprods = self.ddim_alpha_cumprods.to(device)
self.ddim_alpha_cumprods_prev = self.ddim_alpha_cumprods_prev.to(device)
return self
def ddim_step(self, pred_x0, pred_noise, timestep_index):
alpha_cumprod_prev = extract_into_tensor(self.ddim_alpha_cumprods_prev, timestep_index, pred_x0.shape)
dir_xt = (1.0 - alpha_cumprod_prev).sqrt() * pred_noise
x_prev = alpha_cumprod_prev.sqrt() * pred_x0 + dir_xt
return x_prev
def train(model):
if config.get('debug_nan', False):
DebugUnderflowOverflow(model)
logger.info('NaN debugger registered. Start to detect overflow during training.')
time_start, last_tic = time.time(), time.time()
log_buffer = LogBuffer()
global_step = start_step
load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)
# Create uncond embeds for classifier free guidance
uncond_prompt_embeds = torch.load('output/pretrained_models/null_embed.pth', map_location='cpu').to(accelerator.device).repeat(config.train_batch_size, 1, 1, 1)
# Now you train the model
for epoch in range(start_epoch + 1, config.num_epochs + 1):
data_time_start= time.time()
data_time_all = 0
for step, batch in enumerate(train_dataloader):
data_time_all += time.time() - data_time_start
if load_vae_feat:
z = batch[0]
else:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):
posterior = vae.encode(batch[0]).latent_dist
if config.sample_posterior:
z = posterior.sample()
else:
z = posterior.mode()
latents = (z * config.scale_factor).to(weight_dtype)
y = batch[1].squeeze(1).to(weight_dtype)
y_mask = batch[2].squeeze(1).squeeze(1).to(weight_dtype)
data_info = {'resolution': batch[3]['img_hw'].to(weight_dtype), 'aspect_ratio': batch[3]['aspect_ratio'].to(weight_dtype),}
# Sample a random timestep for each image
grad_norm = None
with accelerator.accumulate(model):
# Predict the noise residual
optimizer.zero_grad()
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image t_n ~ U[0, N - k - 1] without bias.
topk = config.train_sampling_steps // config.num_ddim_timesteps
index = torch.randint(0, config.num_ddim_timesteps, (bsz,), device=latents.device).long()
start_timesteps = solver.ddim_timesteps[index]
timesteps = start_timesteps - topk
timesteps = torch.where(timesteps < 0, torch.zeros_like(timesteps), timesteps)
# Get boundary scalings for start_timesteps and (end) timesteps.
c_skip_start, c_out_start = scalings_for_boundary_conditions(start_timesteps)
c_skip_start, c_out_start = [append_dims(x, latents.ndim) for x in [c_skip_start, c_out_start]]
c_skip, c_out = scalings_for_boundary_conditions(timesteps)
c_skip, c_out = [append_dims(x, latents.ndim) for x in [c_skip, c_out]]
# Sample a random guidance scale w from U[w_min, w_max] and embed it
# w = (config.w_max - config.w_min) * torch.rand((bsz,)) + config.w_min
w = config.cfg_scale * torch.ones((bsz,))
w = w.reshape(bsz, 1, 1, 1)
w = w.to(device=latents.device, dtype=latents.dtype)
# Get online LCM prediction on z_{t_{n + k}}, w, c, t_{n + k}
_, pred_x_0, noisy_model_input = train_diffusion.training_losses_diffusers(
model, latents, start_timesteps,
model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),
noise=noise
)
model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0
with torch.no_grad():
with torch.autocast("cuda"):
cond_teacher_output, cond_pred_x0, _ = train_diffusion.training_losses_diffusers(
model_teacher, latents, start_timesteps,
model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),
noise=noise
)
# Get teacher model prediction on noisy_latents and unconditional embedding
uncond_teacher_output, uncond_pred_x0, _ = train_diffusion.training_losses_diffusers(
model_teacher, latents, start_timesteps,
model_kwargs=dict(encoder_hidden_states=uncond_prompt_embeds, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),
noise=noise
)
# Perform "CFG" to get x_prev estimate (using the LCM paper's CFG formulation)
pred_x0 = cond_pred_x0 + w * (cond_pred_x0 - uncond_pred_x0)
pred_noise = cond_teacher_output + w * (cond_teacher_output - uncond_teacher_output)
x_prev = solver.ddim_step(pred_x0, pred_noise, index)
# Get target LCM prediction on x_prev, w, c, t_n
with torch.no_grad():
with torch.autocast("cuda", enabled=True):
_, pred_x_0, _ = train_diffusion.training_losses_diffusers(
model, x_prev.float(), timesteps,
model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),
skip_noise=True
)
target = c_skip * x_prev + c_out * pred_x_0
# Calculate loss
if config.loss_type == "l2":
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
elif config.loss_type == "huber":
loss = torch.mean(torch.sqrt((model_pred.float() - target.float()) ** 2 + config.huber_c**2) - config.huber_c)
accelerator.backward(loss)
if accelerator.sync_gradients:
grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad(set_to_none=True)
lr = lr_scheduler.get_last_lr()[0]
logs = {"loss": accelerator.gather(loss).mean().item()}
if grad_norm is not None:
logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())
log_buffer.update(logs)
if (step + 1) % config.log_interval == 0 or (step + 1) == 1:
t = (time.time() - last_tic) / config.log_interval
t_d = data_time_all / config.log_interval
avg_time = (time.time() - time_start) / (global_step + 1)
eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))
eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))
# avg_loss = sum(loss_buffer) / len(loss_buffer)
log_buffer.average()
info = f"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, " \
f"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), "
info += ', '.join([f"{k}:{v:.4f}" for k, v in log_buffer.output.items()])
logger.info(info)
last_tic = time.time()
log_buffer.clear()
data_time_all = 0
logs.update(lr=lr)
accelerator.log(logs, step=global_step + start_step)
global_step += 1
data_time_start= time.time()
accelerator.wait_for_everyone()
if accelerator.is_main_process:
if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:
save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f"checkpoint-{(epoch - 1) * len(train_dataloader) + step + 1}")
os.umask(0o000)
logger.info(f"Start to save state to {save_path}")
accelerator.save_state(save_path)
logger.info(f"Saved state to {save_path}")
accelerator.wait_for_everyone()
if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:
os.umask(0o000)
save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f"checkpoint-{(epoch - 1) * len(train_dataloader) + step + 1}")
logger.info(f"Start to save state to {save_path}")
model = accelerator.unwrap_model(model)
model.save_pretrained(save_path)
lora_state_dict = get_peft_model_state_dict(model, adapter_name="default")
StableDiffusionPipeline.save_lora_weights(os.path.join(save_path, "transformer_lora"), lora_state_dict)
logger.info(f"Saved state to {save_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("config", type=str, help="config")
parser.add_argument("--cloud", action='store_true', default=False, help="cloud or local machine")
parser.add_argument("--work-dir", default='output', help='the dir to save logs and models')
parser.add_argument("--resume-from", help='the dir to save logs and models')
parser.add_argument("--local-rank", type=int, default=-1)
parser.add_argument("--local_rank", type=int, default=-1)
parser.add_argument("--debug", action='store_true')
parser.add_argument("--lora_rank", type=int, default=64, help="The rank of the LoRA projection matrix.", )
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
config = read_config(args.config)
config.resume_from = None
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
config.work_dir = args.work_dir
if args.cloud:
config.data_root = '/data/data'
if args.resume_from is not None:
config.resume_from = args.resume_from
if args.debug:
config.log_interval = 1
config.train_batch_size = 4
config.valid_num = 10
config.save_model_steps = 10
os.umask(0o000)
os.makedirs(config.work_dir, exist_ok=True)
init_handler = InitProcessGroupKwargs()
init_handler.timeout = datetime.timedelta(seconds=5400) # change timeout to avoid a strange NCCL bug
# Initialize accelerator and tensorboard logging
if config.use_fsdp:
init_train = 'FSDP'
from accelerate import FullyShardedDataParallelPlugin
from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig
set_fsdp_env()
fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)
else:
init_train = 'DDP'
fsdp_plugin = None
even_batches = True
if config.multi_scale:
even_batches=False,
accelerator = Accelerator(
mixed_precision=config.mixed_precision,
gradient_accumulation_steps=config.gradient_accumulation_steps,
log_with="tensorboard",
project_dir=os.path.join(config.work_dir, "logs"),
fsdp_plugin=fsdp_plugin,
even_batches=even_batches,
kwargs_handlers=[init_handler]
)
logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))
logger.info(accelerator.state)
config.seed = init_random_seed(config.get('seed', None))
set_random_seed(config.seed)
if accelerator.is_main_process:
config.dump(os.path.join(config.work_dir, 'config.py'))
logger.info(f"Config: \n{config.pretty_text}")
logger.info(f"World_size: {get_world_size()}, seed: {config.seed}")
logger.info(f"Initializing: {init_train} for training")
image_size = config.image_size # @param [256, 512]
latent_size = int(image_size) // 8
pred_sigma = getattr(config, 'pred_sigma', True)
learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma
# prepare null_embedding for training
if not os.path.exists('output/pretrained_models/null_embed.pth'):
logger.info(f"Creating output/pretrained_models/null_embed.pth")
os.makedirs('output/pretrained_models/', exist_ok=True)
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16, use_safetensors=True,).to("cuda")
torch.save(pipe.encode_prompt(""), 'output/pretrained_models/null_embed.pth')
del pipe
torch.cuda.empty_cache()
# build models
train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, return_startx=True)
model_teacher = Transformer2DModel.from_pretrained(config.load_from, subfolder="transformer")
model_teacher.requires_grad_(False)
model = Transformer2DModel.from_pretrained(config.load_from, subfolder="transformer").train()
logger.info(f"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):}")
lora_config = LoraConfig(
r=config.lora_rank,
target_modules=[
"to_q",
"to_k",
"to_v",
"to_out.0",
"proj_in",
"proj_out",
"ff.net.0.proj",
"ff.net.2",
"proj",
"linear",
"linear_1",
"linear_2",
# "scale_shift_table", # not available due to the implementation in huggingface/peft, working on it.
],
)
print(lora_config)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 9. Handle mixed precision and device placement
# For mixed precision training we cast all non-trainable weigths to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# 11. Enable optimizations
# model.enable_xformers_memory_efficient_attention()
# model_teacher.enable_xformers_memory_efficient_attention()
lora_layers = filter(lambda p: p.requires_grad, model.parameters())
# for name, params in model.named_parameters():
# if params.requires_grad == False: logger.info(f"freeze param: {name}")
#
# for name, params in model.named_parameters():
# if params.requires_grad == True: logger.info(f"trainable param: {name}")
# 10. Handle saving and loading of checkpoints
# `accelerate` 0.16.0 will have better support for customized saving
if version.parse(accelerate.__version__) >= version.parse("0.16.0"):
# create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format
def save_model_hook(models, weights, output_dir):
if accelerator.is_main_process:
transformer_ = accelerator.unwrap_model(models[0])
lora_state_dict = get_peft_model_state_dict(transformer_, adapter_name="default")
StableDiffusionPipeline.save_lora_weights(os.path.join(output_dir, "transformer_lora"), lora_state_dict)
# save weights in peft format to be able to load them back
transformer_.save_pretrained(output_dir)
for _, model in enumerate(models):
# make sure to pop weight so that corresponding model is not saved again
weights.pop()
def load_model_hook(models, input_dir):
# load the LoRA into the model
transformer_ = accelerator.unwrap_model(models[0])
transformer_.load_adapter(input_dir, "default", is_trainable=True)
for _ in range(len(models)):
# pop models so that they are not loaded again
models.pop()
accelerator.register_save_state_pre_hook(save_model_hook)
accelerator.register_load_state_pre_hook(load_model_hook)
if config.grad_checkpointing:
model.enable_gradient_checkpointing()
if not config.data.load_vae_feat:
vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()
# prepare for FSDP clip grad norm calculation
if accelerator.distributed_type == DistributedType.FSDP:
for m in accelerator._models:
m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)
# build dataloader
set_data_root(config.data_root)
dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)
if config.multi_scale:
batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,
ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)
# used for balanced sampling
# batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,
# batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,
# ratio_nums=dataset.ratio_nums)
train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)
else:
train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)
# build optimizer and lr scheduler
lr_scale_ratio = 1
if config.get('auto_lr', None):
lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,
config.optimizer,
**config.auto_lr)
optimizer = build_optimizer(model, config.optimizer)
lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)
timestamp = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime())
if accelerator.is_main_process:
accelerator.init_trackers(f"tb_{timestamp}")
start_epoch = 0
start_step = 0
total_steps = len(train_dataloader) * config.num_epochs
solver = DDIMSolver(train_diffusion.alphas_cumprod, timesteps=config.train_sampling_steps, ddim_timesteps=config.num_ddim_timesteps)
solver.to(accelerator.device)
# Prepare everything
# There is no specific order to remember, you just need to unpack the
# objects in the same order you gave them to the prepare method.
model, model_teacher = accelerator.prepare(model, model_teacher)
optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)
if config.resume_from is not None:
if config.resume_from != "latest":
path = os.path.basename(config.resume_from)
else:
# Get the most recent checkpoint
dirs = os.listdir(os.path.join(config.work_dir, 'checkpoints'))
dirs = [d for d in dirs if d.startswith("checkpoint")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
path = dirs[-1] if len(dirs) > 0 else None
if path is None:
accelerator.print(f"Checkpoint '{config.resume_from}' does not exist. Starting a new training run.")
config.resume_from = None
else:
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(config.work_dir, 'checkpoints', path))
start_step = int(path.split("-")[1])
start_epoch = start_step // len(train_dataloader)
train(model)
================================================
FILE: PixArt-alpha-ToCa/train_scripts/train_pixart_lora_hf.py
================================================
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fine-tuning script for Stable Diffusion for text2image with support for LoRA."""
import argparse
import logging
import math
import os
import random
import shutil
from pathlib import Path
from typing import List, Union
import datasets
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
import accelerate
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import ProjectConfiguration, set_seed
from datasets import load_dataset
from huggingface_hub import create_repo, upload_folder
from packaging import version
from peft import LoraConfig, get_peft_model_state_dict, get_peft_model, PeftModel
from torchvision import transforms
from tqdm.auto import tqdm
import diffusers
from diffusers import AutoencoderKL, DDPMScheduler, DiffusionPipeline, StableDiffusionPipeline, PixArtAlphaPipeline, Transformer2DModel
from transformers import T5EncoderModel, T5Tokenizer
from diffusers.optimization import get_scheduler
from diffusers.training_utils import compute_snr
from diffusers.utils import check_min_version, is_wandb_available
from diffusers.utils.import_utils import is_xformers_available
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
check_min_version("0.25.0.dev0")
logger = get_logger(__name__, log_level="INFO")
# TODO: This function should be removed once training scripts are rewritten in PEFT
def text_encoder_lora_state_dict(text_encoder):
state_dict = {}
def text_encoder_attn_modules(text_encoder):
from transformers import CLIPTextModel, CLIPTextModelWithProjection
attn_modules = []
if isinstance(text_encoder, (CLIPTextModel, CLIPTextModelWithProjection)):
for i, layer in enumerate(text_encoder.text_model.encoder.layers):
name = f"text_model.encoder.layers.{i}.self_attn"
mod = layer.self_attn
attn_modules.append((name, mod))
return attn_modules
for name, module in text_encoder_attn_modules(text_encoder):
for k, v in module.q_proj.lora_linear_layer.state_dict().items():
state_dict[f"{name}.q_proj.lora_linear_layer.{k}"] = v
for k, v in module.k_proj.lora_linear_layer.state_dict().items():
state_dict[f"{name}.k_proj.lora_linear_layer.{k}"] = v
for k, v in module.v_proj.lora_linear_layer.state_dict().items():
state_dict[f"{name}.v_proj.lora_linear_layer.{k}"] = v
for k, v in module.out_proj.lora_linear_layer.state_dict().items():
state_dict[f"{name}.out_proj.lora_linear_layer.{k}"] = v
return state_dict
def save_model_card(repo_id: str, images=None, base_model=str, dataset_name=str, repo_folder=None):
img_str = ""
for i, image in enumerate(images):
image.save(os.path.join(repo_folder, f"image_{i}.png"))
img_str += f"\n"
yaml = f"""
---
license: creativeml-openrail-m
base_model: {base_model}
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- lora
inference: true
---
"""
model_card = f"""
# LoRA text2image fine-tuning - {repo_id}
These are LoRA adaption weights for {base_model}. The weights were fine-tuned on the {dataset_name} dataset. You can find some example images in the following. \n
{img_str}
"""
with open(os.path.join(repo_folder, "README.md"), "w") as f:
f.write(yaml + model_card)
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--revision",
type=str,
default=None,
required=False,
help="Revision of pretrained model identifier from huggingface.co/models.",
)
parser.add_argument(
"--variant",
type=str,
default=None,
help="Variant of the model files of the pretrained model identifier from huggingface.co/models, 'e.g.' fp16",
)
parser.add_argument(
"--dataset_name",
type=str,
default=None,
help=(
"The name of the Dataset (from the HuggingFace hub) to train on (could be your own, possibly private,"
" dataset). It can also be a path pointing to a local copy of a dataset in your filesystem,"
" or to a folder containing files that 🤗 Datasets can understand."
),
)
parser.add_argument(
"--dataset_config_name",
type=str,
default=None,
help="The config of the Dataset, leave as None if there's only one config.",
)
parser.add_argument(
"--train_data_dir",
type=str,
default=None,
help=(
"A folder containing the training data. Folder contents must follow the structure described in"
" https://huggingface.co/docs/datasets/image_dataset#imagefolder. In particular, a `metadata.jsonl` file"
" must exist to provide the captions for the images. Ignored if `dataset_name` is specified."
),
)
parser.add_argument(
"--image_column", type=str, default="image", help="The column of the dataset containing an image."
)
parser.add_argument(
"--caption_column",
type=str,
default="text",
help="The column of the dataset containing a caption or a list of captions.",
)
parser.add_argument(
"--validation_prompt", type=str, default=None, help="A prompt that is sampled during training for inference."
)
parser.add_argument(
"--num_validation_images",
type=int,
default=4,
help="Number of images that should be generated during validation with `validation_prompt`.",
)
parser.add_argument(
"--validation_epochs",
type=int,
default=1,
help=(
"Run fine-tuning validation every X epochs. The validation process consists of running the prompt"
" `args.validation_prompt` multiple times: `args.num_validation_images`."
),
)
parser.add_argument(
"--max_train_samples",
type=int,
default=None,
help=(
"For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
),
)
parser.add_argument(
"--output_dir",
type=str,
default="sd-model-finetuned-lora",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--cache_dir",
type=str,
default=None,
help="The directory where the downloaded models and datasets will be stored.",
)
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument(
"--resolution",
type=int,
default=512,
help=(
"The resolution for input images, all the images in the train/validation dataset will be resized to this"
" resolution"
),
)
parser.add_argument(
"--center_crop",
default=False,
action="store_true",
help=(
"Whether to center crop the input images to the resolution. If not set, the images will be randomly"
" cropped. The images will be resized to the resolution first before cropping."
),
)
parser.add_argument(
"--random_flip",
action="store_true",
help="whether to randomly flip images horizontally",
)
parser.add_argument(
"--train_batch_size", type=int, default=16, help="Batch size (per device) for the training dataloader."
)
parser.add_argument("--num_train_epochs", type=int, default=100)
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--gradient_checkpointing",
action="store_true",
help="Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=1e-6,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument(
"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
)
parser.add_argument(
"--lr_scheduler",
type=str,
default="constant",
help=(
'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
' "constant", "constant_with_warmup"]'
),
)
parser.add_argument(
"--lr_warmup_steps", type=int, default=500, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument(
"--snr_gamma",
type=float,
default=None,
help="SNR weighting gamma to be used if rebalancing the loss. Recommended value is 5.0. "
"More details here: https://arxiv.org/abs/2303.09556.",
)
parser.add_argument(
"--use_8bit_adam", action="store_true", help="Whether or not to use 8-bit Adam from bitsandbytes."
)
parser.add_argument(
"--use_dora",
action="store_true",
default=False,
help="Whether or not to use Dora. For more information, see"
" https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.use_dora"
)
parser.add_argument(
"--use_rslora",
action="store_true",
default=False,
help="Whether or not to use RS Lora. For more information, see"
" https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.use_rslora"
)
parser.add_argument(
"--allow_tf32",
action="store_true",
help=(
"Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see"
" https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices"
),
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
# ----Diffusion Training Arguments----
parser.add_argument(
"--proportion_empty_prompts",
type=float,
default=0,
help="Proportion of image prompts to be replaced with empty strings. Defaults to 0 (no prompt replacement).",
)
parser.add_argument(
"--prediction_type",
type=str,
default=None,
help="The prediction_type that shall be used for training. Choose between 'epsilon' or 'v_prediction' or leave `None`. If left to `None` the default prediction type of the scheduler: `noise_scheduler.config.prediciton_type` is chosen.",
)
parser.add_argument(
"--hub_model_id",
type=str,
default=None,
help="The name of the repository to keep in sync with the local `output_dir`.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
parser.add_argument(
"--checkpointing_steps",
type=int,
default=500,
help=(
"Save a checkpoint of the training state every X updates. These checkpoints are only suitable for resuming"
" training using `--resume_from_checkpoint`."
),
)
parser.add_argument(
"--checkpoints_total_limit",
type=int,
default=None,
help=("Max number of checkpoints to store."),
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help=(
"Whether training should be resumed from a previous checkpoint. Use a path saved by"
' `--checkpointing_steps`, or `"latest"` to automatically select the last available checkpoint.'
),
)
parser.add_argument(
"--enable_xformers_memory_efficient_attention", action="store_true", help="Whether or not to use xformers."
)
parser.add_argument("--noise_offset", type=float, default=0, help="The scale of noise offset.")
parser.add_argument(
"--rank",
type=int,
default=4,
help=("The dimension of the LoRA update matrices."),
)
parser.add_argument("--local-rank", type=int, default=-1)
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
# Sanity checks
if args.dataset_name is None and args.train_data_dir is None:
raise ValueError("Need either a dataset name or a training folder.")
if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:
raise ValueError("`--proportion_empty_prompts` must be in the range [0, 1].")
return args
DATASET_NAME_MAPPING = {"lambdalabs/pokemon-blip-captions": ("image", "text"),}
def main():
args = parse_args()
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
)
if args.report_to == "wandb":
if not is_wandb_available():
raise ImportError("Make sure to install wandb if you want to use it for logging during training.")
import wandb
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state, main_process_only=False)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_warning()
diffusers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
diffusers.utils.logging.set_verbosity_error()
# If passed along, set the training seed now.
if args.seed is not None:
set_seed(args.seed)
# Handle the repository creation
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
if args.push_to_hub:
repo_id = create_repo(repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token).repo_id
# See Section 3.1. of the paper.
max_length = 120
# For mixed precision training we cast all non-trainable weigths (vae, non-lora text_encoder and non-lora transformer) to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# Load scheduler, tokenizer and models.
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler", torch_dtype=weight_dtype)
tokenizer = T5Tokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer", revision=args.revision, torch_dtype=weight_dtype)
text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, torch_dtype=weight_dtype)
text_encoder.requires_grad_(False)
text_encoder.to(accelerator.device)
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae", revision=args.revision, variant=args.variant, torch_dtype=weight_dtype)
vae.requires_grad_(False)
vae.to(accelerator.device)
transformer = Transformer2DModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="transformer", torch_dtype=weight_dtype)
# freeze parameters of models to save more memory
transformer.requires_grad_(False)
# Freeze the transformer parameters before adding adapters
for param in transformer.parameters():
param.requires_grad_(False)
lora_config = LoraConfig(
r=args.rank,
init_lora_weights="gaussian",
target_modules=[
"to_k",
"to_q",
"to_v",
"to_out.0",
"proj_in",
"proj_out",
"ff.net.0.proj",
"ff.net.2",
"proj",
"linear",
"linear_1",
"linear_2",
# "scale_shift_table", # not available due to the implementation in huggingface/peft, working on it.
],
use_dora = args.use_dora,
use_rslora = args.use_rslora
)
# Move transformer, vae and text_encoder to device and cast to weight_dtype
transformer.to(accelerator.device)
def cast_training_params(model: Union[torch.nn.Module, List[torch.nn.Module]], dtype=torch.float32):
if not isinstance(model, list):
model = [model]
for m in model:
for param in m.parameters():
# only upcast trainable parameters into fp32
if param.requires_grad:
param.data = param.to(dtype)
transformer = get_peft_model(transformer, lora_config)
if args.mixed_precision == "fp16":
# only upcast trainable parameters (LoRA) into fp32
cast_training_params(transformer, dtype=torch.float32)
transformer.print_trainable_parameters()
# 10. Handle saving and loading of checkpoints
# `accelerate` 0.16.0 will have better support for customized saving
if version.parse(accelerate.__version__) >= version.parse("0.16.0"):
# create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format
def save_model_hook(models, weights, output_dir):
if accelerator.is_main_process:
transformer_ = accelerator.unwrap_model(transformer)
lora_state_dict = get_peft_model_state_dict(transformer_, adapter_name="default")
StableDiffusionPipeline.save_lora_weights(os.path.join(output_dir, "transformer_lora"), lora_state_dict)
# save weights in peft format to be able to load them back
transformer_.save_pretrained(output_dir)
for _, model in enumerate(models):
# make sure to pop weight so that corresponding model is not saved again
weights.pop()
def load_model_hook(models, input_dir):
# load the LoRA into the model
transformer_ = accelerator.unwrap_model(transformer)
transformer_.load_adapter(input_dir, "default", is_trainable=True)
for _ in range(len(models)):
# pop models so that they are not loaded again
models.pop()
accelerator.register_save_state_pre_hook(save_model_hook)
accelerator.register_load_state_pre_hook(load_model_hook)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
import xformers
xformers_version = version.parse(xformers.__version__)
if xformers_version == version.parse("0.0.16"):
logger.warn(
"xFormers 0.0.16 cannot be used for training in some GPUs. If you observe problems during training, please update xFormers to at least 0.0.17. See https://huggingface.co/docs/diffusers/main/en/optimization/xformers for more details."
)
transformer.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
lora_layers = filter(lambda p: p.requires_grad, transformer.parameters())
# Enable TF32 for faster training on Ampere GPUs,
# cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
if args.allow_tf32:
torch.backends.cuda.matmul.allow_tf32 = True
if args.gradient_checkpointing:
transformer.enable_gradient_checkpointing()
if args.scale_lr:
args.learning_rate = args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
# Initialize the optimizer
if args.use_8bit_adam:
try:
import bitsandbytes as bnb
except ImportError:
raise ImportError("Please install bitsandbytes to use 8-bit Adam. You can do so by running `pip install bitsandbytes`")
optimizer_cls = bnb.optim.AdamW8bit
else:
optimizer_cls = torch.optim.AdamW
optimizer = optimizer_cls(
lora_layers,
lr=args.learning_rate,
betas=(args.adam_beta1, args.adam_beta2),
weight_decay=args.adam_weight_decay,
eps=args.adam_epsilon,
)
# Get the datasets: you can either provide your own training and evaluation files (see below)
# or specify a Dataset from the hub (the dataset will be downloaded automatically from the datasets Hub).
# In distributed training, the load_dataset function guarantees that only one local process can concurrently
# download the dataset.
if args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
dataset = load_dataset(
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
data_dir=args.train_data_dir,
)
else:
data_files = {}
if args.train_data_dir is not None:
data_files["train"] = os.path.join(args.train_data_dir, "**")
dataset = load_dataset(
"imagefolder",
data_files=data_files,
cache_dir=args.cache_dir,
)
# See more about loading custom images at
# https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder
# Preprocessing the datasets.
# We need to tokenize inputs and targets.
column_names = dataset["train"].column_names
# 6. Get the column names for input/target.
dataset_columns = DATASET_NAME_MAPPING.get(args.dataset_name, None)
if args.image_column is None:
image_column = dataset_columns[0] if dataset_columns is not None else column_names[0]
else:
image_column = args.image_column
if image_column not in column_names:
raise ValueError(
f"--image_column' value '{args.image_column}' needs to be one of: {', '.join(column_names)}"
)
if args.caption_column is None:
caption_column = dataset_columns[1] if dataset_columns is not None else column_names[1]
else:
caption_column = args.caption_column
if caption_column not in column_names:
raise ValueError(
f"--caption_column' value '{args.caption_column}' needs to be one of: {', '.join(column_names)}"
)
# Preprocessing the datasets.
# We need to tokenize input captions and transform the images.
def tokenize_captions(examples, is_train=True, proportion_empty_prompts=0., max_length=120):
captions = []
for caption in examples[caption_column]:
if random.random() < proportion_empty_prompts:
captions.append("")
elif isinstance(caption, str):
captions.append(caption)
elif isinstance(caption, (list, np.ndarray)):
# take a random caption if there are multiple
captions.append(random.choice(caption) if is_train else caption[0])
else:
raise ValueError(
f"Caption column `{caption_column}` should contain either strings or lists of strings."
)
inputs = tokenizer(captions, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
return inputs.input_ids, inputs.attention_mask
# Preprocessing the datasets.
train_transforms = transforms.Compose(
[
transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(args.resolution) if args.center_crop else transforms.RandomCrop(args.resolution),
transforms.RandomHorizontalFlip() if args.random_flip else transforms.Lambda(lambda x: x),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
def preprocess_train(examples):
images = [image.convert("RGB") for image in examples[image_column]]
examples["pixel_values"] = [train_transforms(image) for image in images]
examples["input_ids"], examples['prompt_attention_mask'] = tokenize_captions(examples, proportion_empty_prompts=args.proportion_empty_prompts, max_length=max_length)
return examples
with accelerator.main_process_first():
if args.max_train_samples is not None:
dataset["train"] = dataset["train"].shuffle(seed=args.seed).select(range(args.max_train_samples))
# Set the training transforms
train_dataset = dataset["train"].with_transform(preprocess_train)
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()
input_ids = torch.stack([example["input_ids"] for example in examples])
prompt_attention_mask = torch.stack([example["prompt_attention_mask"] for example in examples])
return {"pixel_values": pixel_values, "input_ids": input_ids, 'prompt_attention_mask': prompt_attention_mask}
# DataLoaders creation:
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
args.lr_scheduler,
optimizer=optimizer,
num_warmup_steps=args.lr_warmup_steps * accelerator.num_processes,
num_training_steps=args.max_train_steps * accelerator.num_processes,
)
# Prepare everything with our `accelerator`.
transformer, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(transformer, optimizer, train_dataloader, lr_scheduler)
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if accelerator.is_main_process:
accelerator.init_trackers("text2image-fine-tune", config=vars(args))
# Train!
total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
global_step = 0
first_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint != "latest":
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = os.listdir(args.output_dir)
dirs = [d for d in dirs if d.startswith("checkpoint")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
path = dirs[-1] if len(dirs) > 0 else None
if path is None:
accelerator.print(
f"Checkpoint '{args.resume_from_checkpoint}' does not exist. Starting a new training run."
)
args.resume_from_checkpoint = None
initial_global_step = 0
else:
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(args.output_dir, path))
global_step = int(path.split("-")[1])
initial_global_step = global_step
first_epoch = global_step // num_update_steps_per_epoch
else:
initial_global_step = 0
progress_bar = tqdm(
range(0, args.max_train_steps),
initial=initial_global_step,
desc="Steps",
# Only show the progress bar once on each machine.
disable=not accelerator.is_local_main_process,
)
for epoch in range(first_epoch, args.num_train_epochs):
transformer.train()
train_loss = 0.0
for step, batch in enumerate(train_dataloader):
with accelerator.accumulate(transformer):
# Convert images to latent space
latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
if args.noise_offset:
# https://www.crosslabs.org//blog/diffusion-with-offset-noise
noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
# Get the text embedding for conditioning
prompt_embeds = text_encoder(batch["input_ids"], attention_mask=batch['prompt_attention_mask'])[0]
prompt_attention_mask = batch['prompt_attention_mask']
# Get the target for loss depending on the prediction type
if args.prediction_type is not None:
# set prediction_type of scheduler if defined
noise_scheduler.register_to_config(prediction_type=args.prediction_type)
if noise_scheduler.config.prediction_type == "epsilon":
target = noise
elif noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(latents, noise, timesteps)
else:
raise ValueError(f"Unknown prediction type {noise_scheduler.config.prediction_type}")
# Prepare micro-conditions.
added_cond_kwargs = {"resolution": None, "aspect_ratio": None}
if getattr(transformer, 'module', transformer).config.sample_size == 128:
resolution = torch.tensor([args.resolution, args.resolution]).repeat(bsz, 1)
aspect_ratio = torch.tensor([float(args.resolution / args.resolution)]).repeat(bsz, 1)
resolution = resolution.to(dtype=weight_dtype, device=latents.device)
aspect_ratio = aspect_ratio.to(dtype=weight_dtype, device=latents.device)
added_cond_kwargs = {"resolution": resolution, "aspect_ratio": aspect_ratio}
# Predict the noise residual and compute loss
model_pred = transformer(noisy_latents,
encoder_hidden_states=prompt_embeds,
encoder_attention_mask=prompt_attention_mask,
timestep=timesteps,
added_cond_kwargs=added_cond_kwargs).sample.chunk(2, 1)[0]
if args.snr_gamma is None:
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
else:
# Compute loss-weights as per Section 3.4 of https://arxiv.org/abs/2303.09556.
# Since we predict the noise instead of x_0, the original formulation is slightly changed.
# This is discussed in Section 4.2 of the same paper.
snr = compute_snr(noise_scheduler, timesteps)
if noise_scheduler.config.prediction_type == "v_prediction":
# Velocity objective requires that we add one to SNR values before we divide by them.
snr = snr + 1
mse_loss_weights = (torch.stack([snr, args.snr_gamma * torch.ones_like(timesteps)], dim=1).min(dim=1)[0] / snr)
loss = F.mse_loss(model_pred.float(), target.float(), reduction="none")
loss = loss.mean(dim=list(range(1, len(loss.shape)))) * mse_loss_weights
loss = loss.mean()
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean()
train_loss += avg_loss.item() / args.gradient_accumulation_steps
# Backpropagate
accelerator.backward(loss)
if accelerator.sync_gradients:
params_to_clip = lora_layers
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
global_step += 1
accelerator.log({"train_loss": train_loss}, step=global_step)
train_loss = 0.0
if global_step % args.checkpointing_steps == 0:
if accelerator.is_main_process:
# _before_ saving state, check if this save would set us over the `checkpoints_total_limit`
if args.checkpoints_total_limit is not None:
checkpoints = os.listdir(args.output_dir)
checkpoints = [d for d in checkpoints if d.startswith("checkpoint")]
checkpoints = sorted(checkpoints, key=lambda x: int(x.split("-")[1]))
# before we save the new checkpoint, we need to have at _most_ `checkpoints_total_limit - 1` checkpoints
if len(checkpoints) >= args.checkpoints_total_limit:
num_to_remove = len(checkpoints) - args.checkpoints_total_limit + 1
removing_checkpoints = checkpoints[0:num_to_remove]
logger.info(f"{len(checkpoints)} checkpoints already exist, removing {len(removing_checkpoints)} checkpoints")
logger.info(f"removing checkpoints: {', '.join(removing_checkpoints)}")
for removing_checkpoint in removing_checkpoints:
removing_checkpoint = os.path.join(args.output_dir, removing_checkpoint)
shutil.rmtree(removing_checkpoint)
save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
accelerator.save_state(save_path)
unwrapped_transformer = accelerator.unwrap_model(transformer, keep_fp32_wrapper=False)
transformer_lora_state_dict = get_peft_model_state_dict(unwrapped_transformer)
StableDiffusionPipeline.save_lora_weights(
save_directory=save_path,
unet_lora_layers=transformer_lora_state_dict,
safe_serialization=True,
)
logger.info(f"Saved state to {save_path}")
logs = {"step_loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
progress_bar.set_postfix(**logs)
if global_step >= args.max_train_steps:
break
if accelerator.is_main_process:
if args.validation_prompt is not None and epoch % args.validation_epochs == 0:
logger.info(
f"Running validation... \n Generating {args.num_validation_images} images with prompt:"
f" {args.validation_prompt}."
)
# create pipeline
pipeline = DiffusionPipeline.from_pretrained(
args.pretrained_model_name_or_path,
transformer=accelerator.unwrap_model(transformer, keep_fp32_wrapper=False),
text_encoder=text_encoder, vae=vae,
torch_dtype=weight_dtype,
)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
# run inference
generator = torch.Generator(device=accelerator.device)
if args.seed is not None:
generator = generator.manual_seed(args.seed)
images = []
for _ in range(args.num_validation_images):
images.append(pipeline(args.validation_prompt, num_inference_steps=20, generator=generator).images[0])
for tracker in accelerator.trackers:
if tracker.name == "tensorboard":
np_images = np.stack([np.asarray(img) for img in images])
tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
if tracker.name == "wandb":
tracker.log(
{
"validation": [wandb.Image(image, caption=f"{i}: {args.validation_prompt}") for i, image in enumerate(images)]
}
)
del pipeline
torch.cuda.empty_cache()
# Save the lora layers
accelerator.wait_for_everyone()
if accelerator.is_main_process:
transformer = accelerator.unwrap_model(transformer, keep_fp32_wrapper=False)
transformer.save_pretrained(args.output_dir)
lora_state_dict = get_peft_model_state_dict(transformer)
StableDiffusionPipeline.save_lora_weights(os.path.join(args.output_dir, "transformer_lora"), lora_state_dict)
if args.push_to_hub:
save_model_card(
repo_id,
images=images,
base_model=args.pretrained_model_name_or_path,
dataset_name=args.dataset_name,
repo_folder=args.output_dir,
)
upload_folder(
repo_id=repo_id,
folder_path=args.output_dir,
commit_message="End of training",
ignore_patterns=["step_*", "epoch_*"],
)
# Final inference
# Load previous transformer
transformer = Transformer2DModel.from_pretrained(args.pretrained_model_name_or_path, subfolder='transformer', torch_dtype=weight_dtype)
# load lora weight
transformer = PeftModel.from_pretrained(transformer, args.output_dir)
# Load previous pipeline
pipeline = DiffusionPipeline.from_pretrained(args.pretrained_model_name_or_path, transformer=transformer, text_encoder=text_encoder, vae=vae, torch_dtype=weight_dtype,)
pipeline = pipeline.to(accelerator.device)
del transformer
torch.cuda.empty_cache()
# run inference
generator = torch.Generator(device=accelerator.device)
if args.seed is not None:
generator = generator.manual_seed(args.seed)
images = []
for _ in range(args.num_validation_images):
images.append(pipeline(args.validation_prompt, num_inference_steps=20, generator=generator).images[0])
if accelerator.is_main_process:
for tracker in accelerator.trackers:
if len(images) != 0:
if tracker.name == "tensorboard":
np_images = np.stack([np.asarray(img) for img in images])
tracker.writer.add_images("test", np_images, epoch, dataformats="NHWC")
if tracker.name == "wandb":
tracker.log(
{
"test": [
wandb.Image(image, caption=f"{i}: {args.validation_prompt}")
for i, image in enumerate(images)
]
}
)
accelerator.end_training()
if __name__ == "__main__":
main()
================================================
FILE: PixArt-alpha-ToCa-tools/clip_score.py
================================================
import os
import torch
from PIL import Image
from torchvision.transforms import ToTensor
from torchmetrics.multimodal.clip_score import CLIPScore
from tqdm import tqdm
import torch.multiprocessing as mp
# Load prompts file
def load_prompts(txt_file):
with open(txt_file, "r") as f:
prompts = f.read().splitlines()
return prompts
# Find matching image file: first, directly use the prompt as the filename,
# and if not found, match using a prefix
def find_image_file(image_folder, prompt):
img_filename = prompt + ".jpg" # Assume filename is {prompt}.jpg
img_path = os.path.join(image_folder, img_filename)
if os.path.exists(img_path):
return img_path
# If direct match fails, use prefix matching
for file in os.listdir(image_folder):
if file.startswith(prompt[:20]): # Use the first 20 characters as a prefix for matching
return os.path.join(image_folder, file)
return None
# Load a batch of images and convert them to Tensors
def load_images(image_folder, prompts_batch):
images = []
valid_prompts = []
for prompt in prompts_batch:
img_path = find_image_file(image_folder, prompt)
if img_path:
try:
image = Image.open(img_path).convert("RGB")
image_tensor = ToTensor()(image).unsqueeze(0) # Shape (1, C, H, W)
images.append(image_tensor)
valid_prompts.append(prompt)
except Exception as e:
print(f"Error loading image {img_path}: {e}")
else:
print(f"No image found for prompt: {prompt}")
if len(images) > 0:
images_tensor = torch.cat(images, dim=0) # Combine into a single batch (N, C, H, W)
return images_tensor, valid_prompts
else:
return None, None
# Single task: process a batch of prompts and corresponding images, and calculate CLIP Score
def process_batch(prompts_batch, image_folder, model_path, device):
clip_score_metric = CLIPScore(model_name_or_path=model_path).to(device)
# Load image batch
images_tensor, valid_prompts = load_images(image_folder, prompts_batch)
if images_tensor is not None:
images_tensor = images_tensor.to(device)
with torch.no_grad(): # Avoid building computation graph, reducing memory consumption
# Calculate CLIP Score for each image and prompt
for i, prompt in enumerate(valid_prompts):
clip_score_metric.update(images_tensor[i].unsqueeze(0).float(), prompt)
# Release memory
del images_tensor
torch.cuda.empty_cache()
return clip_score_metric.compute().item()
else:
return None
# Split data into batches
def chunked(iterable, batch_size):
"""Yield successive n-sized chunks from iterable."""
for i in range(0, len(iterable), batch_size):
yield iterable[i:i + batch_size]
# Main processing function
def main_worker(rank, prompts, image_folder, model_path, device, batch_size, queue):
# Split into batches
prompts_batches = list(chunked(prompts, batch_size))
clip_scores = []
for batch in prompts_batches:
score = process_batch(batch, image_folder, model_path, device)
if score is not None:
clip_scores.append(score)
# After processing each batch, send information to the main process
queue.put(1) # Send signal indicating one batch is processed
queue.put(clip_scores) # Put final result into the queue for the main process
def main(prompt_file="prompts.txt", image_folder="images", batch_size=64, num_workers=4):
# Load prompts
prompts = load_prompts(prompt_file)
model_path = "/root/autodl-tmp/pretrained_models/clip-vit-large-patch14"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create multiprocessing queue
queue = mp.Queue()
# Start multiple processes
processes = []
chunk_size = len(prompts) // num_workers
total_batches = (len(prompts) + batch_size - 1) // batch_size # Calculate total batch count
for rank in range(num_workers):
worker_prompts = prompts[rank * chunk_size: (rank + 1) * chunk_size]
p = mp.Process(target=main_worker, args=(rank, worker_prompts, image_folder, model_path, device, batch_size, queue))
p.start()
processes.append(p)
# Use tqdm to create a progress bar
with tqdm(total=total_batches, desc="Processing batches") as pbar:
all_scores = []
finished_batches = 0
# Get results or progress from the queue
while finished_batches < total_batches:
result = queue.get()
if isinstance(result, list): # If it's a list, it means final scores
all_scores.extend(result)
else:
pbar.update(1) # Update progress bar
finished_batches += 1
# Wait for subprocesses to end
for p in processes:
p.join()
# Calculate final result
if all_scores:
final_clip_score = sum(all_scores) / len(all_scores)
print(f"Final averaged CLIP Score for folder '{image_folder}': {final_clip_score}")
else:
print(f"No valid images found in folder '{image_folder}'.")
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Calculate CLIP Score for images and prompts with batch parallel processing.")
parser.add_argument("--prompt_file", type=str, default="/root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt", help="Path to the prompts text file.")
parser.add_argument("--image_folder", type=str, default="/root/autodl-tmp/vis/2024-09-04_custom_epochunknown_stepunknown_scale4.5_step20_size256_bs100_sampdpm-solver_seed0", help="Path to the folder containing images.")
parser.add_argument("--batch_size", type=int, default=64, help="Number of images to process in each batch.")
parser.add_argument("--num_workers", type=int, default=4, help="Number of parallel workers.")
args = parser.parse_args()
# Set multiprocessing start method to 'spawn', suitable for CUDA
mp.set_start_method('spawn', force=True)
main(prompt_file=args.prompt_file, image_folder=args.image_folder, batch_size=args.batch_size, num_workers=args.num_workers)
================================================
FILE: README.md
================================================
# **[ICLR 2025]** *ToCa*: Accelerating Diffusion Transformers with *To*ken-wise Feature *Ca*ching
## 🔥 News
* `2025/03/10` 🚀🚀 Our latest work "From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers" is released! Codes are available at [TaylorSeer](https://github.com/Shenyi-Z/TaylorSeer)! TaylorSeer supports lossless compression at a rate of 4.99x on FLUX.1-dev (with a latency speedup of 3.53x) and high-quality acceleration at a compression rate of 5.00x on HunyuanVideo (with a latency speedup of 4.65x)! We hope *TaylorSeer* can move the paradigm of feature caching methods from reusing to forecasting.For more details, please refer to our latest research [paper](https://arxiv.org/abs/2503.06923).
* `2025/02/19` 🚀🚀 ToCa solution for **FLUX** has been officially released after adjustments, now achieving up to **3.14× lossless acceleration**!
* `2025/01/22` 💥💥 ToCa is honored to be accepted by ICLR 2025!
* `2024/12/29` 🚀🚀 We release our work [DuCa](https://arxiv.org/abs/2412.18911) about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of **2.50×** on [OpenSora](https://github.com/hpcaitech/Open-Sora)! 🎉 **DuCa also overcomes the limitation of ToCa by fully supporting FlashAttention, enabling broader compatibility and efficiency improvements.**
* `2024/12/24` 🤗🤗 We release an open-sourse repo "[Awesome-Token-Reduction-for-Model-Compression](https://github.com/xuyang-liu16/Awesome-Token-Reduction-for-Model-Compression)", which collects recent awesome token reduction papers! Feel free to contribute your suggestions!
* `2024/12/20` 💥💥 Our ToCa has achieved nearly lossless acceleration of **1.51×** on [FLUX](https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell), feel free to check the latest version of our [paper](https://arxiv.org/pdf/2410.05317#page=19)!
* `2024/12/10` 💥💥 Our team's recent work, **SiTo** (https://github.com/EvelynZhang-epiclab/SiTo), has been accepted to **AAAI 2025**. It accelerates diffusion models through adaptive **Token Pruning**.
* `2024/10/16` 🤗🤗 Users with autodl accounts can now quickly experience [OpenSora-ToCa](https://www.codewithgpu.com/i/Shenyi-Z/ToCa/OpenSora-ToCa) by directly using our publicly available image!
* `2024/10/12` 🚀🚀 We release our work [ToCa](https://arxiv.org/abs/2410.05317) about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of **2.36×** on [OpenSora](https://github.com/hpcaitech/Open-Sora)!
* `2024/07/15` 🤗🤗 We release an open-sourse repo "[Awesome-Generation-Acceleration](https://github.com/xuyang-liu16/Awesome-Generation-Acceleration)", which collects recent awesome generation accleration papers! Feel free to contribute your suggestions!
## TODO:
- [x] Support for FLOPs calculation
- [x] Add the FLUX version of ToCa
- [ ] Further optimize the code logic to reduce the time consumption of tensor operations
## Dependencies
``` cmd
Python>=3.9
CUDA>=11.8
```
## 🛠 Installation
``` cmd
git clone https://github.com/Shenyi-Z/ToCa.git
```
### Environment Settings
#### Original Models (recommended)
We evaluated our model under the same environments as the original models.
So you may set the environments through following the requirements of the mentioned original models.
Links:
| Original Models | urls |
| :--------------: | :------------------------------------------: |
| DiT | https://github.com/facebookresearch/DiT |
| PixArt-α | https://github.com/PixArt-alpha/PixArt-alpha |
| OpenSora | https://github.com/hpcaitech/Open-Sora |
| FLUX | https://github.com/black-forest-labs/flux |
Besides, we provide a replica for our environment here:
From our environment.yaml
##### DiT
```bash
cd DiT-ToCa
conda env create -f environment-dit.yml
```
##### PixArt-α
```bash
cd PixArt-alpha-ToCa
conda env create -f environment-pixart.yml
```
##### OpenSora
```bash
cd Open-Sora
conda env create -f environment-opensora.yml
pip install -v . # for development mode, `pip install -v -e .`
```
## 🚀 Run and evaluation
### Run DiT-ToCa
#### DDPM-250 Steps
sample images for **visualization**
```bash
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 250 --cache-type attention --fresh-threshold 4 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --soft-fresh-weight 0.25
```
sample images for **evaluation** (e.g 50k)
```bash
cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --num-fid-samples 50000
```
#### DDIM-50 Steps
sample images for **visualization**
```bash
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --soft-fresh-weight 0.25 --ddim-sample
```
sample images for **evaluation** (e.g 50k)
```bash
cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 50 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --fresh-threshold 3 --soft-fresh-weight 0.25 --num-fid-samples 50000 --ddim-sample
```
#### test FLOPs
Just add --test-FLOPs, here an example:
```bash
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --soft-fresh-weight 0.25 --ddim-sample --test-FLOPs
```
### Run PixArt-α-ToCa
sample images for **visualization**
```bash
cd PixArt-alpha-ToCa
python scripts/inference.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/test.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
```
sample images for **evaluation** (e.g 30k for COCO, 1.6k for PartiPrompts)
```bash
cd PixArt-alpha-ToCa
torchrun --nproc_per_node=6 scripts/inference_ddp.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
```
(Besides, if you need our npz file: https://drive.google.com/file/d/1vUdoSgdIvtXo1cAS_aOFCJ1-XC_i1KEQ/view?usp=sharing)
### Run OpenSora-ToCa
sample video for **visualization**
```bash
cd Open-Sora
python scripts/inference.py configs/opensora-v1-2/inference/sample.py --num-frames 2s --resolution 480p --aspect-ratio 9:16 --prompt "a beautiful waterfall"
```
sample video for **VBench evaluation**
```bash
cd Open-Sora
bash eval/vbench/launch.sh /root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors 51 opensora-ToCa 480p 9:16
```
(remember replacing "/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors" with your own path!)
### Run FLUX-ToCa
First, you need to enter the environment adapted for FLUX. While the official documentation uses `venv` to build the environment, you can also set it up using `conda`, which you might be more familiar with.
How to build a conda environment for FLUX?
```bash
cd flux-ToCa
conda create -n flux python=3.10
pip install -e ".[all]"
```
For interactive sampling run
```bash
python -m flux --name --loop
```
Or to generate a single sample run
```bash
python -m flux --name \
--height --width \
--prompt ""
```
Typically, `` should be set to `flux-dev`.
Generate image samples with a txt file
```bash
python src/sample.py --prompt_file --width 1024 --height 1024 --model_name flux-dev --add_sampling_metadata --output_dir --num_steps 50
```
The `--add_sampling_metadata` parameter is used to control whether the prompt is added to the image's EXIF metadata.
We also provide function for FLOPs testing, but **in this mode, no generated samples are given**.
```bash
python src/sample.py --prompt_file --width 1024 --height 1024 --model_name flux-dev --add_sampling_metadata --output_dir --num_steps 50 --test_FLOPs
```
Use the framework of Geneval for evaluation
```bash
python src/geneval_flux.py /root/geneval/prompts/evaluation_metadata.jsonl --model_name flux-dev --n_samples 4 --steps 50 --width 1024 --height 1024 --seed 42 --output_dir /root/autodl-tmp/samples/flux-ToCa
```
How to prepare environment for geneval?
The environment required for Geneval's metric computation is somewhat specific. As of February 2025, it is not yet possible to set up the environment directly using the default method provided in the project. However, we can follow the guidance in this Geneval issue [https://github.com/djghosh13/geneval/issues/12](https://github.com/djghosh13/geneval/issues/12) to set up the environment. The instructions are very detailed.
#### Awesome acceleration results for the Latest Version of ToCa on FLUX
| Method | Geneval $\uparrow$
overall score | ImageRewrd $\uparrow$
DrawBench200 | FLOPs $\downarrow$ | Latency $\downarrow$ | Compress Ratio $\uparrow$ | Speed Up $\uparrow$ |
| ------------ | :-----------------------------------: | :-------------------------------------: | :----------------: | :------------------: | :-----------------------: | :-----------------: |
| **original** | 0.6752 | 0.9898 | 3719.50 | 33.87s | 1.00 | 1.00 |
| 60% steps | 0.6700 | 0.9739 | 2231.70 | 20.49s | 1.67 | 1.65 |
| 50% steps | 0.6656 | 0.9429 | 1859.75 | 17.12s | 2.00 | 1.98 |
| 40% steps | 0.6606 | 0.9317 | 1487.80 | 13.77s | 2.62 | 2.45 |
| **FORA3** | 0.6594 | 0.9227 | 1320.07 | 12.98s | 2.82 | 2.61 |
| **ToCa4-01** | 0.6748 | **0.9798** | 1263.22 | 11.91s | 2.94 | 2.84 |
| **ToCa5-01** | **0.6750** | 0.9731 | 1126.76 | 10.80s | 3.30 | 3.14 |
| **ToCa6-01** | 0.6653 | 0.9493 | 990.30 | 9.48s | 3.76 | 3.57 |
Explanation of the Improved ToCa
The **acceleration effect has significantly improved while maintaining generation quality** compared with the previous version. This is because, in the current version of the code, we have further optimized ToCa and adopted more reliable metrics (Image Reward on DrawBench200, Geneval).
## 👍 Acknowledgements
- Thanks to [DiT](https://github.com/facebookresearch/DiT) for their great work and codebase upon which we build DiT-ToCa.
- Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) for their great work and codebase upon which we build PixArt-α-ToCa.
- Thanks to [OpenSora](https://github.com/hpcaitech/Open-Sora) for their great work and codebase upon which we build OpenSora-ToCa.
- Thanks to [FLUX](https://github.com/black-forest-labs/flux) for their great work and codebase upon which we build FLUX-ToCa.
## 📌 Citation
```bibtex
@article{zou2024accelerating,
title={Accelerating Diffusion Transformers with Token-wise Feature Caching},
author={Zou, Chang and Liu, Xuyang and Liu, Ting and Huang, Siteng and Zhang, Linfeng},
journal={arXiv preprint arXiv:2410.05317},
year={2024}
}
```
## :e-mail: Contact
If you have any questions, please email [`shenyizou@outlook.com`](mailto:shenyizou@outlook.com).
================================================
FILE: flux-ToCa/.gitignore
================================================
# Created by https://www.toptal.com/developers/gitignore/api/linux,windows,macos,visualstudiocode,python
# Edit at https://www.toptal.com/developers/gitignore?templates=linux,windows,macos,visualstudiocode,python
### Linux ###
*~
# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*
# KDE directory preferences
.directory
# Linux trash folder which might appear on any partition or disk
.Trash-*
# .nfs files are created when an open file is removed but is still being accessed
.nfs*
### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
*.code-workspace
# Local History for Visual Studio Code
.history/
### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide
### Windows ###
# Windows thumbnail cache files
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db
# Dump file
*.stackdump
# Folder config file
[Dd]esktop.ini
# Recycle Bin used on file shares
$RECYCLE.BIN/
# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp
# Windows shortcuts
*.lnk
# End of https://www.toptal.com/developers/gitignore/api/linux,windows,macos,visualstudiocode,python
================================================
FILE: flux-ToCa/LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: flux-ToCa/README.md
================================================
# FLUX
by Black Forest Labs: https://blackforestlabs.ai. Documentation for our API can be found here: [docs.bfl.ml](https://docs.bfl.ml/).

This repo contains minimal inference code to run image generation & editing with our Flux models.
## Local installation
```bash
cd $HOME && git clone https://github.com/black-forest-labs/flux
cd $HOME/flux
# Using pyvenv
python3.10 -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
```
### Models
We are offering an extensive suite of models. For more information about the individual models, please refer to the link under **Usage**.
| Name | Usage | HuggingFace repo | License |
| --------------------------- | ---------------------------------------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------- |
| `FLUX.1 [schnell]` | [Text to Image](docs/text-to-image.md) | https://huggingface.co/black-forest-labs/FLUX.1-schnell | [apache-2.0](model_licenses/LICENSE-FLUX1-schnell) |
| `FLUX.1 [dev]` | [Text to Image](docs/text-to-image.md) | https://huggingface.co/black-forest-labs/FLUX.1-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Fill [dev]` | [In/Out-painting](docs/fill.md) | https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Canny [dev]` | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Depth [dev]` | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Canny [dev] LoRA` | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Depth [dev] LoRA` | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 Redux [dev]` | [Image variation](docs/image-variation.md) | https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |
| `FLUX.1 [pro]` | [Text to Image](docs/text-to-image.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX1.1 [pro]` | [Text to Image](docs/text-to-image.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX1.1 [pro] Ultra/raw` | [Text to Image](docs/text-to-image.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX.1 Fill [pro]` | [In/Out-painting](docs/fill.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX.1 Canny [pro]` | [Structural Conditioning](docs/structural-conditioning.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX.1 Depth [pro]` | [Structural Conditioning](docs/structural-conditioning.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX1.1 Redux [pro]` | [Image variation](docs/image-variation.md) | [Available in our API.](https://docs.bfl.ml/) | |
| `FLUX1.1 Redux [pro] Ultra` | [Image variation](docs/image-variation.md) | [Available in our API.](https://docs.bfl.ml/) | |
The weights of the autoencoder are also released under [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) and can be found in the HuggingFace repos above.
## API usage
Our API offers access to our models. It is documented here:
[docs.bfl.ml](https://docs.bfl.ml/).
In this repository we also offer an easy python interface. To use this, you
first need to register with the API on [api.bfl.ml](https://api.bfl.ml/), and
create a new API key.
To use the API key either run `export BFL_API_KEY=` or provide
it via the `api_key=` parameter. It is also expected that you
have installed the package as above.
Usage from python:
```python
from flux.api import ImageRequest
# this will create an api request directly but not block until the generation is finished
request = ImageRequest("A beautiful beach", name="flux.1.1-pro")
# or: request = ImageRequest("A beautiful beach", name="flux.1.1-pro", api_key="your_key_here")
# any of the following will block until the generation is finished
request.url
# -> https:<...>/sample.jpg
request.bytes
# -> b"..." bytes for the generated image
request.save("outputs/api.jpg")
# saves the sample to local storage
request.image
# -> a PIL image
```
Usage from the command line:
```bash
$ python -m flux.api --prompt="A beautiful beach" url
https:<...>/sample.jpg
# generate and save the result
$ python -m flux.api --prompt="A beautiful beach" save outputs/api
# open the image directly
$ python -m flux.api --prompt="A beautiful beach" image show
```
## Citation
If you find the provided code or models useful for your research, consider citing them as:
```bib
@misc{flux2023,
author={Black Forest Labs},
title={FLUX},
year={2023},
howpublished={\url{https://github.com/black-forest-labs/flux}},
}
```
================================================
FILE: flux-ToCa/demo_gr.py
================================================
import os
import time
import uuid
import gradio as gr
import numpy as np
import torch
from einops import rearrange
from PIL import ExifTags, Image
from transformers import pipeline
from flux.cli import SamplingOptions
from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack
from flux.ideas import denoise_cache
from flux.util import configs, embed_watermark, load_ae, load_clip, load_flow_model, load_t5
NSFW_THRESHOLD = 0.85
def get_models(name: str, device: torch.device, offload: bool, is_schnell: bool):
t5 = load_t5(device, max_length=256 if is_schnell else 512)
clip = load_clip(device)
model = load_flow_model(name, device="cpu" if offload else device)
ae = load_ae(name, device="cpu" if offload else device)
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
return model, ae, t5, clip, nsfw_classifier
class FluxGenerator:
def __init__(self, model_name: str, device: str, offload: bool):
self.device = torch.device(device)
self.offload = offload
self.model_name = model_name
self.is_schnell = model_name == "flux-schnell"
self.model, self.ae, self.t5, self.clip, self.nsfw_classifier = get_models(
model_name,
device=self.device,
offload=self.offload,
is_schnell=self.is_schnell,
)
@torch.inference_mode()
def generate_image(
self,
width,
height,
num_steps,
guidance,
seed,
prompt,
init_image=None,
image2image_strength=0.0,
add_sampling_metadata=True,
):
seed = int(seed)
if seed == -1:
seed = None
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
)
if opts.seed is None:
opts.seed = torch.Generator(device="cpu").seed()
print(f"Generating '{opts.prompt}' with seed {opts.seed}")
t0 = time.perf_counter()
if init_image is not None:
if isinstance(init_image, np.ndarray):
init_image = torch.from_numpy(init_image).permute(2, 0, 1).float() / 255.0
init_image = init_image.unsqueeze(0)
init_image = init_image.to(self.device)
init_image = torch.nn.functional.interpolate(init_image, (opts.height, opts.width))
if self.offload:
self.ae.encoder.to(self.device)
init_image = self.ae.encode(init_image.to())
if self.offload:
self.ae = self.ae.cpu()
torch.cuda.empty_cache()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=self.device,
dtype=torch.bfloat16,
seed=opts.seed,
)
timesteps = get_schedule(
opts.num_steps,
x.shape[-1] * x.shape[-2] // 4,
shift=(not self.is_schnell),
)
if init_image is not None:
t_idx = int((1 - image2image_strength) * num_steps)
t = timesteps[t_idx]
timesteps = timesteps[t_idx:]
x = t * x + (1.0 - t) * init_image.to(x.dtype)
if self.offload:
self.t5, self.clip = self.t5.to(self.device), self.clip.to(self.device)
inp = prepare(t5=self.t5, clip=self.clip, img=x, prompt=opts.prompt)
# offload TEs to CPU, load model to gpu
if self.offload:
self.t5, self.clip = self.t5.cpu(), self.clip.cpu()
torch.cuda.empty_cache()
self.model = self.model.to(self.device)
# denoise initial noise
x = denoise_cache(self.model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if self.offload:
self.model.cpu()
torch.cuda.empty_cache()
self.ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=self.device.type, dtype=torch.bfloat16):
x = self.ae.decode(x)
if self.offload:
self.ae.decoder.cpu()
torch.cuda.empty_cache()
t1 = time.perf_counter()
print(f"Done in {t1 - t0:.1f}s.")
# bring into PIL format
x = x.clamp(-1, 1)
x = embed_watermark(x.float())
x = rearrange(x[0], "c h w -> h w c")
img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())
nsfw_score = [x["score"] for x in self.nsfw_classifier(img) if x["label"] == "nsfw"][0]
if nsfw_score < NSFW_THRESHOLD:
filename = f"output/gradio/{uuid.uuid4()}.jpg"
os.makedirs(os.path.dirname(filename), exist_ok=True)
exif_data = Image.Exif()
if init_image is None:
exif_data[ExifTags.Base.Software] = "AI generated;txt2img;flux"
else:
exif_data[ExifTags.Base.Software] = "AI generated;img2img;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = self.model_name
if add_sampling_metadata:
exif_data[ExifTags.Base.ImageDescription] = prompt
img.save(filename, format="jpeg", exif=exif_data, quality=95, subsampling=0)
return img, str(opts.seed), filename, None
else:
return None, str(opts.seed), None, "Your generated image may contain NSFW content."
def create_demo(
model_name: str, device: str = "cuda" if torch.cuda.is_available() else "cpu", offload: bool = False
):
generator = FluxGenerator(model_name, device, offload)
is_schnell = model_name == "flux-schnell"
with gr.Blocks() as demo:
gr.Markdown(f"# Flux Image Generation Demo - Model: {model_name}")
with gr.Row():
with gr.Column():
prompt = gr.Textbox(
label="Prompt",
value='a photo of a forest with mist swirling around the tree trunks. The word "FLUX" is painted over it in big, red brush strokes with visible texture',
)
do_img2img = gr.Checkbox(label="Image to Image", value=False, interactive=not is_schnell)
init_image = gr.Image(label="Input Image", visible=False)
image2image_strength = gr.Slider(
0.0, 1.0, 0.8, step=0.1, label="Noising strength", visible=False
)
with gr.Accordion("Advanced Options", open=False):
width = gr.Slider(128, 8192, 1360, step=16, label="Width")
height = gr.Slider(128, 8192, 768, step=16, label="Height")
num_steps = gr.Slider(1, 50, 4 if is_schnell else 50, step=1, label="Number of steps")
guidance = gr.Slider(
1.0, 10.0, 3.5, step=0.1, label="Guidance", interactive=not is_schnell
)
seed = gr.Textbox(-1, label="Seed (-1 for random)")
add_sampling_metadata = gr.Checkbox(
label="Add sampling parameters to metadata?", value=True
)
generate_btn = gr.Button("Generate")
with gr.Column():
output_image = gr.Image(label="Generated Image")
seed_output = gr.Number(label="Used Seed")
warning_text = gr.Textbox(label="Warning", visible=False)
download_btn = gr.File(label="Download full-resolution")
def update_img2img(do_img2img):
return {
init_image: gr.update(visible=do_img2img),
image2image_strength: gr.update(visible=do_img2img),
}
do_img2img.change(update_img2img, do_img2img, [init_image, image2image_strength])
generate_btn.click(
fn=generator.generate_image,
inputs=[
width,
height,
num_steps,
guidance,
seed,
prompt,
init_image,
image2image_strength,
add_sampling_metadata,
],
outputs=[output_image, seed_output, download_btn, warning_text],
)
return demo
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Flux")
parser.add_argument(
"--name", type=str, default="flux-schnell", choices=list(configs.keys()), help="Model name"
)
parser.add_argument(
"--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu", help="Device to use"
)
parser.add_argument("--offload", action="store_true", help="Offload model to CPU when not in use")
parser.add_argument("--share", action="store_true", help="Create a public link to your demo")
args = parser.parse_args()
demo = create_demo(args.name, args.device, args.offload)
demo.launch(share=args.share)
================================================
FILE: flux-ToCa/demo_st.py
================================================
import os
import re
import time
from glob import iglob
from io import BytesIO
import streamlit as st
import torch
from einops import rearrange
from fire import Fire
from PIL import ExifTags, Image
from st_keyup import st_keyup
from torchvision import transforms
from transformers import pipeline
from flux.cli import SamplingOptions
from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack
from flux.ideas import denoise_cache
from flux.util import (
configs,
embed_watermark,
load_ae,
load_clip,
load_flow_model,
load_t5,
)
NSFW_THRESHOLD = 0.85
@st.cache_resource()
def get_models(name: str, device: torch.device, offload: bool, is_schnell: bool):
t5 = load_t5(device, max_length=256 if is_schnell else 512)
clip = load_clip(device)
model = load_flow_model(name, device="cpu" if offload else device)
ae = load_ae(name, device="cpu" if offload else device)
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
return model, ae, t5, clip, nsfw_classifier
def get_image() -> torch.Tensor | None:
image = st.file_uploader("Input", type=["jpg", "JPEG", "png"])
if image is None:
return None
image = Image.open(image).convert("RGB")
transform = transforms.Compose(
[
transforms.ToTensor(),
transforms.Lambda(lambda x: 2.0 * x - 1.0),
]
)
img: torch.Tensor = transform(image)
return img[None, ...]
@torch.inference_mode()
def main(
device: str = "cuda" if torch.cuda.is_available() else "cpu",
offload: bool = False,
output_dir: str = "output",
):
torch_device = torch.device(device)
names = list(configs.keys())
name = st.selectbox("Which model to load?", names)
if name is None or not st.checkbox("Load model", False):
return
is_schnell = name == "flux-schnell"
model, ae, t5, clip, nsfw_classifier = get_models(
name,
device=torch_device,
offload=offload,
is_schnell=is_schnell,
)
do_img2img = (
st.checkbox(
"Image to Image",
False,
disabled=is_schnell,
help="Partially noise an image and denoise again to get variations.\n\nOnly works for flux-dev",
)
and not is_schnell
)
if do_img2img:
init_image = get_image()
if init_image is None:
st.warning("Please add an image to do image to image")
image2image_strength = st.number_input("Noising strength", min_value=0.0, max_value=1.0, value=0.8)
if init_image is not None:
h, w = init_image.shape[-2:]
st.write(f"Got image of size {w}x{h} ({h*w/1e6:.2f}MP)")
resize_img = st.checkbox("Resize image", False) or init_image is None
else:
init_image = None
resize_img = True
image2image_strength = 0.0
# allow for packing and conversion to latent space
width = int(
16 * (st.number_input("Width", min_value=128, value=1360, step=16, disabled=not resize_img) // 16)
)
height = int(
16 * (st.number_input("Height", min_value=128, value=768, step=16, disabled=not resize_img) // 16)
)
num_steps = int(st.number_input("Number of steps", min_value=1, value=(4 if is_schnell else 50)))
guidance = float(st.number_input("Guidance", min_value=1.0, value=3.5, disabled=is_schnell))
seed_str = st.text_input("Seed", disabled=is_schnell)
if seed_str.isdecimal():
seed = int(seed_str)
else:
st.info("No seed set, set to positive integer to enable")
seed = None
save_samples = st.checkbox("Save samples?", not is_schnell)
add_sampling_metadata = st.checkbox("Add sampling parameters to metadata?", True)
default_prompt = (
"a photo of a forest with mist swirling around the tree trunks. The word "
'"FLUX" is painted over it in big, red brush strokes with visible texture'
)
prompt = st_keyup("Enter a prompt", value=default_prompt, debounce=300, key="interactive_text")
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
if len(fns) > 0:
idx = max(int(fn.split("_")[-1].split(".")[0]) for fn in fns) + 1
else:
idx = 0
rng = torch.Generator(device="cpu")
if "seed" not in st.session_state:
st.session_state.seed = rng.seed()
def increment_counter():
st.session_state.seed += 1
def decrement_counter():
if st.session_state.seed > 0:
st.session_state.seed -= 1
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
)
if name == "flux-schnell":
cols = st.columns([5, 1, 1, 5])
with cols[1]:
st.button("↩", on_click=increment_counter)
with cols[2]:
st.button("↪", on_click=decrement_counter)
if is_schnell or st.button("Sample"):
if is_schnell:
opts.seed = st.session_state.seed
elif opts.seed is None:
opts.seed = rng.seed()
print(f"Generating '{opts.prompt}' with seed {opts.seed}")
t0 = time.perf_counter()
if init_image is not None:
if resize_img:
init_image = torch.nn.functional.interpolate(init_image, (opts.height, opts.width))
else:
h, w = init_image.shape[-2:]
init_image = init_image[..., : 16 * (h // 16), : 16 * (w // 16)]
opts.height = init_image.shape[-2]
opts.width = init_image.shape[-1]
if offload:
ae.encoder.to(torch_device)
init_image = ae.encode(init_image.to(torch_device))
if offload:
ae = ae.cpu()
torch.cuda.empty_cache()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=opts.seed,
)
# divide pixel space by 16**2 to account for latent space conversion
timesteps = get_schedule(
opts.num_steps,
(x.shape[-1] * x.shape[-2]) // 4,
shift=(not is_schnell),
)
if init_image is not None:
t_idx = int((1 - image2image_strength) * num_steps)
t = timesteps[t_idx]
timesteps = timesteps[t_idx:]
x = t * x + (1.0 - t) * init_image.to(x.dtype)
if offload:
t5, clip = t5.to(torch_device), clip.to(torch_device)
inp = prepare(t5=t5, clip=clip, img=x, prompt=opts.prompt)
# offload TEs to CPU, load model to gpu
if offload:
t5, clip = t5.cpu(), clip.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
# denoise initial noise
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
if offload:
ae.decoder.cpu()
torch.cuda.empty_cache()
t1 = time.perf_counter()
fn = output_name.format(idx=idx)
print(f"Done in {t1 - t0:.1f}s.")
# bring into PIL format and save
x = x.clamp(-1, 1)
x = embed_watermark(x.float())
x = rearrange(x[0], "c h w -> h w c")
img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())
nsfw_score = [x["score"] for x in nsfw_classifier(img) if x["label"] == "nsfw"][0]
if nsfw_score < NSFW_THRESHOLD:
buffer = BytesIO()
exif_data = Image.Exif()
if init_image is None:
exif_data[ExifTags.Base.Software] = "AI generated;txt2img;flux"
else:
exif_data[ExifTags.Base.Software] = "AI generated;img2img;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = name
if add_sampling_metadata:
exif_data[ExifTags.Base.ImageDescription] = prompt
img.save(buffer, format="jpeg", exif=exif_data, quality=95, subsampling=0)
img_bytes = buffer.getvalue()
if save_samples:
print(f"Saving {fn}")
with open(fn, "wb") as file:
file.write(img_bytes)
idx += 1
st.session_state["samples"] = {
"prompt": opts.prompt,
"img": img,
"seed": opts.seed,
"bytes": img_bytes,
}
opts.seed = None
else:
st.warning("Your generated image may contain NSFW content.")
st.session_state["samples"] = None
samples = st.session_state.get("samples", None)
if samples is not None:
st.image(samples["img"], caption=samples["prompt"])
st.download_button(
"Download full-resolution",
samples["bytes"],
file_name="generated.jpg",
mime="image/jpg",
)
st.write(f"Seed: {samples['seed']}")
def app():
Fire(main)
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/demo_st_fill.py
================================================
import os
import re
import tempfile
import time
from glob import iglob
from io import BytesIO
import numpy as np
import streamlit as st
import torch
from einops import rearrange
from PIL import ExifTags, Image
from st_keyup import st_keyup
from streamlit_drawable_canvas import st_canvas
from transformers import pipeline
from flux.sampling import denoise, get_noise, get_schedule, prepare_fill, unpack
from flux.ideas import denoise_cache
from flux.util import embed_watermark, load_ae, load_clip, load_flow_model, load_t5
NSFW_THRESHOLD = 0.85
def add_border_and_mask(image, zoom_all=1.0, zoom_left=0, zoom_right=0, zoom_up=0, zoom_down=0, overlap=0):
"""Adds a black border around the image with individual side control and mask overlap"""
orig_width, orig_height = image.size
# Calculate padding for each side (in pixels)
left_pad = int(orig_width * zoom_left)
right_pad = int(orig_width * zoom_right)
top_pad = int(orig_height * zoom_up)
bottom_pad = int(orig_height * zoom_down)
# Calculate overlap in pixels
overlap_left = int(orig_width * overlap)
overlap_right = int(orig_width * overlap)
overlap_top = int(orig_height * overlap)
overlap_bottom = int(orig_height * overlap)
# If using the all-sides zoom, add it to each side
if zoom_all > 1.0:
extra_each_side = (zoom_all - 1.0) / 2
left_pad += int(orig_width * extra_each_side)
right_pad += int(orig_width * extra_each_side)
top_pad += int(orig_height * extra_each_side)
bottom_pad += int(orig_height * extra_each_side)
# Calculate new dimensions (ensure they're multiples of 32)
new_width = 32 * round((orig_width + left_pad + right_pad) / 32)
new_height = 32 * round((orig_height + top_pad + bottom_pad) / 32)
# Create new image with black border
bordered_image = Image.new("RGB", (new_width, new_height), (0, 0, 0))
# Paste original image in position
paste_x = left_pad
paste_y = top_pad
bordered_image.paste(image, (paste_x, paste_y))
# Create mask (white where the border is, black where the original image was)
mask = Image.new("L", (new_width, new_height), 255) # White background
# Paste black rectangle with overlap adjustment
mask.paste(
0,
(
paste_x + overlap_left, # Left edge moves right
paste_y + overlap_top, # Top edge moves down
paste_x + orig_width - overlap_right, # Right edge moves left
paste_y + orig_height - overlap_bottom, # Bottom edge moves up
),
)
return bordered_image, mask
@st.cache_resource()
def get_models(name: str, device: torch.device, offload: bool):
t5 = load_t5(device, max_length=128)
clip = load_clip(device)
model = load_flow_model(name, device="cpu" if offload else device)
ae = load_ae(name, device="cpu" if offload else device)
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
return model, ae, t5, clip, nsfw_classifier
def resize(img: Image.Image, min_mp: float = 0.5, max_mp: float = 2.0) -> Image.Image:
width, height = img.size
mp = (width * height) / 1_000_000 # Current megapixels
if min_mp <= mp <= max_mp:
# Even if MP is in range, ensure dimensions are multiples of 32
new_width = int(32 * round(width / 32))
new_height = int(32 * round(height / 32))
if new_width != width or new_height != height:
return img.resize((new_width, new_height), Image.Resampling.LANCZOS)
return img
# Calculate scaling factor
if mp < min_mp:
scale = (min_mp / mp) ** 0.5
else: # mp > max_mp
scale = (max_mp / mp) ** 0.5
new_width = int(32 * round(width * scale / 32))
new_height = int(32 * round(height * scale / 32))
return img.resize((new_width, new_height), Image.Resampling.LANCZOS)
def clear_canvas_state():
"""Clear all canvas-related state"""
keys_to_clear = ["canvas", "last_image_dims"]
for key in keys_to_clear:
if key in st.session_state:
del st.session_state[key]
def set_new_image(img: Image.Image):
"""Safely set a new image and clear relevant state"""
st.session_state["current_image"] = img
clear_canvas_state()
st.rerun()
def downscale_image(img: Image.Image, scale_factor: float) -> Image.Image:
"""Downscale image by a given factor while maintaining 32-pixel multiple dimensions"""
if scale_factor >= 1.0:
return img
width, height = img.size
new_width = int(32 * round(width * scale_factor / 32))
new_height = int(32 * round(height * scale_factor / 32))
# Ensure minimum dimensions
new_width = max(64, new_width) # minimum 64 pixels
new_height = max(64, new_height) # minimum 64 pixels
return img.resize((new_width, new_height), Image.Resampling.LANCZOS)
@torch.inference_mode()
def main(
device: str = "cuda" if torch.cuda.is_available() else "cpu",
offload: bool = False,
output_dir: str = "output",
):
torch_device = torch.device(device)
st.title("Flux Fill: Inpainting & Outpainting")
# Model selection and loading
name = "flux-dev-fill"
if not st.checkbox("Load model", False):
return
try:
model, ae, t5, clip, nsfw_classifier = get_models(
name,
device=torch_device,
offload=offload,
)
except Exception as e:
st.error(f"Error loading models: {e}")
return
# Mode selection
mode = st.radio("Select Mode", ["Inpainting", "Outpainting"])
# Image handling - either from previous generation or new upload
if "input_image" in st.session_state:
image = st.session_state["input_image"]
del st.session_state["input_image"]
set_new_image(image)
st.write("Continuing from previous result")
else:
uploaded_image = st.file_uploader("Upload image", type=["jpg", "jpeg", "png"])
if uploaded_image is None:
st.warning("Please upload an image")
return
if (
"current_image_name" not in st.session_state
or st.session_state["current_image_name"] != uploaded_image.name
):
try:
image = Image.open(uploaded_image).convert("RGB")
st.session_state["current_image_name"] = uploaded_image.name
set_new_image(image)
except Exception as e:
st.error(f"Error loading image: {e}")
return
else:
image = st.session_state.get("current_image")
if image is None:
st.error("Error: Image state is invalid. Please reupload the image.")
clear_canvas_state()
return
# Add downscale control
with st.expander("Image Size Control"):
current_mp = (image.size[0] * image.size[1]) / 1_000_000
st.write(f"Current image size: {image.size[0]}x{image.size[1]} ({current_mp:.1f}MP)")
scale_factor = st.slider(
"Downscale Factor",
min_value=0.1,
max_value=1.0,
value=1.0,
step=0.1,
help="1.0 = original size, 0.5 = half size, etc.",
)
if scale_factor < 1.0 and st.button("Apply Downscaling"):
image = downscale_image(image, scale_factor)
set_new_image(image)
st.rerun()
# Resize image with validation
try:
original_mp = (image.size[0] * image.size[1]) / 1_000_000
image = resize(image)
width, height = image.size
current_mp = (width * height) / 1_000_000
if width % 32 != 0 or height % 32 != 0:
st.error("Error: Image dimensions must be multiples of 32")
return
st.write(f"Image dimensions: {width}x{height} pixels")
if original_mp != current_mp:
st.write(
f"Image has been resized from {original_mp:.1f}MP to {current_mp:.1f}MP to stay within bounds (0.5MP - 2MP)"
)
except Exception as e:
st.error(f"Error processing image: {e}")
return
if mode == "Outpainting":
# Outpainting controls
zoom_all = st.slider("Zoom Out Amount (All Sides)", min_value=1.0, max_value=3.0, value=1.0, step=0.1)
with st.expander("Advanced Zoom Controls"):
st.info("These controls add additional zoom to specific sides")
col1, col2 = st.columns(2)
with col1:
zoom_left = st.slider("Left", min_value=0.0, max_value=1.0, value=0.0, step=0.1)
zoom_right = st.slider("Right", min_value=0.0, max_value=1.0, value=0.0, step=0.1)
with col2:
zoom_up = st.slider("Up", min_value=0.0, max_value=1.0, value=0.0, step=0.1)
zoom_down = st.slider("Down", min_value=0.0, max_value=1.0, value=0.0, step=0.1)
overlap = st.slider("Overlap", min_value=0.01, max_value=0.25, value=0.01, step=0.01)
# Generate bordered image and mask
image_for_generation, mask = add_border_and_mask(
image,
zoom_all=zoom_all,
zoom_left=zoom_left,
zoom_right=zoom_right,
zoom_up=zoom_up,
zoom_down=zoom_down,
overlap=overlap,
)
width, height = image_for_generation.size
# Show preview
col1, col2 = st.columns(2)
with col1:
st.image(image_for_generation, caption="Image with Border")
with col2:
st.image(mask, caption="Mask (white areas will be generated)")
else: # Inpainting mode
# Canvas setup with dimension tracking
canvas_key = f"canvas_{width}_{height}"
if "last_image_dims" not in st.session_state:
st.session_state.last_image_dims = (width, height)
elif st.session_state.last_image_dims != (width, height):
clear_canvas_state()
st.session_state.last_image_dims = (width, height)
st.rerun()
try:
canvas_result = st_canvas(
fill_color="rgba(255, 255, 255, 0.0)",
stroke_width=st.slider("Brush size", 1, 500, 50),
stroke_color="#fff",
background_image=image,
height=height,
width=width,
drawing_mode="freedraw",
key=canvas_key,
display_toolbar=True,
)
except Exception as e:
st.error(f"Error creating canvas: {e}")
clear_canvas_state()
st.rerun()
return
# Sampling parameters
num_steps = int(st.number_input("Number of steps", min_value=1, value=50))
guidance = float(st.number_input("Guidance", min_value=1.0, value=30.0))
seed_str = st.text_input("Seed")
if seed_str.isdecimal():
seed = int(seed_str)
else:
st.info("No seed set, using random seed")
seed = None
save_samples = st.checkbox("Save samples?", True)
add_sampling_metadata = st.checkbox("Add sampling parameters to metadata?", True)
# Prompt input
prompt = st_keyup("Enter a prompt", value="", debounce=300, key="interactive_text")
# Setup output path
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
idx = len(fns)
if st.button("Generate"):
valid_input = False
if mode == "Inpainting" and canvas_result.image_data is not None:
valid_input = True
# Create mask from canvas
try:
mask = Image.fromarray(canvas_result.image_data)
mask = mask.getchannel("A") # Get alpha channel
mask_array = np.array(mask)
mask_array = (mask_array > 0).astype(np.uint8) * 255
mask = Image.fromarray(mask_array)
image_for_generation = image
except Exception as e:
st.error(f"Error creating mask: {e}")
return
elif mode == "Outpainting":
valid_input = True
# image_for_generation and mask are already set above
if not valid_input:
st.error("Please draw a mask or configure outpainting settings")
return
# Create temporary files
with (
tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp_img,
tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp_mask,
):
try:
image_for_generation.save(tmp_img.name)
mask.save(tmp_mask.name)
except Exception as e:
st.error(f"Error saving temporary files: {e}")
return
try:
# Generate inpainting/outpainting
rng = torch.Generator(device="cpu")
if seed is None:
seed = rng.seed()
print(f"Generating with seed {seed}:\n{prompt}")
t0 = time.perf_counter()
x = get_noise(
1,
height,
width,
device=torch_device,
dtype=torch.bfloat16,
seed=seed,
)
if offload:
t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)
inp = prepare_fill(
t5,
clip,
x,
prompt=prompt,
ae=ae,
img_cond_path=tmp_img.name,
mask_path=tmp_mask.name,
)
timesteps = get_schedule(num_steps, inp["img"].shape[1], shift=True)
if offload:
t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=guidance)
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
x = unpack(x.float(), height, width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
t1 = time.perf_counter()
print(f"Done in {t1 - t0:.1f}s")
# Process and display result
x = x.clamp(-1, 1)
x = embed_watermark(x.float())
x = rearrange(x[0], "c h w -> h w c")
img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())
nsfw_score = [x["score"] for x in nsfw_classifier(img) if x["label"] == "nsfw"][0]
if nsfw_score < NSFW_THRESHOLD:
buffer = BytesIO()
exif_data = Image.Exif()
exif_data[ExifTags.Base.Software] = "AI generated;inpainting;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = name
if add_sampling_metadata:
exif_data[ExifTags.Base.ImageDescription] = prompt
img.save(buffer, format="jpeg", exif=exif_data, quality=95, subsampling=0)
img_bytes = buffer.getvalue()
if save_samples:
fn = output_name.format(idx=idx)
print(f"Saving {fn}")
with open(fn, "wb") as file:
file.write(img_bytes)
st.session_state["samples"] = {
"prompt": prompt,
"img": img,
"seed": seed,
"bytes": img_bytes,
}
else:
st.warning("Your generated image may contain NSFW content.")
st.session_state["samples"] = None
except Exception as e:
st.error(f"Error during generation: {e}")
return
finally:
# Clean up temporary files
try:
os.unlink(tmp_img.name)
os.unlink(tmp_mask.name)
except Exception as e:
print(f"Error cleaning up temporary files: {e}")
# Display results
samples = st.session_state.get("samples", None)
if samples is not None:
st.image(samples["img"], caption=samples["prompt"])
col1, col2 = st.columns(2)
with col1:
st.download_button(
"Download full-resolution",
samples["bytes"],
file_name="generated.jpg",
mime="image/jpg",
)
with col2:
if st.button("Continue from this image"):
# Store the generated image
new_image = samples["img"]
# Clear ALL canvas state
clear_canvas_state()
if "samples" in st.session_state:
del st.session_state["samples"]
# Set as current image
st.session_state["current_image"] = new_image
st.rerun()
st.write(f"Seed: {samples['seed']}")
if __name__ == "__main__":
st.set_page_config(layout="wide")
main()
================================================
FILE: flux-ToCa/docs/fill.md
================================================
## Models
FLUX.1 Fill introduces advanced inpainting and outpainting capabilities. It allows for seamless edits that integrate naturally with existing images.
| Name | HuggingFace repo | License | sha256sum |
| ------------------- | -------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |
| `FLUX.1 Fill [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 03e289f530df51d014f48e675a9ffa2141bc003259bf5f25d75b957e920a41ca |
| `FLUX.1 Fill [pro]` | Only available in our API. |
## Examples


## Open-weights usage
The weights will be downloaded automatically from HuggingFace once you start one of the demos. To download `FLUX.1 Fill [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). Alternatively, if you have downloaded the model weights manually from [here](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev), you can specify the downloaded paths via environment variables:
```bash
export FLUX_DEV_FILL=
export AE=
```
For interactive sampling run
```bash
python -m src.flux.cli_fill --loop
```
Or to generate a single sample run
```bash
python -m src.flux.cli_fill \
--img_cond_path \
--img_mask_path
```
The input_mask should be an image of the same size as the conditioning image that only contains black and white pixels; see [an example mask](../assets/cup_mask.png) for [this image](../assets/cup.png).
We also provide an interactive streamlit demo. The demo can be run via
```bash
streamlit run demo_st_fill.py
```
================================================
FILE: flux-ToCa/docs/image-variation.md
================================================
## Models
FLUX.1 Redux is an adapter for the FLUX.1 text-to-image base models, FLUX.1 [dev] and FLUX.1 [schnell], which can be used to generate image variations.
In addition, FLUX.1 Redux [pro] is available in our API and, augmenting the [dev] adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios.
| Name | HuggingFace repo | License | sha256sum |
| --------------------------- | ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |
| `FLUX.1 Redux [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | a1b3bdcb4bdc58ce04874b9ca776d61fc3e914bb6beab41efb63e4e2694dca45 |
| `FLUX.1 Redux [pro]` | [Available in our API.](https://docs.bfl.ml/) Supports image variations. |
| `FLUX1.1 Redux [pro] Ultra` | [Available in our API.](https://docs.bfl.ml/) Supports image variations based on a text prompt. |
## Examples

## Open-weights usage
The text-to-image base model weights and the autoencoder weights will be downloaded automatically from HuggingFace once you start the demo. To download `FLUX.1 [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). You need to manually download the adapter weights from [here](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev) and specify them via an environment variable `export FLUX_REDUX=`. In general, you may specify any manually downloaded weights via environment variables:
```bash
export FLUX_REDUX=
export FLUX_SCHNELL=
export FLUX_DEV=
export AE=
```
For interactive sampling run
```bash
python -m src.flux.cli_redux --loop --name
```
where `name` is one of `flux-dev` or `flux-schnell`.
================================================
FILE: flux-ToCa/docs/structural-conditioning.md
================================================
## Models
Structural conditioning uses canny edge or depth detection to maintain precise control during image transformations. By preserving the original image's structure through edge or depth maps, users can make text-guided edits while keeping the core composition intact. This is particularly effective for retexturing images. We release four variations: two based on edge maps (full model and LoRA for FLUX.1 [dev]) and two based on depth maps (full model and LoRA for FLUX.1 [dev]).
| Name | HuggingFace repo | License | sha256sum |
| ------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |
| `FLUX.1 Canny [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 996876670169591cb412b937fbd46ea14cbed6933aef17c48a2dcd9685c98cdb |
| `FLUX.1 Depth [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 41360d1662f44ca45bc1b665fe6387e91802f53911001630d970a4f8be8dac21 |
| `FLUX.1 Canny [dev] LoRA` | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 8eaa21b9c43d5e7242844deb64b8cf22ae9010f813f955ca8c05f240b8a98f7e |
| `FLUX.1 Depth [dev] LoRA` | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 1938b38ea0fdd98080fa3e48beb2bedfbc7ad102d8b65e6614de704a46d8b907 |
| `FLUX.1 Canny [pro]` | [Available in our API](https://docs.bfl.ml/). |
| `FLUX.1 Depth [pro]` | [Available in our API](https://docs.bfl.ml/). |
## Examples


## Open-weights usage
The full model weights (`FLUX.1 Canny [dev], Flux.1 Depth [dev], FLUX.1 [dev], and the autoencoder) will be downloaded automatically from HuggingFace once you start one of the demos. To download them, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). The LoRA weights are not downloaded automatically, but can be downloaded manually [here (Canny)](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora) and [here (Depth)](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora). You may specify any manually downloaded weights via environment variables: (**necessary for LoRAs**):
```bash
export FLUX_DEV_DEPTH=
export FLUX_DEV_CANNY=
export FLUX_DEV_DEPTH_LORA=
export FLUX_DEV_CANNY_LORA=
export FLUX_REDUX=
export FLUX_SCHNELL=
export FLUX_DEV=
export AE=
```
For interactive sampling run
```bash
python -m src.flux.cli_control --loop --name
```
where `name` is one of `flux-dev-canny`, `flux-dev-depth`, `flux-dev-canny-lora`, or `flux-dev-depth-lora`.
## Diffusers usage
Flux Control (including the LoRAs) is also compatible with the `diffusers` Python library. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) to learn more.
================================================
FILE: flux-ToCa/docs/text-to-image.md
================================================
## Models
We currently offer four text-to-image models. `FLUX1.1 [pro]` is our most capable model which can generate images at up to 4MP while maintaining an impressive generation time of only 10 seconds per sample.
| Name | HuggingFace repo | License | sha256sum |
| ------------------------- | ------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |
| `FLUX.1 [schnell]` | https://huggingface.co/black-forest-labs/FLUX.1-schnell | [apache-2.0](model_licenses/LICENSE-FLUX1-schnell) | 9403429e0052277ac2a87ad800adece5481eecefd9ed334e1f348723621d2a0a |
| `FLUX.1 [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 4610115bb0c89560703c892c59ac2742fa821e60ef5871b33493ba544683abd7 |
| `FLUX.1 [pro]` | [Available in our API](https://docs.bfl.ml/). |
| `FLUX1.1 [pro]` | [Available in our API](https://docs.bfl.ml/). |
| `FLUX1.1 [pro] Ultra/raw` | [Available in our API](https://docs.bfl.ml/). |
## Open-weights usage
The weights will be downloaded automatically from HuggingFace once you start one of the demos. To download `FLUX.1 [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login).
If you have downloaded the model weights manually, you can specify the downloaded paths via environment-variables:
```bash
export FLUX_SCHNELL=
export FLUX_DEV=
export AE=
```
For interactive sampling run
```bash
python -m flux --name --loop
```
Or to generate a single sample run
```bash
python -m flux --name \
--height --width \
--prompt ""
```
We also provide a streamlit demo that does both text-to-image and image-to-image. The demo can be run via
```bash
streamlit run demo_st.py
```
We also offer a Gradio-based demo for an interactive experience. To run the Gradio demo:
```bash
python demo_gr.py --name flux-schnell --device cuda
```
Options:
- `--name`: Choose the model to use (options: "flux-schnell", "flux-dev")
- `--device`: Specify the device to use (default: "cuda" if available, otherwise "cpu")
- `--offload`: Offload model to CPU when not in use
- `--share`: Create a public link to your demo
To run the demo with the dev model and create a public link:
```bash
python demo_gr.py --name flux-dev --share
```
## Diffusers integration
`FLUX.1 [schnell]` and `FLUX.1 [dev]` are integrated with the [🧨 diffusers](https://github.com/huggingface/diffusers) library. To use it with diffusers, install it:
```shell
pip install git+https://github.com/huggingface/diffusers.git
```
Then you can use `FluxPipeline` to run the model
```python
import torch
from diffusers import FluxPipeline
model_id = "black-forest-labs/FLUX.1-schnell" #you can also use `black-forest-labs/FLUX.1-dev`
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A cat holding a sign that says hello world"
seed = 42
image = pipe(
prompt,
output_type="pil",
num_inference_steps=4, #use a larger number if you are using [dev]
generator=torch.Generator("cpu").manual_seed(seed)
).images[0]
image.save("flux-schnell.png")
```
To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation
================================================
FILE: flux-ToCa/model_cards/FLUX.1-dev.md
================================================
![FLUX.1 [dev] Grid](../assets/dev_grid.jpg)
`FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
# Key Features
1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`.
2. Competitive prompt following, matching the performance of closed source alternatives.
3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient.
4. Open weights to drive new scientific research, and empower artists to develop innovative workflows.
5. Generated outputs can be used for personal, scientific, and commercial purposes, as described in the [flux-1-dev-non-commercial-license](./licence.md).
# Usage
We provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
Developers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point.
## API Endpoints
The FLUX.1 models are also available via API from the following sources
1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
2. [replicate.com](https://replicate.com/collections/flux)
3. [fal.ai](https://fal.ai/models/fal-ai/flux/dev)
## ComfyUI
`FLUX.1 [dev]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
---
# Limitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate output that matches the prompts.
- Prompt following is heavily influenced by the prompting-style.
# Out-of-Scope Use
The model and its derivatives may not be used
- In any way that violates any applicable national, federal, state, local or international law or regulation.
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
- To generate or disseminate personal identifiable information that can be used to harm an individual.
- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
- To create non-consensual nudity or illegal pornographic content.
- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
- Generating or facilitating large-scale disinformation campaigns.
# License
This model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).
================================================
FILE: flux-ToCa/model_cards/FLUX.1-schnell.md
================================================
![FLUX.1 [schnell] Grid](../assets/schnell_grid.jpg)
`FLUX.1 [schnell]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
# Key Features
1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.
2. Trained using latent adversarial diffusion distillation, `FLUX.1 [schnell]` can generate high-quality images in only 1 to 4 steps.
3. Released under the `apache-2.0` licence, the model can be used for personal, scientific, and commercial purposes.
# Usage
We provide a reference implementation of `FLUX.1 [schnell]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
Developers and creatives looking to build on top of `FLUX.1 [schnell]` are encouraged to use this as a starting point.
## API Endpoints
The FLUX.1 models are also available via API from the following sources
1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
2. [replicate.com](https://replicate.com/collections/flux)
3. [fal.ai](https://fal.ai/models/fal-ai/flux/schnell)
## ComfyUI
`FLUX.1 [schnell]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
---
# Limitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate output that matches the prompts.
- Prompt following is heavily influenced by the prompting-style.
# Out-of-Scope Use
The model and its derivatives may not be used
- In any way that violates any applicable national, federal, state, local or international law or regulation.
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
- To generate or disseminate personal identifiable information that can be used to harm an individual.
- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
- To create non-consensual nudity or illegal pornographic content.
- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
- Generating or facilitating large-scale disinformation campaigns.
================================================
FILE: flux-ToCa/model_licenses/LICENSE-FLUX1-dev
================================================
FLUX.1 [dev] Non-Commercial License
Black Forest Labs, Inc. (“we” or “our” or “Company”) is pleased to make available the weights, parameters and inference code for the FLUX.1 [dev] Model (as defined below) freely available for your non-commercial and non-production use as set forth in this FLUX.1 [dev] Non-Commercial License (“License”). The “FLUX.1 [dev] Model” means the FLUX.1 [dev] AI models, including FLUX.1 [dev], FLUX.1 Fill [dev], FLUX.1 Depth [dev], FLUX.1 Canny [dev], FLUX.1 Redux [dev], FLUX.1 Canny [dev] LoRA and FLUX.1 Depth [dev] LoRA, and their elements which includes algorithms, software, checkpoints, parameters, source code (inference code, evaluation code, and if applicable, fine-tuning code) and any other materials associated with the FLUX.1 [dev] AI models made available by Company under this License, including if any, the technical documentation, manuals and instructions for the use and operation thereof (collectively, “FLUX.1 [dev] Model”).
By downloading, accessing, use, Distributing (as defined below), or creating a Derivative (as defined below) of the FLUX.1 [dev] Model, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to access, use, Distribute or create a Derivative of the FLUX.1 [dev] Model and you must immediately cease using the FLUX.1 [dev] Model. If you are agreeing to be bound by the terms of this License on behalf of your employer or other entity, you represent and warrant to us that you have full legal authority to bind your employer or such entity to this License. If you do not have the requisite authority, you may not accept the License or access the FLUX.1 [dev] Model on behalf of your employer or other entity.
1. Definitions. Capitalized terms used in this License but not defined herein have the following meanings:
a. “Derivative” means any (i) modified version of the FLUX.1 [dev] Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the FLUX.1 [dev] Model, or (iii) any other derivative work thereof. For the avoidance of doubt, Outputs are not considered Derivatives under this License.
b. “Distribution” or “Distribute” or “Distributing” means providing or making available, by any means, a copy of the FLUX.1 [dev] Models and/or the Derivatives as the case may be.
c. “Non-Commercial Purpose” means any of the following uses, but only so far as you do not receive any direct or indirect payment arising from the use of the model or its output: (i) personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, or otherwise not directly or indirectly connected to any commercial activities, business operations, or employment responsibilities; (ii) use by commercial or for-profit entities for testing, evaluation, or non-commercial research and development in a non-production environment, (iii) use by any charitable organization for charitable purposes, or for testing or evaluation. For clarity, use for revenue-generating activity or direct interactions with or impacts on end users, or use to train, fine tune or distill other models for commercial use is not a Non-Commercial purpose.
d. “Outputs” means any content generated by the operation of the FLUX.1 [dev] Models or the Derivatives from a prompt (i.e., text instructions) provided by users. For the avoidance of doubt, Outputs do not include any components of a FLUX.1 [dev] Models, such as any fine-tuned versions of the FLUX.1 [dev] Models, the weights, or parameters.
e. “you” or “your” means the individual or entity entering into this License with Company.
2. License Grant.
a. License. Subject to your compliance with this License, Company grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license to access, use, create Derivatives of, and Distribute the FLUX.1 [dev] Models solely for your Non-Commercial Purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Company’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License. Any restrictions set forth herein in regarding the FLUX.1 [dev] Model also applies to any Derivative you create or that are created on your behalf.
b. Non-Commercial Use Only. You may only access, use, Distribute, or creative Derivatives of or the FLUX.1 [dev] Model or Derivatives for Non-Commercial Purposes. If You want to use a FLUX.1 [dev] Model a Derivative for any purpose that is not expressly authorized under this License, such as for a commercial activity, you must request a license from Company, which Company may grant to you in Company’s sole discretion and which additional use may be subject to a fee, royalty or other revenue share. Please contact Company at the following e-mail address if you want to discuss such a license: info@blackforestlabs.ai.
c. Reserved Rights. The grant of rights expressly set forth in this License are the complete grant of rights to you in the FLUX.1 [dev] Model, and no other licenses are granted, whether by waiver, estoppel, implication, equity or otherwise. Company and its licensors reserve all rights not expressly granted by this License.
d. Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.
3. Distribution. Subject to this License, you may Distribute copies of the FLUX.1 [dev] Model and/or Derivatives made by you, under the following conditions:
a. you must make available a copy of this License to third-party recipients of the FLUX.1 [dev] Models and/or Derivatives you Distribute, and specify that any rights to use the FLUX.1 [dev] Models and/or Derivatives shall be directly granted by Company to said third-party recipients pursuant to this License;
b. you must make prominently display the following notice alongside the Distribution of the FLUX.1 [dev] Model or Derivative (such as via a “Notice” text file distributed as part of such FLUX.1 [dev] Model or Derivative) (the “Attribution Notice”):
“The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc.
IN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.”
c. in the case of Distribution of Derivatives made by you, you must also include in the Attribution Notice a statement that you have modified the applicable FLUX.1 [dev] Model; and
d. in the case of Distribution of Derivatives made by you, any terms and conditions you impose on any third-party recipients relating to Derivatives made by or for you shall neither limit such third-party recipients’ use of the FLUX.1 [dev] Model or any Derivatives made by or for Company in accordance with this License nor conflict with any of its terms and conditions.
e. In the case of Distribution of Derivatives made by you, you must not misrepresent or imply, through any means, that the Derivatives made by or for you and/or any modified version of the FLUX.1 [dev] Model you Distribute under your name and responsibility is an official product of the Company or has been endorsed, approved or validated by the Company, unless you are authorized by Company to do so in writing.
4. Restrictions. You will not, and will not permit, assist or cause any third party to
a. use, modify, copy, reproduce, create Derivatives of, or Distribute the FLUX.1 [dev] Model (or any Derivative thereof, or any data produced by the FLUX.1 [dev] Model), in whole or in part, for (i) any commercial or production purposes, (ii) military purposes, (iii) purposes of surveillance, including any research or development relating to surveillance, (iv) biometric processing, (v) in any manner that infringes, misappropriates, or otherwise violates any third-party rights, or (vi) in any manner that violates any applicable law and violating any privacy or security laws, rules, regulations, directives, or governmental requirements (including the General Data Privacy Regulation (Regulation (EU) 2016/679), the California Consumer Privacy Act, and any and all laws governing the processing of biometric information), as well as all amendments and successor laws to any of the foregoing;
b. alter or remove copyright and other proprietary notices which appear on or in any portion of the FLUX.1 [dev] Model;
c. utilize any equipment, device, software, or other means to circumvent or remove any security or protection used by Company in connection with the FLUX.1 [dev] Model, or to circumvent or remove any usage restrictions, or to enable functionality disabled by FLUX.1 [dev] Model; or
d. offer or impose any terms on the FLUX.1 [dev] Model that alter, restrict, or are inconsistent with the terms of this License.
e. violate any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”) in connection with your use or Distribution of any FLUX.1 [dev] Model;
f. directly or indirectly Distribute, export, or otherwise transfer FLUX.1 [dev] Model (a) to any individual, entity, or country prohibited by Export Laws; (b) to anyone on U.S. or non-U.S. government restricted parties lists; or (c) for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications; 3) use or download FLUX.1 [dev] Model if you or they are (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) for any purpose prohibited by Export Laws; and (4) will not disguise your location through IP proxying or other methods.
5. DISCLAIMERS. THE FLUX.1 [dev] MODEL IS PROVIDED “AS IS” AND “WITH ALL FAULTS” WITH NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. COMPANY EXPRESSLY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS OR IMPLIED, WHETHER BY STATUTE, CUSTOM, USAGE OR OTHERWISE AS TO ANY MATTERS RELATED TO THE FLUX.1 [dev] MODEL, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, SATISFACTORY QUALITY, OR NON-INFRINGEMENT. COMPANY MAKES NO WARRANTIES OR REPRESENTATIONS THAT THE FLUX.1 [dev] MODEL WILL BE ERROR FREE OR FREE OF VIRUSES OR OTHER HARMFUL COMPONENTS, OR PRODUCE ANY PARTICULAR RESULTS.
6. LIMITATION OF LIABILITY. TO THE FULLEST EXTENT PERMITTED BY LAW, IN NO EVENT WILL COMPANY BE LIABLE TO YOU OR YOUR EMPLOYEES, AFFILIATES, USERS, OFFICERS OR DIRECTORS (A) UNDER ANY THEORY OF LIABILITY, WHETHER BASED IN CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY, WARRANTY, OR OTHERWISE UNDER THIS LICENSE, OR (B) FOR ANY INDIRECT, CONSEQUENTIAL, EXEMPLARY, INCIDENTAL, PUNITIVE OR SPECIAL DAMAGES OR LOST PROFITS, EVEN IF COMPANY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE FLUX.1 [dev] MODEL, ITS CONSTITUENT COMPONENTS, AND ANY OUTPUT (COLLECTIVELY, “MODEL MATERIALS”) ARE NOT DESIGNED OR INTENDED FOR USE IN ANY APPLICATION OR SITUATION WHERE FAILURE OR FAULT OF THE MODEL MATERIALS COULD REASONABLY BE ANTICIPATED TO LEAD TO SERIOUS INJURY OF ANY PERSON, INCLUDING POTENTIAL DISCRIMINATION OR VIOLATION OF AN INDIVIDUAL’S PRIVACY RIGHTS, OR TO SEVERE PHYSICAL, PROPERTY, OR ENVIRONMENTAL DAMAGE (EACH, A “HIGH-RISK USE”). IF YOU ELECT TO USE ANY OF THE MODEL MATERIALS FOR A HIGH-RISK USE, YOU DO SO AT YOUR OWN RISK. YOU AGREE TO DESIGN AND IMPLEMENT APPROPRIATE DECISION-MAKING AND RISK-MITIGATION PROCEDURES AND POLICIES IN CONNECTION WITH A HIGH-RISK USE SUCH THAT EVEN IF THERE IS A FAILURE OR FAULT IN ANY OF THE MODEL MATERIALS, THE SAFETY OF PERSONS OR PROPERTY AFFECTED BY THE ACTIVITY STAYS AT A LEVEL THAT IS REASONABLE, APPROPRIATE, AND LAWFUL FOR THE FIELD OF THE HIGH-RISK USE.
7. INDEMNIFICATION
You will indemnify, defend and hold harmless Company and our subsidiaries and affiliates, and each of our respective shareholders, directors, officers, employees, agents, successors, and assigns (collectively, the “Company Parties”) from and against any losses, liabilities, damages, fines, penalties, and expenses (including reasonable attorneys’ fees) incurred by any Company Party in connection with any claim, demand, allegation, lawsuit, proceeding, or investigation (collectively, “Claims”) arising out of or related to (a) your access to or use of the FLUX.1 [dev] Model (as well as any Output, results or data generated from such access or use), including any High-Risk Use (defined below); (b) your violation of this License; or (c) your violation, misappropriation or infringement of any rights of another (including intellectual property or other proprietary rights and privacy rights). You will promptly notify the Company Parties of any such Claims, and cooperate with Company Parties in defending such Claims. You will also grant the Company Parties sole control of the defense or settlement, at Company’s sole option, of any Claims. This indemnity is in addition to, and not in lieu of, any other indemnities or remedies set forth in a written agreement between you and Company or the other Company Parties.
8. Termination; Survival.
a. This License will automatically terminate upon any breach by you of the terms of this License.
b. We may terminate this License, in whole or in part, at any time upon notice (including electronic) to you.
c. If You initiate any legal action or proceedings against Company or any other entity (including a cross-claim or counterclaim in a lawsuit), alleging that the FLUX.1 [dev] Model or any Derivative, or any part thereof, infringe upon intellectual property or other rights owned or licensable by you, then any licenses granted to you under this License will immediately terminate as of the date such legal action or claim is filed or initiated.
d. Upon termination of this License, you must cease all use, access or Distribution of the FLUX.1 [dev] Model and any Derivatives. The following sections survive termination of this License 2(c), 2(d), 4-11.
9. Third Party Materials. The FLUX.1 [dev] Model may contain third-party software or other components (including free and open source software) (all of the foregoing, “Third Party Materials”), which are subject to the license terms of the respective third-party licensors. Your dealings or correspondence with third parties and your use of or interaction with any Third Party Materials are solely between you and the third party. Company does not control or endorse, and makes no representations or warranties regarding, any Third Party Materials, and your access to and use of such Third Party Materials are at your own risk.
10. Trademarks. You have not been granted any trademark license as part of this License and may not use any name or mark associated with Company without the prior written permission of Company, except to the extent necessary to make the reference required in the Attribution Notice as specified above or as is reasonably necessary in describing the FLUX.1 [dev] Model and its creators.
11. General. This License will be governed and construed under the laws of the State of Delaware without regard to conflicts of law provisions. If any provision or part of a provision of this License is unlawful, void or unenforceable, that provision or part of the provision is deemed severed from this License, and will not affect the validity and enforceability of any remaining provisions. The failure of Company to exercise or enforce any right or provision of this License will not operate as a waiver of such right or provision. This License does not confer any third-party beneficiary rights upon any other person or entity. This License, together with the Documentation, contains the entire understanding between you and Company regarding the subject matter of this License, and supersedes all other written or oral agreements and understandings between you and Company regarding such subject matter. No change or addition to any provision of this License will be binding unless it is in writing and signed by an authorized representative of both you and Company.
================================================
FILE: flux-ToCa/model_licenses/LICENSE-FLUX1-schnell
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
You must give any other recipients of the Work or Derivative Works a copy of this License; and
You must cause any modified files to carry prominent notices stating that You changed the files; and
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
================================================
FILE: flux-ToCa/pyproject.toml
================================================
[project]
name = "flux"
authors = [
{ name = "Black Forest Labs", email = "support@blackforestlabs.ai" },
]
description = "Inference codebase for FLUX"
readme = "README.md"
requires-python = ">=3.10"
license = { file = "LICENSE.md" }
dynamic = ["version"]
dependencies = [
"torch == 2.5.1",
"torchvision",
"einops",
"fire >= 0.6.0",
"huggingface-hub",
"safetensors",
"sentencepiece",
"transformers",
"tokenizers",
"protobuf",
"requests",
"invisible-watermark",
"ruff == 0.6.8",
]
[project.optional-dependencies]
streamlit = [
"streamlit",
"streamlit-drawable-canvas",
"streamlit-keyup",
]
gradio = [
"gradio",
]
all = [
"flux[streamlit]",
"flux[gradio]",
]
[project.scripts]
flux = "flux.cli:app"
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools>=64", "wheel", "setuptools_scm>=8"]
[tool.ruff]
line-length = 110
target-version = "py310"
extend-exclude = ["/usr/lib/*"]
[tool.ruff.lint]
ignore = [
"E501", # line too long - will be fixed in format
]
[tool.ruff.format]
quote-style = "double"
indent-style = "space"
line-ending = "auto"
skip-magic-trailing-comma = false
docstring-code-format = true
exclude = [
"src/flux/_version.py", # generated by setuptools_scm
]
[tool.ruff.lint.isort]
combine-as-imports = true
force-wrap-aliases = true
known-local-folder = ["src"]
known-first-party = ["flux"]
[tool.pyright]
include = ["src"]
exclude = [
"**/__pycache__", # cache directories
"./typings", # generated type stubs
]
stubPath = "./typings"
[tool.tomlsort]
in_place = true
no_sort_tables = true
spaces_before_inline_comment = 1
spaces_indent_inline_array = 2
trailing_comma_inline_array = true
sort_first = [
"project",
"build-system",
"tool.setuptools",
]
# needs to be last for CI reasons
[tool.setuptools_scm]
write_to = "src/flux/_version.py"
parentdir_prefix_version = "flux-"
fallback_version = "0.0.0"
version_scheme = "post-release"
================================================
FILE: flux-ToCa/setup.py
================================================
import setuptools
setuptools.setup()
================================================
FILE: flux-ToCa/src/flux/__init__.py
================================================
try:
from ._version import (
version as __version__, # type: ignore
version_tuple,
)
except ImportError:
__version__ = "unknown (no version information available)"
version_tuple = (0, 0, "unknown", "noinfo")
from pathlib import Path
PACKAGE = __package__.replace("_", "-")
PACKAGE_ROOT = Path(__file__).parent
================================================
FILE: flux-ToCa/src/flux/__main__.py
================================================
from .cli import app
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/src/flux/_version.py
================================================
# file generated by setuptools_scm
# don't change, don't track in version control
TYPE_CHECKING = False
if TYPE_CHECKING:
from typing import Tuple, Union
VERSION_TUPLE = Tuple[Union[int, str], ...]
else:
VERSION_TUPLE = object
version: str
__version__: str
__version_tuple__: VERSION_TUPLE
version_tuple: VERSION_TUPLE
__version__ = version = '0.0.post49+gd06f828.d20250206'
__version_tuple__ = version_tuple = (0, 0, 'gd06f828.d20250206')
================================================
FILE: flux-ToCa/src/flux/api.py
================================================
import io
import os
import time
from pathlib import Path
import requests
from PIL import Image
API_URL = "https://api.bfl.ml"
API_ENDPOINTS = {
"flux.1-pro": "flux-pro",
"flux.1-dev": "flux-dev",
"flux.1.1-pro": "flux-pro-1.1",
}
class ApiException(Exception):
def __init__(self, status_code: int, detail: str | list[dict] | None = None):
super().__init__()
self.detail = detail
self.status_code = status_code
def __str__(self) -> str:
return self.__repr__()
def __repr__(self) -> str:
if self.detail is None:
message = None
elif isinstance(self.detail, str):
message = self.detail
else:
message = "[" + ",".join(d["msg"] for d in self.detail) + "]"
return f"ApiException({self.status_code=}, {message=}, detail={self.detail})"
class ImageRequest:
def __init__(
self,
# api inputs
prompt: str,
name: str = "flux.1.1-pro",
width: int | None = None,
height: int | None = None,
num_steps: int | None = None,
prompt_upsampling: bool | None = None,
seed: int | None = None,
guidance: float | None = None,
interval: float | None = None,
safety_tolerance: int | None = None,
# behavior of this class
validate: bool = True,
launch: bool = True,
api_key: str | None = None,
):
"""
Manages an image generation request to the API.
All parameters not specified will use the API defaults.
Args:
prompt: Text prompt for image generation.
width: Width of the generated image in pixels. Must be a multiple of 32.
height: Height of the generated image in pixels. Must be a multiple of 32.
name: Which model version to use
num_steps: Number of steps for the image generation process.
prompt_upsampling: Whether to perform upsampling on the prompt.
seed: Optional seed for reproducibility.
guidance: Guidance scale for image generation.
safety_tolerance: Tolerance level for input and output moderation.
Between 0 and 6, 0 being most strict, 6 being least strict.
validate: Run input validation
launch: Directly launches request
api_key: Your API key if not provided by the environment
Raises:
ValueError: For invalid input, when `validate`
ApiException: For errors raised from the API
"""
if validate:
if name not in API_ENDPOINTS.keys():
raise ValueError(f"Invalid model {name}")
elif width is not None and width % 32 != 0:
raise ValueError(f"width must be divisible by 32, got {width}")
elif width is not None and not (256 <= width <= 1440):
raise ValueError(f"width must be between 256 and 1440, got {width}")
elif height is not None and height % 32 != 0:
raise ValueError(f"height must be divisible by 32, got {height}")
elif height is not None and not (256 <= height <= 1440):
raise ValueError(f"height must be between 256 and 1440, got {height}")
elif num_steps is not None and not (1 <= num_steps <= 50):
raise ValueError(f"steps must be between 1 and 50, got {num_steps}")
elif guidance is not None and not (1.5 <= guidance <= 5.0):
raise ValueError(f"guidance must be between 1.5 and 4, got {guidance}")
elif interval is not None and not (1.0 <= interval <= 4.0):
raise ValueError(f"interval must be between 1 and 4, got {interval}")
elif safety_tolerance is not None and not (0 <= safety_tolerance <= 6.0):
raise ValueError(f"safety_tolerance must be between 0 and 6, got {interval}")
if name == "flux.1-dev":
if interval is not None:
raise ValueError("Interval is not supported for flux.1-dev")
if name == "flux.1.1-pro":
if interval is not None or num_steps is not None or guidance is not None:
raise ValueError("Interval, num_steps and guidance are not supported for " "flux.1.1-pro")
self.name = name
self.request_json = {
"prompt": prompt,
"width": width,
"height": height,
"steps": num_steps,
"prompt_upsampling": prompt_upsampling,
"seed": seed,
"guidance": guidance,
"interval": interval,
"safety_tolerance": safety_tolerance,
}
self.request_json = {key: value for key, value in self.request_json.items() if value is not None}
self.request_id: str | None = None
self.result: dict | None = None
self._image_bytes: bytes | None = None
self._url: str | None = None
if api_key is None:
self.api_key = os.environ.get("BFL_API_KEY")
else:
self.api_key = api_key
if launch:
self.request()
def request(self):
"""
Request to generate the image.
"""
if self.request_id is not None:
return
response = requests.post(
f"{API_URL}/v1/{API_ENDPOINTS[self.name]}",
headers={
"accept": "application/json",
"x-key": self.api_key,
"Content-Type": "application/json",
},
json=self.request_json,
)
result = response.json()
if response.status_code != 200:
raise ApiException(status_code=response.status_code, detail=result.get("detail"))
self.request_id = response.json()["id"]
def retrieve(self) -> dict:
"""
Wait for the generation to finish and retrieve response.
"""
if self.request_id is None:
self.request()
while self.result is None:
response = requests.get(
f"{API_URL}/v1/get_result",
headers={
"accept": "application/json",
"x-key": self.api_key,
},
params={
"id": self.request_id,
},
)
result = response.json()
if "status" not in result:
raise ApiException(status_code=response.status_code, detail=result.get("detail"))
elif result["status"] == "Ready":
self.result = result["result"]
elif result["status"] == "Pending":
time.sleep(0.5)
else:
raise ApiException(status_code=200, detail=f"API returned status '{result['status']}'")
return self.result
@property
def bytes(self) -> bytes:
"""
Generated image as bytes.
"""
if self._image_bytes is None:
response = requests.get(self.url)
if response.status_code == 200:
self._image_bytes = response.content
else:
raise ApiException(status_code=response.status_code)
return self._image_bytes
@property
def url(self) -> str:
"""
Public url to retrieve the image from
"""
if self._url is None:
result = self.retrieve()
self._url = result["sample"]
return self._url
@property
def image(self) -> Image.Image:
"""
Load the image as a PIL Image
"""
return Image.open(io.BytesIO(self.bytes))
def save(self, path: str):
"""
Save the generated image to a local path
"""
suffix = Path(self.url).suffix
if not path.endswith(suffix):
path = path + suffix
Path(path).resolve().parent.mkdir(parents=True, exist_ok=True)
with open(path, "wb") as file:
file.write(self.bytes)
if __name__ == "__main__":
from fire import Fire
Fire(ImageRequest)
================================================
FILE: flux-ToCa/src/flux/cli.py
================================================
import os
import re
import time
from dataclasses import dataclass
from glob import iglob
import torch
from fire import Fire
from transformers import pipeline
from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack
from flux.ideas import denoise_cache
from flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image
NSFW_THRESHOLD = 0.85
@dataclass
class SamplingOptions:
prompt: str
width: int
height: int
num_steps: int
guidance: float
seed: int | None
def parse_prompt(options: SamplingOptions) -> SamplingOptions | None:
user_question = "Next prompt (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the prompt or write a command starting with a slash:\n"
"- '/w ' will set the width of the generated image\n"
"- '/h ' will set the height of the generated image\n"
"- '/s ' sets the next seed\n"
"- '/g ' sets the guidance (flux-dev only)\n"
"- '/n ' sets the number of steps\n"
"- '/q' to quit"
)
while (prompt := input(user_question)).startswith("/"):
if prompt.startswith("/w"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, width = prompt.split()
options.width = 16 * (int(width) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/h"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, height = prompt.split()
options.height = 16 * (int(height) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/g"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, guidance = prompt.split()
options.guidance = float(guidance)
print(f"Setting guidance to {options.guidance}")
elif prompt.startswith("/s"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, seed = prompt.split()
options.seed = int(seed)
print(f"Setting seed to {options.seed}")
elif prompt.startswith("/n"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, steps = prompt.split()
options.num_steps = int(steps)
print(f"Setting number of steps to {options.num_steps}")
elif prompt.startswith("/q"):
print("Quitting")
return None
else:
if not prompt.startswith("/h"):
print(f"Got invalid command '{prompt}'\n{usage}")
print(usage)
if prompt != "":
options.prompt = prompt
return options
@torch.inference_mode()
def main(
name: str = "flux-schnell",
width: int = 1360,
height: int = 768,
seed: int | None = None,
prompt: str = (
"a photo of a forest with mist swirling around the tree trunks. The word "
'"FLUX" is painted over it in big, red brush strokes with visible texture'
),
device: str = "cuda" if torch.cuda.is_available() else "cpu",
num_steps: int | None = None,
loop: bool = False,
guidance: float = 3.5,
offload: bool = False,
output_dir: str = "output",
add_sampling_metadata: bool = True,
):
"""
Sample the flux model. Either interactively (set `--loop`) or run for a
single image.
Args:
name: Name of the model to load
height: height of the sample in pixels (should be a multiple of 16)
width: width of the sample in pixels (should be a multiple of 16)
seed: Set a seed for sampling
output_name: where to save the output image, `{idx}` will be replaced
by the index of the sample
prompt: Prompt used for sampling
device: Pytorch device
num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)
loop: start an interactive session and sample multiple times
guidance: guidance value used for guidance distillation
add_sampling_metadata: Add the prompt to the image Exif metadata
"""
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
if name not in configs:
available = ", ".join(configs.keys())
raise ValueError(f"Got unknown model name: {name}, chose from {available}")
torch_device = torch.device(device)
if num_steps is None:
num_steps = 4 if name == "flux-schnell" else 50
# allow for packing and conversion to latent space
height = 16 * (height // 16)
width = 16 * (width // 16)
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
if len(fns) > 0:
idx = max(int(fn.split("_")[-1].split(".")[0]) for fn in fns) + 1
else:
idx = 0
# init all components
t5 = load_t5(torch_device, max_length=256 if name == "flux-schnell" else 512)
clip = load_clip(torch_device)
model = load_flow_model(name, device="cpu" if offload else torch_device)
ae = load_ae(name, device="cpu" if offload else torch_device)
rng = torch.Generator(device="cpu")
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
)
if loop:
opts = parse_prompt(opts)
while opts is not None:
if opts.seed is None:
opts.seed = rng.seed()
print(f"Generating with seed {opts.seed}:\n{opts.prompt}")
t0 = time.perf_counter()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=opts.seed,
)
opts.seed = None
if offload:
ae = ae.cpu()
torch.cuda.empty_cache()
t5, clip = t5.to(torch_device), clip.to(torch_device)
inp = prepare(t5, clip, x, prompt=opts.prompt)
timesteps = get_schedule(opts.num_steps, inp["img"].shape[1], shift=(name != "flux-schnell"))
# offload TEs to CPU, load model to gpu
if offload:
t5, clip = t5.cpu(), clip.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
# denoise initial noise
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
fn = output_name.format(idx=idx)
print(f"Done in {t1 - t0:.1f}s. Saving {fn}")
idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)
if loop:
print("-" * 80)
opts = parse_prompt(opts)
else:
opts = None
def app():
Fire(main)
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/src/flux/cli_control.py
================================================
import os
import re
import time
from dataclasses import dataclass
from glob import iglob
import torch
from fire import Fire
from transformers import pipeline
from flux.modules.image_embedders import CannyImageEncoder, DepthImageEncoder
from flux.sampling import denoise, get_noise, get_schedule, prepare_control, unpack
from flux.ideas import denoise_cache
from flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image
@dataclass
class SamplingOptions:
prompt: str
width: int
height: int
num_steps: int
guidance: float
seed: int | None
img_cond_path: str
lora_scale: float | None
def parse_prompt(options: SamplingOptions) -> SamplingOptions | None:
user_question = "Next prompt (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the prompt or write a command starting with a slash:\n"
"- '/w ' will set the width of the generated image\n"
"- '/h ' will set the height of the generated image\n"
"- '/s ' sets the next seed\n"
"- '/g ' sets the guidance (flux-dev only)\n"
"- '/n ' sets the number of steps\n"
"- '/q' to quit"
)
while (prompt := input(user_question)).startswith("/"):
if prompt.startswith("/w"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, width = prompt.split()
options.width = 16 * (int(width) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/h"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, height = prompt.split()
options.height = 16 * (int(height) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/g"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, guidance = prompt.split()
options.guidance = float(guidance)
print(f"Setting guidance to {options.guidance}")
elif prompt.startswith("/s"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, seed = prompt.split()
options.seed = int(seed)
print(f"Setting seed to {options.seed}")
elif prompt.startswith("/n"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, steps = prompt.split()
options.num_steps = int(steps)
print(f"Setting number of steps to {options.num_steps}")
elif prompt.startswith("/q"):
print("Quitting")
return None
else:
if not prompt.startswith("/h"):
print(f"Got invalid command '{prompt}'\n{usage}")
print(usage)
if prompt != "":
options.prompt = prompt
return options
def parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:
if options is None:
return None
user_question = "Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the conditioning image or write a command starting with a slash:\n"
"- '/q' to quit"
)
while True:
img_cond_path = input(user_question)
if img_cond_path.startswith("/"):
if img_cond_path.startswith("/q"):
print("Quitting")
return None
else:
if not img_cond_path.startswith("/h"):
print(f"Got invalid command '{img_cond_path}'\n{usage}")
print(usage)
continue
if img_cond_path == "":
break
if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(
(".jpg", ".jpeg", ".png", ".webp")
):
print(f"File '{img_cond_path}' does not exist or is not a valid image file")
continue
options.img_cond_path = img_cond_path
break
return options
def parse_lora_scale(options: SamplingOptions | None) -> tuple[SamplingOptions | None, bool]:
changed = False
if options is None:
return None, changed
user_question = "Next lora scale (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the lora scale or write a command starting with a slash:\n"
"- '/q' to quit"
)
while (prompt := input(user_question)).startswith("/"):
if prompt.startswith("/q"):
print("Quitting")
return None, changed
else:
if not prompt.startswith("/h"):
print(f"Got invalid command '{prompt}'\n{usage}")
print(usage)
if prompt != "":
options.lora_scale = float(prompt)
changed = True
return options, changed
@torch.inference_mode()
def main(
name: str,
width: int = 1024,
height: int = 1024,
seed: int | None = None,
prompt: str = "a robot made out of gold",
device: str = "cuda" if torch.cuda.is_available() else "cpu",
num_steps: int = 50,
loop: bool = False,
guidance: float | None = None,
offload: bool = False,
output_dir: str = "output",
add_sampling_metadata: bool = True,
img_cond_path: str = "assets/robot.webp",
lora_scale: float | None = 0.85,
):
"""
Sample the flux model. Either interactively (set `--loop`) or run for a
single image.
Args:
height: height of the sample in pixels (should be a multiple of 16)
width: width of the sample in pixels (should be a multiple of 16)
seed: Set a seed for sampling
output_name: where to save the output image, `{idx}` will be replaced
by the index of the sample
prompt: Prompt used for sampling
device: Pytorch device
num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)
loop: start an interactive session and sample multiple times
guidance: guidance value used for guidance distillation
add_sampling_metadata: Add the prompt to the image Exif metadata
img_cond_path: path to conditioning image (jpeg/png/webp)
"""
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
assert name in [
"flux-dev-canny",
"flux-dev-depth",
"flux-dev-canny-lora",
"flux-dev-depth-lora",
], f"Got unknown model name: {name}"
if guidance is None:
if name in ["flux-dev-canny", "flux-dev-canny-lora"]:
guidance = 30.0
elif name in ["flux-dev-depth", "flux-dev-depth-lora"]:
guidance = 10.0
else:
raise NotImplementedError()
if name not in configs:
available = ", ".join(configs.keys())
raise ValueError(f"Got unknown model name: {name}, chose from {available}")
torch_device = torch.device(device)
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
if len(fns) > 0:
idx = max(int(fn.split("_")[-1].split(".")[0]) for fn in fns) + 1
else:
idx = 0
# init all components
t5 = load_t5(torch_device, max_length=512)
clip = load_clip(torch_device)
model = load_flow_model(name, device="cpu" if offload else torch_device)
ae = load_ae(name, device="cpu" if offload else torch_device)
# set lora scale
if "lora" in name and lora_scale is not None:
for _, module in model.named_modules():
if hasattr(module, "set_scale"):
module.set_scale(lora_scale)
if name in ["flux-dev-depth", "flux-dev-depth-lora"]:
img_embedder = DepthImageEncoder(torch_device)
elif name in ["flux-dev-canny", "flux-dev-canny-lora"]:
img_embedder = CannyImageEncoder(torch_device)
else:
raise NotImplementedError()
rng = torch.Generator(device="cpu")
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
img_cond_path=img_cond_path,
lora_scale=lora_scale,
)
if loop:
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
if "lora" in name:
opts, changed = parse_lora_scale(opts)
if changed:
# update the lora scale:
for _, module in model.named_modules():
if hasattr(module, "set_scale"):
module.set_scale(opts.lora_scale)
while opts is not None:
if opts.seed is None:
opts.seed = rng.seed()
print(f"Generating with seed {opts.seed}:\n{opts.prompt}")
t0 = time.perf_counter()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=opts.seed,
)
opts.seed = None
if offload:
t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)
inp = prepare_control(
t5,
clip,
x,
prompt=opts.prompt,
ae=ae,
encoder=img_embedder,
img_cond_path=opts.img_cond_path,
)
timesteps = get_schedule(opts.num_steps, inp["img"].shape[1], shift=(name != "flux-schnell"))
# offload TEs and AE to CPU, load model to gpu
if offload:
t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
# denoise initial noise
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Done in {t1 - t0:.1f}s")
idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)
if loop:
print("-" * 80)
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
if "lora" in name:
opts, changed = parse_lora_scale(opts)
if changed:
# update the lora scale:
for _, module in model.named_modules():
if hasattr(module, "set_scale"):
module.set_scale(opts.lora_scale)
else:
opts = None
def app():
Fire(main)
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/src/flux/cli_fill.py
================================================
import os
import re
import time
from dataclasses import dataclass
from glob import iglob
import torch
from fire import Fire
from PIL import Image
from transformers import pipeline
from flux.sampling import denoise, get_noise, get_schedule, prepare_fill, unpack
from flux.ideas import denoise_cache
from flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image
@dataclass
class SamplingOptions:
prompt: str
width: int
height: int
num_steps: int
guidance: float
seed: int | None
img_cond_path: str
img_mask_path: str
def parse_prompt(options: SamplingOptions) -> SamplingOptions | None:
user_question = "Next prompt (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the prompt or write a command starting with a slash:\n"
"- '/s ' sets the next seed\n"
"- '/g ' sets the guidance (flux-dev only)\n"
"- '/n ' sets the number of steps\n"
"- '/q' to quit"
)
while (prompt := input(user_question)).startswith("/"):
if prompt.startswith("/g"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, guidance = prompt.split()
options.guidance = float(guidance)
print(f"Setting guidance to {options.guidance}")
elif prompt.startswith("/s"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, seed = prompt.split()
options.seed = int(seed)
print(f"Setting seed to {options.seed}")
elif prompt.startswith("/n"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, steps = prompt.split()
options.num_steps = int(steps)
print(f"Setting number of steps to {options.num_steps}")
elif prompt.startswith("/q"):
print("Quitting")
return None
else:
if not prompt.startswith("/h"):
print(f"Got invalid command '{prompt}'\n{usage}")
print(usage)
if prompt != "":
options.prompt = prompt
return options
def parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:
if options is None:
return None
user_question = "Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the conditioning image or write a command starting with a slash:\n"
"- '/q' to quit"
)
while True:
img_cond_path = input(user_question)
if img_cond_path.startswith("/"):
if img_cond_path.startswith("/q"):
print("Quitting")
return None
else:
if not img_cond_path.startswith("/h"):
print(f"Got invalid command '{img_cond_path}'\n{usage}")
print(usage)
continue
if img_cond_path == "":
break
if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(
(".jpg", ".jpeg", ".png", ".webp")
):
print(f"File '{img_cond_path}' does not exist or is not a valid image file")
continue
else:
with Image.open(img_cond_path) as img:
width, height = img.size
if width % 32 != 0 or height % 32 != 0:
print(f"Image dimensions must be divisible by 32, got {width}x{height}")
continue
options.img_cond_path = img_cond_path
break
return options
def parse_img_mask_path(options: SamplingOptions | None) -> SamplingOptions | None:
if options is None:
return None
user_question = "Next conditioning mask (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the conditioning mask or write a command starting with a slash:\n"
"- '/q' to quit"
)
while True:
img_mask_path = input(user_question)
if img_mask_path.startswith("/"):
if img_mask_path.startswith("/q"):
print("Quitting")
return None
else:
if not img_mask_path.startswith("/h"):
print(f"Got invalid command '{img_mask_path}'\n{usage}")
print(usage)
continue
if img_mask_path == "":
break
if not os.path.isfile(img_mask_path) or not img_mask_path.lower().endswith(
(".jpg", ".jpeg", ".png", ".webp")
):
print(f"File '{img_mask_path}' does not exist or is not a valid image file")
continue
else:
with Image.open(img_mask_path) as img:
width, height = img.size
if width % 32 != 0 or height % 32 != 0:
print(f"Image dimensions must be divisible by 32, got {width}x{height}")
continue
else:
with Image.open(options.img_cond_path) as img_cond:
img_cond_width, img_cond_height = img_cond.size
if width != img_cond_width or height != img_cond_height:
print(
f"Mask dimensions must match conditioning image, got {width}x{height} and {img_cond_width}x{img_cond_height}"
)
continue
options.img_mask_path = img_mask_path
break
return options
@torch.inference_mode()
def main(
seed: int | None = None,
prompt: str = "a white paper cup",
device: str = "cuda" if torch.cuda.is_available() else "cpu",
num_steps: int = 50,
loop: bool = False,
guidance: float = 30.0,
offload: bool = False,
output_dir: str = "output",
add_sampling_metadata: bool = True,
img_cond_path: str = "assets/cup.png",
img_mask_path: str = "assets/cup_mask.png",
):
"""
Sample the flux model. Either interactively (set `--loop`) or run for a
single image. This demo assumes that the conditioning image and mask have
the same shape and that height and width are divisible by 32.
Args:
seed: Set a seed for sampling
output_name: where to save the output image, `{idx}` will be replaced
by the index of the sample
prompt: Prompt used for sampling
device: Pytorch device
num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)
loop: start an interactive session and sample multiple times
guidance: guidance value used for guidance distillation
add_sampling_metadata: Add the prompt to the image Exif metadata
img_cond_path: path to conditioning image (jpeg/png/webp)
img_mask_path: path to conditioning mask (jpeg/png/webp
"""
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
name = "flux-dev-fill"
if name not in configs:
available = ", ".join(configs.keys())
raise ValueError(f"Got unknown model name: {name}, chose from {available}")
torch_device = torch.device(device)
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
if len(fns) > 0:
idx = max(int(fn.split("_")[-1].split(".")[0]) for fn in fns) + 1
else:
idx = 0
# init all components
t5 = load_t5(torch_device, max_length=128)
clip = load_clip(torch_device)
model = load_flow_model(name, device="cpu" if offload else torch_device)
ae = load_ae(name, device="cpu" if offload else torch_device)
rng = torch.Generator(device="cpu")
with Image.open(img_cond_path) as img:
width, height = img.size
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
img_cond_path=img_cond_path,
img_mask_path=img_mask_path,
)
if loop:
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
with Image.open(opts.img_cond_path) as img:
width, height = img.size
opts.height = height
opts.width = width
opts = parse_img_mask_path(opts)
while opts is not None:
if opts.seed is None:
opts.seed = rng.seed()
print(f"Generating with seed {opts.seed}:\n{opts.prompt}")
t0 = time.perf_counter()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=opts.seed,
)
opts.seed = None
if offload:
t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)
inp = prepare_fill(
t5,
clip,
x,
prompt=opts.prompt,
ae=ae,
img_cond_path=opts.img_cond_path,
mask_path=opts.img_mask_path,
)
timesteps = get_schedule(opts.num_steps, inp["img"].shape[1], shift=(name != "flux-schnell"))
# offload TEs and AE to CPU, load model to gpu
if offload:
t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
# denoise initial noise
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Done in {t1 - t0:.1f}s")
idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)
if loop:
print("-" * 80)
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
with Image.open(opts.img_cond_path) as img:
width, height = img.size
opts.height = height
opts.width = width
opts = parse_img_mask_path(opts)
else:
opts = None
def app():
Fire(main)
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/src/flux/cli_redux.py
================================================
import os
import re
import time
from dataclasses import dataclass
from glob import iglob
import torch
from fire import Fire
from transformers import pipeline
from flux.modules.image_embedders import ReduxImageEncoder
from flux.sampling import denoise, get_noise, get_schedule, prepare_redux, unpack
from flux.ideas import denoise_cache
from flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image
@dataclass
class SamplingOptions:
prompt: str
width: int
height: int
num_steps: int
guidance: float
seed: int | None
img_cond_path: str
def parse_prompt(options: SamplingOptions) -> SamplingOptions | None:
user_question = "Write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Leave this field empty to do nothing "
"or write a command starting with a slash:\n"
"- '/w ' will set the width of the generated image\n"
"- '/h ' will set the height of the generated image\n"
"- '/s ' sets the next seed\n"
"- '/g ' sets the guidance (flux-dev only)\n"
"- '/n ' sets the number of steps\n"
"- '/q' to quit"
)
while (prompt := input(user_question)).startswith("/"):
if prompt.startswith("/w"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, width = prompt.split()
options.width = 16 * (int(width) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/h"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, height = prompt.split()
options.height = 16 * (int(height) // 16)
print(
f"Setting resolution to {options.width} x {options.height} "
f"({options.height *options.width/1e6:.2f}MP)"
)
elif prompt.startswith("/g"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, guidance = prompt.split()
options.guidance = float(guidance)
print(f"Setting guidance to {options.guidance}")
elif prompt.startswith("/s"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, seed = prompt.split()
options.seed = int(seed)
print(f"Setting seed to {options.seed}")
elif prompt.startswith("/n"):
if prompt.count(" ") != 1:
print(f"Got invalid command '{prompt}'\n{usage}")
continue
_, steps = prompt.split()
options.num_steps = int(steps)
print(f"Setting number of steps to {options.num_steps}")
elif prompt.startswith("/q"):
print("Quitting")
return None
else:
if not prompt.startswith("/h"):
print(f"Got invalid command '{prompt}'\n{usage}")
print(usage)
return options
def parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:
if options is None:
return None
user_question = "Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\n"
usage = (
"Usage: Either write your prompt directly, leave this field empty "
"to repeat the conditioning image or write a command starting with a slash:\n"
"- '/q' to quit"
)
while True:
img_cond_path = input(user_question)
if img_cond_path.startswith("/"):
if img_cond_path.startswith("/q"):
print("Quitting")
return None
else:
if not img_cond_path.startswith("/h"):
print(f"Got invalid command '{img_cond_path}'\n{usage}")
print(usage)
continue
if img_cond_path == "":
break
if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(
(".jpg", ".jpeg", ".png", ".webp")
):
print(f"File '{img_cond_path}' does not exist or is not a valid image file")
continue
options.img_cond_path = img_cond_path
break
return options
@torch.inference_mode()
def main(
name: str = "flux-dev",
width: int = 1360,
height: int = 768,
seed: int | None = None,
device: str = "cuda" if torch.cuda.is_available() else "cpu",
num_steps: int | None = None,
loop: bool = False,
guidance: float = 2.5,
offload: bool = False,
output_dir: str = "output",
add_sampling_metadata: bool = True,
img_cond_path: str = "assets/robot.webp",
):
"""
Sample the flux model. Either interactively (set `--loop`) or run for a
single image.
Args:
name: Name of the model to load
height: height of the sample in pixels (should be a multiple of 16)
width: width of the sample in pixels (should be a multiple of 16)
seed: Set a seed for sampling
output_name: where to save the output image, `{idx}` will be replaced
by the index of the sample
prompt: Prompt used for sampling
device: Pytorch device
num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)
loop: start an interactive session and sample multiple times
guidance: guidance value used for guidance distillation
add_sampling_metadata: Add the prompt to the image Exif metadata
img_cond_path: path to conditioning image (jpeg/png/webp)
"""
nsfw_classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection", device=device)
if name not in configs:
available = ", ".join(configs.keys())
raise ValueError(f"Got unknown model name: {name}, chose from {available}")
torch_device = torch.device(device)
if num_steps is None:
num_steps = 4 if name == "flux-schnell" else 50
output_name = os.path.join(output_dir, "img_{idx}.jpg")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
idx = 0
else:
fns = [fn for fn in iglob(output_name.format(idx="*")) if re.search(r"img_[0-9]+\.jpg$", fn)]
if len(fns) > 0:
idx = max(int(fn.split("_")[-1].split(".")[0]) for fn in fns) + 1
else:
idx = 0
# init all components
t5 = load_t5(torch_device, max_length=256 if name == "flux-schnell" else 512)
clip = load_clip(torch_device)
model = load_flow_model(name, device="cpu" if offload else torch_device)
ae = load_ae(name, device="cpu" if offload else torch_device)
img_embedder = ReduxImageEncoder(torch_device)
rng = torch.Generator(device="cpu")
prompt = ""
opts = SamplingOptions(
prompt=prompt,
width=width,
height=height,
num_steps=num_steps,
guidance=guidance,
seed=seed,
img_cond_path=img_cond_path,
)
if loop:
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
while opts is not None:
if opts.seed is None:
opts.seed = rng.seed()
print(f"Generating with seed {opts.seed}:\n{opts.prompt}")
t0 = time.perf_counter()
# prepare input
x = get_noise(
1,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=opts.seed,
)
opts.seed = None
if offload:
ae = ae.cpu()
torch.cuda.empty_cache()
t5, clip = t5.to(torch_device), clip.to(torch_device)
inp = prepare_redux(
t5,
clip,
x,
prompt=opts.prompt,
encoder=img_embedder,
img_cond_path=opts.img_cond_path,
)
timesteps = get_schedule(opts.num_steps, inp["img"].shape[1], shift=(name != "flux-schnell"))
# offload TEs to CPU, load model to gpu
if offload:
t5, clip = t5.cpu(), clip.cpu()
torch.cuda.empty_cache()
model = model.to(torch_device)
# denoise initial noise
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# offload model, load autoencoder to gpu
if offload:
model.cpu()
torch.cuda.empty_cache()
ae.decoder.to(x.device)
# decode latents to pixel space
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Done in {t1 - t0:.1f}s")
idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)
if loop:
print("-" * 80)
opts = parse_prompt(opts)
opts = parse_img_cond_path(opts)
else:
opts = None
def app():
Fire(main)
if __name__ == "__main__":
app()
================================================
FILE: flux-ToCa/src/flux/ideas/__init__.py
================================================
from .cache_denoise import denoise_cache
================================================
FILE: flux-ToCa/src/flux/ideas/cache_denoise.py
================================================
import torch
from ..model import Flux
from torch import Tensor
from ..modules.cache_functions import cache_init
def denoise_cache(
model: Flux,
# model input
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
vec: Tensor,
# sampling parameters
timesteps: list[float],
guidance: float = 4.0,
):
# init cache
cache_dic, current = cache_init(timesteps)
# this is ignored for schnell
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
current['step']=0
current['num_steps'] = len(timesteps)-1
for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
current['t'] = t_curr
#print(t_curr)
pred = model(
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
cache_dic = cache_dic,
current = current,
guidance=guidance_vec,
)
#print(img.shape)
img = img + (t_prev - t_curr) * pred
current['step'] += 1
return img
================================================
FILE: flux-ToCa/src/flux/math.py
================================================
import torch
from einops import rearrange
from torch import Tensor
def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, **kwargs) -> Tensor:
cache_dic = kwargs.get('cache_dic', None)
current = kwargs.get('current', None)
q, k = apply_rope(q, k, pe)
if cache_dic is None:
x, score = dot_product_attention(q, k, v)
#x = torch.nn.functional.scaled_dot_product_attention(q, k, v)
elif cache_dic['cache_type'] == 'attention':
x, score = dot_product_attention(q, k, v)
cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'] = score
else:
#x = torch.nn.functional.scaled_dot_product_attention(q, k, v)
x, score = dot_product_attention(q, k, v) # if you are testing the FLOPs, should change to dot_product_attention
x = rearrange(x, "B H L D -> B L (H D)")
return x
def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
assert dim % 2 == 0
scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
omega = 1.0 / (theta**scale)
out = torch.einsum("...n,d->...nd", pos, omega)
out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)
out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2)
return out.float()
def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:
xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)
xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)
xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)
############################################################################################################
import math
def dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,
is_causal=False, scale=None, enable_gqa=False) -> torch.Tensor | torch.Tensor:
L, S = query.size(-2), key.size(-2)
scale_factor = 1 / math.sqrt(query.size(-1)) if scale is None else scale
attn_bias = torch.zeros(L, S, dtype=query.dtype, device=query.device)
if is_causal:
assert attn_mask is None
temp_mask = torch.ones(L, S, dtype=torch.bool).tril(diagonal=0)
attn_bias.masked_fill_(temp_mask.logical_not(), float("-inf"))
attn_bias.to(query.dtype)
if attn_mask is not None:
if attn_mask.dtype == torch.bool:
attn_bias.masked_fill_(attn_mask.logical_not(), float("-inf"))
else:
attn_bias += attn_mask
if enable_gqa:
key = key.repeat_interleave(query.size(-3)//key.size(-3), -3)
value = value.repeat_interleave(query.size(-3)//value.size(-3), -3)
#attn_weight = query @ key.transpose(-2, -1) * scale_factor
attn_weight = torch.matmul(query, key.transpose(-2, -1))* scale_factor
attn_weight += attn_bias
#attn_weight = torch.softmax(attn_weight, dim=-1)
#attn_weight = torch.dropout(attn_weight, dropout_p, train=True)
#
#return torch.matmul(attn_weight, value)
attn_map = torch.softmax(attn_weight, dim=-1)
attn_weight = torch.dropout(attn_map, dropout_p, train=True)
#return attn_weight @ value, attn_map.mean(dim=1).mean(dim=1)
return torch.matmul(attn_weight, value), attn_map.mean(dim=1).mean(dim=1)
================================================
FILE: flux-ToCa/src/flux/model.py
================================================
from dataclasses import dataclass
import torch
from torch import Tensor, nn
from flux.modules.layers import (
DoubleStreamBlock,
EmbedND,
LastLayer,
MLPEmbedder,
SingleStreamBlock,
timestep_embedding,
)
from flux.modules.lora import LinearLora, replace_linear_with_lora
from flux.modules.cache_functions import cal_type
@dataclass
class FluxParams:
in_channels: int
out_channels: int
vec_in_dim: int
context_in_dim: int
hidden_size: int
mlp_ratio: float
num_heads: int
depth: int
depth_single_blocks: int
axes_dim: list[int]
theta: int
qkv_bias: bool
guidance_embed: bool
class Flux(nn.Module):
"""
Transformer model for flow matching on sequences.
"""
def __init__(self, params: FluxParams):
super().__init__()
self.params = params
self.in_channels = params.in_channels
self.out_channels = params.out_channels
if params.hidden_size % params.num_heads != 0:
raise ValueError(
f"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}"
)
pe_dim = params.hidden_size // params.num_heads
if sum(params.axes_dim) != pe_dim:
raise ValueError(f"Got {params.axes_dim} but expected positional dim {pe_dim}")
self.hidden_size = params.hidden_size
self.num_heads = params.num_heads
self.pe_embedder = EmbedND(dim=pe_dim, theta=params.theta, axes_dim=params.axes_dim)
self.img_in = nn.Linear(self.in_channels, self.hidden_size, bias=True)
self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)
self.vector_in = MLPEmbedder(params.vec_in_dim, self.hidden_size)
self.guidance_in = (
MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if params.guidance_embed else nn.Identity()
)
self.txt_in = nn.Linear(params.context_in_dim, self.hidden_size)
self.double_blocks = nn.ModuleList(
[
DoubleStreamBlock(
self.hidden_size,
self.num_heads,
mlp_ratio=params.mlp_ratio,
qkv_bias=params.qkv_bias,
)
for _ in range(params.depth)
]
)
self.single_blocks = nn.ModuleList(
[
SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio)
for _ in range(params.depth_single_blocks)
]
)
self.final_layer = LastLayer(self.hidden_size, 1, self.out_channels)
def forward(
self,
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
timesteps: Tensor,
y: Tensor,
guidance: Tensor | None = None,
*args,
**kwargs,
) -> Tensor:
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
cache_dic = kwargs.get('cache_dic', None)
current = kwargs.get('current', None)
# running on sequences img
img = self.img_in(img)
vec = self.time_in(timestep_embedding(timesteps, 256))
if self.params.guidance_embed:
if guidance is None:
raise ValueError("Didn't get guidance strength for guidance distilled model.")
vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
vec = vec + self.vector_in(y)
txt = self.txt_in(txt)
ids = torch.cat((txt_ids, img_ids), dim=1)
pe = self.pe_embedder(ids)
cal_type(cache_dic=cache_dic, current=current)
for i, block in enumerate(self.double_blocks):
current['layer'] = i
img, txt = block(img=img, txt=txt, vec=vec, pe=pe, cache_dic=cache_dic, current=current)
img = torch.cat((txt, img), 1)
for i, block in enumerate(self.single_blocks):
current['layer'] = i
img = block(img, vec=vec, pe=pe, cache_dic=cache_dic, current=current)
img = img[:, txt.shape[1] :, ...]
img = self.final_layer(img, vec) # (N, T, patch_size ** 2 * out_channels)
return img
class FluxLoraWrapper(Flux):
def __init__(
self,
lora_rank: int = 128,
lora_scale: float = 1.0,
*args,
**kwargs,
) -> None:
super().__init__(*args, **kwargs)
self.lora_rank = lora_rank
replace_linear_with_lora(
self,
max_rank=lora_rank,
scale=lora_scale,
)
def set_lora_scale(self, scale: float) -> None:
for module in self.modules():
if isinstance(module, LinearLora):
module.set_scale(scale=scale)
================================================
FILE: flux-ToCa/src/flux/modules/autoencoder.py
================================================
from dataclasses import dataclass
import torch
from einops import rearrange
from torch import Tensor, nn
@dataclass
class AutoEncoderParams:
resolution: int
in_channels: int
ch: int
out_ch: int
ch_mult: list[int]
num_res_blocks: int
z_channels: int
scale_factor: float
shift_factor: float
def swish(x: Tensor) -> Tensor:
return x * torch.sigmoid(x)
class AttnBlock(nn.Module):
def __init__(self, in_channels: int):
super().__init__()
self.in_channels = in_channels
self.norm = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)
self.q = nn.Conv2d(in_channels, in_channels, kernel_size=1)
self.k = nn.Conv2d(in_channels, in_channels, kernel_size=1)
self.v = nn.Conv2d(in_channels, in_channels, kernel_size=1)
self.proj_out = nn.Conv2d(in_channels, in_channels, kernel_size=1)
def attention(self, h_: Tensor) -> Tensor:
h_ = self.norm(h_)
q = self.q(h_)
k = self.k(h_)
v = self.v(h_)
b, c, h, w = q.shape
q = rearrange(q, "b c h w -> b 1 (h w) c").contiguous()
k = rearrange(k, "b c h w -> b 1 (h w) c").contiguous()
v = rearrange(v, "b c h w -> b 1 (h w) c").contiguous()
h_ = nn.functional.scaled_dot_product_attention(q, k, v)
return rearrange(h_, "b 1 (h w) c -> b c h w", h=h, w=w, c=c, b=b)
def forward(self, x: Tensor) -> Tensor:
return x + self.proj_out(self.attention(x))
class ResnetBlock(nn.Module):
def __init__(self, in_channels: int, out_channels: int):
super().__init__()
self.in_channels = in_channels
out_channels = in_channels if out_channels is None else out_channels
self.out_channels = out_channels
self.norm1 = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
self.norm2 = nn.GroupNorm(num_groups=32, num_channels=out_channels, eps=1e-6, affine=True)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
if self.in_channels != self.out_channels:
self.nin_shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
def forward(self, x):
h = x
h = self.norm1(h)
h = swish(h)
h = self.conv1(h)
h = self.norm2(h)
h = swish(h)
h = self.conv2(h)
if self.in_channels != self.out_channels:
x = self.nin_shortcut(x)
return x + h
class Downsample(nn.Module):
def __init__(self, in_channels: int):
super().__init__()
# no asymmetric padding in torch conv, must do it ourselves
self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=2, padding=0)
def forward(self, x: Tensor):
pad = (0, 1, 0, 1)
x = nn.functional.pad(x, pad, mode="constant", value=0)
x = self.conv(x)
return x
class Upsample(nn.Module):
def __init__(self, in_channels: int):
super().__init__()
self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)
def forward(self, x: Tensor):
x = nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")
x = self.conv(x)
return x
class Encoder(nn.Module):
def __init__(
self,
resolution: int,
in_channels: int,
ch: int,
ch_mult: list[int],
num_res_blocks: int,
z_channels: int,
):
super().__init__()
self.ch = ch
self.num_resolutions = len(ch_mult)
self.num_res_blocks = num_res_blocks
self.resolution = resolution
self.in_channels = in_channels
# downsampling
self.conv_in = nn.Conv2d(in_channels, self.ch, kernel_size=3, stride=1, padding=1)
curr_res = resolution
in_ch_mult = (1,) + tuple(ch_mult)
self.in_ch_mult = in_ch_mult
self.down = nn.ModuleList()
block_in = self.ch
for i_level in range(self.num_resolutions):
block = nn.ModuleList()
attn = nn.ModuleList()
block_in = ch * in_ch_mult[i_level]
block_out = ch * ch_mult[i_level]
for _ in range(self.num_res_blocks):
block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))
block_in = block_out
down = nn.Module()
down.block = block
down.attn = attn
if i_level != self.num_resolutions - 1:
down.downsample = Downsample(block_in)
curr_res = curr_res // 2
self.down.append(down)
# middle
self.mid = nn.Module()
self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)
self.mid.attn_1 = AttnBlock(block_in)
self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)
# end
self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)
self.conv_out = nn.Conv2d(block_in, 2 * z_channels, kernel_size=3, stride=1, padding=1)
def forward(self, x: Tensor) -> Tensor:
# downsampling
hs = [self.conv_in(x)]
for i_level in range(self.num_resolutions):
for i_block in range(self.num_res_blocks):
h = self.down[i_level].block[i_block](hs[-1])
if len(self.down[i_level].attn) > 0:
h = self.down[i_level].attn[i_block](h)
hs.append(h)
if i_level != self.num_resolutions - 1:
hs.append(self.down[i_level].downsample(hs[-1]))
# middle
h = hs[-1]
h = self.mid.block_1(h)
h = self.mid.attn_1(h)
h = self.mid.block_2(h)
# end
h = self.norm_out(h)
h = swish(h)
h = self.conv_out(h)
return h
class Decoder(nn.Module):
def __init__(
self,
ch: int,
out_ch: int,
ch_mult: list[int],
num_res_blocks: int,
in_channels: int,
resolution: int,
z_channels: int,
):
super().__init__()
self.ch = ch
self.num_resolutions = len(ch_mult)
self.num_res_blocks = num_res_blocks
self.resolution = resolution
self.in_channels = in_channels
self.ffactor = 2 ** (self.num_resolutions - 1)
# compute in_ch_mult, block_in and curr_res at lowest res
block_in = ch * ch_mult[self.num_resolutions - 1]
curr_res = resolution // 2 ** (self.num_resolutions - 1)
self.z_shape = (1, z_channels, curr_res, curr_res)
# z to block_in
self.conv_in = nn.Conv2d(z_channels, block_in, kernel_size=3, stride=1, padding=1)
# middle
self.mid = nn.Module()
self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)
self.mid.attn_1 = AttnBlock(block_in)
self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)
# upsampling
self.up = nn.ModuleList()
for i_level in reversed(range(self.num_resolutions)):
block = nn.ModuleList()
attn = nn.ModuleList()
block_out = ch * ch_mult[i_level]
for _ in range(self.num_res_blocks + 1):
block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))
block_in = block_out
up = nn.Module()
up.block = block
up.attn = attn
if i_level != 0:
up.upsample = Upsample(block_in)
curr_res = curr_res * 2
self.up.insert(0, up) # prepend to get consistent order
# end
self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)
self.conv_out = nn.Conv2d(block_in, out_ch, kernel_size=3, stride=1, padding=1)
def forward(self, z: Tensor) -> Tensor:
# z to block_in
h = self.conv_in(z)
# middle
h = self.mid.block_1(h)
h = self.mid.attn_1(h)
h = self.mid.block_2(h)
# upsampling
for i_level in reversed(range(self.num_resolutions)):
for i_block in range(self.num_res_blocks + 1):
h = self.up[i_level].block[i_block](h)
if len(self.up[i_level].attn) > 0:
h = self.up[i_level].attn[i_block](h)
if i_level != 0:
h = self.up[i_level].upsample(h)
# end
h = self.norm_out(h)
h = swish(h)
h = self.conv_out(h)
return h
class DiagonalGaussian(nn.Module):
def __init__(self, sample: bool = True, chunk_dim: int = 1):
super().__init__()
self.sample = sample
self.chunk_dim = chunk_dim
def forward(self, z: Tensor) -> Tensor:
mean, logvar = torch.chunk(z, 2, dim=self.chunk_dim)
if self.sample:
std = torch.exp(0.5 * logvar)
return mean + std * torch.randn_like(mean)
else:
return mean
class AutoEncoder(nn.Module):
def __init__(self, params: AutoEncoderParams):
super().__init__()
self.encoder = Encoder(
resolution=params.resolution,
in_channels=params.in_channels,
ch=params.ch,
ch_mult=params.ch_mult,
num_res_blocks=params.num_res_blocks,
z_channels=params.z_channels,
)
self.decoder = Decoder(
resolution=params.resolution,
in_channels=params.in_channels,
ch=params.ch,
out_ch=params.out_ch,
ch_mult=params.ch_mult,
num_res_blocks=params.num_res_blocks,
z_channels=params.z_channels,
)
self.reg = DiagonalGaussian()
self.scale_factor = params.scale_factor
self.shift_factor = params.shift_factor
def encode(self, x: Tensor) -> Tensor:
z = self.reg(self.encoder(x))
z = self.scale_factor * (z - self.shift_factor)
return z
def decode(self, z: Tensor) -> Tensor:
z = z / self.scale_factor + self.shift_factor
return self.decoder(z)
def forward(self, x: Tensor) -> Tensor:
return self.decode(self.encode(x))
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/__init__.py
================================================
from .cache_cutfresh import cache_cutfresh
from .fresh_ratio_scheduler import fresh_ratio_scheduler
from .score_evaluate import score_evaluate
from .global_force_fresh import global_force_fresh
from .cache_cutfresh import cache_cutfresh
from .update_cache import update_cache
from .force_init import force_init
from .attention import cached_attention_forward
from .cache_init import cache_init
from .cal_type import cal_type
from .force_scheduler import force_scheduler
from .support_set_selection import support_set_selection
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/attention.py
================================================
# Besides, re-arrange the attention module
from torch.jit import Final
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Union
#from xformers.ops.fmha.attn_bias import BlockDiagonalMask
def cached_attention_forward(
query: torch.Tensor,
key: torch.Tensor,
value: torch.Tensor,
#attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,
attn_bias,
p: float = 0.0,
scale: Optional[float] = None
) -> torch.Tensor:
scale = 1.0 / query.shape[-1] ** 0.5
query = query * scale
query = query.transpose(1, 2)
key = key.transpose(1, 2)
value = value.transpose(1, 2)
attn = query @ key.transpose(-2, -1)
if attn_bias is not None:
attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)
attn = attn + attn_bias
#out_map = attn
attn_map = attn.softmax(-1)
attn = F.dropout(attn_map, p)
attn = attn @ value
return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/cache_cutfresh.py
================================================
from .fresh_ratio_scheduler import fresh_ratio_scheduler
from .score_evaluate import score_evaluate
#from .token_merge import token_merge
from .support_set_selection import support_set_selection
import torch
def cache_cutfresh(cache_dic, tokens, current):
'''
Cut fresh tokens from the input tokens and update the cache counter.
cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.
tokens: torch.Tensor, the input tokens to be cut.
current: dict, the current step, layer, and module information. Particularly convenient for debugging.
'''
step = current['step']
layer = current['layer']
stream = current['stream']
module = current['module']
fresh_ratio = fresh_ratio_scheduler(cache_dic, current)
fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1)
# Generate the index tensor for fresh tokens
score = score_evaluate(cache_dic, tokens, current) # s1, s2, s3 mentioned in the paper
#score = local_selection_with_bonus(score, 0.4, 4) # Uniform Spatial Distribution s4 mentioned in the paper
indices = score.argsort(dim=-1, descending=True)
topk = int(fresh_ratio * score.shape[1])
fresh_indices = indices[:, :topk]
stale_indices = indices[:, topk:]
#fresh_indices = support_set_selection(tokens, fresh_ratio, 0.4, current, cache_dic) # (B, fresh_ratio * N) # 0.4
# (B, fresh_ratio *N)
# Updating the Cache Frequency Score s3 mentioned in the paper
# stale tokens index + 1 in each ***module***, fresh tokens index = 0
cache_dic['cache_index'][-1][layer][module] += 1
cache_dic['cache_index'][-1][layer][module].scatter_(dim=1, index=fresh_indices,
src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))
#cache_dic['cache_index']['layer_index'][module] += 1
#cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices,
# src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))
fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])
fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)
return fresh_indices, fresh_tokens
def local_selection_with_bonus(score, bonus_ratio, grid_size=2):
batch_size, num_tokens = score.shape
image_size = int(num_tokens ** 0.5)
block_size = grid_size * grid_size
assert num_tokens % block_size == 0, "The number of tokens must be divisible by the block size."
# Step 1: Reshape score to group it by blocks
score_reshaped = score.view(batch_size, image_size // grid_size, grid_size, image_size // grid_size, grid_size)
score_reshaped = score_reshaped.permute(0, 1, 3, 2, 4).contiguous()
score_reshaped = score_reshaped.view(batch_size, -1, block_size) # [batch_size, num_blocks, block_size]
# Step 2: Find the max token in each block
max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True) # [batch_size, num_blocks, 1]
# Step 3: Create a mask to identify max score tokens
mask = torch.zeros_like(score_reshaped)
mask.scatter_(-1, max_indices, 1) # Set mask to 1 at the max indices
# Step 4: Apply the bonus only to the max score tokens
score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio) # Apply bonus only to max tokens
# Step 5: Reshape the score back to its original shape
score_modified = score_reshaped.view(batch_size, image_size // grid_size, image_size // grid_size, grid_size, grid_size)
score_modified = score_modified.permute(0, 1, 3, 2, 4).contiguous()
score_modified = score_modified.view(batch_size, num_tokens)
return score_modified
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/cache_init.py
================================================
def cache_init(timesteps, model_kwargs=None):
'''
Initialization for cache.
'''
cache_dic = {}
cache = {}
cache_index = {}
cache[-1]={}
cache_index[-1]={}
cache_index['layer_index']={}
cache_dic['attn_map'] = {}
cache_dic['attn_map'][-1] = {}
cache_dic['attn_map'][-1]['double_stream'] = {}
cache_dic['attn_map'][-1]['single_stream'] = {}
cache_dic['k-norm'] = {}
cache_dic['k-norm'][-1] = {}
cache_dic['k-norm'][-1]['double_stream'] = {}
cache_dic['k-norm'][-1]['single_stream'] = {}
cache_dic['v-norm'] = {}
cache_dic['v-norm'][-1] = {}
cache_dic['v-norm'][-1]['double_stream'] = {}
cache_dic['v-norm'][-1]['single_stream'] = {}
cache_dic['cross_attn_map'] = {}
cache_dic['cross_attn_map'][-1] = {}
cache[-1]['double_stream']={}
cache[-1]['single_stream']={}
cache_dic['cache_counter'] = 0
for j in range(19):
cache[-1]['double_stream'][j] = {}
cache_index[-1][j] = {}
cache_dic['attn_map'][-1]['double_stream'][j] = {}
cache_dic['attn_map'][-1]['double_stream'][j]['total'] = {}
cache_dic['attn_map'][-1]['double_stream'][j]['txt_mlp'] = {}
cache_dic['attn_map'][-1]['double_stream'][j]['img_mlp'] = {}
cache_dic['k-norm'][-1]['double_stream'][j] = {}
cache_dic['k-norm'][-1]['double_stream'][j]['txt_mlp'] = {}
cache_dic['k-norm'][-1]['double_stream'][j]['img_mlp'] = {}
cache_dic['v-norm'][-1]['double_stream'][j] = {}
cache_dic['v-norm'][-1]['double_stream'][j]['txt_mlp'] = {}
cache_dic['v-norm'][-1]['double_stream'][j]['img_mlp'] = {}
for j in range(38):
cache[-1]['single_stream'][j] = {}
cache_index[-1][j] = {}
cache_dic['attn_map'][-1]['single_stream'][j] = {}
cache_dic['attn_map'][-1]['single_stream'][j]['total'] = {}
cache_dic['k-norm'][-1]['single_stream'][j] = {}
cache_dic['k-norm'][-1]['single_stream'][j]['total'] = {}
cache_dic['v-norm'][-1]['single_stream'][j] = {}
cache_dic['v-norm'][-1]['single_stream'][j]['total'] = {}
mode = 'ToCa'
if mode == 'original':
cache_dic['cache_type'] = 'random' # model_kwargs['cache_type'] # no use
cache_dic['cache_index'] = cache_index
cache_dic['cache'] = cache
cache_dic['fresh_ratio_schedule'] = 'ToCa' # model_kwargs['ratio_scheduler']
cache_dic['fresh_ratio'] = 0.0 # model_kwargs['fresh_ratio']
cache_dic['fresh_threshold'] = 1 # model_kwargs['fresh_threshold']
cache_dic['force_fresh'] = 'global' # model_kwargs['force_fresh']
cache_dic['soft_fresh_weight'] = 0.0 # model_kwargs['soft_fresh_weight']
elif mode == 'ToCa':
cache_dic['cache_type'] = 'attention' # Attention cache type for ToCa, use Self-Attention Weight to evaluate the importance of each token
cache_dic['cache_index'] = cache_index
cache_dic['cache'] = cache
cache_dic['fresh_ratio_schedule'] = 'ToCa'
cache_dic['fresh_ratio'] = 0.1
cache_dic['fresh_threshold'] = 4
cache_dic['force_fresh'] = 'global'
cache_dic['soft_fresh_weight'] = 0.25
current = {}
current['final_time'] = timesteps[-2]
return cache_dic, current
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/cal_type.py
================================================
from .force_scheduler import force_scheduler
def cal_type(cache_dic, current):
'''
Determine calculation type for this step
'''
if cache_dic['fresh_ratio'] == 0.0:
# FORA: Uniform
first_step = (current['step'] == 0)
else:
# ToCa: First 3 steps enhanced
first_step = (current['step'] <= 2)
force_fresh = cache_dic['force_fresh']
if not first_step:
fresh_interval = cache_dic['cal_threshold']
else:
fresh_interval = cache_dic['fresh_threshold']
if (first_step) or (cache_dic['cache_counter'] == fresh_interval - 1 ):
current['type'] = 'full'
cache_dic['cache_counter'] = 0
force_scheduler(cache_dic, current)
# ToCa
else:
cache_dic['cache_counter'] += 1
current['type'] = 'ToCa'
######################################################################
#if (current['step'] in [3,2,1,0]):
# current['type'] = 'full'
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/force_init.py
================================================
import torch
def force_init(cache_dic, current, tokens):
'''
Initialization for Force Activation step.
'''
cache_dic['cache_index'][-1][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)
#if current['layer'] == 0:
# cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/force_scheduler.py
================================================
import torch
def force_scheduler(cache_dic, current):
if cache_dic['fresh_ratio'] == 0:
# FORA
linear_step_weight = 0.0
else:
# TokenCache
linear_step_weight = 0.0
step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])
threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)
# no force constrain for sensitive steps, cause the performance is good enough.
# you may have a try.
cache_dic['cal_threshold'] = threshold
#return threshold
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/fresh_ratio_scheduler.py
================================================
import torch
def fresh_ratio_scheduler(cache_dic, current):
'''
Return the fresh ratio for the current step.
'''
fresh_ratio = cache_dic['fresh_ratio']
fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']
step = current['step']
num_steps = current['num_steps']
threshold = cache_dic['fresh_threshold']
weight = 0.9
if fresh_ratio_schedule == 'constant':
return fresh_ratio
elif fresh_ratio_schedule == 'linear':
return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)
elif fresh_ratio_schedule == 'exp':
#return 0.5 * (0.052 ** (step/num_steps))
return fresh_ratio * (weight ** (step / num_steps))
elif fresh_ratio_schedule == 'linear-mode':
mode = (step % threshold)/threshold - 0.5
mode_weight = 0.1
return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)
elif fresh_ratio_schedule == 'layerwise':
return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)
elif fresh_ratio_schedule == 'linear-layerwise':
step_weight = -0.9 #0.9
step_factor = 1 - step_weight + 2 * step_weight * step / num_steps
#if current['layer'] == 2:
# return 1.0
#sigmoid
#sigmoid_weight = 0.13
#layer_factor = 2 * torch.sigmoid(torch.tensor([sigmoid_weight * (13.5 - current['layer'])]))
layer_weight = 0.6
layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27
module_weight = 1.0 #TokenCache N=8 2.5 N=6 2.5 #N=4 2.1
module_time_weight = 0.6
module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)
return fresh_ratio * layer_factor * step_factor * module_factor
elif fresh_ratio_schedule == 'ToCa':
step_weight = 0.0 #0.9
step_factor = 1 - step_weight + 2 * step_weight * step / num_steps
layer_weight = 0.5
layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27
#module_weight = 1.0
#module_time_weight = 0.6
# this means 60*x% cross-attn computation, and 160*x% mlp computation. This is designed for cross-attn has best temporal redundancy, and mlp has worse.
# so cross-attn compute less and mlp compute more.
#module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)
stream_weight = 0.6
stream_factor = (1 - stream_weight) if current['stream']=='double_stream' else (1 + stream_weight)
return fresh_ratio * layer_factor * step_factor * stream_factor #* module_factor
else:
raise ValueError("unrecognized fresh ratio schedule", fresh_ratio_schedule)
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/global_force_fresh.py
================================================
from .force_scheduler import force_scheduler
def global_force_fresh(cache_dic, current):
'''
Return whether to force fresh tokens globally.
'''
first_step = (current['step'] == 0)
second_step = (current['step'] == 1)
force_fresh = cache_dic['force_fresh']
if not first_step:
fresh_threshold = cache_dic['cal_threshold']
else:
fresh_threshold = cache_dic['fresh_threshold']
if force_fresh == 'global':
return (first_step or (current['step']% fresh_threshold == 0))
elif force_fresh == 'local':
return first_step
elif force_fresh == 'none':
return first_step
else:
raise ValueError("unrecognized force fresh strategy", force_fresh)
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/score_evaluate.py
================================================
import torch
import torch.nn as nn
from .scores import attn_score, similarity_score, norm_score, k_norm_score, v_norm_score
def score_evaluate(cache_dic, tokens, current) -> torch.Tensor:
'''
Return the score tensor (B, N) for the given tokens.
'''
#if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):
# # abandoned branch, if you want to explore the local force fresh strategy, this may help.
# force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module
# force_len = force_fresh_mask.sum(dim=1)
# force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]
# force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]
# Just see more explanation in the version of DiT-ToCa if needed.
if cache_dic['cache_type'] == 'random':
score = torch.rand(tokens.shape[0], tokens.shape[1], device=tokens.device)
elif cache_dic['cache_type'] == 'straight':
score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)
elif cache_dic['cache_type'] == 'attention':
# cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed
score = attn_score(cache_dic, current)
#score = score + 0.0 * torch.rand_like(score, device= score.device)
elif cache_dic['cache_type'] == 'similarity':
score = similarity_score(cache_dic, current, tokens)
elif cache_dic['cache_type'] == 'norm':
score = norm_score(cache_dic, current, tokens)
elif cache_dic['cache_type'] == 'k-norm':
score = k_norm_score(cache_dic, current)
elif cache_dic['cache_type'] == 'v-norm':
score = v_norm_score(cache_dic, current)
elif cache_dic['cache_type'] == 'compress':
score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])
score1 = torch.cat([score1, score1], dim=0).to(tokens.device)
score2 = cache_dic['attn_map'][-1][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)
# normalize
score2 = score2 / score2.max(dim=1, keepdim=True)[0]
score = 0.5 * score1 + 0.5 * score2
# abandoned the branch, if you want to explore the local force fresh strategy, this may help.
#if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed
# #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)
# score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32,
# device=force_indices.device))
if (True and (cache_dic['force_fresh'] == 'global')):
soft_step_score = cache_dic['cache_index'][-1][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])
#soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)
score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score
return score.to(tokens.device)
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/scores.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
def attn_score(cache_dic, current):
#self_attn_score = 1- cache_dic['attn_map'][-1][current['layer']].diagonal(dim1=1, dim2=2)
#self_attn_score = F.normalize(self_attn_score, dim=1, p=2)
#attention_score = F.normalize(cache_dic['attn_map'][-1][current['layer']].sum(dim=1), dim=1, p=2)
#cross_attn_map = F.threshold(cache_dic['cross_attn_map'][-1][current['layer']],threshold=0.0, value=0.0)
#cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)
# Note: It is important to give a same selection method for cfg and no cfg.
# Because the influence of **Cross-Attention** in text-contidional models makes cfg and no cfg a BIG difference.
# Same selection for cfg and no cfg
#cond_cmap, uncond_cmap = torch.split(cache_dic['attn_map'][-1][current['layer']], len(cache_dic['cross_attn_map'][-1][current['layer']]) // 2, dim=0)
#cond_weight = 0.5
#cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap
## Entropy score
#cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)
#cross_attention_score = F.normalize(1 + cross_attention_entropy, dim=1, p=2) # Note here "1" does not influence the sorted sequence, but provie stability.
#score = cross_attention_score.repeat(2, 1)
if current['stream'] == 'double_stream':
score = F.normalize(cache_dic['attn_map'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)
elif current['stream'] == 'single_stream':
score = F.normalize(cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)
# You can try conbining the self_attention_score (s1) and cross_attention_score (s2) as the final score, there exists a balance.
#cross_weight = 0.0
#score = (1-cross_weight) * attention_score + cross_weight * cross_attention_score
return score
def similarity_score(cache_dic, current, tokens):
cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][-1][current['layer']][current['module']], dim=-1)
return F.normalize(1- cosine_sim, dim=-1, p=2)
def norm_score(cache_dic, current, tokens):
norm = tokens.norm(dim=-1, p=2)
return F.normalize(norm, dim=-1, p=2)
def kv_norm_score(cache_dic, current):
# (B, N, num_heads)
#cond_k_norm, uncond_k_norm = torch.split(cache_dic['cache'][-1][current['layer']]['k_norm'], len(cache_dic['cache'][-1][current['layer']]['k_norm']) // 2, dim=0)
cond_v_norm, uncond_v_norm = torch.split(cache_dic['cache'][-1][current['layer']]['v_norm'], len(cache_dic['cache'][-1][current['layer']]['v_norm']) // 2, dim=0)
cond_weight = 0.5
#k_norm = cond_weight * cond_k_norm + (1 - cond_weight) * uncond_k_norm
v_norm = cond_weight * cond_v_norm + (1 - cond_weight) * uncond_v_norm
kv_norm = 1 -v_norm
## 计算 (B/2, N) 张量在 N 维度上的每个元素与均值的绝对值差
#kv_norm_mean = kv_norm.mean(dim=-2, keepdim=True)
#kv_norm_diff = torch.abs(kv_norm - kv_norm_mean)
return F.normalize(kv_norm.sum(dim=-1), p=2).repeat(2, 1)
def k_norm_score(cache_dic, current):
# (B, N)
if current['stream'] == 'double_stream':
score = F.normalize(cache_dic['k-norm'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)
elif current['stream'] == 'single_stream':
score = F.normalize(cache_dic['k-norm'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)
return score
def v_norm_score(cache_dic, current):
# (B, N)
if current['stream'] == 'double_stream':
score = F.normalize(cache_dic['v-norm'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)
elif current['stream'] == 'single_stream':
score = F.normalize(cache_dic['v-norm'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)
return score
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/support_set_selection.py
================================================
import torch
from typing import Dict
def support_set_selection(x: torch.Tensor, fresh_ratio: float, base_ratio: float, current: Dict, cache_dic: Dict) -> torch.Tensor:
#selection_start = 0
#
#if current['stream'] == 'single_stream':
# # only select from the img tokens
# x = x[:, cache_dic['txt_shape'] :]
# selection_start = cache_dic['txt_shape']
B, N, H = x.shape
num_total = int(fresh_ratio * N) # 最终每个 batch 选取的 token 数
base_count = int(base_ratio * num_total) # 随机选取的 token 数
#base_count = 1
add_count = num_total - base_count # 需要从候选集中选取的 token 数
# 1. 随机选取 (B, base_count) 个 token
random_indices = torch.randperm(N, device=x.device)
base_indices = random_indices[:base_count]
other_indices = random_indices[base_count:]
base_tokens = x.gather(dim=1, index=base_indices.unsqueeze(-1).expand(B, -1, H))
#other_tokens = x.gather(dim=1, index=other_indices.unsqueeze(-1).expand(-1, -1, H))
# 2. 计算余下 token 与已选 token 的相似度
# normaize
base_tokens = base_tokens / base_tokens.norm(dim=-1, keepdim=True)
#other_tokens = other_tokens / other_tokens.norm(dim=-1, keepdim=True)
x_norm = x / x.norm(dim=-1, keepdim=True)
# 计算余下 token 与已选 token 的相似度
similarity = torch.einsum('bnd,bmd->bnm', base_tokens, x_norm)
# 计算每列最小值
min_similarity = similarity.min(dim=1).values
#min_similarity = similarity.max(dim=1).values
# 3. 选取相似度最小的 token
_, min_indices = min_similarity.topk(add_count, largest=False)
#_, min_indices = min_similarity.topk(add_count, largest=True)
# 4. 合并 base_indices 和 min_indices
#indices = torch.cat([base_indices, other_indices[min_indices]], dim=-1)
indices = torch.cat([base_indices.expand(B, -1), min_indices], dim=-1) #+ selection_start
return indices
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/token_merge.py
================================================
import torch
def token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):
'''
An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.
'''
if (current['layer'] % 1 == 0):
fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))
stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))
method = 'similarity'
if method == 'distance':
descending = False
distance = torch.cdist(stale_tokens, fresh_tokens, p=1)
stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)
elif method == 'similarity':
descending = True
fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)
stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)
similarity = stale_tokens @ fresh_tokens.transpose(1, 2)
stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)
saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())
merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]
stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)
merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)
merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)
cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices
cache_dic['merged_stale_sequence'] = merged_stale_sequence
================================================
FILE: flux-ToCa/src/flux/modules/cache_functions/update_cache.py
================================================
import torch
def update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):
'''
Update the cache with the fresh tokens.
'''
step = current['step']
layer = current['layer']
module = current['module']
# Update the cached tokens at the positions
indices = fresh_indices
cache_dic['cache'][-1][current['stream']][current['layer']][current['module']].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)
================================================
FILE: flux-ToCa/src/flux/modules/conditioner.py
================================================
from torch import Tensor, nn
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
class HFEmbedder(nn.Module):
def __init__(self, version: str, max_length: int, **hf_kwargs):
super().__init__()
self.is_clip = "openai" in version
self.max_length = max_length
self.output_key = "pooler_output" if self.is_clip else "last_hidden_state"
if self.is_clip:
self.tokenizer: CLIPTokenizer = CLIPTokenizer.from_pretrained(version, max_length=max_length)
self.hf_module: CLIPTextModel = CLIPTextModel.from_pretrained(version, **hf_kwargs)
else:
self.tokenizer: T5Tokenizer = T5Tokenizer.from_pretrained(version, max_length=max_length)
self.hf_module: T5EncoderModel = T5EncoderModel.from_pretrained(version, **hf_kwargs)
self.hf_module = self.hf_module.eval().requires_grad_(False)
def forward(self, text: list[str]) -> Tensor:
batch_encoding = self.tokenizer(
text,
truncation=True,
max_length=self.max_length,
return_length=False,
return_overflowing_tokens=False,
padding="max_length",
return_tensors="pt",
)
outputs = self.hf_module(
input_ids=batch_encoding["input_ids"].to(self.hf_module.device),
attention_mask=None,
output_hidden_states=False,
)
return outputs[self.output_key]
================================================
FILE: flux-ToCa/src/flux/modules/image_embedders.py
================================================
import os
import cv2
import numpy as np
import torch
from einops import rearrange, repeat
from PIL import Image
from safetensors.torch import load_file as load_sft
from torch import nn
from transformers import AutoModelForDepthEstimation, AutoProcessor, SiglipImageProcessor, SiglipVisionModel
from flux.util import print_load_warning
class DepthImageEncoder:
depth_model_name = "LiheYoung/depth-anything-large-hf"
def __init__(self, device):
self.device = device
self.depth_model = AutoModelForDepthEstimation.from_pretrained(self.depth_model_name).to(device)
self.processor = AutoProcessor.from_pretrained(self.depth_model_name)
def __call__(self, img: torch.Tensor) -> torch.Tensor:
hw = img.shape[-2:]
img = torch.clamp(img, -1.0, 1.0)
img_byte = ((img + 1.0) * 127.5).byte()
img = self.processor(img_byte, return_tensors="pt")["pixel_values"]
depth = self.depth_model(img.to(self.device)).predicted_depth
depth = repeat(depth, "b h w -> b 3 h w")
depth = torch.nn.functional.interpolate(depth, hw, mode="bicubic", antialias=True)
depth = depth / 127.5 - 1.0
return depth
class CannyImageEncoder:
def __init__(
self,
device,
min_t: int = 50,
max_t: int = 200,
):
self.device = device
self.min_t = min_t
self.max_t = max_t
def __call__(self, img: torch.Tensor) -> torch.Tensor:
assert img.shape[0] == 1, "Only batch size 1 is supported"
img = rearrange(img[0], "c h w -> h w c")
img = torch.clamp(img, -1.0, 1.0)
img_np = ((img + 1.0) * 127.5).numpy().astype(np.uint8)
# Apply Canny edge detection
canny = cv2.Canny(img_np, self.min_t, self.max_t)
# Convert back to torch tensor and reshape
canny = torch.from_numpy(canny).float() / 127.5 - 1.0
canny = rearrange(canny, "h w -> 1 1 h w")
canny = repeat(canny, "b 1 ... -> b 3 ...")
return canny.to(self.device)
class ReduxImageEncoder(nn.Module):
siglip_model_name = "google/siglip-so400m-patch14-384"
def __init__(
self,
device,
redux_dim: int = 1152,
txt_in_features: int = 4096,
redux_path: str | None = os.getenv("FLUX_REDUX"),
dtype=torch.bfloat16,
) -> None:
assert redux_path is not None, "Redux path must be provided"
super().__init__()
self.redux_dim = redux_dim
self.device = device if isinstance(device, torch.device) else torch.device(device)
self.dtype = dtype
with self.device:
self.redux_up = nn.Linear(redux_dim, txt_in_features * 3, dtype=dtype)
self.redux_down = nn.Linear(txt_in_features * 3, txt_in_features, dtype=dtype)
sd = load_sft(redux_path, device=str(device))
missing, unexpected = self.load_state_dict(sd, strict=False, assign=True)
print_load_warning(missing, unexpected)
self.siglip = SiglipVisionModel.from_pretrained(self.siglip_model_name).to(dtype=dtype)
self.normalize = SiglipImageProcessor.from_pretrained(self.siglip_model_name)
def __call__(self, x: Image.Image) -> torch.Tensor:
imgs = self.normalize.preprocess(images=[x], do_resize=True, return_tensors="pt", do_convert_rgb=True)
_encoded_x = self.siglip(**imgs.to(device=self.device, dtype=self.dtype)).last_hidden_state
projected_x = self.redux_down(nn.functional.silu(self.redux_up(_encoded_x)))
return projected_x
================================================
FILE: flux-ToCa/src/flux/modules/layers.py
================================================
import math
from dataclasses import dataclass
from typing import Optional
import torch
from einops import rearrange
from torch import Tensor, nn
from flux.math import attention, rope
from flux.modules.cache_functions import force_init, cache_cutfresh, update_cache
class EmbedND(nn.Module):
def __init__(self, dim: int, theta: int, axes_dim: list[int]):
super().__init__()
self.dim = dim
self.theta = theta
self.axes_dim = axes_dim
def forward(self, ids: Tensor) -> Tensor:
n_axes = ids.shape[-1]
emb = torch.cat(
[rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
dim=-3,
)
return emb.unsqueeze(1)
def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):
"""
Create sinusoidal timestep embeddings.
:param t: a 1-D Tensor of N indices, one per batch element.
These may be fractional.
:param dim: the dimension of the output.
:param max_period: controls the minimum frequency of the embeddings.
:return: an (N, D) Tensor of positional embeddings.
"""
t = time_factor * t
half = dim // 2
freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(
t.device
)
args = t[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
if torch.is_floating_point(t):
embedding = embedding.to(t)
return embedding
class MLPEmbedder(nn.Module):
def __init__(self, in_dim: int, hidden_dim: int):
super().__init__()
self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True)
self.silu = nn.SiLU()
self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True)
def forward(self, x: Tensor) -> Tensor:
return self.out_layer(self.silu(self.in_layer(x)))
class RMSNorm(torch.nn.Module):
def __init__(self, dim: int):
super().__init__()
self.scale = nn.Parameter(torch.ones(dim))
def forward(self, x: Tensor):
x_dtype = x.dtype
x = x.float()
rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6)
return (x * rrms).to(dtype=x_dtype) * self.scale
class QKNorm(torch.nn.Module):
def __init__(self, dim: int):
super().__init__()
self.query_norm = RMSNorm(dim)
self.key_norm = RMSNorm(dim)
def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Tensor]:
q = self.query_norm(q)
k = self.key_norm(k)
return q.to(v), k.to(v)
class SelfAttention(nn.Module):
def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
self.norm = QKNorm(head_dim)
self.proj = nn.Linear(dim, dim)
def forward(self, x: Tensor, pe: Tensor) -> Tensor:
qkv = self.qkv(x)
q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
q, k = self.norm(q, k, v)
x = attention(q, k, v, pe=pe)
x = self.proj(x)
return x
@dataclass
class ModulationOut:
shift: Tensor
scale: Tensor
gate: Tensor
class Modulation(nn.Module):
def __init__(self, dim: int, double: bool):
super().__init__()
self.is_double = double
self.multiplier = 6 if double else 3
self.lin = nn.Linear(dim, self.multiplier * dim, bias=True)
def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut | None]:
out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1)
return (
ModulationOut(*out[:3]),
ModulationOut(*out[3:]) if self.is_double else None,
)
class DoubleStreamBlock(nn.Module):
def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False):
super().__init__()
mlp_hidden_dim = int(hidden_size * mlp_ratio)
self.num_heads = num_heads
self.hidden_size = hidden_size
self.img_mod = Modulation(hidden_size, double=True)
self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.img_mlp = nn.Sequential(
nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
nn.GELU(approximate="tanh"),
nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
)
self.txt_mod = Modulation(hidden_size, double=True)
self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.txt_mlp = nn.Sequential(
nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
nn.GELU(approximate="tanh"),
nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
)
def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor, **kwargs) -> tuple[Tensor, Tensor]:
cache_dic = kwargs.get('cache_dic', None)
current = kwargs.get('current', None)
if cache_dic is None:
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
# prepare image for attention
img_modulated = self.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = self.img_attn.qkv(img_modulated)
img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
# prepare txt for attention
txt_modulated = self.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = self.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
# run actual attention
q = torch.cat((txt_q, img_q), dim=2)
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn = attention(q, k, v, pe=pe)
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
# calculate the img bloks
img = img + img_mod1.gate * self.img_attn.proj(img_attn)
img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
# calculate the txt bloks
txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
else:
current['stream'] = 'double_stream'
if current['type'] == 'full':
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
# prepare image for attention
img_modulated = self.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = self.img_attn.qkv(img_modulated)
img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
if cache_dic['cache_type'] == 'k-norm':
img_k_norm = img_k.norm(dim=-1, p=2).mean(dim=1)
cache_dic['k-norm'][-1][current['stream']][current['layer']]['img_mlp'] = img_k_norm
elif cache_dic['cache_type'] == 'v-norm':
img_v_norm = img_v.norm(dim=-1, p=2).mean(dim=1)
cache_dic['v-norm'][-1][current['stream']][current['layer']]['img_mlp'] = img_v_norm
img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
# prepare txt for attention
txt_modulated = self.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = self.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
if cache_dic['cache_type'] == 'k-norm':
txt_k_norm = txt_k.norm(dim=-1, p=2).mean(dim=1)
cache_dic['k-norm'][-1][current['stream']][current['layer']]['txt_mlp'] = txt_k_norm
elif cache_dic['cache_type'] == 'v-norm':
txt_v_norm = txt_v.norm(dim=-1, p=2).mean(dim=1)
cache_dic['v-norm'][-1][current['stream']][current['layer']]['txt_mlp'] = txt_v_norm
txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
# run actual attention
q = torch.cat((txt_q, img_q), dim=2)
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)
cache_dic['cache'][-1]['double_stream'][current['layer']]['attn'] = attn
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
cache_dic['txt_shape'] = txt.shape[1]
if cache_dic['cache_type'] == 'attention':
cache_dic['attn_map'][-1][current['stream']][current['layer']]['txt_mlp'] = cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'][:, : txt.shape[1]]
cache_dic['attn_map'][-1][current['stream']][current['layer']]['img_mlp'] = cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'][:, txt.shape[1] :]
current['module'] = 'img_mlp'
force_init(cache_dic=cache_dic, current=current, tokens=img)
# calculate the img bloks
img = img + img_mod1.gate * self.img_attn.proj(img_attn)
cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp'] = self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']
current['module'] = 'txt_mlp'
force_init(cache_dic=cache_dic, current=current, tokens=txt)
# calculate the txt bloks
txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp'] = self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']
elif current['type'] == 'ToCa':
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
attn = cache_dic['cache'][-1]['double_stream'][current['layer']]['attn']
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
current['module'] = 'img_mlp'
# calculate the img bloks
img = img + img_mod1.gate * self.img_attn.proj(img_attn)
fresh_indices, fresh_tokens_img = cache_cutfresh(cache_dic=cache_dic, tokens=img, current=current)
fresh_tokens_img = self.img_mlp((1 + img_mod2.scale) * self.img_norm2(fresh_tokens_img) + img_mod2.shift)
update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_img, cache_dic=cache_dic, current=current)
cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']
img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']
current['module'] = 'txt_mlp'
# calculate the txt bloks
txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
fresh_indices, fresh_tokens_txt = cache_cutfresh(cache_dic=cache_dic, tokens=txt, current=current)
fresh_tokens_txt = self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(fresh_tokens_txt) + txt_mod2.shift)
update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_txt, cache_dic=cache_dic, current=current)
txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']
elif current['type'] == 'FORA':
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']
txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']
elif current['type'] == 'aggressive':
current['module'] = 'skipped'
else:
raise ValueError("Unknown cache type.")
return img, txt
class SingleStreamBlock(nn.Module):
"""
A DiT block with parallel linear layers as described in
https://arxiv.org/abs/2302.05442 and adapted modulation interface.
"""
def __init__(
self,
hidden_size: int,
num_heads: int,
mlp_ratio: float = 4.0,
qk_scale: float | None = None,
):
super().__init__()
self.hidden_dim = hidden_size
self.num_heads = num_heads
head_dim = hidden_size // num_heads
self.scale = qk_scale or head_dim**-0.5
self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
# qkv and mlp_in
self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim)
# proj and mlp_out
self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size)
self.norm = QKNorm(head_dim)
self.hidden_size = hidden_size
self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.mlp_act = nn.GELU(approximate="tanh")
self.modulation = Modulation(hidden_size, double=False)
# mlp_in
self.mlp_in = nn.Linear(hidden_size, self.mlp_hidden_dim)
def load_mlp_in_weights(self, linear1_weight: torch.Tensor, linear1_bias: Optional[torch.Tensor] = None):
"""
Split and load the weights of the original `linear1` layer, keeping only the MLP hidden layer part.
Parameters:
- linear1_weight: Tensor, with shape (hidden_size * 3 + mlp_hidden_dim, hidden_size)
- linear1_bias: Tensor, with shape (hidden_size * 3 + mlp_hidden_dim,) or None
"""
hidden_size = self.hidden_size
mlp_hidden_dim = self.mlp_hidden_dim
device = self.linear1.weight.device # target device
self.mlp_in.weight = torch.nn.Parameter(linear1_weight[hidden_size * 3:, :].to(device))
if linear1_bias is not None:
self.mlp_in.bias = torch.nn.Parameter(linear1_bias[hidden_size * 3:].to(device))
def forward(self, x: Tensor, vec: Tensor, pe: Tensor, **kwargs) -> Tensor:
cache_dic = kwargs.get('cache_dic', None)
current = kwargs.get('current', None)
mod, _ = self.modulation(vec)
if cache_dic is None:
x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
q, k = self.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)
# compute activation in mlp stream, cat again and run second linear layer
output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
else:
current['stream'] = 'single_stream'
if current['type'] == 'full':
#if (current['layer'] == 0):
# print(current['step'])
x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp'] = mlp
current['module'] = 'attn'
q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
if cache_dic['cache_type'] == 'k-norm':
cache_dic['k-norm'][-1][current['stream']][current['layer']]['total'] = k.norm(dim=-1, p=2).mean(dim=1)
elif cache_dic['cache_type'] == 'v-norm':
cache_dic['v-norm'][-1][current['stream']][current['layer']]['total'] = v.norm(dim=-1, p=2).mean(dim=1)
q, k = self.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)
force_init(cache_dic=cache_dic, current=current, tokens=attn)
cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'] = attn
# compute activation in mlp stream, cat again and run second linear layer
current['module'] = 'mlp'
output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
force_init(cache_dic=cache_dic, current=current, tokens=output)
current['module'] = 'total'
cache_dic['cache'][-1]['single_stream'][current['layer']]['total'] = output
elif current['type'] == 'ToCa':
self.load_mlp_in_weights(self.linear1.weight, self.linear1.bias)
current['module'] = 'mlp'
fresh_indices, fresh_tokens_mlp = cache_cutfresh(cache_dic=cache_dic, tokens=x, current=current)
x_mod = (1 + mod.scale) * self.pre_norm(fresh_tokens_mlp) + mod.shift
#cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp']
mlp_fresh = self.mlp_in(x_mod)
#_, mlp_fresh1 = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
update_cache(fresh_indices=fresh_indices, fresh_tokens=mlp_fresh, cache_dic=cache_dic, current=current)
# compute attention
fake_fresh_attn = torch.gather(input = cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'], dim = 1,
index = fresh_indices.unsqueeze(-1).expand(-1, -1, cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'].shape[-1]))
current['module'] = 'total'
fresh_tokens_output = self.linear2(torch.cat((fake_fresh_attn, self.mlp_act(mlp_fresh)), 2))
update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_output, cache_dic=cache_dic, current=current)
#attn = cache_dic['cache'][-1]['single_stream'][current['layer']]['attn']
#mlp = cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp']
# compute activation in mlp stream, cat again and run second linear layer
#output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
output = cache_dic['cache'][-1]['single_stream'][current['layer']]['total']
elif current['type'] == 'FORA':
output = cache_dic['cache'][-1]['single_stream'][current['layer']]['total']
elif current['type'] == 'aggressive':
current['module'] = 'skipped'
if current['layer'] == 37:
x = cache_dic['cache'][-1]['aggressive_output']
return x
else:
raise ValueError("Unknown cache type.")
if current['layer'] == 37:
cache_dic['cache'][-1]['aggressive_output'] = x
return x + mod.gate * output
class LastLayer(nn.Module):
def __init__(self, hidden_size: int, patch_size: int, out_channels: int):
super().__init__()
self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))
def forward(self, x: Tensor, vec: Tensor) -> Tensor:
shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)
x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]
x = self.linear(x)
return x
================================================
FILE: flux-ToCa/src/flux/modules/lora.py
================================================
import torch
from torch import nn
def replace_linear_with_lora(
module: nn.Module,
max_rank: int,
scale: float = 1.0,
) -> None:
for name, child in module.named_children():
if isinstance(child, nn.Linear):
new_lora = LinearLora(
in_features=child.in_features,
out_features=child.out_features,
bias=child.bias,
rank=max_rank,
scale=scale,
dtype=child.weight.dtype,
device=child.weight.device,
)
new_lora.weight = child.weight
new_lora.bias = child.bias if child.bias is not None else None
setattr(module, name, new_lora)
else:
replace_linear_with_lora(
module=child,
max_rank=max_rank,
scale=scale,
)
class LinearLora(nn.Linear):
def __init__(
self,
in_features: int,
out_features: int,
bias: bool,
rank: int,
dtype: torch.dtype,
device: torch.device,
lora_bias: bool = True,
scale: float = 1.0,
*args,
**kwargs,
) -> None:
super().__init__(
in_features=in_features,
out_features=out_features,
bias=bias is not None,
device=device,
dtype=dtype,
*args,
**kwargs,
)
assert isinstance(scale, float), "scale must be a float"
self.scale = scale
self.rank = rank
self.lora_bias = lora_bias
self.dtype = dtype
self.device = device
if rank > (new_rank := min(self.out_features, self.in_features)):
self.rank = new_rank
self.lora_A = nn.Linear(
in_features=in_features,
out_features=self.rank,
bias=False,
dtype=dtype,
device=device,
)
self.lora_B = nn.Linear(
in_features=self.rank,
out_features=out_features,
bias=self.lora_bias,
dtype=dtype,
device=device,
)
def set_scale(self, scale: float) -> None:
assert isinstance(scale, float), "scalar value must be a float"
self.scale = scale
def forward(self, input: torch.Tensor) -> torch.Tensor:
base_out = super().forward(input)
_lora_out_B = self.lora_B(self.lora_A(input))
lora_update = _lora_out_B * self.scale
return base_out + lora_update
================================================
FILE: flux-ToCa/src/flux/sampling.py
================================================
import math
from typing import Callable
import numpy as np
import torch
from einops import rearrange, repeat
from PIL import Image
from torch import Tensor
from .model import Flux
from .modules.autoencoder import AutoEncoder
from .modules.conditioner import HFEmbedder
from .modules.image_embedders import CannyImageEncoder, DepthImageEncoder, ReduxImageEncoder
from .modules.cache_functions import cache_init
def get_noise(
num_samples: int,
height: int,
width: int,
device: torch.device,
dtype: torch.dtype,
seed: int,
):
return torch.randn(
num_samples,
16,
# allow for packing
2 * math.ceil(height / 16),
2 * math.ceil(width / 16),
device=device,
dtype=dtype,
generator=torch.Generator(device=device).manual_seed(seed),
)
def prepare(t5: HFEmbedder, clip: HFEmbedder, img: Tensor, prompt: str | list[str]) -> dict[str, Tensor]:
bs, c, h, w = img.shape
if bs == 1 and not isinstance(prompt, str):
bs = len(prompt)
img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img.shape[0] == 1 and bs > 1:
img = repeat(img, "1 ... -> bs ...", bs=bs)
img_ids = torch.zeros(h // 2, w // 2, 3)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
#small_img_ids = torch.zeros((h // 2) // 2, (w // 2) // 2, 3)
#small_img_ids[..., 1] = small_img_ids[..., 1] + torch.arange((h // 2) // 2)[:, None]
#small_img_ids[..., 2] = small_img_ids[..., 2] + torch.arange((w // 2) // 2)[None, :]
#small_img_ids = repeat(small_img_ids, "h w c -> b (h w) c", b=bs)
if isinstance(prompt, str):
prompt = [prompt]
txt = t5(prompt)
if txt.shape[0] == 1 and bs > 1:
txt = repeat(txt, "1 ... -> bs ...", bs=bs)
txt_ids = torch.zeros(bs, txt.shape[1], 3)
vec = clip(prompt)
if vec.shape[0] == 1 and bs > 1:
vec = repeat(vec, "1 ... -> bs ...", bs=bs)
return {
"img": img,
#"img_ids": [img_ids.to(img.device), small_img_ids.to(img.device)],
"img_ids": img_ids.to(img.device),
"txt": txt.to(img.device),
"txt_ids": txt_ids.to(img.device),
"vec": vec.to(img.device),
}
def prepare_control(
t5: HFEmbedder,
clip: HFEmbedder,
img: Tensor,
prompt: str | list[str],
ae: AutoEncoder,
encoder: DepthImageEncoder | CannyImageEncoder,
img_cond_path: str,
) -> dict[str, Tensor]:
# load and encode the conditioning image
bs, _, h, w = img.shape
if bs == 1 and not isinstance(prompt, str):
bs = len(prompt)
img_cond = Image.open(img_cond_path).convert("RGB")
width = w * 8
height = h * 8
img_cond = img_cond.resize((width, height), Image.LANCZOS)
img_cond = np.array(img_cond)
img_cond = torch.from_numpy(img_cond).float() / 127.5 - 1.0
img_cond = rearrange(img_cond, "h w c -> 1 c h w")
with torch.no_grad():
img_cond = encoder(img_cond)
img_cond = ae.encode(img_cond)
img_cond = img_cond.to(torch.bfloat16)
img_cond = rearrange(img_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img_cond.shape[0] == 1 and bs > 1:
img_cond = repeat(img_cond, "1 ... -> bs ...", bs=bs)
return_dict = prepare(t5, clip, img, prompt)
return_dict["img_cond"] = img_cond
return return_dict
def prepare_fill(
t5: HFEmbedder,
clip: HFEmbedder,
img: Tensor,
prompt: str | list[str],
ae: AutoEncoder,
img_cond_path: str,
mask_path: str,
) -> dict[str, Tensor]:
# load and encode the conditioning image and the mask
bs, _, _, _ = img.shape
if bs == 1 and not isinstance(prompt, str):
bs = len(prompt)
img_cond = Image.open(img_cond_path).convert("RGB")
img_cond = np.array(img_cond)
img_cond = torch.from_numpy(img_cond).float() / 127.5 - 1.0
img_cond = rearrange(img_cond, "h w c -> 1 c h w")
mask = Image.open(mask_path).convert("L")
mask = np.array(mask)
mask = torch.from_numpy(mask).float() / 255.0
mask = rearrange(mask, "h w -> 1 1 h w")
with torch.no_grad():
img_cond = img_cond.to(img.device)
mask = mask.to(img.device)
img_cond = img_cond * (1 - mask)
img_cond = ae.encode(img_cond)
mask = mask[:, 0, :, :]
mask = mask.to(torch.bfloat16)
mask = rearrange(
mask,
"b (h ph) (w pw) -> b (ph pw) h w",
ph=8,
pw=8,
)
mask = rearrange(mask, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if mask.shape[0] == 1 and bs > 1:
mask = repeat(mask, "1 ... -> bs ...", bs=bs)
img_cond = img_cond.to(torch.bfloat16)
img_cond = rearrange(img_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img_cond.shape[0] == 1 and bs > 1:
img_cond = repeat(img_cond, "1 ... -> bs ...", bs=bs)
img_cond = torch.cat((img_cond, mask), dim=-1)
return_dict = prepare(t5, clip, img, prompt)
return_dict["img_cond"] = img_cond.to(img.device)
return return_dict
def prepare_redux(
t5: HFEmbedder,
clip: HFEmbedder,
img: Tensor,
prompt: str | list[str],
encoder: ReduxImageEncoder,
img_cond_path: str,
) -> dict[str, Tensor]:
bs, _, h, w = img.shape
if bs == 1 and not isinstance(prompt, str):
bs = len(prompt)
img_cond = Image.open(img_cond_path).convert("RGB")
with torch.no_grad():
img_cond = encoder(img_cond)
img_cond = img_cond.to(torch.bfloat16)
if img_cond.shape[0] == 1 and bs > 1:
img_cond = repeat(img_cond, "1 ... -> bs ...", bs=bs)
img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img.shape[0] == 1 and bs > 1:
img = repeat(img, "1 ... -> bs ...", bs=bs)
img_ids = torch.zeros(h // 2, w // 2, 3)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
if isinstance(prompt, str):
prompt = [prompt]
txt = t5(prompt)
txt = torch.cat((txt, img_cond.to(txt)), dim=-2)
if txt.shape[0] == 1 and bs > 1:
txt = repeat(txt, "1 ... -> bs ...", bs=bs)
txt_ids = torch.zeros(bs, txt.shape[1], 3)
vec = clip(prompt)
if vec.shape[0] == 1 and bs > 1:
vec = repeat(vec, "1 ... -> bs ...", bs=bs)
return {
"img": img,
"img_ids": img_ids.to(img.device),
"txt": txt.to(img.device),
"txt_ids": txt_ids.to(img.device),
"vec": vec.to(img.device),
}
def time_shift(mu: float, sigma: float, t: Tensor):
return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
def get_lin_function(
x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15
) -> Callable[[float], float]:
m = (y2 - y1) / (x2 - x1)
b = y1 - m * x1
return lambda x: m * x + b
def get_schedule(
num_steps: int,
image_seq_len: int,
base_shift: float = 0.5,
max_shift: float = 1.15,
shift: bool = True,
) -> list[float]:
# extra step for zero
timesteps = torch.linspace(1, 0, num_steps + 1)
# shifting the schedule to favor high timesteps for higher signal images
if shift:
# estimate mu based on linear estimation between two points
mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)
timesteps = time_shift(mu, 1.0, timesteps)
return timesteps.tolist()
def denoise(
model: Flux,
# model input
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
vec: Tensor,
# sampling parameters
timesteps: list[float],
guidance: float = 4.0,
# extra img tokens
img_cond: Tensor | None = None,
):
# this is ignored for schnell
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
pred = model(
img=torch.cat((img, img_cond), dim=-1) if img_cond is not None else img,
#img_ids=img_ids[1] if small else img_ids[0],
img_ids=img_ids[0],
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
guidance=guidance_vec,
)
img = img + (t_prev - t_curr) * pred
return img
def unpack(x: Tensor, height: int, width: int) -> Tensor:
return rearrange(
x,
"b (h w) (c ph pw) -> b c (h ph) (w pw)",
h=math.ceil(height / 16),
w=math.ceil(width / 16),
ph=2,
pw=2,
)
####################################################################################################
from calflops import calculate_flops
def denoise_test_FLOPs(
model: Flux,
# model input
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
vec: Tensor,
# sampling parameters
timesteps: list[float],
guidance: float = 4.0,
):
# init cache
cache_dic, current = cache_init(timesteps)
# this is ignored for schnell
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
current['step']=0
current['num_steps'] = len(timesteps)-1
total_flops = 0
for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
inputs=dict(
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
cache_dic = cache_dic,
current = current,
guidance=guidance_vec,
)
flops, macs, params = calculate_flops(model=model,
kwargs = inputs,
print_results=False)
total_flops += convert_flops(flops)
current['step'] += 1
print(f"Total {total_flops * 10 **(-12)} TFLOPs." )
return img
import re
def convert_flops(flops_str):
"""
将表示 FLOPS 的字符串(如 '12.34 GFLOPS', '1.2 TFLOPS')转换为对应的数值。
"""
# 使用正则表达式匹配数字和单位
match = re.match(r"([\d.]+)\s*([GT]?FLOPS)", flops_str.strip(), re.IGNORECASE)
if not match:
raise ValueError(f"无法解析 FLOPS 字符串: {flops_str}")
# 提取数字和单位
value = float(match.group(1))
unit = match.group(2).upper()
# 根据单位转换为数字
if unit == "GFLOPS":
return value * 10**9
elif unit == "TFLOPS":
return value * 10**12
else:
raise ValueError(f"未知的 FLOPS 单位: {unit}")
================================================
FILE: flux-ToCa/src/flux/util.py
================================================
import os
from dataclasses import dataclass
import torch
from einops import rearrange
from huggingface_hub import hf_hub_download
from imwatermark import WatermarkEncoder
from PIL import ExifTags, Image
from safetensors.torch import load_file as load_sft
from flux.model import Flux, FluxLoraWrapper, FluxParams
from flux.modules.autoencoder import AutoEncoder, AutoEncoderParams
from flux.modules.conditioner import HFEmbedder
def save_image(
nsfw_classifier,
name: str,
output_name: str,
idx: int,
x: torch.Tensor,
add_sampling_metadata: bool,
prompt: str,
nsfw_threshold: float = 0.85,
) -> int:
fn = output_name.format(idx=idx)
print(f"Saving {fn}")
# bring into PIL format and save
x = x.clamp(-1, 1)
x = embed_watermark(x.float())
x = rearrange(x[0], "c h w -> h w c")
img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())
nsfw_score = [x["score"] for x in nsfw_classifier(img) if x["label"] == "nsfw"][0]
if nsfw_score < nsfw_threshold:
exif_data = Image.Exif()
exif_data[ExifTags.Base.Software] = "AI generated;txt2img;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = name
if add_sampling_metadata:
exif_data[ExifTags.Base.ImageDescription] = prompt
img.save(fn, exif=exif_data, quality=95, subsampling=0)
idx += 1
else:
print("Your generated image may contain NSFW content.")
return idx
@dataclass
class ModelSpec:
params: FluxParams
ae_params: AutoEncoderParams
ckpt_path: str | None
lora_path: str | None
ae_path: str | None
repo_id: str | None
repo_flow: str | None
repo_ae: str | None
configs = {
"flux-dev": ModelSpec(
repo_id="black-forest-labs/FLUX.1-dev",
repo_flow="flux1-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV"),
lora_path=None,
params=FluxParams(
in_channels=64,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-schnell": ModelSpec(
repo_id="black-forest-labs/FLUX.1-schnell",
repo_flow="flux1-schnell.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_SCHNELL"),
lora_path=None,
params=FluxParams(
in_channels=64,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=False,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-dev-canny": ModelSpec(
repo_id="black-forest-labs/FLUX.1-Canny-dev",
repo_flow="flux1-canny-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV_CANNY"),
lora_path=None,
params=FluxParams(
in_channels=128,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-dev-canny-lora": ModelSpec(
repo_id="black-forest-labs/FLUX.1-dev",
repo_flow="flux1-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV"),
lora_path=os.getenv("FLUX_DEV_CANNY_LORA"),
params=FluxParams(
in_channels=128,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-dev-depth": ModelSpec(
repo_id="black-forest-labs/FLUX.1-Depth-dev",
repo_flow="flux1-depth-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV_DEPTH"),
lora_path=None,
params=FluxParams(
in_channels=128,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-dev-depth-lora": ModelSpec(
repo_id="black-forest-labs/FLUX.1-dev",
repo_flow="flux1-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV"),
lora_path=os.getenv("FLUX_DEV_DEPTH_LORA"),
params=FluxParams(
in_channels=128,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
"flux-dev-fill": ModelSpec(
repo_id="black-forest-labs/FLUX.1-Fill-dev",
repo_flow="flux1-fill-dev.safetensors",
repo_ae="ae.safetensors",
ckpt_path=os.getenv("FLUX_DEV_FILL"),
lora_path=None,
params=FluxParams(
in_channels=384,
out_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
mlp_ratio=4.0,
num_heads=24,
depth=19,
depth_single_blocks=38,
axes_dim=[16, 56, 56],
theta=10_000,
qkv_bias=True,
guidance_embed=True,
),
ae_path=os.getenv("AE"),
ae_params=AutoEncoderParams(
resolution=256,
in_channels=3,
ch=128,
out_ch=3,
ch_mult=[1, 2, 4, 4],
num_res_blocks=2,
z_channels=16,
scale_factor=0.3611,
shift_factor=0.1159,
),
),
}
def print_load_warning(missing: list[str], unexpected: list[str]) -> None:
if len(missing) > 0 and len(unexpected) > 0:
print(f"Got {len(missing)} missing keys:\n\t" + "\n\t".join(missing))
print("\n" + "-" * 79 + "\n")
print(f"Got {len(unexpected)} unexpected keys:\n\t" + "\n\t".join(unexpected))
elif len(missing) > 0:
print(f"Got {len(missing)} missing keys:\n\t" + "\n\t".join(missing))
elif len(unexpected) > 0:
print(f"Got {len(unexpected)} unexpected keys:\n\t" + "\n\t".join(unexpected))
def load_flow_model(
name: str, device: str | torch.device = "cuda", hf_download: bool = True, verbose: bool = False
) -> Flux:
# Loading Flux
print("Init model")
ckpt_path = configs[name].ckpt_path
lora_path = configs[name].lora_path
if (
ckpt_path is None
and configs[name].repo_id is not None
and configs[name].repo_flow is not None
and hf_download
):
ckpt_path = hf_hub_download(configs[name].repo_id, configs[name].repo_flow)
with torch.device("meta" if ckpt_path is not None else device):
if lora_path is not None:
model = FluxLoraWrapper(params=configs[name].params).to(torch.bfloat16)
else:
model = Flux(configs[name].params).to(torch.bfloat16)
if ckpt_path is not None:
print("Loading checkpoint")
# load_sft doesn't support torch.device
sd = load_sft(ckpt_path, device=str(device))
sd = optionally_expand_state_dict(model, sd)
missing, unexpected = model.load_state_dict(sd, strict=False, assign=True)
if verbose:
print_load_warning(missing, unexpected)
if configs[name].lora_path is not None:
print("Loading LoRA")
lora_sd = load_sft(configs[name].lora_path, device=str(device))
# loading the lora params + overwriting scale values in the norms
missing, unexpected = model.load_state_dict(lora_sd, strict=False, assign=True)
if verbose:
print_load_warning(missing, unexpected)
return model
def load_t5(device: str | torch.device = "cuda", max_length: int = 512) -> HFEmbedder:
# max length 64, 128, 256 and 512 should work (if your sequence is short enough)
return HFEmbedder("/root/autodl-tmp/pretrained_models/google/t5-v1_1-xxl", max_length=max_length, torch_dtype=torch.bfloat16).to(device)
def load_clip(device: str | torch.device = "cuda") -> HFEmbedder:
return HFEmbedder("/root/autodl-tmp/pretrained_models/openai/clip-vit-large-patch14", max_length=77, torch_dtype=torch.bfloat16).to(device)
def load_ae(name: str, device: str | torch.device = "cuda", hf_download: bool = True) -> AutoEncoder:
ckpt_path = configs[name].ae_path
if (
ckpt_path is None
and configs[name].repo_id is not None
and configs[name].repo_ae is not None
and hf_download
):
ckpt_path = hf_hub_download(configs[name].repo_id, configs[name].repo_ae)
# Loading the autoencoder
print("Init AE")
with torch.device("meta" if ckpt_path is not None else device):
ae = AutoEncoder(configs[name].ae_params)
if ckpt_path is not None:
sd = load_sft(ckpt_path, device=str(device))
missing, unexpected = ae.load_state_dict(sd, strict=False, assign=True)
print_load_warning(missing, unexpected)
return ae
def optionally_expand_state_dict(model: torch.nn.Module, state_dict: dict) -> dict:
"""
Optionally expand the state dict to match the model's parameters shapes.
"""
for name, param in model.named_parameters():
if name in state_dict:
if state_dict[name].shape != param.shape:
print(
f"Expanding '{name}' with shape {state_dict[name].shape} to model parameter with shape {param.shape}."
)
# expand with zeros:
expanded_state_dict_weight = torch.zeros_like(param, device=state_dict[name].device)
slices = tuple(slice(0, dim) for dim in state_dict[name].shape)
expanded_state_dict_weight[slices] = state_dict[name]
state_dict[name] = expanded_state_dict_weight
return state_dict
class WatermarkEmbedder:
def __init__(self, watermark):
self.watermark = watermark
self.num_bits = len(WATERMARK_BITS)
self.encoder = WatermarkEncoder()
self.encoder.set_watermark("bits", self.watermark)
def __call__(self, image: torch.Tensor) -> torch.Tensor:
"""
Adds a predefined watermark to the input image
Args:
image: ([N,] B, RGB, H, W) in range [-1, 1]
Returns:
same as input but watermarked
"""
image = 0.5 * image + 0.5
squeeze = len(image.shape) == 4
if squeeze:
image = image[None, ...]
n = image.shape[0]
image_np = rearrange((255 * image).detach().cpu(), "n b c h w -> (n b) h w c").numpy()[:, :, :, ::-1]
# torch (b, c, h, w) in [0, 1] -> numpy (b, h, w, c) [0, 255]
# watermarking libary expects input as cv2 BGR format
for k in range(image_np.shape[0]):
image_np[k] = self.encoder.encode(image_np[k], "dwtDct")
image = torch.from_numpy(rearrange(image_np[:, :, :, ::-1], "(n b) h w c -> n b c h w", n=n)).to(
image.device
)
image = torch.clamp(image / 255, min=0.0, max=1.0)
if squeeze:
image = image[0]
image = 2 * image - 1
return image
# A fixed 48-bit message that was chosen at random
WATERMARK_MESSAGE = 0b001010101111111010000111100111001111010100101110
# bin(x)[2:] gives bits of x as str, use int to convert them to 0/1
WATERMARK_BITS = [int(bit) for bit in bin(WATERMARK_MESSAGE)[2:]]
embed_watermark = WatermarkEmbedder(WATERMARK_BITS)
================================================
FILE: flux-ToCa/src/geneval_flux.py
================================================
import argparse
import json
import os
import torch
import numpy as np
from PIL import Image, ExifTags
from tqdm import tqdm, trange
from einops import rearrange
from torchvision.utils import make_grid
from torchvision.transforms import ToTensor
# --- Imports related to FLUX module ---
from flux.sampling import (
denoise_test_FLOPs,
get_noise,
get_schedule,
prepare,
unpack,
)
from flux.ideas import denoise_cache
from flux.util import (
embed_watermark,
load_ae,
load_clip,
load_flow_model,
load_t5,
)
from transformers import pipeline
# NSFW threshold (adjustable as needed)
NSFW_THRESHOLD = 0.85
def parse_args():
parser = argparse.ArgumentParser(description="Generate images using the FLUX model within the Geneval framework")
# Required: input JSONL metadata file, each line must contain at least the "prompt" key
parser.add_argument(
"metadata_file",
type=str,
help="JSONL file containing metadata for each prompt, each line is a JSON object"
)
# FLUX model related parameters
parser.add_argument(
"--model_name",
type=str,
default="flux-schnell",
choices=["flux-dev", "flux-schnell"],
help="FLUX model name"
)
parser.add_argument(
"--n_samples",
type=int,
default=1,
help="Number of images to generate per prompt"
)
parser.add_argument(
"--steps",
type=int,
default=None,
help="Number of sampling steps (if not specified: 4 for flux-schnell, 50 for flux-dev)"
)
parser.add_argument(
"--width",
type=int,
default=1360,
help="Width of the generated image (pixels)"
)
parser.add_argument(
"--height",
type=int,
default=768,
help="Height of the generated image (pixels)"
)
parser.add_argument(
"--guidance",
type=float,
default=3.5,
help="Conditional guidance scale"
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="Random seed"
)
parser.add_argument(
"--batch_size",
type=int,
default=1,
help="Number of samples per batch during image generation"
)
# Output related parameters
parser.add_argument(
"--output_dir",
type=str,
default="outputs",
help="Output directory to save the generated results"
)
parser.add_argument(
"--skip_grid",
action="store_true",
help="Skip saving the overall grid image"
)
# Other options
parser.add_argument(
"--add_sampling_metadata",
action="store_true",
help="Add the prompt text to the metadata of the generated images"
)
parser.add_argument(
"--use_nsfw_filter",
action="store_true",
help="Enable NSFW content filtering (requires downloading the relevant model)"
)
parser.add_argument(
"--test_FLOPs",
action="store_true",
help="Test inference FLOPs only (no images will be generated)"
)
return parser.parse_args()
def main(args):
# Read the metadata file, each line is a JSON object (must contain at least the "prompt" field)
with open(args.metadata_file, "r", encoding="utf-8") as fp:
metadatas = [json.loads(line) for line in fp if line.strip()]
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# If NSFW filtering is enabled, load the corresponding classifier (please modify the model path or name accordingly)
if args.use_nsfw_filter:
nsfw_classifier = pipeline(
"image-classification",
model="/path/to/your/nsfw_model", # Please replace with the actual NSFW model path
device=0 if torch.cuda.is_available() else -1
)
else:
nsfw_classifier = None
# If sampling steps are not specified, set default steps based on the model name
if args.steps is None:
args.steps = 4 if args.model_name == "flux-schnell" else 50
# Ensure the image width and height are multiples of 16 (required by FLUX)
args.width = 16 * (args.width // 16)
args.height = 16 * (args.height // 16)
# Load FLUX model components onto the device (T5, CLIP, Flow model, autoencoder)
t5 = load_t5(device, max_length=256 if args.model_name == "flux-schnell" else 512)
clip = load_clip(device)
model = load_flow_model(args.model_name, device=device)
ae = load_ae(args.model_name, device=device)
# Generate results for each prompt:
# Each prompt corresponds to a subfolder (e.g., outputs/00000/), inside which samples and (optionally) a grid image grid.png are saved,
# along with the prompt's metadata saved in a metadata.jsonl file.
for idx, metadata in enumerate(metadatas):
prompt = metadata.get("prompt", "")
print(f"Processing prompt {idx + 1}/{len(metadatas)}: '{prompt}'")
# Define output directory and samples directory
outpath = os.path.join(args.output_dir, f"{idx:05d}")
sample_path = os.path.join(outpath, "samples")
# If the output directory already exists, check the number of PNG files already in the samples folder
existing_samples = []
sample_count = 0
if os.path.exists(sample_path):
files = sorted(
fname for fname in os.listdir(sample_path)
if fname.endswith(".png") and fname != "grid.png"
)
sample_count = len(files)
# Load existing images (to be used later for generating the grid image)
for fname in files:
full_path = os.path.join(sample_path, fname)
try:
img = Image.open(full_path).convert("RGB")
existing_samples.append(ToTensor()(img))
except Exception as e:
print(f"Failed to read existing image {full_path}: {e}")
# If the number of generated images is sufficient, skip generation
if sample_count >= args.n_samples:
print(f"Samples for prompt {idx + 1} already exist ({sample_count} images), skipping generation.")
continue
# Create output directory and samples subdirectory
os.makedirs(outpath, exist_ok=True)
os.makedirs(sample_path, exist_ok=True)
# Save the current prompt's metadata to metadata.jsonl
with open(os.path.join(outpath, "metadata.jsonl"), "w", encoding="utf-8") as fp:
json.dump(metadata, fp)
# Initialize: use the number of existing images as the starting count, and copy existing samples for later grid generation
local_index = sample_count
all_samples = existing_samples.copy()
# The initial value of the progress bar is the number of existing samples
pbar = tqdm(total=args.n_samples, initial=sample_count, desc="Sampling")
# For the current prompt, only generate the missing images
while local_index < args.n_samples:
current_bs = min(args.batch_size, args.n_samples - local_index)
# Set seed for the current batch (using the number of images already present in the prompt as offset)
seed = args.seed + local_index
# Generate random noise
x = get_noise(current_bs, args.height, args.width, device=device, dtype=torch.bfloat16, seed=seed)
prompt_list = [prompt] * current_bs
# Prepare input (prompt encoding, initial image noise, etc.)
inp = prepare(t5, clip, x, prompt=prompt_list)
# Compute denoising schedule based on the input shape (note: the second parameter is the number of latent channels)
timesteps = get_schedule(args.steps, inp["img"].shape[1], shift=(args.model_name != "flux-schnell"))
with torch.no_grad():
if args.test_FLOPs:
latent = denoise_test_FLOPs(model, **inp, timesteps=timesteps, guidance=args.guidance)
else:
latent = denoise_cache(model, **inp, timesteps=timesteps, guidance=args.guidance)
# Unpack latent to a shape suitable for the decoder input
latent = unpack(latent.float(), args.height, args.width)
# Decode to image with automatic mixed precision
with torch.autocast(device_type=device.type, dtype=torch.bfloat16):
decoded = ae.decode(latent)
# Post-processing: clamp, embed watermark, and rearrange to [B, H, W, C] format
decoded = decoded.clamp(-1, 1)
decoded = embed_watermark(decoded.float())
images_tensor = rearrange(decoded, "b c h w -> b h w c")
# Iterate over each generated image in the current batch
for i in range(current_bs):
img_array = (127.5 * (images_tensor[i] + 1.0)).cpu().numpy().astype(np.uint8)
img = Image.fromarray(img_array)
# NSFW filtering (if enabled)
if nsfw_classifier is not None:
nsfw_result = nsfw_classifier(img)
nsfw_score = next((res["score"] for res in nsfw_result if res["label"] == "nsfw"), 0.0)
else:
nsfw_score = 0.0
if nsfw_score < NSFW_THRESHOLD:
# Add sampling metadata (EXIF info); note: PNG format may not fully support EXIF
if args.add_sampling_metadata:
exif_data = Image.Exif()
exif_data[ExifTags.Base.Software] = "AI generated;txt2img;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = args.model_name
exif_data[ExifTags.Base.ImageDescription] = prompt
else:
exif_data = None
sample_fname = os.path.join(sample_path, f"{local_index:05d}.png")
if exif_data is not None:
img.save(sample_fname, exif=exif_data)
else:
img.save(sample_fname)
all_samples.append(ToTensor()(img))
else:
print("The generated image may contain inappropriate content and has been skipped.")
local_index += 1
pbar.update(1)
# end for current batch
pbar.close()
# If grid generation is not skipped and there is at least one sample, create and save a grid image (consistent with Geneval format)
if not args.skip_grid and len(all_samples) > 0:
grid_tensor = torch.stack(all_samples, 0)
grid = make_grid(grid_tensor, nrow=args.batch_size)
grid = 255.0 * rearrange(grid, "c h w -> h w c").cpu().numpy()
grid_img = Image.fromarray(grid.astype(np.uint8))
grid_img.save(os.path.join(outpath, "grid.png"))
# end for each prompt
print("Generation completed.")
if __name__ == "__main__":
args = parse_args()
main(args)
'''
python src/geneval_flux.py /root/geneval/prompts/evaluation_metadata.jsonl --model_name flux-dev --n_samples 4 --steps 50 --width 1024 --height 1024 --seed 42 --output_dir /root/autodl-tmp/samples/geneval_original --batch_size 1
'''
================================================
FILE: flux-ToCa/src/sample.py
================================================
import os
import re
import time
from dataclasses import dataclass
from glob import iglob
import torch
from einops import rearrange
from PIL import ExifTags, Image
from transformers import pipeline
from tqdm import tqdm
from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack, denoise_test_FLOPs
from flux.ideas import denoise_cache
from flux.util import configs, embed_watermark, load_ae, load_clip, load_flow_model, load_t5
NSFW_THRESHOLD = 0.85 # NSFW score threshold
@dataclass
class SamplingOptions:
prompts: list[str] # List of prompts
width: int # Image width
height: int # Image height
num_steps: int # Number of sampling steps
guidance: float # Guidance value
seed: int | None # Random seed
num_images_per_prompt: int # Number of images generated per prompt
batch_size: int # Batch size (number of prompts per batch)
model_name: str # Model name
output_dir: str # Output directory
add_sampling_metadata: bool # Whether to add metadata
use_nsfw_filter: bool # Whether to enable NSFW filter
test_FLOPs: bool # Whether in FLOPs testing mode (in which case no images are generated)
def main(opts: SamplingOptions):
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Optional NSFW classifier
if opts.use_nsfw_filter:
nsfw_classifier = pipeline(
"image-classification",
model="/root/autodl-tmp/pretrained_models/Falconsai/nsfw_image_detection",
device=device
)
else:
nsfw_classifier = None
# Load model
model_name = opts.model_name
if model_name not in configs:
available = ", ".join(configs.keys())
raise ValueError(f"Unknown model name: {model_name}, available: {available}")
if opts.num_steps is None:
opts.num_steps = 4 if model_name == "flux-schnell" else 50
# Ensure width and height are multiples of 16
opts.width = 16 * (opts.width // 16)
opts.height = 16 * (opts.height // 16)
# Set output directory and index
output_name = os.path.join(opts.output_dir, f"img_{{idx}}.jpg")
if not os.path.exists(opts.output_dir):
os.makedirs(opts.output_dir)
idx = 0 # Image index
# Initialize model components
torch_device = device
# Load T5 and CLIP models onto GPU
t5 = load_t5(torch_device, max_length=256 if model_name == "flux-schnell" else 512)
clip = load_clip(torch_device)
# Load model onto GPU
model = load_flow_model(model_name, device=torch_device)
ae = load_ae(model_name, device=torch_device)
# Set random seed
if opts.seed is not None:
base_seed = opts.seed
else:
base_seed = torch.randint(0, 2**32, (1,)).item()
prompts = opts.prompts
total_images = len(prompts) * opts.num_images_per_prompt
progress_bar = tqdm(total=total_images, desc="Generating images")
# Calculate number of prompt batches
num_prompt_batches = (len(prompts) + opts.batch_size - 1) // opts.batch_size
for batch_idx in range(num_prompt_batches):
prompt_start = batch_idx * opts.batch_size
prompt_end = min(prompt_start + opts.batch_size, len(prompts))
batch_prompts = prompts[prompt_start:prompt_end]
num_prompts_in_batch = len(batch_prompts)
# For each prompt, generate the corresponding number of images
for image_idx in range(opts.num_images_per_prompt):
# Prepare random seed
seed = base_seed + idx # Set a different seed for each image
idx += num_prompts_in_batch # Update image index
# Prepare input
batch_size = num_prompts_in_batch
x = get_noise(
batch_size,
opts.height,
opts.width,
device=torch_device,
dtype=torch.bfloat16,
seed=seed,
)
# Prepare prompts
# batch_prompts is a list containing the prompts for the current batch
inp = prepare(t5, clip, x, prompt=batch_prompts)
timesteps = get_schedule(opts.num_steps, inp["img"].shape[1], shift=(model_name != "flux-schnell"))
# Denoise
with torch.no_grad():
if opts.test_FLOPs:
x = denoise_test_FLOPs(model, **inp, timesteps=timesteps, guidance=opts.guidance)
else:
x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)
# Decode latent variables
x = unpack(x.float(), opts.height, opts.width)
with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):
x = ae.decode(x)
# Convert to PIL format and save
x = x.clamp(-1, 1)
x = embed_watermark(x.float())
x = rearrange(x, "b c h w -> b h w c")
for i in range(batch_size):
img_array = x[i]
img = Image.fromarray((127.5 * (img_array + 1.0)).cpu().byte().numpy())
# Optional NSFW filtering
if opts.use_nsfw_filter:
nsfw_result = nsfw_classifier(img)
nsfw_score = next((res["score"] for res in nsfw_result if res["label"] == "nsfw"), 0.0)
else:
nsfw_score = 0.0 # If filter is not enabled, consider safe
if nsfw_score < NSFW_THRESHOLD:
exif_data = Image.Exif()
exif_data[ExifTags.Base.Software] = "AI generated;txt2img;flux"
exif_data[ExifTags.Base.Make] = "Black Forest Labs"
exif_data[ExifTags.Base.Model] = model_name
if opts.add_sampling_metadata:
exif_data[ExifTags.Base.ImageDescription] = batch_prompts[i]
# Save image
fn = output_name.format(idx=idx - num_prompts_in_batch + i)
img.save(fn, exif=exif_data, quality=95, subsampling=0)
else:
print(f"The generated image may contain inappropriate content and has been skipped.")
progress_bar.update(1)
progress_bar.close()
def read_prompts(prompt_file: str):
with open(prompt_file, 'r', encoding='utf-8') as f:
prompts = [line.strip() for line in f if line.strip()]
return prompts
def app():
import argparse
parser = argparse.ArgumentParser(description="Generate images using the flux model.")
parser.add_argument('--prompt_file', type=str, required=True, help='Path to the prompt text file.')
parser.add_argument('--width', type=int, default=1360, help='Width of the generated image.')
parser.add_argument('--height', type=int, default=768, help='Height of the generated image.')
parser.add_argument('--num_steps', type=int, default=None, help='Number of sampling steps.')
parser.add_argument('--guidance', type=float, default=3.5, help='Guidance value.')
parser.add_argument('--seed', type=int, default=0, help='Random seed.')
parser.add_argument('--num_images_per_prompt', type=int, default=1, help='Number of images generated per prompt.')
parser.add_argument('--batch_size', type=int, default=1, help='Batch size (number of prompts per batch).')
parser.add_argument('--model_name', type=str, default='flux-schnell', choices=['flux-dev', 'flux-schnell'], help='Model name.')
parser.add_argument('--output_dir', type=str, default='/root/autodl-tmp/samples', help='Directory to save images.')
parser.add_argument('--add_sampling_metadata', action='store_true', help='Whether to add prompts to image metadata.')
parser.add_argument('--use_nsfw_filter', action='store_true', help='Enable NSFW filter.')
parser.add_argument('--test_FLOPs', action='store_true', help='Test inference FLOPs.')
args = parser.parse_args()
prompts = read_prompts(args.prompt_file)
opts = SamplingOptions(
prompts=prompts,
width=args.width,
height=args.height,
num_steps=args.num_steps,
guidance=args.guidance,
seed=args.seed,
num_images_per_prompt=args.num_images_per_prompt,
batch_size=args.batch_size,
model_name=args.model_name,
output_dir=args.output_dir,
add_sampling_metadata=args.add_sampling_metadata,
use_nsfw_filter=args.use_nsfw_filter,
test_FLOPs=args.test_FLOPs,
)
main(opts)
if __name__ == '__main__':
app()