main c7cfdb1caa7f cached
6 files
16.0 KB
3.9k tokens
10 symbols
1 requests
Download .txt
Repository: andreasjansson/cog-stable-diffusion
Branch: main
Commit: c7cfdb1caa7f
Files: 6
Total size: 16.0 KB

Directory structure:
gitextract_ahtj3bew/

├── .gitignore
├── README.md
├── cog.yaml
├── image_to_image.py
├── predict.py
└── script/
    └── download-weights

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.cog/
__pycache__/
diffusers-cache/

================================================
FILE: README.md
================================================
# Stable Diffusion Cog model

This is an implementation of the [Diffusers Stable Diffusion 1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) as a Cog model. [Cog packages machine learning models as standard containers.](https://github.com/replicate/cog)

First, download the pre-trained weights [with your Hugging Face auth token](https://huggingface.co/settings/tokens):

    cog run script/download-weights <your-hugging-face-auth-token>

Then, you can run predictions:

    cog predict -i prompt="monkey scuba diving"

Or, build a Docker image:

    cog build

Or, [push it to Replicate](https://replicate.com/docs/guides/push-a-model):

    cog push r8.im/...


================================================
FILE: cog.yaml
================================================
build:
  gpu: true
  cuda: "11.6.2"
  python_version: "3.10"
  python_packages:
    - "diffusers==0.2.4"
    - "torch==1.12.1 --extra-index-url=https://download.pytorch.org/whl/cu116"
    - "ftfy==6.1.1"
    - "scipy==1.9.0"
    - "transformers==4.21.1"
predict: "predict.py:Predictor"


================================================
FILE: image_to_image.py
================================================
import inspect
from typing import List, Optional, Union, Tuple

import numpy as np
import torch

from PIL import Image
from diffusers import (
    AutoencoderKL,
    DDIMScheduler,
    DiffusionPipeline,
    PNDMScheduler,
    LMSDiscreteScheduler,
    UNet2DConditionModel,
)
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer


def preprocess_init_image(image: Image, width: int, height: int):
    image = image.resize((width, height), resample=Image.LANCZOS)
    image = np.array(image).astype(np.float32) / 255.0
    image = image[None].transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return 2.0 * image - 1.0


def preprocess_mask(mask: Image, width: int, height: int):
    mask = mask.convert("L")
    mask = mask.resize((width // 8, height // 8), resample=Image.LANCZOS)
    mask = np.array(mask).astype(np.float32) / 255.0
    mask = np.tile(mask, (4, 1, 1))
    mask = mask[None].transpose(0, 1, 2, 3)  # what does this step do?
    mask = torch.from_numpy(mask)
    return mask


class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
    """
    From https://github.com/huggingface/diffusers/pull/241
    """

    def __init__(
        self,
        vae: AutoencoderKL,
        text_encoder: CLIPTextModel,
        tokenizer: CLIPTokenizer,
        unet: UNet2DConditionModel,
        scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
        safety_checker: StableDiffusionSafetyChecker,
        feature_extractor: CLIPFeatureExtractor,
    ):
        super().__init__()
        scheduler = scheduler.set_format("pt")
        self.register_modules(
            vae=vae,
            text_encoder=text_encoder,
            tokenizer=tokenizer,
            unet=unet,
            scheduler=scheduler,
            safety_checker=safety_checker,
            feature_extractor=feature_extractor,
        )

    @torch.no_grad()
    def __call__(
        self,
        prompt: Union[str, List[str]],
        init_image: Optional[torch.FloatTensor],
        mask: Optional[torch.FloatTensor],
        width: int,
        height: int,
        prompt_strength: float = 0.8,
        num_inference_steps: int = 50,
        guidance_scale: float = 7.5,
        eta: float = 0.0,
        generator: Optional[torch.Generator] = None,
    ) -> Image:
        if isinstance(prompt, str):
            batch_size = 1
        elif isinstance(prompt, list):
            batch_size = len(prompt)
        else:
            raise ValueError(
                f"`prompt` has to be of type `str` or `list` but is {type(prompt)}"
            )

        if prompt_strength < 0 or prompt_strength > 1:
            raise ValueError(
                f"The value of prompt_strength should in [0.0, 1.0] but is {prompt_strength}"
            )

        if mask is not None and init_image is None:
            raise ValueError(
                "If mask is defined, then init_image also needs to be defined"
            )

        if width % 8 != 0 or height % 8 != 0:
            raise ValueError("Width and height must both be divisible by 8")

        # set timesteps
        accepts_offset = "offset" in set(
            inspect.signature(self.scheduler.set_timesteps).parameters.keys()
        )
        extra_set_kwargs = {}
        offset = 0
        if accepts_offset:
            offset = 1
            extra_set_kwargs["offset"] = 1

        self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)

        if init_image is not None:
            init_latents_orig, latents, init_timestep = self.latents_from_init_image(
                init_image,
                prompt_strength,
                offset,
                num_inference_steps,
                batch_size,
                generator,
            )
        else:
            latents = torch.randn(
                (batch_size, self.unet.in_channels, height // 8, width // 8),
                generator=generator,
                device=self.device,
            )
            init_timestep = num_inference_steps

        do_classifier_free_guidance = guidance_scale > 1.0
        text_embeddings = self.embed_text(
            prompt, do_classifier_free_guidance, batch_size
        )

        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
        # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
        # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
        # and should be between [0, 1]
        accepts_eta = "eta" in set(
            inspect.signature(self.scheduler.step).parameters.keys()
        )
        extra_step_kwargs = {}
        if accepts_eta:
            extra_step_kwargs["eta"] = eta

        mask_noise = torch.randn(latents.shape, generator=generator, device=self.device)

        # if we use LMSDiscreteScheduler, let's make sure latents are mulitplied by sigmas
        if isinstance(self.scheduler, LMSDiscreteScheduler):
            latents = latents * self.scheduler.sigmas[0]

        t_start = max(num_inference_steps - init_timestep + offset, 0)
        for i, t in tqdm(enumerate(self.scheduler.timesteps[t_start:])):
            # expand the latents if we are doing classifier free guidance
            latent_model_input = (
                torch.cat([latents] * 2) if do_classifier_free_guidance else latents
            )

            if isinstance(self.scheduler, LMSDiscreteScheduler):
                sigma = self.scheduler.sigmas[i]
                latent_model_input = latent_model_input / ((sigma ** 2 + 1) ** 0.5)

            # predict the noise residual
            noise_pred = self.unet(
                latent_model_input, t, encoder_hidden_states=text_embeddings
            )["sample"]

            # perform guidance
            if do_classifier_free_guidance:
                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                noise_pred = noise_pred_uncond + guidance_scale * (
                    noise_pred_text - noise_pred_uncond
                )

            # compute the previous noisy sample x_t -> x_t-1
            if isinstance(self.scheduler, LMSDiscreteScheduler):
                latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs)[
                    "prev_sample"
                ]
            else:
                latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs)[
                    "prev_sample"
                ]

            # replace the unmasked part with original latents, with added noise
            if mask is not None:
                timesteps = self.scheduler.timesteps[t_start + i]
                timesteps = torch.tensor(
                    [timesteps] * batch_size, dtype=torch.long, device=self.device
                )
                noisy_init_latents = self.scheduler.add_noise(init_latents_orig, mask_noise, timesteps)
                latents = noisy_init_latents * mask + latents * (1 - mask)

        # scale and decode the image latents with vae
        latents = 1 / 0.18215 * latents
        image = self.vae.decode(latents)

        image = (image / 2 + 0.5).clamp(0, 1)
        image = image.cpu().permute(0, 2, 3, 1).numpy()

        # run safety checker
        safety_cheker_input = self.feature_extractor(
            self.numpy_to_pil(image), return_tensors="pt"
        ).to(self.device)
        image, has_nsfw_concept = self.safety_checker(
            images=image, clip_input=safety_cheker_input.pixel_values
        )

        image = self.numpy_to_pil(image)

        return {"sample": image, "nsfw_content_detected": has_nsfw_concept}

    def latents_from_init_image(
        self,
        init_image: torch.FloatTensor,
        prompt_strength: float,
        offset: int,
        num_inference_steps: int,
        batch_size: int,
        generator: Optional[torch.Generator],
    ) -> Tuple[torch.FloatTensor, torch.FloatTensor, int]:
        # encode the init image into latents and scale the latents
        init_latents = self.vae.encode(init_image.to(self.device)).sample()
        init_latents = 0.18215 * init_latents
        init_latents_orig = init_latents

        # prepare init_latents noise to latents
        init_latents = torch.cat([init_latents] * batch_size)

        # get the original timestep using init_timestep
        init_timestep = int(num_inference_steps * prompt_strength) + offset
        init_timestep = min(init_timestep, num_inference_steps)
        timesteps = self.scheduler.timesteps[-init_timestep]
        timesteps = torch.tensor(
            [timesteps] * batch_size, dtype=torch.long, device=self.device
        )

        # add noise to latents using the timesteps
        noise = torch.randn(init_latents.shape, generator=generator, device=self.device)
        init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)

        return init_latents_orig, init_latents, init_timestep

    def embed_text(
        self,
        prompt: Union[str, List[str]],
        do_classifier_free_guidance: bool,
        batch_size: int,
    ) -> torch.FloatTensor:
        # get prompt text embeddings
        text_input = self.tokenizer(
            prompt,
            padding="max_length",
            max_length=self.tokenizer.model_max_length,
            truncation=True,
            return_tensors="pt",
        )
        text_embeddings = self.text_encoder(text_input.input_ids.to(self.device))[0]

        # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
        # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
        # corresponds to doing no classifier free guidance.
        # get unconditional embeddings for classifier free guidance
        if do_classifier_free_guidance:
            max_length = text_input.input_ids.shape[-1]
            uncond_input = self.tokenizer(
                [""] * batch_size,
                padding="max_length",
                max_length=max_length,
                return_tensors="pt",
            )
            uncond_embeddings = self.text_encoder(
                uncond_input.input_ids.to(self.device)
            )[0]

            # For classifier free guidance, we need to do two forward passes.
            # Here we concatenate the unconditional and text embeddings into a single batch
            # to avoid doing two forward passes
            text_embeddings = torch.cat([uncond_embeddings, text_embeddings])

        return text_embeddings


================================================
FILE: predict.py
================================================
import os
from typing import Optional, List

import torch
from torch import autocast
from diffusers import PNDMScheduler, LMSDiscreteScheduler
from PIL import Image
from cog import BasePredictor, Input, Path

from image_to_image import (
    StableDiffusionImg2ImgPipeline,
    preprocess_init_image,
    preprocess_mask,
)


MODEL_CACHE = "diffusers-cache"


class Predictor(BasePredictor):
    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        print("Loading pipeline...")
        scheduler = PNDMScheduler(
            beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear"
        )
        self.pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
            "CompVis/stable-diffusion-v1-4",
            scheduler=scheduler,
            revision="fp16",
            torch_dtype=torch.float16,
            cache_dir=MODEL_CACHE,
            local_files_only=True,
        ).to("cuda")

    @torch.inference_mode()
    @torch.cuda.amp.autocast()
    def predict(
        self,
        prompt: str = Input(description="Input prompt", default=""),
        width: int = Input(
            description="Width of output image",
            choices=[128, 256, 512, 768, 1024],
            default=512,
        ),
        height: int = Input(
            description="Height of output image",
            choices=[128, 256, 512, 768],
            default=512,
        ),
        init_image: Path = Input(
            description="Inital image to generate variations of. Will be resized to the specified width and height", default=None
        ),
        mask: Path = Input(
            description="Black and white image to use as mask for inpainting over init_image. Black pixels are inpainted and white pixels are preserved. Experimental feature, tends to work better with prompt strength of 0.5-0.7",
            default=None,
        ),
        prompt_strength: float = Input(
            description="Prompt strength when using init image. 1.0 corresponds to full destruction of information in init image",
            default=0.8,
        ),
        num_outputs: int = Input(
            description="Number of images to output", choices=[1, 4], default=1
        ),
        num_inference_steps: int = Input(
            description="Number of denoising steps", ge=1, le=500, default=50
        ),
        guidance_scale: float = Input(
            description="Scale for classifier-free guidance", ge=1, le=20, default=7.5
        ),
        seed: int = Input(
            description="Random seed. Leave blank to randomize the seed", default=None
        ),
    ) -> List[Path]:
        """Run a single prediction on the model"""
        if seed is None:
            seed = int.from_bytes(os.urandom(2), "big")
        print(f"Using seed: {seed}")

        if init_image:
            init_image = Image.open(init_image).convert("RGB")
            init_image = preprocess_init_image(init_image, width, height).to("cuda")

            # use PNDM with init images
            scheduler = PNDMScheduler(
                beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear"
            )
        else:
            # use LMS without init images
            scheduler = LMSDiscreteScheduler(
                beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear"
            )

        self.pipe.scheduler = scheduler

        if mask:
            mask = Image.open(mask).convert("RGB")
            mask = preprocess_mask(mask, width, height).to("cuda")

        generator = torch.Generator("cuda").manual_seed(seed)
        output = self.pipe(
            prompt=[prompt] * num_outputs if prompt is not None else None,
            init_image=init_image,
            mask=mask,
            width=width,
            height=height,
            prompt_strength=prompt_strength,
            guidance_scale=guidance_scale,
            generator=generator,
            num_inference_steps=num_inference_steps,
        )
        if any(output["nsfw_content_detected"]):
            raise Exception("NSFW content detected, please try a different prompt")

        output_paths = []
        for i, sample in enumerate(output["sample"]):
            output_path = f"/tmp/out-{i}.png"
            sample.save(output_path)
            output_paths.append(Path(output_path))

        return output_paths


================================================
FILE: script/download-weights
================================================
#!/usr/bin/env python

import os
import sys

import torch
from diffusers import StableDiffusionPipeline

os.makedirs("diffusers-cache", exist_ok=True)


pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    cache_dir="diffusers-cache",
    revision="fp16",
    torch_dtype=torch.float16,
    use_auth_token=sys.argv[1],
)
Download .txt
gitextract_ahtj3bew/

├── .gitignore
├── README.md
├── cog.yaml
├── image_to_image.py
├── predict.py
└── script/
    └── download-weights
Download .txt
SYMBOL INDEX (10 symbols across 2 files)

FILE: image_to_image.py
  function preprocess_init_image (line 21) | def preprocess_init_image(image: Image, width: int, height: int):
  function preprocess_mask (line 29) | def preprocess_mask(mask: Image, width: int, height: int):
  class StableDiffusionImg2ImgPipeline (line 39) | class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
    method __init__ (line 44) | def __init__(
    method __call__ (line 67) | def __call__(
    method latents_from_init_image (line 214) | def latents_from_init_image(
    method embed_text (line 245) | def embed_text(

FILE: predict.py
  class Predictor (line 20) | class Predictor(BasePredictor):
    method setup (line 21) | def setup(self):
    method predict (line 38) | def predict(
Condensed preview — 6 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (17K chars).
[
  {
    "path": ".gitignore",
    "chars": 35,
    "preview": ".cog/\n__pycache__/\ndiffusers-cache/"
  },
  {
    "path": "README.md",
    "chars": 673,
    "preview": "# Stable Diffusion Cog model\n\nThis is an implementation of the [Diffusers Stable Diffusion 1.4](https://huggingface.co/C"
  },
  {
    "path": "cog.yaml",
    "chars": 286,
    "preview": "build:\n  gpu: true\n  cuda: \"11.6.2\"\n  python_version: \"3.10\"\n  python_packages:\n    - \"diffusers==0.2.4\"\n    - \"torch==1"
  },
  {
    "path": "image_to_image.py",
    "chars": 10646,
    "preview": "import inspect\nfrom typing import List, Optional, Union, Tuple\n\nimport numpy as np\nimport torch\n\nfrom PIL import Image\nf"
  },
  {
    "path": "predict.py",
    "chars": 4389,
    "preview": "import os\nfrom typing import Optional, List\n\nimport torch\nfrom torch import autocast\nfrom diffusers import PNDMScheduler"
  },
  {
    "path": "script/download-weights",
    "chars": 357,
    "preview": "#!/usr/bin/env python\n\nimport os\nimport sys\n\nimport torch\nfrom diffusers import StableDiffusionPipeline\n\nos.makedirs(\"di"
  }
]

About this extraction

This page contains the full source code of the andreasjansson/cog-stable-diffusion GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 6 files (16.0 KB), approximately 3.9k tokens, and a symbol index with 10 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!