Repository: TheMistoAI/MistoControlNet-Flux-dev
Branch: main
Commit: 12931880cf5b
Files: 13
Total size: 61.5 KB
Directory structure:
gitextract_y65za_rv/
├── .idea/
│ ├── .gitignore
│ ├── MistoControlNet-Flux-dev.iml
│ ├── inspectionProfiles/
│ │ └── profiles_settings.xml
│ ├── misc.xml
│ ├── modules.xml
│ └── vcs.xml
├── README.md
├── README_CN.md
├── __init__.py
├── modules/
│ ├── misto_controlnet.py
│ └── utils.py
├── nodes.py
└── workflows/
└── example_workflow.json
================================================
FILE CONTENTS
================================================
================================================
FILE: .idea/.gitignore
================================================
# Default ignored files
/shelf/
/workspace.xml
================================================
FILE: .idea/MistoControlNet-Flux-dev.iml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
</module>
================================================
FILE: .idea/inspectionProfiles/profiles_settings.xml
================================================
<component name="InspectionProjectProfileManager">
<settings>
<option name="USE_PROJECT_PROFILE" value="false" />
<version value="1.0" />
</settings>
</component>
================================================
FILE: .idea/misc.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="Black">
<option name="sdkName" value="Python 3.11" />
</component>
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.11" project-jdk-type="Python SDK" />
</project>
================================================
FILE: .idea/modules.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/MistoControlNet-Flux-dev.iml" filepath="$PROJECT_DIR$/.idea/MistoControlNet-Flux-dev.iml" />
</modules>
</component>
</project>
================================================
FILE: .idea/vcs.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="" vcs="Git" />
</component>
</project>
================================================
FILE: README.md
================================================
# New Update!
We have just launched our latest product, Misto.
The most powerful AI Mind Palace built for all designers.
Warmly welcome everyone to try it out.
### Website here: https://themisto.ai/

[中文版-README](README_CN.md)
## VERY IMPORTANT
<mark>!!!Please update the ComfyUI-suite for fixed the tensor mismatch promblem.
!!!please donot use AUTO cfg for our ksampler, it will have a very bad result.
!!!Strength and prompt senstive, be care for your prompt and try 0.5 as the starting controlnet strength
!!!update a new example workflow in workflow folder, get start with it.<mark>
## Summary
by TheMisto.ai @Shenzhen, China
This is a ControlNet network designed for any lineart or outline sketches, compatible with Flux1.dev. The ControlNet model parameters are approximately 1.4B.
This model is not compatible with XLabs loaders and samplers. Please use TheMisto.ai Flux ControlNet ComfyUI suite.
This is a Flow matching structure Flux-dev model, utilizing a scalable Transformer module as the backbone of this ControlNet.
We've implemented a dual-stream Transformer structure, which enhances alignment and expressiveness for various types of lineart and outline conditions without increasing inference time. The model has also been trained for alignment with both T5 and clip-l TextEncoders, ensuring balanced performance between conditioning images and text prompts.
For more details on the Flux.dev model structure, visit: https://huggingface.co/black-forest-labs/FLUX.1-dev
This ControlNet is compatible with Flux1.dev's fp16/fp8 and other models quantized with Flux1.dev. ByteDance 8/16-step distilled models have not been tested.
- The example workflow uses the flux1-dev-Q4_K_S.gguf quantized model.
- Generation quality: Flux1.dev(fp16)>>Flux1.dev(fp8)>>Other quantized models
- Generation speed: Flux1.dev(fp16)<<< Flux1.dev(fp8) <<< Other quantized models
## Performance
### Performance Across Different Sizes and Scenarios
Tested in various common scenarios such as industrial design, architecture, interior design, animation, games, and photography.
Make sure to craft your prompts well—precision is more important than length!
Performance examples are shown below:

### Performance with Different Lineart or Scribble Preprocessors
Test Parameters:
- Prompt: "Hyper-realistic 3D render of a classic Formula 1 race car, bright red with Marlboro and Agip logos, number 1, large black tires, dramatic studio lighting, dark moody background, reflective floor, cinematic atmosphere, Octane render style, high detail"
- controlnet_strength: 0.650.8 (Recommended: Anyline with 0.60.7)
- steps: 30
- guidance: 4.0
- The quality of generated images is positively correlated with prompt quality. Controlnet_strength may vary for different types of lineart and outlines, so experiment with the settings!

### Recommended Settings
- Image resolution: 720px or above on the short edge
- controlnet strength: 0.6~0.85 (adjust as needed)
- guidance: 3.0~5.0 (adjust as needed)
- steps: 30 or more
## Huggingface (抱抱脸):
[MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev)
## Usage
- Download the model from [MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev)
- Place the model in the ComfyUI\models\TheMisto_model\ directory
- The directory will be automatically created the first time you run the ComfyUI's TheMisto.ai Flux ControlNet ComfyUI suite
- Run using ComfyUI; an example workflow can be found in the workflow folder
- Note: The length and width of the conditioning image must be divisible by 16, or an error will occur.
### ComfyUI

## Training Details
The Transformer structure, under the scale law, has a significant impact on training time and computational power (higher compute cost, longer training time).
The training cost for MistoLine_Flux1_dev is several times that of MistoLineSDXL.
We conducted extensive ablation experiments to balance performance with training costs.
This training was done using A100-80GB with bf16 mixed precision on the Flux1.dev series models. Apart from Lora, consumer-grade GPUs are basically unsuitable for training.
In our experiments with larger parameter models, multi-GPUs, multi-node parallel training was required, which is costly.
If we reach 50,000 stars, we will open-source the Technical Report detailing more training details.
## License
Align to the FLUX.1 [dev] Non-Commercial License
This ComfyUI node falls under ComfyUI
This model is for research and educational purposes only and may not be used for any form of commercial purposes.
## Business Cooperation
For any custom model training, commercial cooperation, AI application development, or other business collaboration matters, please contact us.
- *Business:* info@themisto.ai
## Media
### International:
Website: https://themisto.ai/
Discord: https://discord.gg/fTyDB2CU
X: https://x.com/AiThemisto79359
### Mainland China (中国大陆):
*Website:* https://themisto.ai/
*WeChat Official Account:* TheMisto AI (Shenzhen Mixed Tuple Technology Co., Ltd.)
*Xiaohongshu:* TheMisto.ai (Xiaohongshu ID: 4211745997)
================================================
FILE: README_CN.md
================================================

## 非常重要
<mark>!!!请更新ComfyUI套件至最新版本,修复了tensor mismatch问题
!!!不要使用Auto CFG来跟Ksampler配合使用,可能会导致非常糟糕的结果
!!!对controlnet strength和prompt非常敏感,请好好写prompt和从0.5的controlnet strength逐步尝试
!!!更新了一个工作流在./workflow文件夹中,从从这个工作流开始<mark>

## 概述
by TheMisto.ai @Shenzhen, China
这是一个适用于任意线稿、轮廓用于Flux1.dev的ControlNet网络,本ControlNet参数约为1.4B。
本模型不兼容XLabs的加载和采样器,请使用TheMisto.ai Flux ControlNet ComfyUI套件。
这是一个Flow matching结构的Flux-dev模型,使用了可scale的Transformer 模块来作为本ControlNet的骨干网
这次我们使用了双流型Transformer结构,在不增加推理时间的情况下对不同类型的线稿和轮廓条件有更好的表现力和对齐效果,同时对T5和clip-l两个TextEncoder的文字对齐也得到了对应的训练,不会出现只对conditioning image有响应而对prompt对齐能力下降的问题,尝试做到条件图像和文本对齐都有较好的表现。Flux.dev模型结构等请浏览:https://huggingface.co/black-forest-labs/FLUX.1-dev
本ControlNet适用于Flux1.dev的fp16/fp8以及其他使用Flux1.dev量化的模型, 字节跳动8/16步蒸馏的没有测试过。
示例Workflow使用的是flux1-dev-Q4_K_S.gguf量化的模型。
生成质量:Flux1.dev(fp16)>>Flux1.dev(fp8)>>其他量化模型
生成速度:Flux1.dev(fp16)<<< Flux1.dev(fp8) <<< 其他量化模型
### 效果
#### 不同尺寸和场景效果
测试了工业设计、建筑设计、室内设计、动漫、游戏、照片等常用场景。
请好好写prompt,不是长,是要比较精确!
效果如下:

#### 不同类型的Lineart or scribble preprocessor
测试参数:
- Prompt: "Hyper-realistic 3D render of a classic Formula 1 race car, bright red with Marlboro and Agip logos, number 1, large black tires, dramatic studio lighting, dark moody background, reflective floor, cinematic atmosphere, Octane render style, high detail"
- controlnet_strength: 0.65-0.8 (推荐:Anyline with 0.6-0.7)
- step: 30
- guidance: 4.0
- 生成效果跟prompt质量成正相关关系,不同类型的线稿、轮廓使用的controlnet_strength也可能不同,多试一下!

### 推荐参数
- 图像分辨率:720px 以上 短边
- controlnet strength:0.6~0.85 (视情况调整)
- guidance:3.0~5.0 (视情况调整)
- step:30 以上
### Huggingface(抱抱脸):
[MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev)
## 用法
- 从[MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev)下载模型
- 将模型放入到 ComfyUI\models\TheMisto_model\ 目录下
- 第一次运行ComfyUI的TheMisto.ai Flux ControlNet ComfyUI套件会自动创建该目录
- 使用ComfyUI运行,文件夹workflow下有example的workflow
- 注意:条件图的长和宽必须能被16整除,否则会报错
### ComfyUI

## 训练细节
Transformer结构在scale law的背景下会对训练时间和算力产生巨大影响(更大的算力消耗,更多的训练时间),MistoLine_Flux1_dev的训练成本为MistoLineSDXL的数倍之多。我们做了大量的消融实验来确保效果与训练成本的平衡。
本次训练使用了A100-80GB,bf16混合精度,Flux1.dev系列模型的训练,训练的话除了Lora之外基本可以告别消费级显卡。
在我们更大的参数量模型的实验中,需要使用多卡多节点并行训练,成本较大。
标星到5万我们将开源Technical report,介绍更详细的训练细节。
## 许可
- Align to the FLUX.1 [dev] Non-Commercial License
- This ComfyUI node fall under ComfyUI
- 本模型仅供研究和学习,不可用于任何形式商用
## Business Cooperation(商务合作)
For any custom model training, commercial cooperation, AI application development, or other business collaboration matters.
please contact us.
- Business :info@themisto.ai
- Investment: investment@themisto.ai
如果有任何模型定制训练,商业合作,AI应用开发等合作事宜请联系。
同时我们也欢迎投资者咨询更多信息。
- 业务电邮:info@themisto.ai
- 投资者关系:investment@themisto.ai
## WIP
- Flux1.dev-MistoCN-collection
- Flux1.dev-Misto-IPAdapter
你的star是我们开源的动力!
## One more thing

我们将会在最近推出我们自己的的产品:(一款极其简单易用的多模态AI创意创作APP - [Misto])
以最简单和启发性的体验,重新激发大众创作欲望
创意触手可及,扩展想象力边界,让无限灵感成就超级个体!
支持平台:全平台
## 媒体
### 国际化:
website: https://themisto.ai/
Discord:https://discord.gg/fTyDB2CU
X:https://x.com/AiThemisto79359
### 中国大陆地区:
website: https://themisto.ai/
wechat公众号:TheMisto AI(深圳混合元组科技有限公司)
小红书:TheMisto.ai (小红书号:4211745997)
================================================
FILE: __init__.py
================================================
from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
__all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"]
================================================
FILE: modules/misto_controlnet.py
================================================
import torch
from diffusers.utils import is_torch_version
from torch import Tensor, nn
from einops import rearrange
from typing import Any, Dict, Tuple, Union
from .utils import EmbedND, MLPEmbedder, DoubleStreamBlock, SingleStreamBlock, timestep_embedding
def zero_module(module):
for p in module.parameters():
nn.init.zeros_(p)
return module
class CondDownsamplBlock(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(16, 16, 1),
nn.SiLU(),
zero_module(nn.Conv2d(16, 16, 3, padding=1))
)
def forward(self, x):
return self.encoder(x)
class EnhanceControlnet(nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.linear = nn.Linear(hidden_size, hidden_size)
self.act = nn.SiLU()
nn.init.eye_(self.linear.weight)
nn.init.zeros_(self.linear.bias)
def forward(self, x):
return self.act(self.linear(x))
class MistoControlNetFluxDev(nn.Module):
_supports_gradient_checkpointing = True
def __init__(
self,
in_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
num_heads=24,
num_transformer=2,
num_single_transformer=2,
guidance_embed=True,
):
super().__init__()
self.out_channels = in_channels
self.axes_dim = [16, 56, 56]
self.theta=10_000
self.guidance_embed = guidance_embed
if hidden_size % num_heads != 0:
raise ValueError(f"Hidden size {hidden_size} must be divisible by num_heads {num_heads}")
pe_dim = hidden_size // num_heads
if sum(self.axes_dim) != pe_dim:
raise ValueError(f"Got {self.axes_dim} but expected positional dim {pe_dim}")
self.hidden_size = hidden_size
self.num_heads = num_heads
self.pe_embedder = EmbedND(dim=pe_dim, theta=self.theta, axes_dim=self.axes_dim)
self.img_in = nn.Linear(in_channels, self.hidden_size, bias=True)
self.txt_in = nn.Linear(context_in_dim, self.hidden_size)
self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)
self.vector_in = MLPEmbedder(vec_in_dim, self.hidden_size)
self.guidance_in = (MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if guidance_embed else nn.Identity())
self.pos_embed_input = nn.Linear(in_channels, self.hidden_size, bias=True)
self.gradient_checkpointing = False
self.double_blocks = nn.ModuleList(
[
DoubleStreamBlock(
self.hidden_size,
self.num_heads,
mlp_ratio=4.0,
qkv_bias=True,
)
for _ in range(num_transformer)
]
)
self.single_blocks = nn.ModuleList(
[
SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=4.0)
for _ in range(num_single_transformer)
]
)
# ControlNet blocks
self.controlnet_blocks = nn.ModuleList([])
for _ in range(num_transformer):
controlnet_block = EnhanceControlnet(self.hidden_size)
controlnet_block = zero_module(controlnet_block)
self.controlnet_blocks.append(controlnet_block)
# single controlnet blocks
self.single_controlnet_blocks = nn.ModuleList([])
for _ in range(num_single_transformer):
controlnet_block = EnhanceControlnet(self.hidden_size)
controlnet_block = zero_module(controlnet_block)
self.single_controlnet_blocks.append(controlnet_block)
# Input processing
self.input_cond_block = CondDownsamplBlock()
def _set_gradient_checkpointing(self, module, value=False):
if hasattr(module, "gradient_checkpointing"):
module.gradient_checkpointing = value
@property
def attn_processors(self):
# set recursively
processors = {}
def fn_recursive_add_processors(name: str, module: torch.nn.Module, processors):
if hasattr(module, "set_processor"):
processors[f"{name}.processor"] = module.processor
for sub_name, child in module.named_children():
fn_recursive_add_processors(f"{name}.{sub_name}", child, processors)
return processors
for name, module in self.named_children():
fn_recursive_add_processors(name, module, processors)
return processors
def set_attn_processor(self, processor):
r"""
Sets the attention processor to use to compute attention.
Parameters:
processor (`dict` of `AttentionProcessor` or only `AttentionProcessor`):
The instantiated processor class or a dictionary of processor classes that will be set as the processor
for **all** `Attention` layers.
If `processor` is a dict, the key needs to define the path to the corresponding cross attention
processor. This is strongly recommended when setting trainable attention processors.
"""
count = len(self.attn_processors.keys())
if isinstance(processor, dict) and len(processor) != count:
raise ValueError(
f"A dict of processors was passed, but the number of processors {len(processor)} does not match the"
f" number of attention layers: {count}. Please make sure to pass {count} processor classes."
)
def fn_recursive_attn_processor(name: str, module: torch.nn.Module, processor):
if hasattr(module, "set_processor"):
if not isinstance(processor, dict):
module.set_processor(processor)
else:
module.set_processor(processor.pop(f"{name}.processor"))
for sub_name, child in module.named_children():
fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor)
for name, module in self.named_children():
fn_recursive_attn_processor(name, module, processor)
def forward(
self,
img: Tensor,
img_ids: Tensor,
controlnet_cond: Tensor,
txt: Tensor,
txt_ids: Tensor,
timesteps: Tensor,
y: Tensor,
guidance: Tensor | None = None,
) -> Tuple[Tensor, Tensor]:
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
# running on sequences img
img = self.img_in(img)
controlnet_cond = self.input_cond_block(controlnet_cond)
controlnet_cond = rearrange(controlnet_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
controlnet_cond = self.pos_embed_input(controlnet_cond)
img = img + controlnet_cond
vec = self.time_in(timestep_embedding(timesteps, 256))
if self.guidance_embed:
if guidance is None:
raise ValueError("Didn't get guidance strength for guidance distilled model.")
vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
vec = vec + self.vector_in(y)
txt = self.txt_in(txt)
ids = torch.cat((txt_ids, img_ids), dim=1)
pe = self.pe_embedder(ids)
block_res_samples = ()
for block in self.double_blocks:
if self.training and self.gradient_checkpointing:
def create_custom_forward(module, return_dict=None):
def custom_forward(*inputs):
if return_dict is not None:
return module(*inputs, return_dict=return_dict)
else:
return module(*inputs)
return custom_forward
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
create_custom_forward(block),
img,
txt,
vec,
pe,
)
else:
img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
block_res_samples = block_res_samples + (img,)
img = torch.cat((txt, torch.zeros_like(img)), 1)
single_block_res_samples = ()
for index, block in enumerate(self.single_blocks):
if self.training and self.gradient_checkpointing:
def create_custom_forward(module, return_dict=None):
def custom_forward(*inputs):
if return_dict is not None:
return module(*inputs, return_dict=return_dict)
else:
return module(*inputs)
return custom_forward
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
create_custom_forward(block),
img,
vec,
pe,
)
else:
img = block(img, vec=vec, pe=pe)
single_block_res_samples = single_block_res_samples+(img,)
controlnet_block_res_samples = ()
for block_res_sample, controlnet_block in zip(block_res_samples, self.controlnet_blocks):
block_res_sample = controlnet_block(block_res_sample)
controlnet_block_res_samples = controlnet_block_res_samples + (block_res_sample,)
single_controlnet_block_res_samples = ()
for single_block_res_sample, single_controlnet_block in zip(single_block_res_samples, self.single_controlnet_blocks):
single_block_res_sample = single_controlnet_block(single_block_res_sample)
single_controlnet_block_res_samples = single_controlnet_block_res_samples + (single_block_res_sample,)
return controlnet_block_res_samples,single_controlnet_block_res_samples
================================================
FILE: modules/utils.py
================================================
import math
from dataclasses import dataclass
from typing import Callable,Dict,Any
import torch
from einops import rearrange
from torch import Tensor, nn
from tqdm import tqdm
def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
q, k = apply_rope(q, k, pe)
x = torch.nn.functional.scaled_dot_product_attention(q, k, v)
x = rearrange(x, "B H L D -> B L (H D)")
return x
def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
assert dim % 2 == 0
scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
omega = 1.0 / (theta**scale)
out = torch.einsum("...n,d->...nd", pos, omega)
out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)
out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2)
return out.float()
def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:
xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)
xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)
xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)
class EmbedND(nn.Module):
def __init__(self, dim: int, theta: int, axes_dim: list[int]):
super().__init__()
self.dim = dim
self.theta = theta
self.axes_dim = axes_dim
def forward(self, ids: Tensor) -> Tensor:
n_axes = ids.shape[-1]
emb = torch.cat(
[rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
dim=-3,
)
return emb.unsqueeze(1)
def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):
"""
Create sinusoidal timestep embeddings.
:param t: a 1-D Tensor of N indices, one per batch element.
These may be fractional.
:param dim: the dimension of the output.
:param max_period: controls the minimum frequency of the embeddings.
:return: an (N, D) Tensor of positional embeddings.
"""
t = time_factor * t
half = dim // 2
freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(
t.device
)
args = t[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
if torch.is_floating_point(t):
embedding = embedding.to(t)
return embedding
class MLPEmbedder(nn.Module):
def __init__(self, in_dim: int, hidden_dim: int):
super().__init__()
self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True)
self.silu = nn.SiLU()
self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True)
def forward(self, x: Tensor) -> Tensor:
return self.out_layer(self.silu(self.in_layer(x)))
class RMSNorm(torch.nn.Module):
def __init__(self, dim: int):
super().__init__()
self.scale = nn.Parameter(torch.ones(dim))
def forward(self, x: Tensor):
x_dtype = x.dtype
x = x.float()
rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6)
return (x * rrms).to(dtype=x_dtype) * self.scale
class QKNorm(torch.nn.Module):
def __init__(self, dim: int):
super().__init__()
self.query_norm = RMSNorm(dim)
self.key_norm = RMSNorm(dim)
def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Tensor]:
q = self.query_norm(q)
k = self.key_norm(k)
return q.to(v), k.to(v)
class SelfAttention(nn.Module):
def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
self.norm = QKNorm(head_dim)
self.proj = nn.Linear(dim, dim)
def forward(self, x: Tensor, pe: Tensor) -> Tensor:
qkv = self.qkv(x)
q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
q, k = self.norm(q, k, v)
x = attention(q, k, v, pe=pe)
x = self.proj(x)
return x
@dataclass
class ModulationOut:
shift: Tensor
scale: Tensor
gate: Tensor
class Modulation(nn.Module):
def __init__(self, dim: int, double: bool):
super().__init__()
self.is_double = double
self.multiplier = 6 if double else 3
self.lin = nn.Linear(dim, self.multiplier * dim, bias=True)
def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut | None]:
out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1)
return (
ModulationOut(*out[:3]),
ModulationOut(*out[3:]) if self.is_double else None,
)
class DoubleStreamBlock(nn.Module):
def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False):
super().__init__()
mlp_hidden_dim = int(hidden_size * mlp_ratio)
self.num_heads = num_heads
self.hidden_size = hidden_size
self.img_mod = Modulation(hidden_size, double=True)
self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.img_mlp = nn.Sequential(
nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
nn.GELU(approximate="tanh"),
nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
)
self.txt_mod = Modulation(hidden_size, double=True)
self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.txt_mlp = nn.Sequential(
nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
nn.GELU(approximate="tanh"),
nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
)
def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor) -> tuple[Tensor, Tensor]:
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
# prepare image for attention
img_modulated = self.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = self.img_attn.qkv(img_modulated)
img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
# prepare txt for attention
txt_modulated = self.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = self.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
# run actual attention
q = torch.cat((txt_q, img_q), dim=2)
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn = attention(q, k, v, pe=pe)
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
# calculate the img bloks
img = img + img_mod1.gate * self.img_attn.proj(img_attn)
img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
# calculate the txt bloks
txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
return img, txt
class SingleStreamBlock(nn.Module):
"""
A DiT block with parallel linear layers as described in
https://arxiv.org/abs/2302.05442 and adapted modulation interface.
"""
def __init__(
self,
hidden_size: int,
num_heads: int,
mlp_ratio: float = 4.0,
qk_scale: float | None = None,
):
super().__init__()
self.hidden_dim = hidden_size
self.num_heads = num_heads
head_dim = hidden_size // num_heads
self.scale = qk_scale or head_dim**-0.5
self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
# qkv and mlp_in
self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim)
# proj and mlp_out
self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size)
self.norm = QKNorm(head_dim)
self.hidden_size = hidden_size
self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.mlp_act = nn.GELU(approximate="tanh")
self.modulation = Modulation(hidden_size, double=False)
def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
mod, _ = self.modulation(vec)
x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
q, k = self.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe)
# compute activation in mlp stream, cat again and run second linear layer
output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
return x + mod.gate * output
class LastLayer(nn.Module):
def __init__(self, hidden_size: int, patch_size: int, out_channels: int):
super().__init__()
self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))
def forward(self, x: Tensor, vec: Tensor) -> Tensor:
shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)
x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]
x = self.linear(x)
return x
def get_noise(
num_samples: int,
height: int,
width: int,
device: torch.device,
dtype: torch.dtype,
seed: int,
):
return torch.randn(
num_samples,
16,
# allow for packing
2 * math.ceil(height / 16),
2 * math.ceil(width / 16),
device=device,
dtype=dtype,
generator=torch.Generator(device=device).manual_seed(seed),
)
def time_shift(mu: float, sigma: float, t: Tensor):
return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
def get_lin_function(
x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15
) -> Callable[[float], float]:
m = (y2 - y1) / (x2 - x1)
b = y1 - m * x1
return lambda x: m * x + b
def get_schedule(
num_steps: int,
image_seq_len: int,
base_shift: float = 0.5,
max_shift: float = 1.15,
shift: bool = True,
) -> list[float]:
# extra step for zero
timesteps = torch.linspace(1, 0, num_steps + 1)
# shifting the schedule to favor high timesteps for higher signal images
if shift:
# eastimate mu based on linear estimation between two points
mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)
timesteps = time_shift(mu, 1.0, timesteps)
return timesteps.tolist()
def unpack(x: Tensor, height: int, width: int) -> Tensor:
return rearrange(
x,
"b (h w) (c ph pw) -> b c (h ph) (w pw)",
h=math.ceil(height / 16),
w=math.ceil(width / 16),
ph=2,
pw=2,
)
def forward_mistoCN(
model,
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
timesteps: Tensor,
y: Tensor,
block_controlnet_hidden_states=None,
single_controlnet_hidden_states=None,
guidance: Tensor | None = None,
):
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
# running on sequences img
img = model.img_in(img)
vec = model.time_in(timestep_embedding(timesteps, 256))
vec = vec + model.guidance_in(timestep_embedding(guidance, 256))
vec = vec + model.vector_in(y)
txt = model.txt_in(txt)
ids = torch.cat((txt_ids, img_ids), dim=1)
pe = model.pe_embedder(ids)
for index_block, block in enumerate(model.double_blocks):
img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
# controlnet residual
if block_controlnet_hidden_states is not None:
if len(block_controlnet_hidden_states) == 1:
img = img + block_controlnet_hidden_states[0]
else:
img = img + block_controlnet_hidden_states[index_block % 2]
img = torch.cat((txt, img), 1)
for index_block, block in enumerate(model.single_blocks):
img = block(img, vec=vec, pe=pe)
# controlnet residual
if single_controlnet_hidden_states is not None:
if len(single_controlnet_hidden_states) == 1:
img = img + single_controlnet_hidden_states[0]
else:
img = img + single_controlnet_hidden_states[index_block % 2]
img = img[:, txt.shape[1]:, ...]
img = model.final_layer(img, vec) # (N, T, patch_size ** 2 * out_channels)
return img
def denoise_controlnet(
pbar,
model,
controlnet,
# model input
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
vec: Tensor,
controlnet_cond,
neg_txt,
neg_txt_ids,
neg_vec,
# sampling parameters
timesteps: list[float],
guidance: float = 4.0,
controlnet_strength=1.0,
):
controlnet.to(img.device, dtype=img.dtype)
img_ids = img_ids.to(img.device, dtype=img.dtype)
controlnet_cond = controlnet_cond.to(img.device, dtype=img.dtype)
txt = txt.to(img.device, dtype=img.dtype)
txt_ids = txt_ids.to(img.device, dtype=img.dtype)
vec = vec.to(img.device, dtype=img.dtype)
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
guidance_vec = guidance_vec.to(img.device, dtype=img.dtype)
for t_curr, t_prev in tqdm(zip(timesteps[:-1], timesteps[1:]),desc="Sampling", total = len(timesteps)-1):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
block_res_samples, single_block_res_samples = controlnet(
img=img,
img_ids=img_ids,
controlnet_cond=controlnet_cond,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
guidance=guidance_vec,
)
pred = forward_mistoCN(
model=model,
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
guidance=guidance_vec,
block_controlnet_hidden_states=[i * controlnet_strength for i in block_res_samples],
single_controlnet_hidden_states = [i * controlnet_strength for i in single_block_res_samples]
)
# negative
neg_block_res_samples, neg_single_block_res_samples = controlnet(
img=img,
img_ids=img_ids,
controlnet_cond=controlnet_cond,
txt=neg_txt,
txt_ids=neg_txt_ids,
y=neg_vec,
timesteps=t_vec,
guidance=guidance_vec,
)
neg_pred = forward_mistoCN(
model=model,
img=img,
img_ids=img_ids,
txt=neg_txt,
txt_ids=neg_txt_ids,
y=neg_vec,
timesteps=t_vec,
guidance=guidance_vec,
block_controlnet_hidden_states=[i * controlnet_strength for i in neg_block_res_samples],
single_controlnet_hidden_states=[i * controlnet_strength for i in neg_single_block_res_samples]
)
f=0.85
pred = neg_pred + f * (pred - neg_pred)
img = img + (t_prev - t_curr) * pred
pbar.update(1)
return img
================================================
FILE: nodes.py
================================================
import torch
import os
import comfy.model_management
from comfy.utils import ProgressBar
import folder_paths
import numpy as np
from safetensors.torch import load_file
from einops import rearrange,repeat
from .modules.misto_controlnet import MistoControlNetFluxDev
from .modules.utils import get_schedule,get_noise,denoise_controlnet, unpack
import torch.nn.functional as F
dir_TheMistoModel = os.path.join(folder_paths.models_dir, "TheMisto_model")
os.makedirs(dir_TheMistoModel, exist_ok=True)
folder_paths.folder_names_and_paths["TheMisto_model"] = ([dir_TheMistoModel], folder_paths.supported_pt_extensions)
class LATENT_PROCESSOR_COMFY:
def __init__(self):
self.scale_factor = 0.3611
self.shift_factor = 0.1159
self.latent_rgb_factors =[
[-0.0404, 0.0159, 0.0609],
[ 0.0043, 0.0298, 0.0850],
[ 0.0328, -0.0749, -0.0503],
[-0.0245, 0.0085, 0.0549],
[ 0.0966, 0.0894, 0.0530],
[ 0.0035, 0.0399, 0.0123],
[ 0.0583, 0.1184, 0.1262],
[-0.0191, -0.0206, -0.0306],
[-0.0324, 0.0055, 0.1001],
[ 0.0955, 0.0659, -0.0545],
[-0.0504, 0.0231, -0.0013],
[ 0.0500, -0.0008, -0.0088],
[ 0.0982, 0.0941, 0.0976],
[-0.1233, -0.0280, -0.0897],
[-0.0005, -0.0530, -0.0020],
[-0.1273, -0.0932, -0.0680]
]
def __call__(self, x):
return (x / self.scale_factor) + self.shift_factor
def go_back(self, x):
return (x - self.shift_factor) * self.scale_factor
MAX_RESOLUTION=16384
def prepare_sampling(t5_emb, clip_emb, img,batch_size):
bs, c, h, w = img.shape
bs = batch_size
img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img.shape[0] == 1 and bs > 1:
img = repeat(img, "1 ... -> bs ...", bs=bs)
img_ids = torch.zeros(h // 2, w // 2, 3)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
if t5_emb.shape[0] == 1 and bs > 1:
t5_emb = repeat(t5_emb, "1 ... -> bs ...", bs=bs)
t5_emb_ids = torch.zeros(bs, t5_emb.shape[1], 3)
if clip_emb.shape[0] == 1 and bs > 1:
clip_emb = repeat(clip_emb, "1 ... -> bs ...", bs=bs)
return {
"img":img,
"img_ids":img_ids.to(img.device, dtype=img.dtype),
"txt":t5_emb.to(img.device, dtype=img.dtype),
"txt_ids":t5_emb_ids.to(img.device, dtype=img.dtype),
"vec":clip_emb.to(img.device, dtype=img.dtype)
}
def load_misto_transoformer_cn(device):
with torch.device(device):
controlnet = MistoControlNetFluxDev(
in_channels=64,
vec_in_dim=768,
context_in_dim=4096,
hidden_size=3072,
num_heads=24,
num_transformer=3,
num_single_transformer=2,
guidance_embed=True,
)
return controlnet
def img_preprocessor(image, res):
_, _, h, w = image.shape
scale = res / min(h, w)
new_h, new_w = int(h * scale), int(w * scale)
resized = F.interpolate(image, size=(new_h, new_w), mode='bilinear', align_corners=False)
crop_h = int((new_h // 16) * 16)
crop_w = int((new_w // 16) * 16)
start_h = (new_h - crop_h) // 2
start_w = (new_w - crop_w) // 2
cropped = resized[:, :, start_h:start_h + crop_h, start_w:start_w + crop_w]
return cropped
class LoadMistoFluxControlNet:
@classmethod
def INPUT_TYPES(s):
return {"required": {
"model_name": (folder_paths.get_filename_list("TheMisto_model"),)
}}
RETURN_TYPES = ("MistoFluxControlNet",)
RETURN_NAMES = ("ControlNet",)
FUNCTION = "load_model"
CATEGORY = "TheMistoAINodes"
def load_model(self,model_name):
device=comfy.model_management.get_torch_device()
misto_cn = load_misto_transoformer_cn(device=device)
ckpt_path = os.path.join(dir_TheMistoModel, model_name)
if '.bin' in model_name:
state_dict = torch.load(ckpt_path, map_location='cpu')
else:
state_dict = load_file(ckpt_path)
miss_, error_ = misto_cn.load_state_dict(state_dict,strict=False)
misto_cn.eval()
print(miss_, error_)
return (misto_cn,)
class ApplyMistoFluxControlNet:
@classmethod
def INPUT_TYPES(s):
return {"required": {"controlnet": ("MistoFluxControlNet",),
"image": ("IMAGE",),
"resolution": ("INT", {"default":960, "min": 512, "max": 4096}),
"strength": ("FLOAT", {"default": 0.85, "min": 0.0, "max": 2.0, "step": 0.01})
}}
RETURN_TYPES = ("ControlNetCondition","IMAGE")
RETURN_NAMES = ("controlnet_condition","cond_image")
FUNCTION = "embedding"
CATEGORY = "TheMistoAINodes"
def embedding(self, controlnet, image, resolution, strength):
cond_img = torch.from_numpy((np.array(image) * 2) - 1)
cond_img = cond_img.permute(0, 3, 1, 2)
res_img = img_preprocessor(image=cond_img, res=resolution)
out_img = res_img.permute(0, 2, 3, 1)
out_img = (out_img + 1) / 2
cond_out = {
"img": res_img,
"controlnet_strength": strength,
"model": controlnet,
}
return (cond_out,out_img)
class KSamplerTheMisto:
@classmethod
def INPUT_TYPES(s):
return {
"required": {
"model": ("MODEL",),
"ae":("VAE",),
"positive": ("CONDITIONING",),
"negative": ("CONDITIONING",),
"controlnet_condition": ("ControlNetCondition", {"default": None}),
"batch_size": ("INT", {"default":1, "min": 1, "max": 100}),
"guidance": ("FLOAT", {"default": 3.5, "min": 0.1, "max": 30}),
"seed": ("INT", {"default": 0, "min": 0, "max": 0xffffffffffffffff}),
"steps": ("INT", {"default": 20, "min": 1, "max": 100}),
},
}
RETURN_TYPES = ("IMAGE",)
RETURN_NAMES = ("image",)
FUNCTION = "sampling"
CATEGORY = "TheMistoAINodes"
def sampling(self, model,ae, positive, negative,controlnet_condition,batch_size,guidance,seed, steps ):
# device ,dtype and pbar
device = comfy.model_management.get_torch_device()
dtype_model = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
pbar = ProgressBar(steps+10)
pbar.update(1)
# model
comfy.model_management.load_model_gpu(model)
flux_model = model.model.diffusion_model
pbar.update(3)
# cn cond
cn_model = controlnet_condition['model']
cond_img = controlnet_condition['img'].to(torch.bfloat16).to(device)
cn_strength = controlnet_condition['controlnet_strength']
bc, c, h, w = cond_img.shape
height = (h//16) * 16
width = (w//16) * 16
pbar.update(2)
with torch.no_grad():
# set scheduler
timesteps = get_schedule( steps, (width // 8) * (height // 8) // 4, shift=True, )
x = get_noise( 1, height, width, device=device, dtype=dtype_model, seed=seed)
p_inp_cond = prepare_sampling(positive[0][0], positive[0][1]['pooled_output'], img=x, batch_size=batch_size)
n_inp_cond = prepare_sampling(negative[0][0], negative[0][1]['pooled_output'], img=x, batch_size=batch_size)
pbar.update(1)
# denoise
x = denoise_controlnet(
pbar=pbar,
model=flux_model, **p_inp_cond,
controlnet=cn_model,
timesteps=timesteps,
guidance=guidance,
controlnet_cond=cond_img,
controlnet_strength = cn_strength,
neg_txt=n_inp_cond['txt'],
neg_txt_ids=n_inp_cond['txt_ids'],
neg_vec=n_inp_cond['vec'],
)
x = unpack(x.float(), height, width)
lat_processor = LATENT_PROCESSOR_COMFY()
x = lat_processor(x)
pbar.update(1)
return (ae.decode(x),)
NODE_CLASS_MAPPINGS = {
"LoadTheMistoFluxControlNet": LoadMistoFluxControlNet,
"ApplyTheMistoFluxControlNet": ApplyMistoFluxControlNet,
"KSamplerTheMisto":KSamplerTheMisto,
}
NODE_DISPLAY_NAME_MAPPINGS = {
"LoadTheMistoFluxControlNet": "Load MistoCN-Flux.dev",
"ApplyTheMistoFluxControlNet": "Apply MistoCN-Flux.dev",
"KSamplerTheMisto":"KSampler for MistoCN-Flux.dev",
}
================================================
FILE: workflows/example_workflow.json
================================================
{
"last_node_id": 40,
"last_link_id": 44,
"nodes": [
{
"id": 13,
"type": "VAELoader",
"pos": {
"0": 60,
"1": 327,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
19
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 10,
"type": "DualCLIPLoader",
"pos": {
"0": 58,
"1": 27,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 348.2605285644531,
"1": 106
},
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
24,
26
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp8_e4m3fn.safetensors",
"flux"
]
},
{
"id": 36,
"type": "AnyLineArtPreprocessor_aux",
"pos": {
"0": 250,
"1": 569,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 315,
"1": 178
},
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 38
}
],
"outputs": [
{
"name": "image",
"type": "IMAGE",
"links": [
37
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "AnyLineArtPreprocessor_aux"
},
"widgets_values": [
"lineart_standard",
960,
0,
1,
36,
1
]
},
{
"id": 17,
"type": "LoadTheMistoFluxControlNet",
"pos": {
"0": 54,
"1": 445,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 392,
"1": 60
},
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "ControlNet",
"type": "MistoFluxControlNet",
"links": [
40
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadTheMistoFluxControlNet"
},
"widgets_values": [
"mistoline_flux.dev_v1.safetensors"
]
},
{
"id": 37,
"type": "LoadImage",
"pos": {
"0": -92,
"1": 568,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": [
315,
314
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
38
],
"shape": 3,
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"image--73-.png",
"image"
]
},
{
"id": 26,
"type": "CLIPTextEncodeFlux",
"pos": {
"0": 819,
"1": 29,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 400,
"1": 200
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 24
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
25
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "CLIPTextEncodeFlux"
},
"widgets_values": [
"Model in modern attire with metallic accessories, in an old factory setting, Metallic sheen, Full-frame mirrorless, 35mm lens, f/2.8 aperture, ISO 500, off-camera flash",
"Model in modern attire with metallic accessories, in an old factory setting, Metallic sheen, Full-frame mirrorless, 35mm lens, f/2.8 aperture, ISO 500, off-camera flash",
3.5
]
},
{
"id": 4,
"type": "CheckpointLoaderSimple",
"pos": {
"0": 64,
"1": 179,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 315,
"1": 98
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
12
],
"slot_index": 0
},
{
"name": "CLIP",
"type": "CLIP",
"links": [],
"slot_index": 1
},
{
"name": "VAE",
"type": "VAE",
"links": [],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"flux1-dev-fp8.safetensors"
]
},
{
"id": 28,
"type": "CLIPTextEncodeFlux",
"pos": {
"0": 823,
"1": 273,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 399.6443786621094,
"1": 156.8167266845703
},
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 26
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
27
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "CLIPTextEncodeFlux"
},
"widgets_values": [
"out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.",
"out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.",
3.5
]
},
{
"id": 21,
"type": "Note",
"pos": {
"0": 587,
"1": 286,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 210,
"1": 122.03421020507812
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {
"text": ""
},
"widgets_values": [
"Make sure to craft your prompts well—precision is more important than length, otherwise the result could be very bad.\n\nPrompt非常重要,请用AI或者自己写一个好的prompt,不然效果会很糟糕。"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 39,
"type": "Note",
"pos": {
"0": 774,
"1": 629,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": [
210,
122.03421020507812
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {
"text": ""
},
"widgets_values": [
"Very sensitive to strength, please start with about 0.5 and gradually increase\n\n对strength非常敏感,请从0.5开始尝试逐步增加。"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 38,
"type": "PreviewImage",
"pos": {
"0": 1015,
"1": 479,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": [
332.628470953563,
319.48089726588853
],
"flags": {},
"order": 15,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 41
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 35,
"type": "ApplyTheMistoFluxControlNet",
"pos": {
"0": 593,
"1": 482,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 393,
"1": 102
},
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "controlnet",
"type": "MistoFluxControlNet",
"link": 40
},
{
"name": "image",
"type": "IMAGE",
"link": 37
}
],
"outputs": [
{
"name": "controlnet_condition",
"type": "ControlNetCondition",
"links": [
39
],
"shape": 3,
"slot_index": 0
},
{
"name": "cond_image",
"type": "IMAGE",
"links": [
41
],
"shape": 3,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "ApplyTheMistoFluxControlNet"
},
"widgets_values": [
960,
0.5
]
},
{
"id": 15,
"type": "LoraLoaderModelOnly",
"pos": {
"0": 498,
"1": -118,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 380.00384521484375,
"1": 82
},
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 12
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
44
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"Hyper-FLUX.1-dev-16steps-lora.safetensors",
0.13
]
},
{
"id": 19,
"type": "KSamplerTheMisto",
"pos": {
"0": 1251,
"1": -121,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 330,
"1": 234
},
"flags": {},
"order": 14,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 44
},
{
"name": "ae",
"type": "VAE",
"link": 19
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 25
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 27
},
{
"name": "controlnet_condition",
"type": "ControlNetCondition",
"link": 39
}
],
"outputs": [
{
"name": "image",
"type": "IMAGE",
"links": [
20
],
"slot_index": 0,
"shape": 3
}
],
"properties": {
"Node name for S&R": "KSamplerTheMisto"
},
"widgets_values": [
1,
4,
1039574509141705,
"randomize",
16
]
},
{
"id": 22,
"type": "PreviewImage",
"pos": {
"0": 1614,
"1": -128,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": [
461.88277135920316,
499.946617357832
],
"flags": {},
"order": 16,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 20
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 20,
"type": "Note",
"pos": {
"0": 586,
"1": 55,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": {
"0": 210,
"1": 122.03421020507812
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {
"text": ""
},
"widgets_values": [
"Make sure to craft your prompts well—precision is more important than length, otherwise the result could be very bad.\n\nPrompt非常重要,请用AI或者自己写一个好的prompt,不然效果会很糟糕。"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 40,
"type": "Note",
"pos": {
"0": 1258,
"1": -232,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0
},
"size": [
210,
67.63623725377664
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {
"text": ""
},
"widgets_values": [
"no autoCFG\n\n不要使用自动CFG"
],
"color": "#432",
"bgcolor": "#653"
}
],
"links": [
[
12,
4,
0,
15,
0,
"MODEL"
],
[
19,
13,
0,
19,
1,
"VAE"
],
[
20,
19,
0,
22,
0,
"IMAGE"
],
[
24,
10,
0,
26,
0,
"CLIP"
],
[
25,
26,
0,
19,
2,
"CONDITIONING"
],
[
26,
10,
0,
28,
0,
"CLIP"
],
[
27,
28,
0,
19,
3,
"CONDITIONING"
],
[
37,
36,
0,
35,
1,
"IMAGE"
],
[
38,
37,
0,
36,
0,
"IMAGE"
],
[
39,
35,
0,
19,
4,
"ControlNetCondition"
],
[
40,
17,
0,
35,
0,
"MistoFluxControlNet"
],
[
41,
35,
1,
38,
0,
"IMAGE"
],
[
44,
15,
0,
19,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
384.31753255840897,
385.7533316712783
]
}
},
"version": 0.4
}
gitextract_y65za_rv/
├── .idea/
│ ├── .gitignore
│ ├── MistoControlNet-Flux-dev.iml
│ ├── inspectionProfiles/
│ │ └── profiles_settings.xml
│ ├── misc.xml
│ ├── modules.xml
│ └── vcs.xml
├── README.md
├── README_CN.md
├── __init__.py
├── modules/
│ ├── misto_controlnet.py
│ └── utils.py
├── nodes.py
└── workflows/
└── example_workflow.json
SYMBOL INDEX (68 symbols across 3 files)
FILE: modules/misto_controlnet.py
function zero_module (line 8) | def zero_module(module):
class CondDownsamplBlock (line 14) | class CondDownsamplBlock(nn.Module):
method __init__ (line 15) | def __init__(self):
method forward (line 39) | def forward(self, x):
class EnhanceControlnet (line 43) | class EnhanceControlnet(nn.Module):
method __init__ (line 44) | def __init__(self, hidden_size):
method forward (line 51) | def forward(self, x):
class MistoControlNetFluxDev (line 56) | class MistoControlNetFluxDev(nn.Module):
method __init__ (line 59) | def __init__(
method _set_gradient_checkpointing (line 133) | def _set_gradient_checkpointing(self, module, value=False):
method attn_processors (line 139) | def attn_processors(self):
method set_attn_processor (line 157) | def set_attn_processor(self, processor):
method forward (line 191) | def forward(
FILE: modules/utils.py
function attention (line 10) | def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
function rope (line 19) | def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
function apply_rope (line 29) | def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tenso...
class EmbedND (line 38) | class EmbedND(nn.Module):
method __init__ (line 39) | def __init__(self, dim: int, theta: int, axes_dim: list[int]):
method forward (line 45) | def forward(self, ids: Tensor) -> Tensor:
function timestep_embedding (line 55) | def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: fl...
class MLPEmbedder (line 79) | class MLPEmbedder(nn.Module):
method __init__ (line 80) | def __init__(self, in_dim: int, hidden_dim: int):
method forward (line 86) | def forward(self, x: Tensor) -> Tensor:
class RMSNorm (line 90) | class RMSNorm(torch.nn.Module):
method __init__ (line 91) | def __init__(self, dim: int):
method forward (line 95) | def forward(self, x: Tensor):
class QKNorm (line 102) | class QKNorm(torch.nn.Module):
method __init__ (line 103) | def __init__(self, dim: int):
method forward (line 108) | def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Te...
class SelfAttention (line 114) | class SelfAttention(nn.Module):
method __init__ (line 115) | def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False):
method forward (line 124) | def forward(self, x: Tensor, pe: Tensor) -> Tensor:
class ModulationOut (line 134) | class ModulationOut:
class Modulation (line 140) | class Modulation(nn.Module):
method __init__ (line 141) | def __init__(self, dim: int, double: bool):
method forward (line 147) | def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut |...
class DoubleStreamBlock (line 156) | class DoubleStreamBlock(nn.Module):
method __init__ (line 157) | def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float,...
method forward (line 185) | def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor) -...
class SingleStreamBlock (line 221) | class SingleStreamBlock(nn.Module):
method __init__ (line 227) | def __init__(
method forward (line 254) | def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
class LastLayer (line 269) | class LastLayer(nn.Module):
method __init__ (line 270) | def __init__(self, hidden_size: int, patch_size: int, out_channels: int):
method forward (line 276) | def forward(self, x: Tensor, vec: Tensor) -> Tensor:
function get_noise (line 282) | def get_noise(
function time_shift (line 301) | def time_shift(mu: float, sigma: float, t: Tensor):
function get_lin_function (line 305) | def get_lin_function(
function get_schedule (line 312) | def get_schedule(
function unpack (line 330) | def unpack(x: Tensor, height: int, width: int) -> Tensor:
function forward_mistoCN (line 340) | def forward_mistoCN(
function denoise_controlnet (line 388) | def denoise_controlnet(
FILE: nodes.py
class LATENT_PROCESSOR_COMFY (line 17) | class LATENT_PROCESSOR_COMFY:
method __init__ (line 18) | def __init__(self):
method __call__ (line 39) | def __call__(self, x):
method go_back (line 41) | def go_back(self, x):
function prepare_sampling (line 47) | def prepare_sampling(t5_emb, clip_emb, img,batch_size):
function load_misto_transoformer_cn (line 76) | def load_misto_transoformer_cn(device):
function img_preprocessor (line 91) | def img_preprocessor(image, res):
class LoadMistoFluxControlNet (line 105) | class LoadMistoFluxControlNet:
method INPUT_TYPES (line 107) | def INPUT_TYPES(s):
method load_model (line 117) | def load_model(self,model_name):
class ApplyMistoFluxControlNet (line 130) | class ApplyMistoFluxControlNet:
method INPUT_TYPES (line 132) | def INPUT_TYPES(s):
method embedding (line 144) | def embedding(self, controlnet, image, resolution, strength):
class KSamplerTheMisto (line 158) | class KSamplerTheMisto:
method INPUT_TYPES (line 160) | def INPUT_TYPES(s):
method sampling (line 180) | def sampling(self, model,ae, positive, negative,controlnet_condition,b...
Condensed preview — 13 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (69K chars).
[
{
"path": ".idea/.gitignore",
"chars": 47,
"preview": "# Default ignored files\n/shelf/\n/workspace.xml\n"
},
{
"path": ".idea/MistoControlNet-Flux-dev.iml",
"chars": 284,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<module type=\"PYTHON_MODULE\" version=\"4\">\n <component name=\"NewModuleRootManager"
},
{
"path": ".idea/inspectionProfiles/profiles_settings.xml",
"chars": 174,
"preview": "<component name=\"InspectionProjectProfileManager\">\n <settings>\n <option name=\"USE_PROJECT_PROFILE\" value=\"false\" />\n"
},
{
"path": ".idea/misc.xml",
"chars": 278,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project version=\"4\">\n <component name=\"Black\">\n <option name=\"sdkName\" value"
},
{
"path": ".idea/modules.xml",
"chars": 300,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project version=\"4\">\n <component name=\"ProjectModuleManager\">\n <modules>\n "
},
{
"path": ".idea/vcs.xml",
"chars": 167,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project version=\"4\">\n <component name=\"VcsDirectoryMappings\">\n <mapping dire"
},
{
"path": "README.md",
"chars": 5288,
"preview": "# New Update!\nWe have just launched our latest product, Misto.\nThe most powerful AI Mind Palace built for all designers."
},
{
"path": "README_CN.md",
"chars": 3437,
"preview": " \n## 非常重要\n<mark>!!!请更新ComfyUI套件至最新版本,修复了tensor mismatch问题 \n!!!不要使用Auto CFG来跟Ksa"
},
{
"path": "__init__.py",
"chars": 131,
"preview": "from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS\n\n__all__ = [\"NODE_CLASS_MAPPINGS\", \"NODE_DISPLAY_NAME"
},
{
"path": "modules/misto_controlnet.py",
"chars": 10876,
"preview": "import torch\nfrom diffusers.utils import is_torch_version\nfrom torch import Tensor, nn\nfrom einops import rearrange\nfrom"
},
{
"path": "modules/utils.py",
"chars": 16604,
"preview": "import math\nfrom dataclasses import dataclass\nfrom typing import Callable,Dict,Any\nimport torch\nfrom einops import rearr"
},
{
"path": "nodes.py",
"chars": 8912,
"preview": "import torch\nimport os\nimport comfy.model_management\nfrom comfy.utils import ProgressBar\nimport folder_paths\nimport nump"
},
{
"path": "workflows/example_workflow.json",
"chars": 16466,
"preview": "{\n \"last_node_id\": 40,\n \"last_link_id\": 44,\n \"nodes\": [\n {\n \"id\": 13,\n \"type\": \"VAELoader\",\n \"pos\":"
}
]
About this extraction
This page contains the full source code of the TheMistoAI/MistoControlNet-Flux-dev GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 13 files (61.5 KB), approximately 18.0k tokens, and a symbol index with 68 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.