Repository: TheMistoAI/MistoControlNet-Flux-dev Branch: main Commit: 12931880cf5b Files: 13 Total size: 61.5 KB Directory structure: gitextract_y65za_rv/ ├── .idea/ │ ├── .gitignore │ ├── MistoControlNet-Flux-dev.iml │ ├── inspectionProfiles/ │ │ └── profiles_settings.xml │ ├── misc.xml │ ├── modules.xml │ └── vcs.xml ├── README.md ├── README_CN.md ├── __init__.py ├── modules/ │ ├── misto_controlnet.py │ └── utils.py ├── nodes.py └── workflows/ └── example_workflow.json ================================================ FILE CONTENTS ================================================ ================================================ FILE: .idea/.gitignore ================================================ # Default ignored files /shelf/ /workspace.xml ================================================ FILE: .idea/MistoControlNet-Flux-dev.iml ================================================ ================================================ FILE: .idea/inspectionProfiles/profiles_settings.xml ================================================ ================================================ FILE: .idea/misc.xml ================================================ ================================================ FILE: .idea/modules.xml ================================================ ================================================ FILE: .idea/vcs.xml ================================================ ================================================ FILE: README.md ================================================ # New Update! We have just launched our latest product, Misto. The most powerful AI Mind Palace built for all designers. Warmly welcome everyone to try it out. ### Website here: https://themisto.ai/ ![Result2](assets/misto.png) [中文版-README](README_CN.md) ## VERY IMPORTANT !!!Please update the ComfyUI-suite for fixed the tensor mismatch promblem. !!!please donot use AUTO cfg for our ksampler, it will have a very bad result. !!!Strength and prompt senstive, be care for your prompt and try 0.5 as the starting controlnet strength !!!update a new example workflow in workflow folder, get start with it. ## Summary by TheMisto.ai @Shenzhen, China This is a ControlNet network designed for any lineart or outline sketches, compatible with Flux1.dev. The ControlNet model parameters are approximately 1.4B. This model is not compatible with XLabs loaders and samplers. Please use TheMisto.ai Flux ControlNet ComfyUI suite. This is a Flow matching structure Flux-dev model, utilizing a scalable Transformer module as the backbone of this ControlNet. We've implemented a dual-stream Transformer structure, which enhances alignment and expressiveness for various types of lineart and outline conditions without increasing inference time. The model has also been trained for alignment with both T5 and clip-l TextEncoders, ensuring balanced performance between conditioning images and text prompts. For more details on the Flux.dev model structure, visit: https://huggingface.co/black-forest-labs/FLUX.1-dev This ControlNet is compatible with Flux1.dev's fp16/fp8 and other models quantized with Flux1.dev. ByteDance 8/16-step distilled models have not been tested. - The example workflow uses the flux1-dev-Q4_K_S.gguf quantized model. - Generation quality: Flux1.dev(fp16)>>Flux1.dev(fp8)>>Other quantized models - Generation speed: Flux1.dev(fp16)<<< Flux1.dev(fp8) <<< Other quantized models ## Performance ### Performance Across Different Sizes and Scenarios Tested in various common scenarios such as industrial design, architecture, interior design, animation, games, and photography. Make sure to craft your prompts well—precision is more important than length! Performance examples are shown below: ![Result2](assets/result2.jpg) ### Performance with Different Lineart or Scribble Preprocessors Test Parameters: - Prompt: "Hyper-realistic 3D render of a classic Formula 1 race car, bright red with Marlboro and Agip logos, number 1, large black tires, dramatic studio lighting, dark moody background, reflective floor, cinematic atmosphere, Octane render style, high detail" - controlnet_strength: 0.650.8 (Recommended: Anyline with 0.60.7) - steps: 30 - guidance: 4.0 - The quality of generated images is positively correlated with prompt quality. Controlnet_strength may vary for different types of lineart and outlines, so experiment with the settings! ![Result1](assets/result1.jpg) ### Recommended Settings - Image resolution: 720px or above on the short edge - controlnet strength: 0.6~0.85 (adjust as needed) - guidance: 3.0~5.0 (adjust as needed) - steps: 30 or more ## Huggingface (抱抱脸): [MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev) ## Usage - Download the model from [MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev) - Place the model in the ComfyUI\models\TheMisto_model\ directory - The directory will be automatically created the first time you run the ComfyUI's TheMisto.ai Flux ControlNet ComfyUI suite - Run using ComfyUI; an example workflow can be found in the workflow folder - Note: The length and width of the conditioning image must be divisible by 16, or an error will occur. ### ComfyUI ![ComfyUI-workflow](assets/comfyui.png) ## Training Details The Transformer structure, under the scale law, has a significant impact on training time and computational power (higher compute cost, longer training time). The training cost for MistoLine_Flux1_dev is several times that of MistoLineSDXL. We conducted extensive ablation experiments to balance performance with training costs. This training was done using A100-80GB with bf16 mixed precision on the Flux1.dev series models. Apart from Lora, consumer-grade GPUs are basically unsuitable for training. In our experiments with larger parameter models, multi-GPUs, multi-node parallel training was required, which is costly. If we reach 50,000 stars, we will open-source the Technical Report detailing more training details. ## License Align to the FLUX.1 [dev] Non-Commercial License This ComfyUI node falls under ComfyUI This model is for research and educational purposes only and may not be used for any form of commercial purposes. ## Business Cooperation For any custom model training, commercial cooperation, AI application development, or other business collaboration matters, please contact us. - *Business:* info@themisto.ai ## Media ### International: Website: https://themisto.ai/ Discord: https://discord.gg/fTyDB2CU X: https://x.com/AiThemisto79359 ### Mainland China (中国大陆): *Website:* https://themisto.ai/ *WeChat Official Account:* TheMisto AI (Shenzhen Mixed Tuple Technology Co., Ltd.) *Xiaohongshu:* TheMisto.ai (Xiaohongshu ID: 4211745997) ================================================ FILE: README_CN.md ================================================ ![Intro Image](assets/open_source.png) ## 非常重要 !!!请更新ComfyUI套件至最新版本,修复了tensor mismatch问题 !!!不要使用Auto CFG来跟Ksampler配合使用,可能会导致非常糟糕的结果 !!!对controlnet strength和prompt非常敏感,请好好写prompt和从0.5的controlnet strength逐步尝试 !!!更新了一个工作流在./workflow文件夹中,从从这个工作流开始 ![Intro Image](assets/example1.jpg) ## 概述 by TheMisto.ai @Shenzhen, China 这是一个适用于任意线稿、轮廓用于Flux1.dev的ControlNet网络,本ControlNet参数约为1.4B。 本模型不兼容XLabs的加载和采样器,请使用TheMisto.ai Flux ControlNet ComfyUI套件。 这是一个Flow matching结构的Flux-dev模型,使用了可scale的Transformer 模块来作为本ControlNet的骨干网 这次我们使用了双流型Transformer结构,在不增加推理时间的情况下对不同类型的线稿和轮廓条件有更好的表现力和对齐效果,同时对T5和clip-l两个TextEncoder的文字对齐也得到了对应的训练,不会出现只对conditioning image有响应而对prompt对齐能力下降的问题,尝试做到条件图像和文本对齐都有较好的表现。Flux.dev模型结构等请浏览:https://huggingface.co/black-forest-labs/FLUX.1-dev 本ControlNet适用于Flux1.dev的fp16/fp8以及其他使用Flux1.dev量化的模型, 字节跳动8/16步蒸馏的没有测试过。 示例Workflow使用的是flux1-dev-Q4_K_S.gguf量化的模型。 生成质量:Flux1.dev(fp16)>>Flux1.dev(fp8)>>其他量化模型 生成速度:Flux1.dev(fp16)<<< Flux1.dev(fp8) <<< 其他量化模型 ### 效果 #### 不同尺寸和场景效果 测试了工业设计、建筑设计、室内设计、动漫、游戏、照片等常用场景。 请好好写prompt,不是长,是要比较精确! 效果如下: ![Result2](assets/result2.jpg) #### 不同类型的Lineart or scribble preprocessor 测试参数: - Prompt: "Hyper-realistic 3D render of a classic Formula 1 race car, bright red with Marlboro and Agip logos, number 1, large black tires, dramatic studio lighting, dark moody background, reflective floor, cinematic atmosphere, Octane render style, high detail" - controlnet_strength: 0.65-0.8 (推荐:Anyline with 0.6-0.7) - step: 30 - guidance: 4.0 - 生成效果跟prompt质量成正相关关系,不同类型的线稿、轮廓使用的controlnet_strength也可能不同,多试一下! ![Result1](assets/result1.jpg) ### 推荐参数 - 图像分辨率:720px 以上 短边 - controlnet strength:0.6~0.85 (视情况调整) - guidance:3.0~5.0 (视情况调整) - step:30 以上 ### Huggingface(抱抱脸): [MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev) ## 用法 - 从[MistoLine_Flux.dev_v1](https://huggingface.co/TheMistoAI/MistoLine_Flux.dev)下载模型 - 将模型放入到 ComfyUI\models\TheMisto_model\ 目录下 - 第一次运行ComfyUI的TheMisto.ai Flux ControlNet ComfyUI套件会自动创建该目录 - 使用ComfyUI运行,文件夹workflow下有example的workflow - 注意:条件图的长和宽必须能被16整除,否则会报错 ### ComfyUI ![ComfyUI-workflow](assets/comfyui.png) ## 训练细节 Transformer结构在scale law的背景下会对训练时间和算力产生巨大影响(更大的算力消耗,更多的训练时间),MistoLine_Flux1_dev的训练成本为MistoLineSDXL的数倍之多。我们做了大量的消融实验来确保效果与训练成本的平衡。 本次训练使用了A100-80GB,bf16混合精度,Flux1.dev系列模型的训练,训练的话除了Lora之外基本可以告别消费级显卡。 在我们更大的参数量模型的实验中,需要使用多卡多节点并行训练,成本较大。 标星到5万我们将开源Technical report,介绍更详细的训练细节。 ## 许可 - Align to the FLUX.1 [dev] Non-Commercial License - This ComfyUI node fall under ComfyUI - 本模型仅供研究和学习,不可用于任何形式商用 ## Business Cooperation(商务合作) For any custom model training, commercial cooperation, AI application development, or other business collaboration matters. please contact us. - Business :info@themisto.ai - Investment: investment@themisto.ai 如果有任何模型定制训练,商业合作,AI应用开发等合作事宜请联系。 同时我们也欢迎投资者咨询更多信息。 - 业务电邮:info@themisto.ai - 投资者关系:investment@themisto.ai ## WIP - Flux1.dev-MistoCN-collection - Flux1.dev-Misto-IPAdapter 你的star是我们开源的动力! ## One more thing ![Product](assets/misto.png) 我们将会在最近推出我们自己的的产品:(一款极其简单易用的多模态AI创意创作APP - [Misto]) 以最简单和启发性的体验,重新激发大众创作欲望 创意触手可及,扩展想象力边界,让无限灵感成就超级个体! 支持平台:全平台 ## 媒体 ### 国际化: website: https://themisto.ai/ Discord:https://discord.gg/fTyDB2CU X:https://x.com/AiThemisto79359 ### 中国大陆地区: website: https://themisto.ai/ wechat公众号:TheMisto AI(深圳混合元组科技有限公司) 小红书:TheMisto.ai (小红书号:4211745997) ================================================ FILE: __init__.py ================================================ from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS __all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"] ================================================ FILE: modules/misto_controlnet.py ================================================ import torch from diffusers.utils import is_torch_version from torch import Tensor, nn from einops import rearrange from typing import Any, Dict, Tuple, Union from .utils import EmbedND, MLPEmbedder, DoubleStreamBlock, SingleStreamBlock, timestep_embedding def zero_module(module): for p in module.parameters(): nn.init.zeros_(p) return module class CondDownsamplBlock(nn.Module): def __init__(self): super().__init__() self.encoder = nn.Sequential( nn.Conv2d(3, 16, 3, padding=1), nn.SiLU(), nn.Conv2d(16, 16, 1), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1, stride=2), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1, stride=2), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1), nn.SiLU(), nn.Conv2d(16, 16, 3, padding=1, stride=2), nn.SiLU(), nn.Conv2d(16, 16, 1), nn.SiLU(), zero_module(nn.Conv2d(16, 16, 3, padding=1)) ) def forward(self, x): return self.encoder(x) class EnhanceControlnet(nn.Module): def __init__(self, hidden_size): super().__init__() self.linear = nn.Linear(hidden_size, hidden_size) self.act = nn.SiLU() nn.init.eye_(self.linear.weight) nn.init.zeros_(self.linear.bias) def forward(self, x): return self.act(self.linear(x)) class MistoControlNetFluxDev(nn.Module): _supports_gradient_checkpointing = True def __init__( self, in_channels=64, vec_in_dim=768, context_in_dim=4096, hidden_size=3072, num_heads=24, num_transformer=2, num_single_transformer=2, guidance_embed=True, ): super().__init__() self.out_channels = in_channels self.axes_dim = [16, 56, 56] self.theta=10_000 self.guidance_embed = guidance_embed if hidden_size % num_heads != 0: raise ValueError(f"Hidden size {hidden_size} must be divisible by num_heads {num_heads}") pe_dim = hidden_size // num_heads if sum(self.axes_dim) != pe_dim: raise ValueError(f"Got {self.axes_dim} but expected positional dim {pe_dim}") self.hidden_size = hidden_size self.num_heads = num_heads self.pe_embedder = EmbedND(dim=pe_dim, theta=self.theta, axes_dim=self.axes_dim) self.img_in = nn.Linear(in_channels, self.hidden_size, bias=True) self.txt_in = nn.Linear(context_in_dim, self.hidden_size) self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) self.vector_in = MLPEmbedder(vec_in_dim, self.hidden_size) self.guidance_in = (MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if guidance_embed else nn.Identity()) self.pos_embed_input = nn.Linear(in_channels, self.hidden_size, bias=True) self.gradient_checkpointing = False self.double_blocks = nn.ModuleList( [ DoubleStreamBlock( self.hidden_size, self.num_heads, mlp_ratio=4.0, qkv_bias=True, ) for _ in range(num_transformer) ] ) self.single_blocks = nn.ModuleList( [ SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=4.0) for _ in range(num_single_transformer) ] ) # ControlNet blocks self.controlnet_blocks = nn.ModuleList([]) for _ in range(num_transformer): controlnet_block = EnhanceControlnet(self.hidden_size) controlnet_block = zero_module(controlnet_block) self.controlnet_blocks.append(controlnet_block) # single controlnet blocks self.single_controlnet_blocks = nn.ModuleList([]) for _ in range(num_single_transformer): controlnet_block = EnhanceControlnet(self.hidden_size) controlnet_block = zero_module(controlnet_block) self.single_controlnet_blocks.append(controlnet_block) # Input processing self.input_cond_block = CondDownsamplBlock() def _set_gradient_checkpointing(self, module, value=False): if hasattr(module, "gradient_checkpointing"): module.gradient_checkpointing = value @property def attn_processors(self): # set recursively processors = {} def fn_recursive_add_processors(name: str, module: torch.nn.Module, processors): if hasattr(module, "set_processor"): processors[f"{name}.processor"] = module.processor for sub_name, child in module.named_children(): fn_recursive_add_processors(f"{name}.{sub_name}", child, processors) return processors for name, module in self.named_children(): fn_recursive_add_processors(name, module, processors) return processors def set_attn_processor(self, processor): r""" Sets the attention processor to use to compute attention. Parameters: processor (`dict` of `AttentionProcessor` or only `AttentionProcessor`): The instantiated processor class or a dictionary of processor classes that will be set as the processor for **all** `Attention` layers. If `processor` is a dict, the key needs to define the path to the corresponding cross attention processor. This is strongly recommended when setting trainable attention processors. """ count = len(self.attn_processors.keys()) if isinstance(processor, dict) and len(processor) != count: raise ValueError( f"A dict of processors was passed, but the number of processors {len(processor)} does not match the" f" number of attention layers: {count}. Please make sure to pass {count} processor classes." ) def fn_recursive_attn_processor(name: str, module: torch.nn.Module, processor): if hasattr(module, "set_processor"): if not isinstance(processor, dict): module.set_processor(processor) else: module.set_processor(processor.pop(f"{name}.processor")) for sub_name, child in module.named_children(): fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor) for name, module in self.named_children(): fn_recursive_attn_processor(name, module, processor) def forward( self, img: Tensor, img_ids: Tensor, controlnet_cond: Tensor, txt: Tensor, txt_ids: Tensor, timesteps: Tensor, y: Tensor, guidance: Tensor | None = None, ) -> Tuple[Tensor, Tensor]: if img.ndim != 3 or txt.ndim != 3: raise ValueError("Input img and txt tensors must have 3 dimensions.") # running on sequences img img = self.img_in(img) controlnet_cond = self.input_cond_block(controlnet_cond) controlnet_cond = rearrange(controlnet_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2) controlnet_cond = self.pos_embed_input(controlnet_cond) img = img + controlnet_cond vec = self.time_in(timestep_embedding(timesteps, 256)) if self.guidance_embed: if guidance is None: raise ValueError("Didn't get guidance strength for guidance distilled model.") vec = vec + self.guidance_in(timestep_embedding(guidance, 256)) vec = vec + self.vector_in(y) txt = self.txt_in(txt) ids = torch.cat((txt_ids, img_ids), dim=1) pe = self.pe_embedder(ids) block_res_samples = () for block in self.double_blocks: if self.training and self.gradient_checkpointing: def create_custom_forward(module, return_dict=None): def custom_forward(*inputs): if return_dict is not None: return module(*inputs, return_dict=return_dict) else: return module(*inputs) return custom_forward ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {} encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint( create_custom_forward(block), img, txt, vec, pe, ) else: img, txt = block(img=img, txt=txt, vec=vec, pe=pe) block_res_samples = block_res_samples + (img,) img = torch.cat((txt, torch.zeros_like(img)), 1) single_block_res_samples = () for index, block in enumerate(self.single_blocks): if self.training and self.gradient_checkpointing: def create_custom_forward(module, return_dict=None): def custom_forward(*inputs): if return_dict is not None: return module(*inputs, return_dict=return_dict) else: return module(*inputs) return custom_forward ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {} encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint( create_custom_forward(block), img, vec, pe, ) else: img = block(img, vec=vec, pe=pe) single_block_res_samples = single_block_res_samples+(img,) controlnet_block_res_samples = () for block_res_sample, controlnet_block in zip(block_res_samples, self.controlnet_blocks): block_res_sample = controlnet_block(block_res_sample) controlnet_block_res_samples = controlnet_block_res_samples + (block_res_sample,) single_controlnet_block_res_samples = () for single_block_res_sample, single_controlnet_block in zip(single_block_res_samples, self.single_controlnet_blocks): single_block_res_sample = single_controlnet_block(single_block_res_sample) single_controlnet_block_res_samples = single_controlnet_block_res_samples + (single_block_res_sample,) return controlnet_block_res_samples,single_controlnet_block_res_samples ================================================ FILE: modules/utils.py ================================================ import math from dataclasses import dataclass from typing import Callable,Dict,Any import torch from einops import rearrange from torch import Tensor, nn from tqdm import tqdm def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor: q, k = apply_rope(q, k, pe) x = torch.nn.functional.scaled_dot_product_attention(q, k, v) x = rearrange(x, "B H L D -> B L (H D)") return x def rope(pos: Tensor, dim: int, theta: int) -> Tensor: assert dim % 2 == 0 scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim omega = 1.0 / (theta**scale) out = torch.einsum("...n,d->...nd", pos, omega) out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1) out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2) return out.float() def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]: xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2) xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2) xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1] xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1] return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk) class EmbedND(nn.Module): def __init__(self, dim: int, theta: int, axes_dim: list[int]): super().__init__() self.dim = dim self.theta = theta self.axes_dim = axes_dim def forward(self, ids: Tensor) -> Tensor: n_axes = ids.shape[-1] emb = torch.cat( [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)], dim=-3, ) return emb.unsqueeze(1) def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0): """ Create sinusoidal timestep embeddings. :param t: a 1-D Tensor of N indices, one per batch element. These may be fractional. :param dim: the dimension of the output. :param max_period: controls the minimum frequency of the embeddings. :return: an (N, D) Tensor of positional embeddings. """ t = time_factor * t half = dim // 2 freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to( t.device ) args = t[:, None].float() * freqs[None] embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1) if dim % 2: embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1) if torch.is_floating_point(t): embedding = embedding.to(t) return embedding class MLPEmbedder(nn.Module): def __init__(self, in_dim: int, hidden_dim: int): super().__init__() self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True) self.silu = nn.SiLU() self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True) def forward(self, x: Tensor) -> Tensor: return self.out_layer(self.silu(self.in_layer(x))) class RMSNorm(torch.nn.Module): def __init__(self, dim: int): super().__init__() self.scale = nn.Parameter(torch.ones(dim)) def forward(self, x: Tensor): x_dtype = x.dtype x = x.float() rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6) return (x * rrms).to(dtype=x_dtype) * self.scale class QKNorm(torch.nn.Module): def __init__(self, dim: int): super().__init__() self.query_norm = RMSNorm(dim) self.key_norm = RMSNorm(dim) def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Tensor]: q = self.query_norm(q) k = self.key_norm(k) return q.to(v), k.to(v) class SelfAttention(nn.Module): def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False): super().__init__() self.num_heads = num_heads head_dim = dim // num_heads self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias) self.norm = QKNorm(head_dim) self.proj = nn.Linear(dim, dim) def forward(self, x: Tensor, pe: Tensor) -> Tensor: qkv = self.qkv(x) q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads) q, k = self.norm(q, k, v) x = attention(q, k, v, pe=pe) x = self.proj(x) return x @dataclass class ModulationOut: shift: Tensor scale: Tensor gate: Tensor class Modulation(nn.Module): def __init__(self, dim: int, double: bool): super().__init__() self.is_double = double self.multiplier = 6 if double else 3 self.lin = nn.Linear(dim, self.multiplier * dim, bias=True) def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut | None]: out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1) return ( ModulationOut(*out[:3]), ModulationOut(*out[3:]) if self.is_double else None, ) class DoubleStreamBlock(nn.Module): def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False): super().__init__() mlp_hidden_dim = int(hidden_size * mlp_ratio) self.num_heads = num_heads self.hidden_size = hidden_size self.img_mod = Modulation(hidden_size, double=True) self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias) self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.img_mlp = nn.Sequential( nn.Linear(hidden_size, mlp_hidden_dim, bias=True), nn.GELU(approximate="tanh"), nn.Linear(mlp_hidden_dim, hidden_size, bias=True), ) self.txt_mod = Modulation(hidden_size, double=True) self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias) self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.txt_mlp = nn.Sequential( nn.Linear(hidden_size, mlp_hidden_dim, bias=True), nn.GELU(approximate="tanh"), nn.Linear(mlp_hidden_dim, hidden_size, bias=True), ) def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor) -> tuple[Tensor, Tensor]: img_mod1, img_mod2 = self.img_mod(vec) txt_mod1, txt_mod2 = self.txt_mod(vec) # prepare image for attention img_modulated = self.img_norm1(img) img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift img_qkv = self.img_attn.qkv(img_modulated) img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads) img_q, img_k = self.img_attn.norm(img_q, img_k, img_v) # prepare txt for attention txt_modulated = self.txt_norm1(txt) txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift txt_qkv = self.txt_attn.qkv(txt_modulated) txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads) txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v) # run actual attention q = torch.cat((txt_q, img_q), dim=2) k = torch.cat((txt_k, img_k), dim=2) v = torch.cat((txt_v, img_v), dim=2) attn = attention(q, k, v, pe=pe) txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :] # calculate the img bloks img = img + img_mod1.gate * self.img_attn.proj(img_attn) img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift) # calculate the txt bloks txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn) txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift) return img, txt class SingleStreamBlock(nn.Module): """ A DiT block with parallel linear layers as described in https://arxiv.org/abs/2302.05442 and adapted modulation interface. """ def __init__( self, hidden_size: int, num_heads: int, mlp_ratio: float = 4.0, qk_scale: float | None = None, ): super().__init__() self.hidden_dim = hidden_size self.num_heads = num_heads head_dim = hidden_size // num_heads self.scale = qk_scale or head_dim**-0.5 self.mlp_hidden_dim = int(hidden_size * mlp_ratio) # qkv and mlp_in self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim) # proj and mlp_out self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size) self.norm = QKNorm(head_dim) self.hidden_size = hidden_size self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.mlp_act = nn.GELU(approximate="tanh") self.modulation = Modulation(hidden_size, double=False) def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor: mod, _ = self.modulation(vec) x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1) q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads) q, k = self.norm(q, k, v) # compute attention attn = attention(q, k, v, pe=pe) # compute activation in mlp stream, cat again and run second linear layer output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2)) return x + mod.gate * output class LastLayer(nn.Module): def __init__(self, hidden_size: int, patch_size: int, out_channels: int): super().__init__() self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True) self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True)) def forward(self, x: Tensor, vec: Tensor) -> Tensor: shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1) x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :] x = self.linear(x) return x def get_noise( num_samples: int, height: int, width: int, device: torch.device, dtype: torch.dtype, seed: int, ): return torch.randn( num_samples, 16, # allow for packing 2 * math.ceil(height / 16), 2 * math.ceil(width / 16), device=device, dtype=dtype, generator=torch.Generator(device=device).manual_seed(seed), ) def time_shift(mu: float, sigma: float, t: Tensor): return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma) def get_lin_function( x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15 ) -> Callable[[float], float]: m = (y2 - y1) / (x2 - x1) b = y1 - m * x1 return lambda x: m * x + b def get_schedule( num_steps: int, image_seq_len: int, base_shift: float = 0.5, max_shift: float = 1.15, shift: bool = True, ) -> list[float]: # extra step for zero timesteps = torch.linspace(1, 0, num_steps + 1) # shifting the schedule to favor high timesteps for higher signal images if shift: # eastimate mu based on linear estimation between two points mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len) timesteps = time_shift(mu, 1.0, timesteps) return timesteps.tolist() def unpack(x: Tensor, height: int, width: int) -> Tensor: return rearrange( x, "b (h w) (c ph pw) -> b c (h ph) (w pw)", h=math.ceil(height / 16), w=math.ceil(width / 16), ph=2, pw=2, ) def forward_mistoCN( model, img: Tensor, img_ids: Tensor, txt: Tensor, txt_ids: Tensor, timesteps: Tensor, y: Tensor, block_controlnet_hidden_states=None, single_controlnet_hidden_states=None, guidance: Tensor | None = None, ): if img.ndim != 3 or txt.ndim != 3: raise ValueError("Input img and txt tensors must have 3 dimensions.") # running on sequences img img = model.img_in(img) vec = model.time_in(timestep_embedding(timesteps, 256)) vec = vec + model.guidance_in(timestep_embedding(guidance, 256)) vec = vec + model.vector_in(y) txt = model.txt_in(txt) ids = torch.cat((txt_ids, img_ids), dim=1) pe = model.pe_embedder(ids) for index_block, block in enumerate(model.double_blocks): img, txt = block(img=img, txt=txt, vec=vec, pe=pe) # controlnet residual if block_controlnet_hidden_states is not None: if len(block_controlnet_hidden_states) == 1: img = img + block_controlnet_hidden_states[0] else: img = img + block_controlnet_hidden_states[index_block % 2] img = torch.cat((txt, img), 1) for index_block, block in enumerate(model.single_blocks): img = block(img, vec=vec, pe=pe) # controlnet residual if single_controlnet_hidden_states is not None: if len(single_controlnet_hidden_states) == 1: img = img + single_controlnet_hidden_states[0] else: img = img + single_controlnet_hidden_states[index_block % 2] img = img[:, txt.shape[1]:, ...] img = model.final_layer(img, vec) # (N, T, patch_size ** 2 * out_channels) return img def denoise_controlnet( pbar, model, controlnet, # model input img: Tensor, img_ids: Tensor, txt: Tensor, txt_ids: Tensor, vec: Tensor, controlnet_cond, neg_txt, neg_txt_ids, neg_vec, # sampling parameters timesteps: list[float], guidance: float = 4.0, controlnet_strength=1.0, ): controlnet.to(img.device, dtype=img.dtype) img_ids = img_ids.to(img.device, dtype=img.dtype) controlnet_cond = controlnet_cond.to(img.device, dtype=img.dtype) txt = txt.to(img.device, dtype=img.dtype) txt_ids = txt_ids.to(img.device, dtype=img.dtype) vec = vec.to(img.device, dtype=img.dtype) guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype) guidance_vec = guidance_vec.to(img.device, dtype=img.dtype) for t_curr, t_prev in tqdm(zip(timesteps[:-1], timesteps[1:]),desc="Sampling", total = len(timesteps)-1): t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device) block_res_samples, single_block_res_samples = controlnet( img=img, img_ids=img_ids, controlnet_cond=controlnet_cond, txt=txt, txt_ids=txt_ids, y=vec, timesteps=t_vec, guidance=guidance_vec, ) pred = forward_mistoCN( model=model, img=img, img_ids=img_ids, txt=txt, txt_ids=txt_ids, y=vec, timesteps=t_vec, guidance=guidance_vec, block_controlnet_hidden_states=[i * controlnet_strength for i in block_res_samples], single_controlnet_hidden_states = [i * controlnet_strength for i in single_block_res_samples] ) # negative neg_block_res_samples, neg_single_block_res_samples = controlnet( img=img, img_ids=img_ids, controlnet_cond=controlnet_cond, txt=neg_txt, txt_ids=neg_txt_ids, y=neg_vec, timesteps=t_vec, guidance=guidance_vec, ) neg_pred = forward_mistoCN( model=model, img=img, img_ids=img_ids, txt=neg_txt, txt_ids=neg_txt_ids, y=neg_vec, timesteps=t_vec, guidance=guidance_vec, block_controlnet_hidden_states=[i * controlnet_strength for i in neg_block_res_samples], single_controlnet_hidden_states=[i * controlnet_strength for i in neg_single_block_res_samples] ) f=0.85 pred = neg_pred + f * (pred - neg_pred) img = img + (t_prev - t_curr) * pred pbar.update(1) return img ================================================ FILE: nodes.py ================================================ import torch import os import comfy.model_management from comfy.utils import ProgressBar import folder_paths import numpy as np from safetensors.torch import load_file from einops import rearrange,repeat from .modules.misto_controlnet import MistoControlNetFluxDev from .modules.utils import get_schedule,get_noise,denoise_controlnet, unpack import torch.nn.functional as F dir_TheMistoModel = os.path.join(folder_paths.models_dir, "TheMisto_model") os.makedirs(dir_TheMistoModel, exist_ok=True) folder_paths.folder_names_and_paths["TheMisto_model"] = ([dir_TheMistoModel], folder_paths.supported_pt_extensions) class LATENT_PROCESSOR_COMFY: def __init__(self): self.scale_factor = 0.3611 self.shift_factor = 0.1159 self.latent_rgb_factors =[ [-0.0404, 0.0159, 0.0609], [ 0.0043, 0.0298, 0.0850], [ 0.0328, -0.0749, -0.0503], [-0.0245, 0.0085, 0.0549], [ 0.0966, 0.0894, 0.0530], [ 0.0035, 0.0399, 0.0123], [ 0.0583, 0.1184, 0.1262], [-0.0191, -0.0206, -0.0306], [-0.0324, 0.0055, 0.1001], [ 0.0955, 0.0659, -0.0545], [-0.0504, 0.0231, -0.0013], [ 0.0500, -0.0008, -0.0088], [ 0.0982, 0.0941, 0.0976], [-0.1233, -0.0280, -0.0897], [-0.0005, -0.0530, -0.0020], [-0.1273, -0.0932, -0.0680] ] def __call__(self, x): return (x / self.scale_factor) + self.shift_factor def go_back(self, x): return (x - self.shift_factor) * self.scale_factor MAX_RESOLUTION=16384 def prepare_sampling(t5_emb, clip_emb, img,batch_size): bs, c, h, w = img.shape bs = batch_size img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2) if img.shape[0] == 1 and bs > 1: img = repeat(img, "1 ... -> bs ...", bs=bs) img_ids = torch.zeros(h // 2, w // 2, 3) img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None] img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :] img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs) if t5_emb.shape[0] == 1 and bs > 1: t5_emb = repeat(t5_emb, "1 ... -> bs ...", bs=bs) t5_emb_ids = torch.zeros(bs, t5_emb.shape[1], 3) if clip_emb.shape[0] == 1 and bs > 1: clip_emb = repeat(clip_emb, "1 ... -> bs ...", bs=bs) return { "img":img, "img_ids":img_ids.to(img.device, dtype=img.dtype), "txt":t5_emb.to(img.device, dtype=img.dtype), "txt_ids":t5_emb_ids.to(img.device, dtype=img.dtype), "vec":clip_emb.to(img.device, dtype=img.dtype) } def load_misto_transoformer_cn(device): with torch.device(device): controlnet = MistoControlNetFluxDev( in_channels=64, vec_in_dim=768, context_in_dim=4096, hidden_size=3072, num_heads=24, num_transformer=3, num_single_transformer=2, guidance_embed=True, ) return controlnet def img_preprocessor(image, res): _, _, h, w = image.shape scale = res / min(h, w) new_h, new_w = int(h * scale), int(w * scale) resized = F.interpolate(image, size=(new_h, new_w), mode='bilinear', align_corners=False) crop_h = int((new_h // 16) * 16) crop_w = int((new_w // 16) * 16) start_h = (new_h - crop_h) // 2 start_w = (new_w - crop_w) // 2 cropped = resized[:, :, start_h:start_h + crop_h, start_w:start_w + crop_w] return cropped class LoadMistoFluxControlNet: @classmethod def INPUT_TYPES(s): return {"required": { "model_name": (folder_paths.get_filename_list("TheMisto_model"),) }} RETURN_TYPES = ("MistoFluxControlNet",) RETURN_NAMES = ("ControlNet",) FUNCTION = "load_model" CATEGORY = "TheMistoAINodes" def load_model(self,model_name): device=comfy.model_management.get_torch_device() misto_cn = load_misto_transoformer_cn(device=device) ckpt_path = os.path.join(dir_TheMistoModel, model_name) if '.bin' in model_name: state_dict = torch.load(ckpt_path, map_location='cpu') else: state_dict = load_file(ckpt_path) miss_, error_ = misto_cn.load_state_dict(state_dict,strict=False) misto_cn.eval() print(miss_, error_) return (misto_cn,) class ApplyMistoFluxControlNet: @classmethod def INPUT_TYPES(s): return {"required": {"controlnet": ("MistoFluxControlNet",), "image": ("IMAGE",), "resolution": ("INT", {"default":960, "min": 512, "max": 4096}), "strength": ("FLOAT", {"default": 0.85, "min": 0.0, "max": 2.0, "step": 0.01}) }} RETURN_TYPES = ("ControlNetCondition","IMAGE") RETURN_NAMES = ("controlnet_condition","cond_image") FUNCTION = "embedding" CATEGORY = "TheMistoAINodes" def embedding(self, controlnet, image, resolution, strength): cond_img = torch.from_numpy((np.array(image) * 2) - 1) cond_img = cond_img.permute(0, 3, 1, 2) res_img = img_preprocessor(image=cond_img, res=resolution) out_img = res_img.permute(0, 2, 3, 1) out_img = (out_img + 1) / 2 cond_out = { "img": res_img, "controlnet_strength": strength, "model": controlnet, } return (cond_out,out_img) class KSamplerTheMisto: @classmethod def INPUT_TYPES(s): return { "required": { "model": ("MODEL",), "ae":("VAE",), "positive": ("CONDITIONING",), "negative": ("CONDITIONING",), "controlnet_condition": ("ControlNetCondition", {"default": None}), "batch_size": ("INT", {"default":1, "min": 1, "max": 100}), "guidance": ("FLOAT", {"default": 3.5, "min": 0.1, "max": 30}), "seed": ("INT", {"default": 0, "min": 0, "max": 0xffffffffffffffff}), "steps": ("INT", {"default": 20, "min": 1, "max": 100}), }, } RETURN_TYPES = ("IMAGE",) RETURN_NAMES = ("image",) FUNCTION = "sampling" CATEGORY = "TheMistoAINodes" def sampling(self, model,ae, positive, negative,controlnet_condition,batch_size,guidance,seed, steps ): # device ,dtype and pbar device = comfy.model_management.get_torch_device() dtype_model = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16 pbar = ProgressBar(steps+10) pbar.update(1) # model comfy.model_management.load_model_gpu(model) flux_model = model.model.diffusion_model pbar.update(3) # cn cond cn_model = controlnet_condition['model'] cond_img = controlnet_condition['img'].to(torch.bfloat16).to(device) cn_strength = controlnet_condition['controlnet_strength'] bc, c, h, w = cond_img.shape height = (h//16) * 16 width = (w//16) * 16 pbar.update(2) with torch.no_grad(): # set scheduler timesteps = get_schedule( steps, (width // 8) * (height // 8) // 4, shift=True, ) x = get_noise( 1, height, width, device=device, dtype=dtype_model, seed=seed) p_inp_cond = prepare_sampling(positive[0][0], positive[0][1]['pooled_output'], img=x, batch_size=batch_size) n_inp_cond = prepare_sampling(negative[0][0], negative[0][1]['pooled_output'], img=x, batch_size=batch_size) pbar.update(1) # denoise x = denoise_controlnet( pbar=pbar, model=flux_model, **p_inp_cond, controlnet=cn_model, timesteps=timesteps, guidance=guidance, controlnet_cond=cond_img, controlnet_strength = cn_strength, neg_txt=n_inp_cond['txt'], neg_txt_ids=n_inp_cond['txt_ids'], neg_vec=n_inp_cond['vec'], ) x = unpack(x.float(), height, width) lat_processor = LATENT_PROCESSOR_COMFY() x = lat_processor(x) pbar.update(1) return (ae.decode(x),) NODE_CLASS_MAPPINGS = { "LoadTheMistoFluxControlNet": LoadMistoFluxControlNet, "ApplyTheMistoFluxControlNet": ApplyMistoFluxControlNet, "KSamplerTheMisto":KSamplerTheMisto, } NODE_DISPLAY_NAME_MAPPINGS = { "LoadTheMistoFluxControlNet": "Load MistoCN-Flux.dev", "ApplyTheMistoFluxControlNet": "Apply MistoCN-Flux.dev", "KSamplerTheMisto":"KSampler for MistoCN-Flux.dev", } ================================================ FILE: workflows/example_workflow.json ================================================ { "last_node_id": 40, "last_link_id": 44, "nodes": [ { "id": 13, "type": "VAELoader", "pos": { "0": 60, "1": 327, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 315, "1": 58 }, "flags": {}, "order": 0, "mode": 0, "inputs": [], "outputs": [ { "name": "VAE", "type": "VAE", "links": [ 19 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "VAELoader" }, "widgets_values": [ "ae.safetensors" ] }, { "id": 10, "type": "DualCLIPLoader", "pos": { "0": 58, "1": 27, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 348.2605285644531, "1": 106 }, "flags": {}, "order": 1, "mode": 0, "inputs": [], "outputs": [ { "name": "CLIP", "type": "CLIP", "links": [ 24, 26 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "DualCLIPLoader" }, "widgets_values": [ "clip_l.safetensors", "t5xxl_fp8_e4m3fn.safetensors", "flux" ] }, { "id": 36, "type": "AnyLineArtPreprocessor_aux", "pos": { "0": 250, "1": 569, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 315, "1": 178 }, "flags": {}, "order": 11, "mode": 0, "inputs": [ { "name": "image", "type": "IMAGE", "link": 38 } ], "outputs": [ { "name": "image", "type": "IMAGE", "links": [ 37 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "AnyLineArtPreprocessor_aux" }, "widgets_values": [ "lineart_standard", 960, 0, 1, 36, 1 ] }, { "id": 17, "type": "LoadTheMistoFluxControlNet", "pos": { "0": 54, "1": 445, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 392, "1": 60 }, "flags": {}, "order": 2, "mode": 0, "inputs": [], "outputs": [ { "name": "ControlNet", "type": "MistoFluxControlNet", "links": [ 40 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "LoadTheMistoFluxControlNet" }, "widgets_values": [ "mistoline_flux.dev_v1.safetensors" ] }, { "id": 37, "type": "LoadImage", "pos": { "0": -92, "1": 568, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": [ 315, 314 ], "flags": {}, "order": 3, "mode": 0, "inputs": [], "outputs": [ { "name": "IMAGE", "type": "IMAGE", "links": [ 38 ], "shape": 3, "slot_index": 0 }, { "name": "MASK", "type": "MASK", "links": null, "shape": 3 } ], "properties": { "Node name for S&R": "LoadImage" }, "widgets_values": [ "image--73-.png", "image" ] }, { "id": 26, "type": "CLIPTextEncodeFlux", "pos": { "0": 819, "1": 29, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 400, "1": 200 }, "flags": {}, "order": 9, "mode": 0, "inputs": [ { "name": "clip", "type": "CLIP", "link": 24 } ], "outputs": [ { "name": "CONDITIONING", "type": "CONDITIONING", "links": [ 25 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "CLIPTextEncodeFlux" }, "widgets_values": [ "Model in modern attire with metallic accessories, in an old factory setting, Metallic sheen, Full-frame mirrorless, 35mm lens, f/2.8 aperture, ISO 500, off-camera flash", "Model in modern attire with metallic accessories, in an old factory setting, Metallic sheen, Full-frame mirrorless, 35mm lens, f/2.8 aperture, ISO 500, off-camera flash", 3.5 ] }, { "id": 4, "type": "CheckpointLoaderSimple", "pos": { "0": 64, "1": 179, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 315, "1": 98 }, "flags": {}, "order": 4, "mode": 0, "inputs": [], "outputs": [ { "name": "MODEL", "type": "MODEL", "links": [ 12 ], "slot_index": 0 }, { "name": "CLIP", "type": "CLIP", "links": [], "slot_index": 1 }, { "name": "VAE", "type": "VAE", "links": [], "slot_index": 2 } ], "properties": { "Node name for S&R": "CheckpointLoaderSimple" }, "widgets_values": [ "flux1-dev-fp8.safetensors" ] }, { "id": 28, "type": "CLIPTextEncodeFlux", "pos": { "0": 823, "1": 273, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 399.6443786621094, "1": 156.8167266845703 }, "flags": {}, "order": 10, "mode": 0, "inputs": [ { "name": "clip", "type": "CLIP", "link": 26 } ], "outputs": [ { "name": "CONDITIONING", "type": "CONDITIONING", "links": [ 27 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "CLIPTextEncodeFlux" }, "widgets_values": [ "out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.", "out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.", 3.5 ] }, { "id": 21, "type": "Note", "pos": { "0": 587, "1": 286, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 210, "1": 122.03421020507812 }, "flags": {}, "order": 5, "mode": 0, "inputs": [], "outputs": [], "properties": { "text": "" }, "widgets_values": [ "Make sure to craft your prompts well—precision is more important than length, otherwise the result could be very bad.\n\nPrompt非常重要,请用AI或者自己写一个好的prompt,不然效果会很糟糕。" ], "color": "#432", "bgcolor": "#653" }, { "id": 39, "type": "Note", "pos": { "0": 774, "1": 629, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": [ 210, 122.03421020507812 ], "flags": {}, "order": 6, "mode": 0, "inputs": [], "outputs": [], "properties": { "text": "" }, "widgets_values": [ "Very sensitive to strength, please start with about 0.5 and gradually increase\n\n对strength非常敏感,请从0.5开始尝试逐步增加。" ], "color": "#432", "bgcolor": "#653" }, { "id": 38, "type": "PreviewImage", "pos": { "0": 1015, "1": 479, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": [ 332.628470953563, 319.48089726588853 ], "flags": {}, "order": 15, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 41 } ], "outputs": [], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 35, "type": "ApplyTheMistoFluxControlNet", "pos": { "0": 593, "1": 482, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 393, "1": 102 }, "flags": {}, "order": 13, "mode": 0, "inputs": [ { "name": "controlnet", "type": "MistoFluxControlNet", "link": 40 }, { "name": "image", "type": "IMAGE", "link": 37 } ], "outputs": [ { "name": "controlnet_condition", "type": "ControlNetCondition", "links": [ 39 ], "shape": 3, "slot_index": 0 }, { "name": "cond_image", "type": "IMAGE", "links": [ 41 ], "shape": 3, "slot_index": 1 } ], "properties": { "Node name for S&R": "ApplyTheMistoFluxControlNet" }, "widgets_values": [ 960, 0.5 ] }, { "id": 15, "type": "LoraLoaderModelOnly", "pos": { "0": 498, "1": -118, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 380.00384521484375, "1": 82 }, "flags": {}, "order": 12, "mode": 0, "inputs": [ { "name": "model", "type": "MODEL", "link": 12 } ], "outputs": [ { "name": "MODEL", "type": "MODEL", "links": [ 44 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "LoraLoaderModelOnly" }, "widgets_values": [ "Hyper-FLUX.1-dev-16steps-lora.safetensors", 0.13 ] }, { "id": 19, "type": "KSamplerTheMisto", "pos": { "0": 1251, "1": -121, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 330, "1": 234 }, "flags": {}, "order": 14, "mode": 0, "inputs": [ { "name": "model", "type": "MODEL", "link": 44 }, { "name": "ae", "type": "VAE", "link": 19 }, { "name": "positive", "type": "CONDITIONING", "link": 25 }, { "name": "negative", "type": "CONDITIONING", "link": 27 }, { "name": "controlnet_condition", "type": "ControlNetCondition", "link": 39 } ], "outputs": [ { "name": "image", "type": "IMAGE", "links": [ 20 ], "slot_index": 0, "shape": 3 } ], "properties": { "Node name for S&R": "KSamplerTheMisto" }, "widgets_values": [ 1, 4, 1039574509141705, "randomize", 16 ] }, { "id": 22, "type": "PreviewImage", "pos": { "0": 1614, "1": -128, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": [ 461.88277135920316, 499.946617357832 ], "flags": {}, "order": 16, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 20 } ], "outputs": [], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 20, "type": "Note", "pos": { "0": 586, "1": 55, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": { "0": 210, "1": 122.03421020507812 }, "flags": {}, "order": 7, "mode": 0, "inputs": [], "outputs": [], "properties": { "text": "" }, "widgets_values": [ "Make sure to craft your prompts well—precision is more important than length, otherwise the result could be very bad.\n\nPrompt非常重要,请用AI或者自己写一个好的prompt,不然效果会很糟糕。" ], "color": "#432", "bgcolor": "#653" }, { "id": 40, "type": "Note", "pos": { "0": 1258, "1": -232, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0 }, "size": [ 210, 67.63623725377664 ], "flags": {}, "order": 8, "mode": 0, "inputs": [], "outputs": [], "properties": { "text": "" }, "widgets_values": [ "no autoCFG\n\n不要使用自动CFG" ], "color": "#432", "bgcolor": "#653" } ], "links": [ [ 12, 4, 0, 15, 0, "MODEL" ], [ 19, 13, 0, 19, 1, "VAE" ], [ 20, 19, 0, 22, 0, "IMAGE" ], [ 24, 10, 0, 26, 0, "CLIP" ], [ 25, 26, 0, 19, 2, "CONDITIONING" ], [ 26, 10, 0, 28, 0, "CLIP" ], [ 27, 28, 0, 19, 3, "CONDITIONING" ], [ 37, 36, 0, 35, 1, "IMAGE" ], [ 38, 37, 0, 36, 0, "IMAGE" ], [ 39, 35, 0, 19, 4, "ControlNetCondition" ], [ 40, 17, 0, 35, 0, "MistoFluxControlNet" ], [ 41, 35, 1, 38, 0, "IMAGE" ], [ 44, 15, 0, 19, 0, "MODEL" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 0.9090909090909091, "offset": [ 384.31753255840897, 385.7533316712783 ] } }, "version": 0.4 }