Full Code of KlingAIResearch/LivePortrait for AI

main 49784e879821 cached

145 files

752.1 KB

198.4k tokens

718 symbols

1 requests

Download .txt

Showing preview only (795K chars total). Download the full file or copy to clipboard to get everything.

Repository: KlingAIResearch/LivePortrait
Branch: main
Commit: 49784e879821
Files: 145
Total size: 752.1 KB

Directory structure:
gitextract_l1jm0s2t/

├── .gitignore
├── .vscode/
│   └── settings.json
├── LICENSE
├── app.py
├── app_animals.py
├── assets/
│   ├── .gitignore
│   ├── docs/
│   │   ├── changelog/
│   │   │   ├── 2024-07-10.md
│   │   │   ├── 2024-07-19.md
│   │   │   ├── 2024-07-24.md
│   │   │   ├── 2024-08-02.md
│   │   │   ├── 2024-08-05.md
│   │   │   ├── 2024-08-06.md
│   │   │   ├── 2024-08-19.md
│   │   │   └── 2025-01-01.md
│   │   ├── directory-structure.md
│   │   ├── how-to-install-ffmpeg.md
│   │   └── speed.md
│   ├── examples/
│   │   └── driving/
│   │       ├── aggrieved.pkl
│   │       ├── d1.pkl
│   │       ├── d2.pkl
│   │       ├── d5.pkl
│   │       ├── d7.pkl
│   │       ├── d8.pkl
│   │       ├── laugh.pkl
│   │       ├── open_lip.pkl
│   │       ├── shake_face.pkl
│   │       ├── shy.pkl
│   │       ├── talking.pkl
│   │       └── wink.pkl
│   └── gradio/
│       ├── gradio_description_animate_clear.md
│       ├── gradio_description_animation.md
│       ├── gradio_description_retargeting.md
│       ├── gradio_description_retargeting_video.md
│       ├── gradio_description_upload.md
│       ├── gradio_description_upload_animal.md
│       └── gradio_title.md
├── inference.py
├── inference_animals.py
├── pretrained_weights/
│   └── .gitkeep
├── readme.md
├── readme_zh_cn.md
├── requirements.txt
├── requirements_base.txt
├── requirements_macOS.txt
├── speed.py
└── src/
    ├── config/
    │   ├── __init__.py
    │   ├── argument_config.py
    │   ├── base_config.py
    │   ├── crop_config.py
    │   ├── inference_config.py
    │   └── models.yaml
    ├── gradio_pipeline.py
    ├── live_portrait_pipeline.py
    ├── live_portrait_pipeline_animal.py
    ├── live_portrait_wrapper.py
    ├── modules/
    │   ├── __init__.py
    │   ├── appearance_feature_extractor.py
    │   ├── convnextv2.py
    │   ├── dense_motion.py
    │   ├── motion_extractor.py
    │   ├── spade_generator.py
    │   ├── stitching_retargeting_network.py
    │   ├── util.py
    │   └── warping_network.py
    └── utils/
        ├── __init__.py
        ├── animal_landmark_runner.py
        ├── camera.py
        ├── check_windows_port.py
        ├── crop.py
        ├── cropper.py
        ├── dependencies/
        │   ├── XPose/
        │   │   ├── config_model/
        │   │   │   ├── UniPose_SwinT.py
        │   │   │   └── coco_transformer.py
        │   │   ├── models/
        │   │   │   ├── UniPose/
        │   │   │   │   ├── __init__.py
        │   │   │   │   ├── attention.py
        │   │   │   │   ├── backbone.py
        │   │   │   │   ├── deformable_transformer.py
        │   │   │   │   ├── fuse_modules.py
        │   │   │   │   ├── mask_generate.py
        │   │   │   │   ├── ops/
        │   │   │   │   │   ├── functions/
        │   │   │   │   │   │   ├── __init__.py
        │   │   │   │   │   │   └── ms_deform_attn_func.py
        │   │   │   │   │   ├── modules/
        │   │   │   │   │   │   ├── __init__.py
        │   │   │   │   │   │   ├── ms_deform_attn.py
        │   │   │   │   │   │   └── ms_deform_attn_key_aware.py
        │   │   │   │   │   ├── setup.py
        │   │   │   │   │   ├── src/
        │   │   │   │   │   │   ├── cpu/
        │   │   │   │   │   │   │   ├── ms_deform_attn_cpu.cpp
        │   │   │   │   │   │   │   └── ms_deform_attn_cpu.h
        │   │   │   │   │   │   ├── cuda/
        │   │   │   │   │   │   │   ├── ms_deform_attn_cuda.cu
        │   │   │   │   │   │   │   ├── ms_deform_attn_cuda.h
        │   │   │   │   │   │   │   └── ms_deform_im2col_cuda.cuh
        │   │   │   │   │   │   ├── ms_deform_attn.h
        │   │   │   │   │   │   └── vision.cpp
        │   │   │   │   │   └── test.py
        │   │   │   │   ├── position_encoding.py
        │   │   │   │   ├── swin_transformer.py
        │   │   │   │   ├── transformer_deformable.py
        │   │   │   │   ├── transformer_vanilla.py
        │   │   │   │   ├── unipose.py
        │   │   │   │   └── utils.py
        │   │   │   ├── __init__.py
        │   │   │   └── registry.py
        │   │   ├── predefined_keypoints.py
        │   │   ├── transforms.py
        │   │   └── util/
        │   │       ├── addict.py
        │   │       ├── box_ops.py
        │   │       ├── config.py
        │   │       ├── keypoint_ops.py
        │   │       └── misc.py
        │   └── insightface/
        │       ├── __init__.py
        │       ├── app/
        │       │   ├── __init__.py
        │       │   ├── common.py
        │       │   └── face_analysis.py
        │       ├── data/
        │       │   ├── __init__.py
        │       │   ├── image.py
        │       │   ├── objects/
        │       │   │   └── meanshape_68.pkl
        │       │   ├── pickle_object.py
        │       │   └── rec_builder.py
        │       ├── model_zoo/
        │       │   ├── __init__.py
        │       │   ├── arcface_onnx.py
        │       │   ├── attribute.py
        │       │   ├── inswapper.py
        │       │   ├── landmark.py
        │       │   ├── model_store.py
        │       │   ├── model_zoo.py
        │       │   ├── retinaface.py
        │       │   └── scrfd.py
        │       └── utils/
        │           ├── __init__.py
        │           ├── constant.py
        │           ├── download.py
        │           ├── face_align.py
        │           ├── filesystem.py
        │           ├── storage.py
        │           └── transform.py
        ├── face_analysis_diy.py
        ├── filter.py
        ├── helper.py
        ├── human_landmark_runner.py
        ├── io.py
        ├── resources/
        │   ├── clip_embedding_68.pkl
        │   ├── clip_embedding_9.pkl
        │   └── lip_array.pkl
        ├── retargeting_utils.py
        ├── rprint.py
        ├── timer.py
        ├── video.py
        └── viz.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
**/__pycache__/
*.py[cod]
**/*.py[cod]
*$py.class

# Model weights
**/*.pth
**/*.onnx

pretrained_weights/*.md
pretrained_weights/docs
pretrained_weights/liveportrait
pretrained_weights/liveportrait_animals

# Ipython notebook
*.ipynb

# Temporary files or benchmark resources
animations/*
tmp/*
.vscode/launch.json
**/*.DS_Store
gradio_temp/**

# Windows dependencies
ffmpeg/
LivePortrait_env/

# XPose build files
src/utils/dependencies/XPose/models/UniPose/ops/build
src/utils/dependencies/XPose/models/UniPose/ops/dist
src/utils/dependencies/XPose/models/UniPose/ops/MultiScaleDeformableAttention.egg-info


================================================
FILE: .vscode/settings.json
================================================
{
    "[python]": {
        "editor.tabSize": 4
    },
    "files.eol": "\n",
    "files.insertFinalNewline": true,
    "files.trimFinalNewlines": true,
    "files.trimTrailingWhitespace": true,
    "files.exclude": {
        "**/.git": true,
        "**/.svn": true,
        "**/.hg": true,
        "**/CVS": true,
        "**/.DS_Store": true,
        "**/Thumbs.db": true,
        "**/*.crswap": true,
        "**/__pycache__": true
    }
}


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024 Kuaishou Visual Generation and Interaction Center

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

---

The code of InsightFace is released under the MIT License.
The models of InsightFace are for non-commercial research purposes only.

If you want to use the LivePortrait project for commercial purposes, you 
should remove and replace InsightFace’s detection models to fully comply with 
the MIT license.


================================================
FILE: app.py
================================================
# coding: utf-8

"""
The entrance of the gradio for human
"""

import os
import tyro
import subprocess
import gradio as gr
import os.path as osp
from src.utils.helper import load_description
from src.gradio_pipeline import GradioPipeline
from src.config.crop_config import CropConfig
from src.config.argument_config import ArgumentConfig
from src.config.inference_config import InferenceConfig


def partial_fields(target_class, kwargs):
    return target_class(**{k: v for k, v in kwargs.items() if hasattr(target_class, k)})


def fast_check_ffmpeg():
    try:
        subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
        return True
    except:
        return False


# set tyro theme
tyro.extras.set_accent_color("bright_cyan")
args = tyro.cli(ArgumentConfig)

ffmpeg_dir = os.path.join(os.getcwd(), "ffmpeg")
if osp.exists(ffmpeg_dir):
    os.environ["PATH"] += (os.pathsep + ffmpeg_dir)

if not fast_check_ffmpeg():
    raise ImportError(
        "FFmpeg is not installed. Please install FFmpeg (including ffmpeg and ffprobe) before running this script. https://ffmpeg.org/download.html"
    )
# specify configs for inference
inference_cfg = partial_fields(InferenceConfig, args.__dict__)  # use attribute of args to initial InferenceConfig
crop_cfg = partial_fields(CropConfig, args.__dict__)  # use attribute of args to initial CropConfig
# global_tab_selection = None

gradio_pipeline = GradioPipeline(
    inference_cfg=inference_cfg,
    crop_cfg=crop_cfg,
    args=args
)

if args.gradio_temp_dir not in (None, ''):
    os.environ["GRADIO_TEMP_DIR"] = args.gradio_temp_dir
    os.makedirs(args.gradio_temp_dir, exist_ok=True)


def gpu_wrapped_execute_video(*args, **kwargs):
    return gradio_pipeline.execute_video(*args, **kwargs)


def gpu_wrapped_execute_image_retargeting(*args, **kwargs):
    return gradio_pipeline.execute_image_retargeting(*args, **kwargs)


def gpu_wrapped_execute_video_retargeting(*args, **kwargs):
    return gradio_pipeline.execute_video_retargeting(*args, **kwargs)


def reset_sliders(*args, **kwargs):
    return 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.5, True, True


# assets
title_md = "assets/gradio/gradio_title.md"
example_portrait_dir = "assets/examples/source"
example_video_dir = "assets/examples/driving"
data_examples_i2v = [
    [osp.join(example_portrait_dir, "s9.jpg"), osp.join(example_video_dir, "d0.mp4"), True, True, True, False],
    [osp.join(example_portrait_dir, "s6.jpg"), osp.join(example_video_dir, "d0.mp4"), True, True, True, False],
    [osp.join(example_portrait_dir, "s10.jpg"), osp.join(example_video_dir, "d0.mp4"), True, True, True, False],
    [osp.join(example_portrait_dir, "s5.jpg"), osp.join(example_video_dir, "d18.mp4"), True, True, True, False],
    [osp.join(example_portrait_dir, "s7.jpg"), osp.join(example_video_dir, "d19.mp4"), True, True, True, False],
    [osp.join(example_portrait_dir, "s2.jpg"), osp.join(example_video_dir, "d13.mp4"), True, True, True, True],
]
data_examples_v2v = [
    [osp.join(example_portrait_dir, "s13.mp4"), osp.join(example_video_dir, "d0.mp4"), True, True, True, False, 3e-7],
    # [osp.join(example_portrait_dir, "s14.mp4"), osp.join(example_video_dir, "d18.mp4"), True, True, True, False, False, 3e-7],
    # [osp.join(example_portrait_dir, "s15.mp4"), osp.join(example_video_dir, "d19.mp4"), True, True, True, False, False, 3e-7],
    [osp.join(example_portrait_dir, "s18.mp4"), osp.join(example_video_dir, "d6.mp4"), True, True, True, False, 3e-7],
    # [osp.join(example_portrait_dir, "s19.mp4"), osp.join(example_video_dir, "d6.mp4"), True, True, True, False, False, 3e-7],
    [osp.join(example_portrait_dir, "s20.mp4"), osp.join(example_video_dir, "d0.mp4"), True, True, True, False, 3e-7],
]
#################### interface logic ####################

# Define components first
retargeting_source_scale = gr.Number(minimum=1.8, maximum=3.2, value=2.5, step=0.05, label="crop scale")
video_retargeting_source_scale = gr.Number(minimum=1.8, maximum=3.2, value=2.3, step=0.05, label="crop scale")
driving_smooth_observation_variance_retargeting = gr.Number(value=3e-6, label="motion smooth strength", minimum=1e-11, maximum=1e-2, step=1e-8)
video_retargeting_silence = gr.Checkbox(value=False, label="keeping the lip silent")
eye_retargeting_slider = gr.Slider(minimum=0, maximum=0.8, step=0.01, label="target eyes-open ratio")
lip_retargeting_slider = gr.Slider(minimum=0, maximum=0.8, step=0.01, label="target lip-open ratio")
video_lip_retargeting_slider = gr.Slider(minimum=0, maximum=0.8, step=0.01, label="target lip-open ratio")
head_pitch_slider = gr.Slider(minimum=-15.0, maximum=15.0, value=0, step=1, label="relative pitch")
head_yaw_slider = gr.Slider(minimum=-25, maximum=25, value=0, step=1, label="relative yaw")
head_roll_slider = gr.Slider(minimum=-15.0, maximum=15.0, value=0, step=1, label="relative roll")
mov_x = gr.Slider(minimum=-0.19, maximum=0.19, value=0.0, step=0.01, label="x-axis movement")
mov_y = gr.Slider(minimum=-0.19, maximum=0.19, value=0.0, step=0.01, label="y-axis movement")
mov_z = gr.Slider(minimum=0.9, maximum=1.2, value=1.0, step=0.01, label="z-axis movement")
lip_variation_zero = gr.Slider(minimum=-0.09, maximum=0.09, value=0, step=0.01, label="pouting")
lip_variation_one = gr.Slider(minimum=-20.0, maximum=15.0, value=0, step=0.01, label="pursing 😐")
lip_variation_two = gr.Slider(minimum=0.0, maximum=15.0, value=0, step=0.01, label="grin 😁")
lip_variation_three = gr.Slider(minimum=-90.0, maximum=120.0, value=0, step=1.0, label="lip close <-> open")
smile = gr.Slider(minimum=-0.3, maximum=1.3, value=0, step=0.01, label="smile 😄")
wink = gr.Slider(minimum=0, maximum=39, value=0, step=0.01, label="wink 😉")
eyebrow = gr.Slider(minimum=-30, maximum=30, value=0, step=0.01, label="eyebrow 🤨")
eyeball_direction_x = gr.Slider(minimum=-30.0, maximum=30.0, value=0, step=0.01, label="eye gaze (horizontal) 👀")
eyeball_direction_y = gr.Slider(minimum=-63.0, maximum=63.0, value=0, step=0.01, label="eye gaze (vertical) 🙄")
retargeting_input_image = gr.Image(type="filepath")
retargeting_input_video = gr.Video()
output_image = gr.Image(type="numpy")
output_image_paste_back = gr.Image(type="numpy")
retargeting_output_image = gr.Image(type="numpy")
retargeting_output_image_paste_back = gr.Image(type="numpy")
output_video = gr.Video(autoplay=False)
output_video_paste_back = gr.Video(autoplay=False)

with gr.Blocks(theme=gr.themes.Soft(font=[gr.themes.GoogleFont("Plus Jakarta Sans")])) as demo:
    gr.HTML(load_description(title_md))

    gr.Markdown(load_description("assets/gradio/gradio_description_upload.md"))
    with gr.Row():
        with gr.Column():
            with gr.Tabs():
                with gr.TabItem("🖼️ Source Image") as tab_image:
                    with gr.Accordion(open=True, label="Source Image"):
                        source_image_input = gr.Image(type="filepath")
                        gr.Examples(
                            examples=[
                                [osp.join(example_portrait_dir, "s9.jpg")],
                                [osp.join(example_portrait_dir, "s6.jpg")],
                                [osp.join(example_portrait_dir, "s10.jpg")],
                                [osp.join(example_portrait_dir, "s5.jpg")],
                                [osp.join(example_portrait_dir, "s7.jpg")],
                                [osp.join(example_portrait_dir, "s12.jpg")],
                                [osp.join(example_portrait_dir, "s22.jpg")],
                                [osp.join(example_portrait_dir, "s23.jpg")],
                            ],
                            inputs=[source_image_input],
                            cache_examples=False,
                        )

                with gr.TabItem("🎞️ Source Video") as tab_video:
                    with gr.Accordion(open=True, label="Source Video"):
                        source_video_input = gr.Video()
                        gr.Examples(
                            examples=[
                                [osp.join(example_portrait_dir, "s13.mp4")],
                                # [osp.join(example_portrait_dir, "s14.mp4")],
                                # [osp.join(example_portrait_dir, "s15.mp4")],
                                [osp.join(example_portrait_dir, "s18.mp4")],
                                # [osp.join(example_portrait_dir, "s19.mp4")],
                                [osp.join(example_portrait_dir, "s20.mp4")],
                            ],
                            inputs=[source_video_input],
                            cache_examples=False,
                        )

                tab_selection = gr.Textbox(visible=False)
                tab_image.select(lambda: "Image", None, tab_selection)
                tab_video.select(lambda: "Video", None, tab_selection)
            with gr.Accordion(open=True, label="Cropping Options for Source Image or Video"):
                with gr.Row():
                    flag_do_crop_input = gr.Checkbox(value=True, label="do crop (source)")
                    scale = gr.Number(value=2.3, label="source crop scale", minimum=1.8, maximum=3.2, step=0.05)
                    vx_ratio = gr.Number(value=0.0, label="source crop x", minimum=-0.5, maximum=0.5, step=0.01)
                    vy_ratio = gr.Number(value=-0.125, label="source crop y", minimum=-0.5, maximum=0.5, step=0.01)

        with gr.Column():
            with gr.Tabs():
                with gr.TabItem("🎞️ Driving Video") as v_tab_video:
                    with gr.Accordion(open=True, label="Driving Video"):
                        driving_video_input = gr.Video()
                        gr.Examples(
                            examples=[
                                [osp.join(example_video_dir, "d0.mp4")],
                                [osp.join(example_video_dir, "d18.mp4")],
                                [osp.join(example_video_dir, "d19.mp4")],
                                [osp.join(example_video_dir, "d14.mp4")],
                                [osp.join(example_video_dir, "d6.mp4")],
                                [osp.join(example_video_dir, "d20.mp4")],
                            ],
                            inputs=[driving_video_input],
                            cache_examples=False,
                        )
                with gr.TabItem("🖼️ Driving Image") as v_tab_image:
                    with gr.Accordion(open=True, label="Driving Image"):
                        driving_image_input = gr.Image(type="filepath")
                        gr.Examples(
                            examples=[
                                [osp.join(example_video_dir, "d30.jpg")],
                                [osp.join(example_video_dir, "d9.jpg")],
                                [osp.join(example_video_dir, "d19.jpg")],
                                [osp.join(example_video_dir, "d8.jpg")],
                                [osp.join(example_video_dir, "d12.jpg")],
                                [osp.join(example_video_dir, "d38.jpg")],
                            ],
                            inputs=[driving_image_input],
                            cache_examples=False,
                        )

                with gr.TabItem("📁 Driving Pickle") as v_tab_pickle:
                    with gr.Accordion(open=True, label="Driving Pickle"):
                        driving_video_pickle_input = gr.File(type="filepath", file_types=[".pkl"])
                        gr.Examples(
                            examples=[
                                [osp.join(example_video_dir, "d1.pkl")],
                                [osp.join(example_video_dir, "d2.pkl")],
                                [osp.join(example_video_dir, "d5.pkl")],
                                [osp.join(example_video_dir, "d7.pkl")],
                                [osp.join(example_video_dir, "d8.pkl")],
                            ],
                            inputs=[driving_video_pickle_input],
                            cache_examples=False,
                        )

                v_tab_selection = gr.Textbox(visible=False)
                v_tab_video.select(lambda: "Video", None, v_tab_selection)
                v_tab_image.select(lambda: "Image", None, v_tab_selection)
                v_tab_pickle.select(lambda: "Pickle", None, v_tab_selection)
            # with gr.Accordion(open=False, label="Animation Instructions"):
                # gr.Markdown(load_description("assets/gradio/gradio_description_animation.md"))
            with gr.Accordion(open=True, label="Cropping Options for Driving Video"):
                with gr.Row():
                    flag_crop_driving_video_input = gr.Checkbox(value=False, label="do crop (driving)")
                    scale_crop_driving_video = gr.Number(value=2.2, label="driving crop scale", minimum=1.8, maximum=3.2, step=0.05)
                    vx_ratio_crop_driving_video = gr.Number(value=0.0, label="driving crop x", minimum=-0.5, maximum=0.5, step=0.01)
                    vy_ratio_crop_driving_video = gr.Number(value=-0.1, label="driving crop y", minimum=-0.5, maximum=0.5, step=0.01)

    with gr.Row():
        with gr.Accordion(open=True, label="Animation Options"):
            with gr.Row():
                flag_normalize_lip = gr.Checkbox(value=False, label="normalize lip")
                flag_relative_input = gr.Checkbox(value=True, label="relative motion")
                flag_remap_input = gr.Checkbox(value=True, label="paste-back")
                flag_stitching_input = gr.Checkbox(value=True, label="stitching")
                animation_region = gr.Radio(["exp", "pose", "lip", "eyes", "all"], value="all", label="animation region")
                driving_option_input = gr.Radio(['expression-friendly', 'pose-friendly'], value="expression-friendly", label="driving option (i2v)")
                driving_multiplier = gr.Number(value=1.0, label="driving multiplier (i2v)", minimum=0.0, maximum=2.0, step=0.02)
                driving_smooth_observation_variance = gr.Number(value=3e-7, label="motion smooth strength (v2v)", minimum=1e-11, maximum=1e-2, step=1e-8)

    gr.Markdown(load_description("assets/gradio/gradio_description_animate_clear.md"))
    with gr.Row():
        process_button_animation = gr.Button("🚀 Animate", variant="primary")
    with gr.Row():
        with gr.Column():
            output_video_i2v = gr.Video(autoplay=False, label="The animated video in the original image space")
        with gr.Column():
            output_video_concat_i2v = gr.Video(autoplay=False, label="The animated video")
    with gr.Row():
        with gr.Column():
            output_image_i2i = gr.Image(type="numpy", label="The animated image in the original image space", visible=False)
        with gr.Column():
            output_image_concat_i2i = gr.Image(type="numpy", label="The animated image", visible=False)
    with gr.Row():
        process_button_reset = gr.ClearButton([source_image_input, source_video_input, driving_video_pickle_input, driving_video_input, driving_image_input, output_video_i2v, output_video_concat_i2v, output_image_i2i, output_image_concat_i2i], value="🧹 Clear")

    with gr.Row():
        # Examples
        gr.Markdown("## You could also choose the examples below by one click ⬇️")
    with gr.Row():
        with gr.Tabs():
            with gr.TabItem("🖼️ Portrait Animation"):
                gr.Examples(
                    examples=data_examples_i2v,
                    fn=gpu_wrapped_execute_video,
                    inputs=[
                        source_image_input,
                        driving_video_input,
                        flag_relative_input,
                        flag_do_crop_input,
                        flag_remap_input,
                        flag_crop_driving_video_input,
                    ],
                    outputs=[output_image, output_image_paste_back],
                    examples_per_page=len(data_examples_i2v),
                    cache_examples=False,
                )
            with gr.TabItem("🎞️ Portrait Video Editing"):
                gr.Examples(
                    examples=data_examples_v2v,
                    fn=gpu_wrapped_execute_video,
                    inputs=[
                        source_video_input,
                        driving_video_input,
                        flag_relative_input,
                        flag_do_crop_input,
                        flag_remap_input,
                        flag_crop_driving_video_input,
                        driving_smooth_observation_variance,
                    ],
                    outputs=[output_image, output_image_paste_back],
                    examples_per_page=len(data_examples_v2v),
                    cache_examples=False,
                )

    # Retargeting Image
    gr.Markdown(load_description("assets/gradio/gradio_description_retargeting.md"), visible=True)
    with gr.Row(visible=True):
        flag_do_crop_input_retargeting_image = gr.Checkbox(value=True, label="do crop (source)")
        flag_stitching_retargeting_input = gr.Checkbox(value=True, label="stitching")
        retargeting_source_scale.render()
        eye_retargeting_slider.render()
        lip_retargeting_slider.render()
    with gr.Row(visible=True):
        with gr.Column():
            with gr.Accordion(open=True, label="Facial movement sliders"):
                with gr.Row(visible=True):
                    head_pitch_slider.render()
                    head_yaw_slider.render()
                    head_roll_slider.render()
                with gr.Row(visible=True):
                    mov_x.render()
                    mov_y.render()
                    mov_z.render()
        with gr.Column():
            with gr.Accordion(open=True, label="Facial expression sliders"):
                with gr.Row(visible=True):
                    lip_variation_zero.render()
                    lip_variation_one.render()
                    lip_variation_two.render()
                with gr.Row(visible=True):
                    lip_variation_three.render()
                    smile.render()
                    wink.render()
                with gr.Row(visible=True):
                    eyebrow.render()
                    eyeball_direction_x.render()
                    eyeball_direction_y.render()
    with gr.Row(visible=True):
        reset_button = gr.Button("🔄 Reset")
        reset_button.click(
            fn=reset_sliders,
            inputs=None,
            outputs=[
                head_pitch_slider, head_yaw_slider, head_roll_slider, mov_x, mov_y, mov_z,
                lip_variation_zero, lip_variation_one, lip_variation_two, lip_variation_three, smile, wink, eyebrow, eyeball_direction_x, eyeball_direction_y,
                retargeting_source_scale, flag_stitching_retargeting_input, flag_do_crop_input_retargeting_image
            ]
        )
    with gr.Row(visible=True):
        with gr.Column():
            with gr.Accordion(open=True, label="Retargeting Image Input"):
                retargeting_input_image.render()
                gr.Examples(
                    examples=[
                        [osp.join(example_portrait_dir, "s9.jpg")],
                        [osp.join(example_portrait_dir, "s6.jpg")],
                        [osp.join(example_portrait_dir, "s10.jpg")],
                        [osp.join(example_portrait_dir, "s5.jpg")],
                        [osp.join(example_portrait_dir, "s7.jpg")],
                        [osp.join(example_portrait_dir, "s12.jpg")],
                        [osp.join(example_portrait_dir, "s22.jpg")],
                        # [osp.join(example_portrait_dir, "s23.jpg")],
                        [osp.join(example_portrait_dir, "s42.jpg")],
                    ],
                    inputs=[retargeting_input_image],
                    cache_examples=False,
                )
        with gr.Column():
            with gr.Accordion(open=True, label="Retargeting Result"):
                retargeting_output_image.render()
        with gr.Column():
            with gr.Accordion(open=True, label="Paste-back Result"):
                retargeting_output_image_paste_back.render()
    with gr.Row(visible=True):
        process_button_reset_retargeting = gr.ClearButton(
            [
                retargeting_input_image,
                retargeting_output_image,
                retargeting_output_image_paste_back,
            ],
            value="🧹 Clear"
        )

    # Retargeting Video
    gr.Markdown(load_description("assets/gradio/gradio_description_retargeting_video.md"), visible=True)
    with gr.Row(visible=True):
        flag_do_crop_input_retargeting_video = gr.Checkbox(value=True, label="do crop (source)")
        video_retargeting_source_scale.render()
        video_lip_retargeting_slider.render()
        driving_smooth_observation_variance_retargeting.render()
        video_retargeting_silence.render()
    with gr.Row(visible=True):
        process_button_retargeting_video = gr.Button("🚗 Retargeting Video", variant="primary")
    with gr.Row(visible=True):
        with gr.Column():
            with gr.Accordion(open=True, label="Retargeting Video Input"):
                retargeting_input_video.render()
                gr.Examples(
                    examples=[
                        [osp.join(example_portrait_dir, "s13.mp4")],
                        # [osp.join(example_portrait_dir, "s18.mp4")],
                        # [osp.join(example_portrait_dir, "s20.mp4")],
                        [osp.join(example_portrait_dir, "s29.mp4")],
                        [osp.join(example_portrait_dir, "s32.mp4")],
                        [osp.join(example_video_dir, "d3.mp4")],
                    ],
                    inputs=[retargeting_input_video],
                    cache_examples=False,
                )
        with gr.Column():
            with gr.Accordion(open=True, label="Retargeting Result"):
                output_video.render()
        with gr.Column():
            with gr.Accordion(open=True, label="Paste-back Result"):
                output_video_paste_back.render()
    with gr.Row(visible=True):
        process_button_reset_retargeting = gr.ClearButton(
            [
                video_lip_retargeting_slider,
                retargeting_input_video,
                output_video,
                output_video_paste_back
            ],
            value="🧹 Clear"
        )

    # binding functions for buttons
    process_button_animation.click(
        fn=gpu_wrapped_execute_video,
        inputs=[
            source_image_input,
            source_video_input,
            driving_video_input,
            driving_image_input,
            driving_video_pickle_input,
            flag_normalize_lip,
            flag_relative_input,
            flag_do_crop_input,
            flag_remap_input,
            flag_stitching_input,
            animation_region,
            driving_option_input,
            driving_multiplier,
            flag_crop_driving_video_input,
            scale,
            vx_ratio,
            vy_ratio,
            scale_crop_driving_video,
            vx_ratio_crop_driving_video,
            vy_ratio_crop_driving_video,
            driving_smooth_observation_variance,
            tab_selection,
            v_tab_selection,
        ],
        outputs=[output_video_i2v, output_video_i2v, output_video_concat_i2v, output_video_concat_i2v, output_image_i2i, output_image_i2i, output_image_concat_i2i, output_image_concat_i2i],
        show_progress=True
    )


    retargeting_input_image.change(
        fn=gradio_pipeline.init_retargeting_image,
        inputs=[retargeting_source_scale, eye_retargeting_slider, lip_retargeting_slider, retargeting_input_image],
        outputs=[eye_retargeting_slider, lip_retargeting_slider]
    )

    sliders = [eye_retargeting_slider, lip_retargeting_slider, head_pitch_slider, head_yaw_slider, head_roll_slider, mov_x, mov_y, mov_z, lip_variation_zero, lip_variation_one, lip_variation_two, lip_variation_three, smile, wink, eyebrow, eyeball_direction_x, eyeball_direction_y]
    for slider in sliders:
        # NOTE: gradio >= 4.0.0 may cause slow response
        slider.change(
            fn=gpu_wrapped_execute_image_retargeting,
            inputs=[
                eye_retargeting_slider, lip_retargeting_slider, head_pitch_slider, head_yaw_slider, head_roll_slider, mov_x, mov_y, mov_z,
                lip_variation_zero, lip_variation_one, lip_variation_two, lip_variation_three, smile, wink, eyebrow, eyeball_direction_x, eyeball_direction_y,
                retargeting_input_image, retargeting_source_scale, flag_stitching_retargeting_input, flag_do_crop_input_retargeting_image
            ],
            outputs=[retargeting_output_image, retargeting_output_image_paste_back],
        )

    process_button_retargeting_video.click(
        fn=gpu_wrapped_execute_video_retargeting,
        inputs=[video_lip_retargeting_slider, retargeting_input_video, video_retargeting_source_scale, driving_smooth_observation_variance_retargeting, video_retargeting_silence, flag_do_crop_input_retargeting_video],
        outputs=[output_video, output_video_paste_back],
        show_progress=True
    )

demo.launch(
    server_port=args.server_port,
    share=args.share,
    server_name=args.server_name
)


================================================
FILE: app_animals.py
================================================
# coding: utf-8

"""
The entrance of the gradio for animal
"""

import os
import tyro
import subprocess
import gradio as gr
import os.path as osp
from src.utils.helper import load_description
from src.gradio_pipeline import GradioPipelineAnimal
from src.config.crop_config import CropConfig
from src.config.argument_config import ArgumentConfig
from src.config.inference_config import InferenceConfig


def partial_fields(target_class, kwargs):
    return target_class(**{k: v for k, v in kwargs.items() if hasattr(target_class, k)})


def fast_check_ffmpeg():
    try:
        subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
        return True
    except:
        return False


# set tyro theme
tyro.extras.set_accent_color("bright_cyan")
args = tyro.cli(ArgumentConfig)

ffmpeg_dir = os.path.join(os.getcwd(), "ffmpeg")
if osp.exists(ffmpeg_dir):
    os.environ["PATH"] += (os.pathsep + ffmpeg_dir)

if not fast_check_ffmpeg():
    raise ImportError(
        "FFmpeg is not installed. Please install FFmpeg (including ffmpeg and ffprobe) before running this script. https://ffmpeg.org/download.html"
    )
# specify configs for inference
inference_cfg = partial_fields(InferenceConfig, args.__dict__)  # use attribute of args to initial InferenceConfig
crop_cfg = partial_fields(CropConfig, args.__dict__)  # use attribute of args to initial CropConfig

gradio_pipeline_animal: GradioPipelineAnimal = GradioPipelineAnimal(
    inference_cfg=inference_cfg,
    crop_cfg=crop_cfg,
    args=args
)

if args.gradio_temp_dir not in (None, ''):
    os.environ["GRADIO_TEMP_DIR"] = args.gradio_temp_dir
    os.makedirs(args.gradio_temp_dir, exist_ok=True)

def gpu_wrapped_execute_video(*args, **kwargs):
    return gradio_pipeline_animal.execute_video(*args, **kwargs)


# assets
title_md = "assets/gradio/gradio_title.md"
example_portrait_dir = "assets/examples/source"
example_video_dir = "assets/examples/driving"
data_examples_i2v = [
    [osp.join(example_portrait_dir, "s41.jpg"), osp.join(example_video_dir, "d3.mp4"), True, False, False, False],
    [osp.join(example_portrait_dir, "s40.jpg"), osp.join(example_video_dir, "d6.mp4"), True, False, False, False],
    [osp.join(example_portrait_dir, "s25.jpg"), osp.join(example_video_dir, "d19.mp4"), True, False, False, False],
]
data_examples_i2v_pickle = [
    [osp.join(example_portrait_dir, "s25.jpg"), osp.join(example_video_dir, "wink.pkl"), True, False, False, False],
    [osp.join(example_portrait_dir, "s40.jpg"), osp.join(example_video_dir, "talking.pkl"), True, False, False, False],
    [osp.join(example_portrait_dir, "s41.jpg"), osp.join(example_video_dir, "aggrieved.pkl"), True, False, False, False],
]
#################### interface logic ####################

# Define components first
output_image = gr.Image(type="numpy")
output_image_paste_back = gr.Image(type="numpy")
output_video_i2v = gr.Video(autoplay=False)
output_video_concat_i2v = gr.Video(autoplay=False)
output_video_i2v_gif = gr.Image(type="numpy")


with gr.Blocks(theme=gr.themes.Soft(font=[gr.themes.GoogleFont("Plus Jakarta Sans")])) as demo:
    gr.HTML(load_description(title_md))

    gr.Markdown(load_description("assets/gradio/gradio_description_upload_animal.md"))
    with gr.Row():
        with gr.Column():
            with gr.Accordion(open=True, label="🐱 Source Animal Image"):
                source_image_input = gr.Image(type="filepath")
                gr.Examples(
                    examples=[
                        [osp.join(example_portrait_dir, "s25.jpg")],
                        [osp.join(example_portrait_dir, "s30.jpg")],
                        [osp.join(example_portrait_dir, "s31.jpg")],
                        [osp.join(example_portrait_dir, "s32.jpg")],
                        [osp.join(example_portrait_dir, "s33.jpg")],
                        [osp.join(example_portrait_dir, "s39.jpg")],
                        [osp.join(example_portrait_dir, "s40.jpg")],
                        [osp.join(example_portrait_dir, "s41.jpg")],
                        [osp.join(example_portrait_dir, "s38.jpg")],
                        [osp.join(example_portrait_dir, "s36.jpg")],
                    ],
                    inputs=[source_image_input],
                    cache_examples=False,
                )

            with gr.Accordion(open=True, label="Cropping Options for Source Image"):
                with gr.Row():
                    flag_do_crop_input = gr.Checkbox(value=True, label="do crop (source)")
                    scale = gr.Number(value=2.3, label="source crop scale", minimum=1.8, maximum=3.2, step=0.05)
                    vx_ratio = gr.Number(value=0.0, label="source crop x", minimum=-0.5, maximum=0.5, step=0.01)
                    vy_ratio = gr.Number(value=-0.125, label="source crop y", minimum=-0.5, maximum=0.5, step=0.01)

        with gr.Column():
            with gr.Tabs():
                with gr.TabItem("📁 Driving Pickle") as tab_pickle:
                    with gr.Accordion(open=True, label="Driving Pickle"):
                        driving_video_pickle_input = gr.File()
                        gr.Examples(
                            examples=[
                                [osp.join(example_video_dir, "wink.pkl")],
                                [osp.join(example_video_dir, "shy.pkl")],
                                [osp.join(example_video_dir, "aggrieved.pkl")],
                                [osp.join(example_video_dir, "open_lip.pkl")],
                                [osp.join(example_video_dir, "laugh.pkl")],
                                [osp.join(example_video_dir, "talking.pkl")],
                                [osp.join(example_video_dir, "shake_face.pkl")],
                            ],
                            inputs=[driving_video_pickle_input],
                            cache_examples=False,
                        )
                with gr.TabItem("🎞️ Driving Video") as tab_video:
                    with gr.Accordion(open=True, label="Driving Video"):
                        driving_video_input = gr.Video()
                        gr.Examples(
                            examples=[
                                # [osp.join(example_video_dir, "d0.mp4")],
                                # [osp.join(example_video_dir, "d18.mp4")],
                                [osp.join(example_video_dir, "d19.mp4")],
                                [osp.join(example_video_dir, "d14.mp4")],
                                [osp.join(example_video_dir, "d6.mp4")],
                                [osp.join(example_video_dir, "d3.mp4")],
                            ],
                            inputs=[driving_video_input],
                            cache_examples=False,
                        )

                    tab_selection = gr.Textbox(visible=False)
                    tab_pickle.select(lambda: "Pickle", None, tab_selection)
                    tab_video.select(lambda: "Video", None, tab_selection)
            with gr.Accordion(open=True, label="Cropping Options for Driving Video"):
                with gr.Row():
                    flag_crop_driving_video_input = gr.Checkbox(value=False, label="do crop (driving)")
                    scale_crop_driving_video = gr.Number(value=2.2, label="driving crop scale", minimum=1.8, maximum=3.2, step=0.05)
                    vx_ratio_crop_driving_video = gr.Number(value=0.0, label="driving crop x", minimum=-0.5, maximum=0.5, step=0.01)
                    vy_ratio_crop_driving_video = gr.Number(value=-0.1, label="driving crop y", minimum=-0.5, maximum=0.5, step=0.01)

    with gr.Row():
        with gr.Accordion(open=False, label="Animation Options"):
            with gr.Row():
                flag_stitching = gr.Checkbox(value=False, label="stitching (not recommended)")
                flag_remap_input = gr.Checkbox(value=False, label="paste-back (not recommended)")
                driving_multiplier = gr.Number(value=1.0, label="driving multiplier", minimum=0.0, maximum=2.0, step=0.02)

    gr.Markdown(load_description("assets/gradio/gradio_description_animate_clear.md"))
    with gr.Row():
        process_button_animation = gr.Button("🚀 Animate", variant="primary")
    with gr.Row():
        with gr.Column():
            with gr.Accordion(open=True, label="The animated video in the cropped image space"):
                output_video_i2v.render()
        with gr.Column():
            with gr.Accordion(open=True, label="The animated gif in the cropped image space"):
                output_video_i2v_gif.render()
        with gr.Column():
            with gr.Accordion(open=True, label="The animated video"):
                output_video_concat_i2v.render()
    with gr.Row():
        process_button_reset = gr.ClearButton([source_image_input, driving_video_input, output_video_i2v, output_video_concat_i2v, output_video_i2v_gif], value="🧹 Clear")

    with gr.Row():
        # Examples
        gr.Markdown("## You could also choose the examples below by one click ⬇️")
    with gr.Row():
        with gr.Tabs():
            with gr.TabItem("📁 Driving Pickle") as tab_video:
                gr.Examples(
                    examples=data_examples_i2v_pickle,
                    fn=gpu_wrapped_execute_video,
                    inputs=[
                        source_image_input,
                        driving_video_pickle_input,
                        flag_do_crop_input,
                        flag_stitching,
                        flag_remap_input,
                        flag_crop_driving_video_input,
                    ],
                    outputs=[output_image, output_image_paste_back, output_video_i2v_gif],
                    examples_per_page=len(data_examples_i2v_pickle),
                    cache_examples=False,
                )
            with gr.TabItem("🎞️ Driving Video") as tab_video:
                gr.Examples(
                    examples=data_examples_i2v,
                    fn=gpu_wrapped_execute_video,
                    inputs=[
                        source_image_input,
                        driving_video_input,
                        flag_do_crop_input,
                        flag_stitching,
                        flag_remap_input,
                        flag_crop_driving_video_input,
                    ],
                    outputs=[output_image, output_image_paste_back, output_video_i2v_gif],
                    examples_per_page=len(data_examples_i2v),
                    cache_examples=False,
                )

    process_button_animation.click(
        fn=gpu_wrapped_execute_video,
        inputs=[
            source_image_input,
            driving_video_input,
            driving_video_pickle_input,
            flag_do_crop_input,
            flag_remap_input,
            driving_multiplier,
            flag_stitching,
            flag_crop_driving_video_input,
            scale,
            vx_ratio,
            vy_ratio,
            scale_crop_driving_video,
            vx_ratio_crop_driving_video,
            vy_ratio_crop_driving_video,
            tab_selection,
        ],
        outputs=[output_video_i2v, output_video_concat_i2v, output_video_i2v_gif],
        show_progress=True
    )

demo.launch(
    server_port=args.server_port,
    share=args.share,
    server_name=args.server_name
)


================================================
FILE: assets/.gitignore
================================================
examples/driving/*.pkl
examples/driving/*_crop.mp4


================================================
FILE: assets/docs/changelog/2024-07-10.md
================================================
## 2024/07/10

**First, thank you all for your attention, support, sharing, and contributions to LivePortrait!** ❤️
The popularity of LivePortrait has exceeded our expectations. If you encounter any issues or other problems and we do not respond promptly, please accept our apologies. We are still actively updating and improving this repository.

### Updates

- <strong>Audio and video concatenating: </strong> If the driving video contains audio, it will automatically be included in the generated video. Additionally, the generated video will maintain the same FPS as the driving video. If you run LivePortrait on Windows, you need to install `ffprobe` and `ffmpeg` exe, see issue [#94](https://github.com/KlingTeam/LivePortrait/issues/94).

- <strong>Driving video auto-cropping: </strong> Implemented automatic cropping for driving videos by tracking facial landmarks and calculating a global cropping box with a 1:1 aspect ratio. Alternatively, you can crop using video editing software or other tools to achieve a 1:1 ratio. Auto-cropping is not enbaled by default, you can specify it by `--flag_crop_driving_video`.

- <strong>Motion template making: </strong> Added the ability to create motion templates to protect privacy. The motion template is a `.pkl` file that only contains the motions of the driving video. Theoretically, it is impossible to reconstruct the original face from the template. These motion templates can be used to generate videos without needing the original driving video. By default, the motion template will be generated and saved as a `.pkl` file with the same name as the driving video, e.g., `d0.mp4` -> `d0.pkl`. Once generated, you can specify it using the `-d` or `--driving` option.


### About driving video

- For a guide on using your own driving video, see the [driving video auto-cropping](https://github.com/KlingTeam/LivePortrait/tree/main?tab=readme-ov-file#driving-video-auto-cropping) section.


### Others

- If you encounter a black box problem, disable half-precision inference by using `--no_flag_use_half_precision`, reported by issue [#40](https://github.com/KlingTeam/LivePortrait/issues/40), [#48](https://github.com/KlingTeam/LivePortrait/issues/48), [#62](https://github.com/KlingTeam/LivePortrait/issues/62).


================================================
FILE: assets/docs/changelog/2024-07-19.md
================================================
## 2024/07/19

**Once again, we would like to express our heartfelt gratitude for your love, attention, and support for LivePortrait! 🎉**
We are excited to announce the release of an implementation of Portrait Video Editing (aka v2v) today! Special thanks to the hard work of the LivePortrait team: [Dingyun Zhang](https://github.com/Mystery099), [Zhizhou Zhong](https://github.com/zzzweakman), and [Jianzhu Guo](https://github.com/cleardusk).

### Updates

- <strong>Portrait video editing (v2v):</strong> Implemented a version of Portrait Video Editing (aka v2v). Ensure you have `pykalman` package installed, which has been added in [`requirements_base.txt`](../../../requirements_base.txt). You can specify the source video using the `-s` or `--source` option,  adjust the temporal smoothness of motion with `--driving_smooth_observation_variance`, enable head pose motion transfer with `--flag_video_editing_head_rotation`, and ensure the eye-open scalar of each source frame matches the first source frame before animation with `--flag_source_video_eye_retargeting`.

- <strong>More options in Gradio:</strong> We have upgraded the Gradio interface and added more options. These include `Cropping Options for Source Image or Video` and `Cropping Options for Driving Video`, providing greater flexibility and control.

<p align="center">
  <img src="../LivePortrait-Gradio-2024-07-19.jpg" alt="LivePortrait" width="800px">
  <br>
  The Gradio Interface for LivePortrait
</p>


### Community Contributions

- **ONNX/TensorRT Versions of LivePortrait:** Explore optimized versions of LivePortrait for faster performance:
  - [FasterLivePortrait](https://github.com/warmshao/FasterLivePortrait) by [warmshao](https://github.com/warmshao) ([#150](https://github.com/KlingTeam/LivePortrait/issues/150))
  - [Efficient-Live-Portrait](https://github.com/aihacker111/Efficient-Live-Portrait) by [aihacker111](https://github.com/aihacker111/Efficient-Live-Portrait) ([#126](https://github.com/KlingTeam/LivePortrait/issues/126), [#142](https://github.com/KlingTeam/LivePortrait/issues/142))
- **LivePortrait with [X-Pose](https://github.com/IDEA-Research/X-Pose) Detection:** Check out [LivePortrait](https://github.com/ShiJiaying/LivePortrait) by [ShiJiaying](https://github.com/ShiJiaying) for enhanced detection capabilities using X-pose, see [#119](https://github.com/KlingTeam/LivePortrait/issues/119).


================================================
FILE: assets/docs/changelog/2024-07-24.md
================================================
## 2024/07/24

### Updates

- **Portrait pose editing:** You can change the `relative pitch`, `relative yaw`, and `relative roll` in the Gradio interface to adjust the pose of the source portrait.
- **Detection threshold:** We have added a `--det_thresh` argument with a default value of 0.15 to increase recall, meaning more types of faces (e.g., monkeys, human-like) will be detected. You can set it to other values, e.g., 0.5, by using `python app.py --det_thresh 0.5`.

<p align="center">
  <img src="../pose-edit-2024-07-24.jpg" alt="LivePortrait" width="960px">
  <br>
  Pose Editing in the Gradio Interface
</p>


================================================
FILE: assets/docs/changelog/2024-08-02.md
================================================
## 2024/08/02

<table class="center" style="width: 80%; margin-left: auto; margin-right: auto;">
<tr>
    <td style="text-align: center"><b>Animals Singing Dance Monkey 🎤</b></td>
</tr>

<tr>
    <td style="border: none; text-align: center;">
        <video controls loop src="https://github.com/user-attachments/assets/38d5b6e5-d29b-458d-9f2c-4dd52546cb41" muted="false" style="width: 60%;"></video>
    </td>
</tr>
</table>


🎉 We are excited to announce the release of a new version featuring animals mode, along with several other updates. Special thanks to the dedicated efforts of the LivePortrait team. 💪 We also provided an one-click installer for Windows users, checkout the details [here](./2024-08-05.md).

### Updates on Animals mode
We are pleased to announce the release of the animals mode, which is fine-tuned on approximately 230K frames of various animals (mostly cats and dogs). The trained weights have been updated in the `liveportrait_animals` subdirectory, available on [HuggingFace](https://huggingface.co/KlingTeam/LivePortrait/tree/main/) or [Google Drive](https://drive.google.com/drive/u/0/folders/1UtKgzKjFAOmZkhNK-OYT0caJ_w2XAnib). You should [download the weights](https://github.com/KlingTeam/LivePortrait?tab=readme-ov-file#2-download-pretrained-weights) before running. There are two ways to run this mode.

> Please note that we have not trained the stitching and retargeting modules for the animals model due to several technical issues. _This may be addressed in future updates._ Therefore, we recommend **disabling stitching by setting the `--no_flag_stitching`** option when running the model. Additionally, `paste-back` is also not recommended.

#### Install X-Pose
We have chosen [X-Pose](https://github.com/IDEA-Research/X-Pose) as the keypoints detector for animals. This relies on `transformers==4.22.0` and `pillow>=10.2.0` (which are already updated in `requirements.txt`) and requires building an OP named `MultiScaleDeformableAttention`.

Refer to the [PyTorch installation](https://github.com/KlingTeam/LivePortrait?tab=readme-ov-file#for-linux-or-windows-users) for Linux and Windows users.


Next, build the OP `MultiScaleDeformableAttention` by running:
```bash
cd src/utils/dependencies/XPose/models/UniPose/ops
python setup.py build install
cd - # this returns to the previous directory
```

To run the model, use the `inference_animals.py` script:
```bash
python inference_animals.py -s assets/examples/source/s39.jpg -d assets/examples/driving/wink.pkl --no_flag_stitching --driving_multiplier 1.75
```

Alternatively, you can use Gradio for a more user-friendly interface. Launch it with:
```bash
python app_animals.py # --server_port 8889 --server_name "0.0.0.0" --share
```

> [!WARNING]
> [X-Pose](https://github.com/IDEA-Research/X-Pose) is only for Non-commercial Scientific Research Purposes, you should remove and replace it with other detectors if you use it for commercial purposes.

### Updates on Humans mode

- **Driving Options**: We have introduced an `expression-friendly` driving option to **reduce head wobbling**, now set as the default. While it may be less effective with large head poses, you can also select the `pose-friendly` option, which is the same as the previous version. This can be set using `--driving_option` or selected in the Gradio interface. Additionally, we added a `--driving_multiplier` option to adjust driving intensity, with a default value of 1, which can also be set in the Gradio interface.

- **Retargeting Video in Gradio**: We have implemented a video retargeting feature. You can specify a `target lip-open ratio` to adjust the mouth movement in the source video. For instance, setting it to 0 will close the mouth in the source video 🤐.

### Others

- [**Poe supports LivePortrait**](https://poe.com/LivePortrait). Check out the news on [X](https://x.com/poe_platform/status/1816136105781256260).
- [ComfyUI-LivePortraitKJ](https://github.com/kijai/ComfyUI-LivePortraitKJ) (1.1K 🌟) now includes MediaPipe as an alternative to InsightFace, ensuring the license remains under MIT and Apache 2.0.
- [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) features real-time portrait pose/expression editing and animation, and is registered with ComfyUI-Manager.



**Below are some screenshots of the new features and improvements:**

| ![The Gradio Interface of Animals Mode](../animals-mode-gradio-2024-08-02.jpg) |
|:---:|
| **The Gradio Interface of Animals Mode** |

| ![Driving Options and Multiplier](../driving-option-multiplier-2024-08-02.jpg) |
|:---:|
| **Driving Options and Multiplier** |

| ![The Feature of Retargeting Video](../retargeting-video-2024-08-02.jpg) |
|:---:|
| **The Feature of Retargeting Video** |


================================================
FILE: assets/docs/changelog/2024-08-05.md
================================================
## One-click Windows Installer

### Download the installer from HuggingFace
```bash
# !pip install -U "huggingface_hub[cli]"
huggingface-cli download cleardusk/LivePortrait-Windows LivePortrait-Windows-v20240806.zip --local-dir ./
```

If you cannot access to Huggingface, you can use [hf-mirror](https://hf-mirror.com/) to download:
```bash
# !pip install -U "huggingface_hub[cli]"
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download cleardusk/LivePortrait-Windows LivePortrait-Windows-v20240806.zip --local-dir ./
```

Alternatively, you can manually download it from the [HuggingFace](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240806.zip) page.

Then, simply unzip the package `LivePortrait-Windows-v20240806.zip` and double-click `run_windows_human.bat` for the Humans mode, or `run_windows_animal.bat` for the **Animals mode**.


================================================
FILE: assets/docs/changelog/2024-08-06.md
================================================
## Precise Portrait Editing

Inspired by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) ([@PowerHouseMan](https://github.com/PowerHouseMan)), we have implemented a version of Precise Portrait Editing in the Gradio interface. With each adjustment of the slider, the edited image updates in real-time. You can click the `🔄 Reset` button to reset all slider parameters. However, the performance may not be as fast as the ComfyUI plugin.

<p align="center">
  <img src="../editing-portrait-2024-08-06.jpg" alt="LivePortrait" width="960px">
  <br>
  Preciese Portrait Editing in the Gradio Interface
</p>


================================================
FILE: assets/docs/changelog/2024-08-19.md
================================================
## Image Driven and Regional Control

<p align="center">
  <img src="../image-driven-image-2024-08-19.jpg" alt="LivePortrait" width="512px">
  <br>
  <strong>Image Drives an Image</strong>
</p>

You can now **use an image as a driving signal** to drive the source image or video! Additionally, we **have refined the driving options to support expressions, pose, lips, eyes, or all** (all is consistent with the previous default method), which we name it regional control. The control is becoming more and more precise! 🎯

> Please note that image-based driving or regional control may not perform well in certain cases. Feel free to try different options, and be patient. 😊

> [!Note]
> We recognize that the project now offers more options, which have become increasingly complex, but due to our limited team capacity and resources, we haven’t fully documented them yet. We ask for your understanding and will work to improve the documentation over time. Contributions via PRs are welcome! If anyone is considering donating or sponsoring, feel free to leave a message in the GitHub Issues or Discussions. We will set up a payment account to reward the team members or support additional efforts in maintaining the project. 💖


### CLI Usage
It's very simple to use an image as a driving reference. Just set the `-d` argument to the driving image:

```bash
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d30.jpg
```

To change the `animation_region` option, you can use the `--animation_region` argument to `exp`, `pose`, `lip`, `eyes`, or `all`. For example, to only drive the lip region, you can run by:

```bash
# only driving the lip region
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d0.mp4 --animation_region lip
```

### Gradio Interface

<p align="center">
  <img src="../image-driven-portrait-animation-2024-08-19.jpg" alt="LivePortrait" width="960px">
  <br>
  <strong>Image-driven Portrait Animation and Regional Control</strong>
</p>

### More Detailed Explanation

**flag_relative_motion**:
When using an image as the driving input, setting `--flag_relative_motion` to true will apply the motion deformation between the driving image and its canonical form. If set to false, the absolute motion of the driving image is used, which may amplify expression driving strength but could also cause identity leakage. This option corresponds to the `relative motion` toggle in the Gradio interface. Additionally, if both source and driving inputs are images, the output will be an image. If the source is a video and the driving input is an image, the output will be a video, with each frame driven by the image's motion. The Gradio interface automatically saves and displays the output in the appropriate format.

**animation_region**:
This argument offers five options:

- `exp`: Only the expression of the driving input influences the source.
- `pose`: Only the head pose drives the source.
- `lip`: Only lip movement drives the source.
- `eyes`: Only eye movement drives the source.
- `all`: All motions from the driving input are applied.

You can also select these options directly in the Gradio interface.

**Editing the Lip Region of the Source Video to a Neutral Expression**:
In response to requests for a more neutral lip region in the `Retargeting Video` of the Gradio interface, we've added a `keeping the lip silent` option. When selected, the animated video's lip region will adopt a neutral expression. However, this may cause inter-frame jitter or identity leakage, as it uses a mode similar to absolute driving. Note that the neutral expression may sometimes feature a slightly open mouth.

**Others**:
When both source and driving inputs are videos, the output motion may be a blend of both, due to the default setting of `--flag_relative_motion`. This option uses relative driving, where the motion offset of the current driving frame relative to the first driving frame is added to the source frame's motion. In contrast, `--no_flag_relative_motion` applies the driving frame's motion directly as the final driving motion.

For CLI usage, to retain only the driving video's motion in the output, use:
```bash
python inference.py --no_flag_relative_motion
```
In the Gradio interface, simply uncheck the relative motion option. Note that absolute driving may cause jitter or identity leakage in the animated video.


================================================
FILE: assets/docs/changelog/2025-01-01.md
================================================
## 2025/01/01

**We’re thrilled that cats 🐱 are now speaking and singing across the internet!**  🎶

In this update, we’ve improved the [Animals model](https://huggingface.co/KlingTeam/LivePortrait/tree/main/liveportrait_animals/base_models_v1.1) with more data. While you might notice only a slight improvement for cats (if at all 😼), dogs have gotten a slightly better upgrade. For example, the model is now better at recognizing their mouths instead of mistaking them for noses. 🐶

<table class="center" style="width: 80%; margin-left: auto; margin-right: auto;">
<tr>
    <td style="text-align: center"><b>Before vs. After (v1.1)</b></td>
</tr>

<tr>
    <td style="border: none; text-align: center;">
        <video controls loop src="https://github.com/user-attachments/assets/59fc09b9-6cb7-4265-833f-eebb27ed9511" muted="false" style="width: 60%;"></video>
    </td>
</tr>
</table>


The new version (v1.1) Animals Model has been updated on [HuggingFace](https://huggingface.co/KlingTeam/LivePortrait/tree/main/liveportrait_animals/base_models_v1.1). The new version is enabled by default.

> [!IMPORTANT]
> Note: Make sure to update your weights to use the new version.

If you prefer to use the original version, simply modify the configuration in [inference_config.py](../../../src/config/inference_config.py#L29)
```python
version_animals = "" # old version
# version_animals = "_v1.1" # new (v1.1) version
```


================================================
FILE: assets/docs/directory-structure.md
================================================
## The directory structure of `pretrained_weights`

```text
pretrained_weights
├── insightface
│   └── models
│       └── buffalo_l
│           ├── 2d106det.onnx
│           └── det_10g.onnx
├── liveportrait
│   ├── base_models
│   │   ├── appearance_feature_extractor.pth
│   │   ├── motion_extractor.pth
│   │   ├── spade_generator.pth
│   │   └── warping_module.pth
│   ├── landmark.onnx
│   └── retargeting_models
│       └── stitching_retargeting_module.pth
└── liveportrait_animals
    ├── base_models
    │   ├── appearance_feature_extractor.pth
    │   ├── motion_extractor.pth
    │   ├── spade_generator.pth
    │   └── warping_module.pth
    ├── retargeting_models
    │   └── stitching_retargeting_module.pth
    └── xpose.pth
```


================================================
FILE: assets/docs/how-to-install-ffmpeg.md
================================================
## Install FFmpeg

Make sure you have `ffmpeg` and `ffprobe` installed on your system. If you don't have them installed, follow the instructions below.

> [!Note]
> The installation is copied from [SoVITS](https://github.com/RVC-Boss/GPT-SoVITS) 🤗

### Conda Users

```bash
conda install ffmpeg
```

### Ubuntu/Debian Users

```bash
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
```

### Windows Users

Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.

### MacOS Users
```bash
brew install ffmpeg
```


================================================
FILE: assets/docs/speed.md
================================================
### Speed

Below are the results of inferring one frame on an RTX 4090 GPU using the native PyTorch framework with `torch.compile`:

| Model                             | Parameters(M) | Model Size(MB) | Inference(ms) |
|-----------------------------------|:-------------:|:--------------:|:-------------:|
| Appearance Feature Extractor      |     0.84      |       3.3      |     0.82      |
| Motion Extractor                  |     28.12     |       108      |     0.84      |
| Spade Generator                   |     55.37     |       212      |     7.59      |
| Warping Module                    |     45.53     |       174      |     5.21      |
| Stitching and Retargeting Modules |     0.23      |       2.3      |     0.31      |

*Note: The values for the Stitching and Retargeting Modules represent the combined parameter counts and total inference time of three sequential MLP networks.*


================================================
FILE: assets/gradio/gradio_description_animate_clear.md
================================================
<div style="font-size: 1.2em; text-align: center;">
    Step 3: Click the <strong>🚀 Animate</strong> button below to generate, or click <strong>🧹 Clear</strong> to erase the results
</div>
<!-- <div style="font-size: 1.1em; text-align: center;">
    <strong style="color: red;">Note:</strong>  If both <strong>Source Image</strong> and <strong>Video</strong> are uploaded, the <strong>Source Image</strong> will be used. Please click the <strong>🧹 Clear</strong> button, then re-upload the <strong>Source Image</strong> or <strong>Video</strong>.
</div> -->


================================================
FILE: assets/gradio/gradio_description_animation.md
================================================
<span style="font-size: 1.2em;">🔥 To animate the source image or video with the driving video, please follow these steps:</span>
<div style="font-size: 1.2em; margin-left: 20px;">
1. In the <strong>Animation Options for Source Image or Video</strong> section, we recommend enabling the <code>do crop (source)</code> option if faces occupy a small portion of your source image or video.
</div>
<div style="font-size: 1.2em; margin-left: 20px;">
2. In the <strong>Animation Options for Driving Video</strong> section, the <code>relative head rotation</code> and <code>smooth strength</code> options only take effect if the source input is a video.
</div>
<div style="font-size: 1.2em; margin-left: 20px;">
3. Press the <strong>🚀 Animate</strong> button and wait for a moment. Your animated video will appear in the result block. This may take a few moments. If the input is a source video, the length of the animated video is the minimum of the length of the source video and the driving video.
</div>
<div style="font-size: 1.2em; margin-left: 20px;">
4. If you want to upload your own driving video, <strong>the best practice</strong>:

 - Crop it to a 1:1 aspect ratio (e.g., 512x512 or 256x256 pixels), or enable auto-driving by checking `do crop (driving video)`.
 - Focus on the head area, similar to the example videos.
 - Minimize shoulder movement.
 - Make sure the first frame of driving video is a frontal face with **neutral expression**.

</div>


================================================
FILE: assets/gradio/gradio_description_retargeting.md
================================================
<br>

<!-- ## Retargeting -->
<!-- <span style="font-size: 1.2em;">🔥 To edit the eyes and lip open ratio of the source portrait, drag the sliders and click the <strong>🚗 Retargeting</strong> button. You can try running it multiple times. <strong>😊 Set both ratios to 0.8 to see what's going on!</strong> </span> -->


<div style="display: flex; justify-content: center; align-items: center; text-align: center; font-size: 1.2em;">
  <div>
    <h2>Retargeting and Editing Portraits</h2>
    <p>Upload a source portrait, and the <code>eyes-open ratio</code> and <code>lip-open ratio</code> will be auto-calculated. Adjust the sliders to see instant edits. Feel free to experiment! 🎨</p>
    <strong>😊 Set both target eyes-open and lip-open ratios to 0.8 to see what's going on!</strong></p>
  </div>
</div>


================================================
FILE: assets/gradio/gradio_description_retargeting_video.md
================================================
<br>
<div style="display: flex; justify-content: center; align-items: center; text-align: center; font-size: 1.2em;">
  <div>
    <h2>Retargeting Video</h2>
    <p>Upload a Source Video as Retargeting Input, then drag the sliders and click the <strong>🚗 Retargeting Video</strong> button. You can try running it multiple times.
    <br>
    <strong>🤐 Set target lip-open ratio to 0 to see what's going on!</strong></p>
  </div>
</div>


================================================
FILE: assets/gradio/gradio_description_upload.md
================================================
<br>
<div style="font-size: 1.2em; display: flex; justify-content: space-between;">
    <div style="flex: 1; text-align: center; margin-right: 20px;">
        <div style="display: inline-block;">
            Step 1: Upload a <strong>Source Image</strong> or <strong>Video</strong> (any aspect ratio) ⬇️
        </div>
        <div style="display: inline-block; font-size: 0.8em;">
            <strong>Note:</strong> Better if Source Video has <strong>the same FPS</strong> as the Driving Video.
        </div>
    </div>
    <div style="flex: 1; text-align: center; margin-left: 20px;">
        <div style="display: inline-block;">
            Step 2: Upload a <strong>Driving Video</strong> (any aspect ratio) ⬇️
        </div>
        <div style="display: inline-block; font-size: 0.8em;">
            <strong>Tips:</strong> Focus on the head, minimize shoulder movement, <strong>neutral expression</strong> in first frame.
        </div>
    </div>
</div>


================================================
FILE: assets/gradio/gradio_description_upload_animal.md
================================================
<br>
<div style="font-size: 1.2em; display: flex; justify-content: space-between;">
    <div style="flex: 1; text-align: center; margin-right: 20px;">
        <div style="display: inline-block;">
            Step 1: Upload a <strong>Source Animal Image</strong> (any aspect ratio) ⬇️
        </div>
    </div>
    <div style="flex: 1; text-align: center; margin-left: 20px;">
        <div style="display: inline-block;">
            Step 2: Upload a <strong>Driving Pickle</strong> or <strong>Driving Video</strong> (any aspect ratio) ⬇️
        </div>
        <div style="display: inline-block; font-size: 0.8em;">
            <strong>Tips:</strong> Focus on the head, minimize shoulder movement, <strong>neutral expression</strong> in first frame.
        </div>
    </div>
</div>


================================================
FILE: assets/gradio/gradio_title.md
================================================
<div style="display: flex; justify-content: center; align-items: center; text-align: center;">
  <div>
    <h1>LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control</h1>
    <!-- <span>Add mimics and lip sync to your static portrait driven by a video</span> -->
    <!-- <span>Efficient Portrait Animation with Stitching and Retargeting Control</span> -->
    <!-- <br> -->
    <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
      <a href="https://arxiv.org/pdf/2407.03168"><img src="https://img.shields.io/badge/arXiv-2407.03168-red"></a>
      &nbsp;
      <a href="https://liveportrait.github.io"><img src="https://img.shields.io/badge/Project_Page-LivePortrait-green" alt="Project Page"></a>
      &nbsp;
      <a href='https://huggingface.co/spaces/KlingTeam/LivePortrait'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
      &nbsp;
      <a href="https://github.com/KlingTeam/LivePortrait"><img src="https://img.shields.io/badge/Github-Code-blue"></a>
      &nbsp;
      <a href="https://github.com/KlingTeam/LivePortrait"><img src="https://img.shields.io/github/stars/KlingTeam/LivePortrait
      "></a>
    </div>
  </div>
</div>


================================================
FILE: inference.py
================================================
# coding: utf-8

"""
The entrance of humans
"""

import os
import os.path as osp
import tyro
import subprocess
from src.config.argument_config import ArgumentConfig
from src.config.inference_config import InferenceConfig
from src.config.crop_config import CropConfig
from src.live_portrait_pipeline import LivePortraitPipeline


def partial_fields(target_class, kwargs):
    return target_class(**{k: v for k, v in kwargs.items() if hasattr(target_class, k)})


def fast_check_ffmpeg():
    try:
        subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
        return True
    except:
        return False


def fast_check_args(args: ArgumentConfig):
    if not osp.exists(args.source):
        raise FileNotFoundError(f"source info not found: {args.source}")
    if not osp.exists(args.driving):
        raise FileNotFoundError(f"driving info not found: {args.driving}")


def main():
    # set tyro theme
    tyro.extras.set_accent_color("bright_cyan")
    args = tyro.cli(ArgumentConfig)

    ffmpeg_dir = os.path.join(os.getcwd(), "ffmpeg")
    if osp.exists(ffmpeg_dir):
        os.environ["PATH"] += (os.pathsep + ffmpeg_dir)

    if not fast_check_ffmpeg():
        raise ImportError(
            "FFmpeg is not installed. Please install FFmpeg (including ffmpeg and ffprobe) before running this script. https://ffmpeg.org/download.html"
        )

    fast_check_args(args)

    # specify configs for inference
    inference_cfg = partial_fields(InferenceConfig, args.__dict__)
    crop_cfg = partial_fields(CropConfig, args.__dict__)

    live_portrait_pipeline = LivePortraitPipeline(
        inference_cfg=inference_cfg,
        crop_cfg=crop_cfg
    )

    # run
    live_portrait_pipeline.execute(args)


if __name__ == "__main__":
    main()


================================================
FILE: inference_animals.py
================================================
# coding: utf-8

"""
The entrance of animal
"""

import os
import os.path as osp
import tyro
import subprocess
from src.config.argument_config import ArgumentConfig
from src.config.inference_config import InferenceConfig
from src.config.crop_config import CropConfig
from src.live_portrait_pipeline_animal import LivePortraitPipelineAnimal


def partial_fields(target_class, kwargs):
    return target_class(**{k: v for k, v in kwargs.items() if hasattr(target_class, k)})


def fast_check_ffmpeg():
    try:
        subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
        return True
    except:
        return False


def fast_check_args(args: ArgumentConfig):
    if not osp.exists(args.source):
        raise FileNotFoundError(f"source info not found: {args.source}")
    if not osp.exists(args.driving):
        raise FileNotFoundError(f"driving info not found: {args.driving}")


def main():
    # set tyro theme
    tyro.extras.set_accent_color("bright_cyan")
    args = tyro.cli(ArgumentConfig)

    ffmpeg_dir = os.path.join(os.getcwd(), "ffmpeg")
    if osp.exists(ffmpeg_dir):
        os.environ["PATH"] += (os.pathsep + ffmpeg_dir)

    if not fast_check_ffmpeg():
        raise ImportError(
            "FFmpeg is not installed. Please install FFmpeg (including ffmpeg and ffprobe) before running this script. https://ffmpeg.org/download.html"
        )

    fast_check_args(args)

    # specify configs for inference
    inference_cfg = partial_fields(InferenceConfig, args.__dict__)
    crop_cfg = partial_fields(CropConfig, args.__dict__)

    live_portrait_pipeline_animal = LivePortraitPipelineAnimal(
        inference_cfg=inference_cfg,
        crop_cfg=crop_cfg
    )

    # run
    live_portrait_pipeline_animal.execute(args)


if __name__ == "__main__":
    main()


================================================
FILE: pretrained_weights/.gitkeep
================================================


================================================
FILE: readme.md
================================================
<h1 align="center">LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control</h1>

<div align='center'>
    <a href='https://github.com/cleardusk' target='_blank'><strong>Jianzhu Guo</strong></a><sup> 1*†</sup>&emsp;
    <a href='https://github.com/Mystery099' target='_blank'><strong>Dingyun Zhang</strong></a><sup> 1,2*</sup>&emsp;
    <a href='https://github.com/KwaiVGI' target='_blank'><strong>Xiaoqiang Liu</strong></a><sup> 1</sup>&emsp;
    <a href='https://github.com/zzzweakman' target='_blank'><strong>Zhizhou Zhong</strong></a><sup> 1,3</sup>&emsp;
    <a href='https://scholar.google.com.hk/citations?user=_8k1ubAAAAAJ' target='_blank'><strong>Yuan Zhang</strong></a><sup> 1</sup>&emsp;
</div>

<div align='center'>
    <a href='https://scholar.google.com/citations?user=P6MraaYAAAAJ' target='_blank'><strong>Pengfei Wan</strong></a><sup> 1</sup>&emsp;
    <a href='https://openreview.net/profile?id=~Di_ZHANG3' target='_blank'><strong>Di Zhang</strong></a><sup> 1</sup>&emsp;
</div>

<div align='center'>
    <sup>1 </sup>Kuaishou Technology&emsp; <sup>2 </sup>University of Science and Technology of China&emsp; <sup>3 </sup>Fudan University&emsp;
</div>
<div align='center'>
    <small><sup>*</sup> Equal contributions</small>
    <small><sup>†</sup> Project lead</small>
</div>
<br>

<!-- ===== LivePortrait – Quick Start & Links ===== -->
<div align="center">

  <!-- 🚀 Quick Start buttons -->
  <p>
    <a href="https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240829.zip" target="_blank"><img src="https://img.shields.io/badge/🖥 Windows Installer-v20240829-00BFFF?style=for-the-badge&logo=windows&logoColor=white" alt="Windows one-click installer"></a>&nbsp;
    <a href="https://huggingface.co/spaces/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/badge/🌐 Try Online Demo-FF6F00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace online demo"></a>
  </p>

  <!-- 📄 Paper / project / GitHub stats -->
  <p>
    <a href="https://arxiv.org/pdf/2407.03168" target="_blank"><img src="https://img.shields.io/badge/arXiv-LivePortrait-red" alt="arXiv link"></a>&nbsp;
    <a href="https://liveportrait.github.io" target="_blank"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="project homepage"></a>&nbsp;
    <a href="https://huggingface.co/spaces/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/badge/🤗 Hugging Face-Spaces-blue" alt="HF space"></a>&nbsp;
    <a href="https://hellogithub.com/repository/bed652ef02154dd7a434e0720125639e" target="_blank"><img src="https://abroad.hellogithub.com/v1/widgets/recommend.svg?rid=bed652ef02154dd7a434e0720125639e&claim_uid=XyBT2K9QJ7RZhej&theme=small" alt="Featured by HelloGitHub"></a>&nbsp;
    <a href="https://github.com/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/github/stars/KlingTeam/LivePortrait?style=social" alt="GitHub stars"></a>
  </p>

  <!-- 🌏 Language switch -->
  <p><strong>English</strong> | <a href="./readme_zh_cn.md"><strong>简体中文</strong></a></p>

  <!-- 🎬 Showcase GIF -->
  <p><img src="./assets/docs/showcase2.gif" alt="LivePortrait showcase GIF"></p>
  <p>🔥 For more results, visit our <a href="https://liveportrait.github.io/" target="_blank"><strong>homepage</strong></a> 🔥</p>

</div>
<!-- ===== /LivePortrait ===== -->


## 🔥 Updates
- **`2025/06/01`**: 🌍 Over the past year, **LivePortrait** has 🚀 become an efficient portrait-animation (humans, cats and dogs) solution adopted by major video platforms—Kuaishou, Douyin, Jianying, WeChat Channels—as well as numerous startups and creators. 🎉
- **`2025/01/01`**: 🐶 We updated a new version of the Animals model with more data, see [**here**](./assets/docs/changelog/2025-01-01.md).
- **`2024/10/18`**: ❗ We have updated the versions of the `transformers` and `gradio` libraries to avoid security vulnerabilities. Details [here](https://github.com/KlingTeam/LivePortrait/pull/421/files).
- **`2024/08/29`**: 📦 We update the Windows [one-click installer](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240829.zip) and support auto-updates, see [changelog](https://huggingface.co/cleardusk/LivePortrait-Windows#20240829).
- **`2024/08/19`**: 🖼️ We support **image driven mode** and **regional control**. For details, see [**here**](./assets/docs/changelog/2024-08-19.md).
- **`2024/08/06`**: 🎨 We support **precise portrait editing** in the Gradio interface, inspired by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait). See [**here**](./assets/docs/changelog/2024-08-06.md).
- **`2024/08/05`**: 📦 Windows users can now download the [one-click installer](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240806.zip) for Humans mode and **Animals mode** now! For details, see [**here**](./assets/docs/changelog/2024-08-05.md).
- **`2024/08/02`**: 😸 We released a version of the **Animals model**, along with several other updates and improvements. Check out the details [**here**](./assets/docs/changelog/2024-08-02.md)!
- **`2024/07/25`**: 📦 Windows users can now download the package from [HuggingFace](https://huggingface.co/cleardusk/LivePortrait-Windows/tree/main). Simply unzip and double-click `run_windows.bat` to enjoy!
- **`2024/07/24`**: 🎨 We support pose editing for source portraits in the Gradio interface. We’ve also lowered the default detection threshold to increase recall. [Have fun](assets/docs/changelog/2024-07-24.md)!
- **`2024/07/19`**: ✨ We support 🎞️ **portrait video editing (aka v2v)**! More to see [here](assets/docs/changelog/2024-07-19.md).
- **`2024/07/17`**: 🍎 We support macOS with Apple Silicon, modified from [jeethu](https://github.com/jeethu)'s PR [#143](https://github.com/KlingTeam/LivePortrait/pull/143).
- **`2024/07/10`**: 💪 We support audio and video concatenating, driving video auto-cropping, and template making to protect privacy. More to see [here](assets/docs/changelog/2024-07-10.md).
- **`2024/07/09`**: 🤗 We released the [HuggingFace Space](https://huggingface.co/spaces/KlingTeam/LivePortrait), thanks to the HF team and [Gradio](https://github.com/gradio-app/gradio)!
- **`2024/07/04`**: 😊 We released the initial version of the inference code and models. Continuous updates, stay tuned!
- **`2024/07/04`**: 🔥 We released the [homepage](https://liveportrait.github.io) and technical report on [arXiv](https://arxiv.org/pdf/2407.03168).



## Introduction 📖
This repo, named **LivePortrait**, contains the official PyTorch implementation of our paper [LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control](https://arxiv.org/pdf/2407.03168).
We are actively updating and improving this repository. If you find any bugs or have suggestions, welcome to raise issues or submit pull requests (PR) 💖.

## Getting Started 🏁
### 1. Clone the code and prepare the environment 🛠️

> [!Note]
> Make sure your system has [`git`](https://git-scm.com/), [`conda`](https://anaconda.org/anaconda/conda), and [`FFmpeg`](https://ffmpeg.org/download.html) installed. For details on FFmpeg installation, see [**how to install FFmpeg**](assets/docs/how-to-install-ffmpeg.md).

```bash
git clone https://github.com/KlingTeam/LivePortrait
cd LivePortrait

# create env using conda
conda create -n LivePortrait python=3.10
conda activate LivePortrait
```

#### For Linux 🐧 or Windows 🪟 Users
[X-Pose](https://github.com/IDEA-Research/X-Pose), required by Animals mode, is a dependency that needs to be installed. The step of `Check your CUDA versions` is **optional** if you only want to run Humans mode.

<details>
  <summary>Check your CUDA versions</summary>

  Firstly, check your current CUDA version by:
  ```bash
  nvcc -V # example versions: 11.1, 11.8, 12.1, etc.
  ```

  Then, install the corresponding torch version. Here are examples for different CUDA versions. Visit the [PyTorch Official Website](https://pytorch.org/get-started/previous-versions) for installation commands if your CUDA version is not listed:
  ```bash
  # for CUDA 11.1
  pip install torch==1.10.1+cu111 torchvision==0.11.2 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
  # for CUDA 11.8
  pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
  # for CUDA 12.1
  pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
  # ...
  ```

  **Note**: On Windows systems, some higher versions of CUDA (such as 12.4, 12.6, etc.) may lead to unknown issues. You may consider downgrading CUDA to version 11.8 for stability. See the [downgrade guide](https://github.com/dimitribarbot/sd-webui-live-portrait/blob/main/assets/docs/how-to-install-xpose.md#cuda-toolkit-118) by [@dimitribarbot](https://github.com/dimitribarbot).
</details>


Finally, install the remaining dependencies:
```bash
pip install -r requirements.txt
```

#### For macOS  with Apple Silicon Users
The [X-Pose](https://github.com/IDEA-Research/X-Pose) dependency does not support macOS, so you can skip its installation. While Humans mode works as usual, Animals mode is not supported. Use the provided requirements file for macOS with Apple Silicon:
```bash
# for macOS with Apple Silicon users
pip install -r requirements_macOS.txt
```

### 2. Download pretrained weights 📥

The easiest way to download the pretrained weights is from HuggingFace:
```bash
# !pip install -U "huggingface_hub[cli]"
huggingface-cli download KlingTeam/LivePortrait --local-dir pretrained_weights --exclude "*.git*" "README.md" "docs"
```

If you cannot access to Huggingface, you can use [hf-mirror](https://hf-mirror.com/) to download:
```bash
# !pip install -U "huggingface_hub[cli]"
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download KlingTeam/LivePortrait --local-dir pretrained_weights --exclude "*.git*" "README.md" "docs"
```

Alternatively, you can download all pretrained weights from [Google Drive](https://drive.google.com/drive/folders/1UtKgzKjFAOmZkhNK-OYT0caJ_w2XAnib) or [Baidu Yun](https://pan.baidu.com/s/1MGctWmNla_vZxDbEp2Dtzw?pwd=z5cn). Unzip and place them in `./pretrained_weights`.

Ensuring the directory structure is as or contains [**this**](assets/docs/directory-structure.md).

### 3. Inference 🚀

#### Fast hands-on (humans) 👤
```bash
# For Linux and Windows users
python inference.py

# For macOS users with Apple Silicon (Intel is not tested). NOTE: this maybe 20x slower than RTX 4090
PYTORCH_ENABLE_MPS_FALLBACK=1 python inference.py
```

If the script runs successfully, you will get an output mp4 file named `animations/s6--d0_concat.mp4`. This file includes the following results: driving video, input image or video, and generated result.

<p align="center">
  <img src="./assets/docs/inference.gif" alt="image">
</p>

Or, you can change the input by specifying the `-s` and `-d` arguments:

```bash
# source input is an image
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d0.mp4

# source input is a video ✨
python inference.py -s assets/examples/source/s13.mp4 -d assets/examples/driving/d0.mp4

# more options to see
python inference.py -h
```

#### Fast hands-on (animals) 🐱🐶
Animals mode is ONLY tested on Linux and Windows with NVIDIA GPU.

You need to build an OP named `MultiScaleDeformableAttention` first (refer to the <a href="#for-linux--or-windows--users">Check your CUDA versions</a> if needed), which is used by [X-Pose](https://github.com/IDEA-Research/X-Pose), a general keypoint detection framework.

```bash
cd src/utils/dependencies/XPose/models/UniPose/ops
python setup.py build install
cd - # equal to cd ../../../../../../../
```

Then
```bash
python inference_animals.py -s assets/examples/source/s39.jpg -d assets/examples/driving/wink.pkl --driving_multiplier 1.75 --no_flag_stitching
```
If the script runs successfully, you will get an output mp4 file named `animations/s39--wink_concat.mp4`.
<p align="center">
  <img src="./assets/docs/inference-animals.gif" alt="image">
</p>

#### Driving video auto-cropping 📢📢📢
> [!IMPORTANT]
> To use your own driving video, we **recommend**: ⬇️
> - Crop it to a **1:1** aspect ratio (e.g., 512x512 or 256x256 pixels), or enable auto-cropping by `--flag_crop_driving_video`.
> - Focus on the head area, similar to the example videos.
> - Minimize shoulder movement.
> - Make sure the first frame of driving video is a frontal face with **neutral expression**.

Below is an auto-cropping case by `--flag_crop_driving_video`:
```bash
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d13.mp4 --flag_crop_driving_video
```

If you find the results of auto-cropping is not well, you can modify the `--scale_crop_driving_video`, `--vy_ratio_crop_driving_video` options to adjust the scale and offset, or do it manually.

#### Motion template making
You can also use the auto-generated motion template files ending with `.pkl` to speed up inference, and **protect privacy**, such as:
```bash
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d5.pkl # portrait animation
python inference.py -s assets/examples/source/s13.mp4 -d assets/examples/driving/d5.pkl # portrait video editing
```

### 4. Gradio interface 🤗

We also provide a Gradio <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a> interface for a better experience, just run by:

```bash
# For Linux and Windows users (and macOS with Intel??)
python app.py # humans mode

# For macOS with Apple Silicon users, Intel not supported, this maybe 20x slower than RTX 4090
PYTORCH_ENABLE_MPS_FALLBACK=1 python app.py # humans mode
```

We also provide a Gradio interface of animals mode, which is only tested on Linux with NVIDIA GPU:
```bash
python app_animals.py # animals mode 🐱🐶
```

You can specify the `--server_port`, `--share`, `--server_name` arguments to satisfy your needs!

🚀 We also provide an acceleration option `--flag_do_torch_compile`. The first-time inference triggers an optimization process (about one minute), making subsequent inferences 20-30% faster. Performance gains may vary with different CUDA versions.
```bash
# enable torch.compile for faster inference
python app.py --flag_do_torch_compile
```
**Note**: This method is not supported on Windows and macOS.

**Or, try it out effortlessly on [HuggingFace](https://huggingface.co/spaces/KlingTeam/LivePortrait) 🤗**

### 5. Inference speed evaluation 🚀🚀🚀
We have also provided a script to evaluate the inference speed of each module:

```bash
# For NVIDIA GPU
python speed.py
```

The results are [**here**](./assets/docs/speed.md).

## Community Resources 🤗

Discover the invaluable resources contributed by our community to enhance your LivePortrait experience.


### Community-developed Projects

| Repo | Description | Author / Links |
|------|------|--------|
| [**ditto-talkinghead**](https://github.com/antgroup/ditto-talkinghead) | Real-time audio-driven talking head. | [ArXiv](https://arxiv.org/abs/2411.19509), [Homepage](https://digital-avatar.github.io/ai/Ditto/)  |
| [**FasterLivePortrait**](https://github.com/warmshao/FasterLivePortrait) | Faster real-time version using TensorRT. | [@warmshao](https://github.com/warmshao) |
| [**AdvancedLivePortrait-WebUI**](https://github.com/jhj0517/AdvancedLivePortrait-WebUI) | Dedicated gradio based WebUI started from [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait). | [@jhj0517](https://github.com/jhj0517) |
| [**FacePoke**](https://github.com/jbilcke-hf/FacePoke) | A real-time head transformation app, controlled by your mouse! | [@jbilcke-hf](https://github.com/jbilcke-hf) |
| [**FaceFusion**](https://github.com/facefusion/facefusion) | FaceFusion 3.0 integregates LivePortrait as `expression_restorer` and `face_editor` processors. | [@henryruhs](https://github.com/henryruhs) |
| [**sd-webui-live-portrait**](https://github.com/dimitribarbot/sd-webui-live-portrait) | WebUI extension of LivePortrait, adding atab to the original Stable Diffusion WebUI to benefit from LivePortrait features. | [@dimitribarbot](https://github.com/dimitribarbot) |
| [**ComfyUI-LivePortraitKJ**](https://github.com/kijai/ComfyUI-LivePortraitKJ) | A ComfyUI node to use LivePortrait, with MediaPipe as as an alternative to Insightface. | [@kijai](https://github.com/kijai) |
| [**ComfyUI-AdvancedLivePortrait**](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) | A faster ComfyUI node with real-time preview that has inspired many other community-developed tools and projects. | [@PowerHouseMan](https://github.com/PowerHouseMan) |
| [**comfyui-liveportrait**](https://github.com/shadowcz007/comfyui-liveportrait) | A ComfyUI node to use LivePortrait, supporting multi-faces, expression interpolation etc, with a [tutorial](https://www.bilibili.com/video/BV1JW421R7sP). | [@shadowcz007](https://github.com/shadowcz007) |

### Playgrounds, 🤗 HuggingFace Spaces and Others
- [FacePoke Space](https://huggingface.co/spaces/jbilcke-hf/FacePoke)
- [Expression Editor Space](https://huggingface.co/spaces/fffiloni/expression-editor)
- [Expression Editor Replicate](https://replicate.com/fofr/expression-editor)
- [Face Control Realtime Demo](https://fal.ai/demos/face-control) on FAL
- [Replicate Playground](https://replicate.com/fofr/live-portrait)
- Nuke can use LivePortrait through CompyUI node, details [here](https://x.com/bilawalsidhu/status/1837349806475276338)
- LivePortrait lives on [Poe](https://poe.com/LivePortrait)

### Video Tutorials
- [Workflow of LivePortrait Video to Video](https://youtu.be/xfzK_6cTs58?si=aYjgypeJBkhc46VL) by [@curiousrefuge](https://www.youtube.com/@curiousrefuge)
- [Google Colab tutorial](https://youtu.be/59Y9ePAXTp0?si=KzEWhklBlporW7D8) by [@Planet Ai](https://www.youtube.com/@planetai217)
- [Paper reading](https://youtu.be/fD0P6UWSu8I?si=Vn5wxUa8qSu1jv4l) by [@TwoMinutePapers](https://www.youtube.com/@TwoMinutePapers)
- [ComfyUI Advanced LivePortrait](https://youtu.be/q0Vf-ZZsbzI?si=nbs3npleH-dVCt28) by [TutoView](https://www.youtube.com/@TutoView)
- [LivePortarit exploration](https://www.youtube.com/watch?v=vsvlbTEqgXQ) and [A deep dive into LivePortrait](https://youtu.be/cucaEEDYmsw?si=AtPaDWc5G-a4E8dD) by [TheoreticallyMedia](https://www.youtube.com/@TheoreticallyMedia)
- [LivePortrait hands-on tutorial](https://www.youtube.com/watch?v=uyjSTAOY7yI) by [@AI Search](https://www.youtube.com/@theAIsearch)
- [ComfyUI tutorial](https://www.youtube.com/watch?v=8-IcDDmiUMM) by [@Sebastian Kamph](https://www.youtube.com/@sebastiankamph)
- A [tutorial](https://www.bilibili.com/video/BV1cf421i7Ly) on BiliBili

And so MANY amazing contributions from our community, too many to list them all 💖

## Acknowledgements 💐
We would like to thank the contributors of [FOMM](https://github.com/AliaksandrSiarohin/first-order-model), [Open Facevid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis), [SPADE](https://github.com/NVlabs/SPADE), [InsightFace](https://github.com/deepinsight/insightface) and [X-Pose](https://github.com/IDEA-Research/X-Pose) repositories, for their open research and contributions.

## Ethics Considerations 🛡️
Portrait animation technologies come with social risks, particularly the potential for misuse in creating deepfakes. To mitigate these risks, it’s crucial to follow ethical guidelines and adopt responsible usage practices. At present, the synthesized results contain visual artifacts that may help in detecting deepfakes. Please note that we do not assume any legal responsibility for the use of the results generated by this project.

## Citation 💖
If you find LivePortrait useful for your project or research, welcome to 🌟 this repo and cite our work using the following BibTeX:
```bibtex
@article{guo2024liveportrait,
  title   = {LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control},
  author  = {Guo, Jianzhu and Zhang, Dingyun and Liu, Xiaoqiang and Zhong, Zhizhou and Zhang, Yuan and Wan, Pengfei and Zhang, Di},
  journal = {arXiv preprint arXiv:2407.03168},
  year    = {2024}
}
```

*Long live in arXiv.*

## Contact 📧
[**Jianzhu Guo (郭建珠)**](https://guojianzhu.com); **guojianzhu1994@gmail.com**

## Star History 🌟
<details>
  <summary>Click to view Star chart</summary>
  <p align="center">
    <a href="https://www.star-history.com/#KlingAIResearch/LivePortrait&Timeline" target="_blank">
      <picture>
        <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline&theme=dark" />
        <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline" />
        <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline" width="90%" />
      </picture>
    </a>
  </p>
</details>


================================================
FILE: readme_zh_cn.md
================================================
<h1 align="center">LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control</h1>

<!-- ===== 作者信息 ===== -->
<div align='center'>
  <a href='https://github.com/cleardusk' target='_blank'><strong>郭建珠</strong></a><sup> 1*†</sup>&emsp;
  <a href='https://github.com/Mystery099' target='_blank'><strong>张丁芸</strong></a><sup> 1,2*</sup>&emsp;
  <a href='https://github.com/KwaiVGI' target='_blank'><strong>刘晓强</strong></a><sup> 1</sup>&emsp;
  <a href='https://github.com/zzzweakman' target='_blank'><strong>钟智舟</strong></a><sup> 1,3</sup>&emsp;
  <a href='https://scholar.google.com.hk/citations?user=_8k1ubAAAAAJ' target='_blank'><strong>张渊</strong></a><sup> 1</sup>&emsp;
  <a href='https://scholar.google.com/citations?user=P6MraaYAAAAJ' target='_blank'><strong>万鹏飞</strong></a><sup> 1</sup>&emsp;
  <a href='https://openreview.net/profile?id=~Di_ZHANG3' target='_blank'><strong>张迪</strong></a><sup> 1</sup>&emsp;
</div>


<div align='center'>
  <sup>1 </sup>快手科技&emsp; <sup>2 </sup>中国科学技术大学&emsp; <sup>3 </sup>复旦大学&emsp;
</div>
<div align='center'>
  <small><sup>*</sup> Equal contributions</small>
  <small><sup>†</sup> Project lead</small>
</div>
<br>

<!-- ===== LivePortrait – 快速上手 & 链接 ===== -->
<div align="center">

  <!-- 🚀 快速体验按钮 -->
  <p>
    <a href="https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240829.zip" target="_blank"><img src="https://img.shields.io/badge/🖥 Windows 一键安装-v20240829-00BFFF?style=for-the-badge&logo=windows&logoColor=white" alt="Windows 一键安装包"></a>&nbsp;
    <a href="https://huggingface.co/spaces/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/badge/🌐 在线体验-FF6F00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace 在线体验"></a>
  </p>

  <!-- 📄 论文 / 项目 / Star 等徽章 -->
  <p>
    <a href="https://arxiv.org/pdf/2407.03168" target="_blank"><img src="https://img.shields.io/badge/arXiv-LivePortrait-red" alt="arXiv link"></a>&nbsp;
    <a href="https://liveportrait.github.io" target="_blank"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="project homepage"></a>&nbsp;
    <a href="https://huggingface.co/spaces/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/badge/🤗 Hugging Face-Spaces-blue" alt="HF space"></a>&nbsp;
    <a href="https://hellogithub.com/repository/bed652ef02154dd7a434e0720125639e" target="_blank"><img src="https://abroad.hellogithub.com/v1/widgets/recommend.svg?rid=bed652ef02154dd7a434e0720125639e&claim_uid=XyBT2K9QJ7RZhej&theme=small" alt="Featured by HelloGitHub"></a>&nbsp;
    <a href="https://github.com/KlingTeam/LivePortrait" target="_blank"><img src="https://img.shields.io/github/stars/KlingTeam/LivePortrait?style=social" alt="GitHub stars"></a>
  </p>

  <!-- 🌏 语言切换 -->
  <p><a href="./readme.md"><strong>English</strong></a> | <strong>简体中文</strong></p>

  <!-- 🎬 Showcase GIF -->
  <p><img src="./assets/docs/showcase2.gif" alt="LivePortrait 效果展示 GIF"></p>
  <p>🔥 更多效果，请访问我们的 <a href="https://liveportrait.github.io/" target="_blank"><strong>主页</strong></a> 🔥</p>

</div>
<!-- ===== /LivePortrait ===== -->


## 🔥 更新日志
- **`2025/06/01`**：🌍 过去一年里，LivePortrait 🚀 已成为高效的人像与宠物（猫狗）动画解决方案，被快手、抖音、剪映、视频号等主流视频平台，以及众多初创公司和创作者所采用。🎉
- **`2025/01/01`**：🐶 我们更新了一版动物模型（使用了更多动物数据），具体查看[**这里**](./assets/docs/changelog/2025-01-01.md).
- **`2024/10/18`**：❗ 我们更新了`transformers`，`gradio`库的版本避免安全漏洞，具体查看[这里](https://github.com/KlingTeam/LivePortrait/pull/421/files).
- **`2024/08/29`**：📦 我们更新了Windows[一键安装程序](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240829.zip)并支持自动更新, 详情建[这里](https://huggingface.co/cleardusk/LivePortrait-Windows#20240829)。
- **`2024/08/19`**：🖼️ 我们支持了**图像驱动模式**和**区域控制**。详情请见[**这里**](./assets/docs/changelog/2024-08-19.md)。
- **`2024/08/06`**：🎨 我们在Gradio界面支持**精确的人像编辑**, 受到[ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait)启发。详见[**这里**](./assets/docs/changelog/2024-08-06.md)。
- **`2024/08/05`**：📦 Windows用户现在可以下载[一键安装程序](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240806.zip)，支持**人类模式**和**动物模式**！详情见[**这里**](./assets/docs/changelog/2024-08-05.md)。
- **`2024/08/02`**：😸 我们发布了**动物模型**版本，以及其他一些更新和改进。查看详情[**这里**](./assets/docs/changelog/2024-08-02.md)！
- **`2024/07/25`**：📦 Windows用户现在可以从 [HuggingFace](https://huggingface.co/cleardusk/LivePortrait-Windows/tree/main) 或 [百度云](https://pan.baidu.com/s/1FWsWqKe0eNfXrwjEhhCqlw?pwd=86q2) 下载软件包。解压并双击`run_windows.bat`即可享受！
- **`2024/07/24`**：🎨 我们在Gradio界面支持源人像的姿势编辑。我们还降低了默认检测阈值以增加召回率。[玩得开心](assets/docs/changelog/2024-07-24.md)！
- **`2024/07/19`**：✨ 我们支持🎞️ **人像视频编辑（aka v2v）**！更多信息见[**这里**](assets/docs/changelog/2024-07-19.md)。
- **`2024/07/17`**：🍎 我们支持macOS搭载Apple Silicon，修改来自 [jeethu](https://github.com/jeethu) 的PR [#143](https://github.com/KlingTeam/LivePortrait/pull/143) 。
- **`2024/07/10`**：💪我们支持音频和视频拼接、驱动视频自动裁剪以及制作模板以保护隐私。更多信息见[这里](assets/docs/changelog/2024-07-10.md)。
- **`2024/07/09`**：🤗 我们发布了[HuggingFace Space](https://huggingface.co/spaces/KlingTeam/LivePortrait)，感谢HF团队和[Gradio](https://github.com/gradio-app/gradio)！
- **`2024/07/04`**：😊 我们发布了初始版本的推理代码和模型。持续更新，敬请关注！
- **`2024/07/04`**：🔥 我们发布了[主页](https://liveportrait.github.io)和在[arXiv](https://arxiv.org/pdf/2407.03168)上的技术报告。



## 介绍 📖
此仓库名为**LivePortrait**，包含我们论文（[LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control](https://arxiv.org/pdf/2407.03168)）的官方PyTorch实现。 我们正在积极更新和改进此仓库。如果您发现任何错误或有建议，欢迎提出问题或提交合并请求💖。

## 上手指南 🏁
### 1. 克隆代码和安装运行环境 🛠️

> [!Note]
> 确保您的系统已安装[`git`](https://git-scm.com/)、[`conda`](https://anaconda.org/anaconda/conda)和[`FFmpeg`](https://ffmpeg.org/download.html)。有关FFmpeg安装的详细信息，见[**如何安装FFmpeg**](assets/docs/how-to-install-ffmpeg.md)。

```bash
git clone https://github.com/KlingTeam/LivePortrait
cd LivePortrait

# 使用conda创建环境
conda create -n LivePortrait python=3.10
conda activate LivePortrait
```

#### 对于Linux或Windows用户

[X-Pose](https://github.com/IDEA-Research/X-Pose)需要您的`torch`版本与CUDA版本兼容。

首先，通过以下命令检查您当前的CUDA版本：

```bash
nvcc -V # example versions: 11.1, 11.8, 12.1, etc.
```

然后，安装相应版本的torch。以下是不同CUDA版本的示例。如果您的CUDA版本未列出，请访问[PyTorch官方网站](https://pytorch.org/get-started/previous-versions)获取安装命令：
```bash
# for CUDA 11.1
pip install torch==1.10.1+cu111 torchvision==0.11.2 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
# for CUDA 11.8
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
# for CUDA 12.1
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
# ...
```

**注意**：在Windows系统上，一些过高版本的CUDA(12.4、12.6等)可能会导致未知的问题，您可以考虑降低您的CUDA版本到11.8，这是我们测试的一个较为稳定的版本。降级方法可以参考 [@dimitribarbot](https://github.com/dimitribarbot)
提供的[文档](https://github.com/dimitribarbot/sd-webui-live-portrait/blob/main/assets/docs/how-to-install-xpose.md#cuda-toolkit-118).

最后，安装其余依赖项：

```bash
pip install -r requirements.txt
```

#### 对于搭载Apple Silicon的macOS用户

[X-Pose](https://github.com/IDEA-Research/X-Pose)依赖项不支持macOS，因此您可以跳过其安装。人类模式照常工作，但不支持动物模式。使用为搭载Apple Silicon的macOS提供的requirements文件：

```bash
# 对于搭载Apple Silicon的macOS用户
pip install -r requirements_macOS.txt
```

### 2. 下载预训练权重(Pretrained weights) 📥

从HuggingFace下载预训练权重的最简单方法是：
```bash
# !pip install -U "huggingface_hub[cli]"
huggingface-cli download KlingTeam/LivePortrait --local-dir pretrained_weights --exclude "*.git*" "README.md" "docs"
```

若您不能访问HuggingFace平台，你可以访问其镜像网站[hf-mirror](https://hf-mirror.com/)进行下载操作：

```bash
# !pip install -U "huggingface_hub[cli]"
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download KlingTeam/LivePortrait --local-dir pretrained_weights --exclude "*.git*" "README.md" "docs"
```

或者，您可以从[Google Drive](https://drive.google.com/drive/folders/1UtKgzKjFAOmZkhNK-OYT0caJ_w2XAnib)或[百度云](https://pan.baidu.com/s/1MGctWmNla_vZxDbEp2Dtzw?pwd=z5cn)（进行中）下载所有预训练权重。解压并将它们放置在`./pretrained_weights`目录下。

确保目录结构如所示包含[**本仓库该路径**](assets/docs/directory-structure.md)其中展示的内容。

### 3. 推理 🚀

#### 快速上手（人类模型）👤

```bash
# 对于Linux和Windows用户
python inference.py

# 对于搭载Apple Silicon的macOS用户（Intel未测试）。注意：这可能比RTX 4090慢20倍
PYTORCH_ENABLE_MPS_FALLBACK=1 python inference.py
```

如果脚本成功运行，您将得到一个名为`animations/s6--d0_concat.mp4`的输出mp4文件。此文件包含以下结果：驱动视频、输入图像或视频以及生成结果。

<p align="center">
  <img src="./assets/docs/inference.gif" alt="image">
</p>
或者，您可以通过指定`-s`和`-d`参数来更改输入：

```bash
# 源输入是图像
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d0.mp4

# 源输入是视频 ✨
python inference.py -s assets/examples/source/s13.mp4 -d assets/examples/driving/d0.mp4

# 更多选项请见
python inference.py -h
```

#### 快速上手（动物模型） 🐱🐶

动物模式仅在Linux和Windows上经过测试，并且需要NVIDIA GPU。

您需要首先构建一个名为`MultiScaleDeformableAttention`的OP，该OP由[X-Pose](https://github.com/IDEA-Research/X-Pose)使用，这是一个通用的关键点检测框架。

```bash
cd src/utils/dependencies/XPose/models/UniPose/ops
python setup.py build install
cd - # 等同于 cd ../../../../../../../
```

然后执行
```bash
python inference_animals.py -s assets/examples/source/s39.jpg -d assets/examples/driving/wink.pkl --driving_multiplier 1.75 --no_flag_stitching
```
如果脚本成功运行，您将得到一个名为`animations/s39--wink_concat.mp4`的输出mp4文件。
<p align="center">
  <img src="./assets/docs/inference-animals.gif" alt="image">
</p>

#### 驱动视频自动裁剪 📢📢📢

> [!IMPORTANT]
> 使用您自己的驱动视频时，我们**推荐**： ⬇️
>
> - 将其裁剪为**1:1**的宽高比（例如，512x512或256x256像素），或通过`--flag_crop_driving_video`启用自动裁剪。
> - 专注于头部区域，类似于示例视频。
> - 最小化肩部运动。
> - 确保驱动视频的第一帧是具有**中性表情**的正面面部。

以下是通过`--flag_crop_driving_video`自动裁剪的示例：

```bash
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d13.mp4 --flag_crop_driving_video
```

如果自动裁剪的结果不理想，您可以修改`--scale_crop_driving_video`、`--vy_ratio_crop_driving_video`选项来调整比例和偏移，或者手动进行调整。

#### 动作模板制作

您也可以使用以`.pkl`结尾的自动生成的动作模板文件来加快推理速度，并**保护隐私**，例如：
```bash
python inference.py -s assets/examples/source/s9.jpg -d assets/examples/driving/d5.pkl # 人像动画
python inference.py -s assets/examples/source/s13.mp4 -d assets/examples/driving/d5.pkl # 人像视频编辑
```

### 4. Gradio 界面 🤗

我们还提供了Gradio界面 <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a>，以获得更好的体验，只需运行：

```bash
# 对于Linux和Windows用户（以及搭载Intel的macOS？？）
python app.py # 人类模型模式

# 对于搭载Apple Silicon的macOS用户，不支持Intel，这可能比RTX 4090慢20倍
PYTORCH_ENABLE_MPS_FALLBACK=1 python app.py # 人类模型模式
```

我们还为动物模式提供了Gradio界面，这仅在Linux上经过NVIDIA GPU测试：
```bash
python app_animals.py # animals mode 🐱🐶
```

您可以指定`--server_port`、`--share`、`--server_name`参数以满足您的需求！

🚀我们还提供了一个加速选项`--flag_do_torch_compile`。第一次推理触发优化过程（约一分钟），使后续推理速度提高20-30%。不同CUDA版本的性能提升可能有所不同。

```bash
# 启用torch.compile以进行更快的推理
python app.py --flag_do_torch_compile
```
**注意**：此方法在Windows和macOS上不受支持。

**或者，在[HuggingFace](https://huggingface.co/spaces/KlingTeam/LivePortrait)上轻松尝试**🤗。

### 5. 推理速度预估 🚀🚀🚀
我们还提供了一个脚本来评估每个模块的推理速度：

```bash
# 对于NVIDIA GPU
python speed.py
```

结果在[**本仓库该文件展示**](./assets/docs/speed.md).

## 社区资源 🤗

### 社区项目

| 仓库 | 描述 | 作者 / 链接 |
|------|------|--------|
| [**ditto-talkinghead**](https://github.com/antgroup/ditto-talkinghead) | 实时音频驱动。 | [论文](https://arxiv.org/abs/2411.19509), [主页](https://digital-avatar.github.io/ai/Ditto/) |
| [**FasterLivePortrait**](https://github.com/warmshao/FasterLivePortrait) | 基于TensorRT加速更快的实时版本。 | [@warmshao](https://github.com/warmshao) |
| [**AdvancedLivePortrait-WebUI**](https://github.com/jhj0517/AdvancedLivePortrait-WebUI) | Dedicated gradio based WebUI started from [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) | [@jhj0517](https://github.com/jhj0517) |
| [**FacePoke**](https://github.com/jbilcke-hf/FacePoke) | 一个实时的头部姿态表情控制应用，通过鼠标控制！ | [@jbilcke-hf](https://github.com/jbilcke-hf) |
| [**FaceFusion**](https://github.com/facefusion/facefusion) | FaceFusion 3.0 集成了 LivePortrait 作为 `expression_restorer` 和 `face_editor` 处理器。 | [@henryruhs](https://github.com/henryruhs) |
| [**sd-webui-live-portrait**](https://github.com/dimitribarbot/sd-webui-live-portrait) | LivePortrait 的 WebUI 扩展，在原版 Stable Diffusion WebUI 中添加了一个标签以使用 LivePortrait 的功能。 | [@dimitribarbot](https://github.com/dimitribarbot) |
| [**ComfyUI-LivePortraitKJ**](https://github.com/kijai/ComfyUI-LivePortraitKJ) | 一个用于 LivePortrait 的 ComfyUI 节点，使用 MediaPipe 作为 Insightface 的替代方案。 | [@kijai](https://github.com/kijai) |
| [**ComfyUI-AdvancedLivePortrait**](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) | 一个更快的 ComfyUI 节点，具有实时预览功能，启发了许多社区开发的工具和项目。 | [@PowerHouseMan](https://github.com/PowerHouseMan) |
| [**comfyui-liveportrait**](https://github.com/shadowcz007/comfyui-liveportrait) | 一个用于 LivePortrait 的 ComfyUI 节点，支持多面部、表情插值等功能，并有[教程](https://www.bilibili.com/video/BV1JW421R7sP)。 | [@shadowcz007](https://github.com/shadowcz007) |

### Playgrounds, 🤗 HuggingFace Spaces 以及其它
- [FacePoke Space](https://huggingface.co/spaces/jbilcke-hf/FacePoke)
- [Expression Editor Space](https://huggingface.co/spaces/fffiloni/expression-editor)
- [Expression Editor Replicate](https://replicate.com/fofr/expression-editor)
- [Face Control Realtime Demo](https://fal.ai/demos/face-control) on FAL
- [Replicate Playground](https://replicate.com/fofr/live-portrait)
- Nuke 可以通过 CompyUI 节点使用 LivePortrait，详情见[这里](https://x.com/bilawalsidhu/status/1837349806475276338)
- LivePortrait 在 [Poe](https://poe.com/LivePortrait) 上运行

### 视频教程
- [LivePortrait 视频转视频的工作流程](https://youtu.be/xfzK_6cTs58?si=aYjgypeJBkhc46VL) 由 [@curiousrefuge](https://www.youtube.com/@curiousrefuge) 制作
- [Google Colab 教程](https://youtu.be/59Y9ePAXTp0?si=KzEWhklBlporW7D8) 由 [@Planet Ai](https://www.youtube.com/@planetai217) 制作
- [论文解读](https://youtu.be/fD0P6UWSu8I?si=Vn5wxUa8qSu1jv4l) 由 [@TwoMinutePapers](https://www.youtube.com/@TwoMinutePapers) 制作
- [ComfyUI 高级 LivePortrait 教程](https://youtu.be/q0Vf-ZZsbzI?si=nbs3npleH-dVCt28) 由 [TutoView](https://www.youtube.com/@TutoView) 制作
- [LivePortrait 探索](https://www.youtube.com/watch?v=vsvlbTEqgXQ) 和 [LivePortrait 深入探讨](https://youtu.be/cucaEEDYmsw?si=AtPaDWc5G-a4E8dD) 由 [TheoreticallyMedia](https://www.youtube.com/@TheoreticallyMedia) 制作
- [LivePortrait 实战教程](https://www.youtube.com/watch?v=uyjSTAOY7yI) 由 [@AI Search](https://www.youtube.com/@theAIsearch) 制作
- [ComfyUI 教程](https://www.youtube.com/watch?v=8-IcDDmiUMM) 由 [@Sebastian Kamph](https://www.youtube.com/@sebastiankamph) 制作
- B 站上的[教程](https://www.bilibili.com/video/BV1cf421i7Ly)

还有来自社区的无数令人惊叹的贡献，未能一一列举 💖

## 致谢 💐

我们要感谢[FOMM](https://github.com/AliaksandrSiarohin/first-order-model)、[Open Facevid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis)、[SPADE](https://github.com/NVlabs/SPADE)、[InsightFace](https://github.com/deepinsight/insightface)和[X-Pose](https://github.com/IDEA-Research/X-Pose)仓库的的贡献者，感谢他们的开放研究和贡献。

## 道德考量 🛡️
肖像动画技术伴随着社会风险，特别是在创建深度伪造（deepfakes）时可能被滥用。为了减轻这些风险，遵循道德指南并采取负责任的使用实践至关重要。目前，生成的结果包含一些视觉伪影，这些伪影可能有助于检测深度伪造。请注意，我们不对本项目生成的结果的使用承担任何法律责任。

## 引用 💖

如果您发现LivePortrait对您的研究有用，欢迎引用我们的工作，使用以下BibTeX：

```bibtex
@article{guo2024liveportrait,
  title   = {LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control},
  author  = {Guo, Jianzhu and Zhang, Dingyun and Liu, Xiaoqiang and Zhong, Zhizhou and Zhang, Yuan and Wan, Pengfei and Zhang, Di},
  journal = {arXiv preprint arXiv:2407.03168},
  year    = {2024}
}
```

## 联系方式 📧

[**Jianzhu Guo (郭建珠)**](https://guojianzhu.com); **guojianzhu1994@gmail.com**；

## Star History 🌟
<details>
  <summary>点击展开查看项目 Star 曲线</summary>
  <p align="center">
    <a href="https://www.star-history.com/#KlingAIResearch/LivePortrait&Timeline" target="_blank">
      <picture>
        <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline&theme=dark" />
        <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline" />
        <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=KlingAIResearch/LivePortrait&type=Timeline" width="90%" />
      </picture>
    </a>
  </p>
</details>


================================================
FILE: requirements.txt
================================================
-r requirements_base.txt

onnxruntime-gpu==1.18.0
transformers==4.38.0


================================================
FILE: requirements_base.txt
================================================
numpy==1.26.4
pyyaml==6.0.1
opencv-python==4.10.0.84
scipy==1.13.1
imageio==2.34.2
lmdb==1.4.1
tqdm==4.66.4
rich==13.7.1
ffmpeg-python==0.2.0
onnx==1.16.1
scikit-image==0.24.0
albumentations==1.4.10
matplotlib==3.9.0
imageio-ffmpeg==0.5.1
tyro==0.8.5
gradio==5.1.0
pykalman==0.9.7
pillow>=10.2.0

================================================
FILE: requirements_macOS.txt
================================================
-r requirements_base.txt

--extra-index-url https://download.pytorch.org/whl/cpu
torch==2.3.0
torchvision==0.18.0
torchaudio==2.3.0
onnxruntime-silicon==1.16.3


================================================
FILE: speed.py
================================================
# coding: utf-8

"""
Benchmark the inference speed of each module in LivePortrait.

TODO: heavy GPT style, need to refactor
"""

import torch
torch._dynamo.config.suppress_errors = True  # Suppress errors and fall back to eager execution

import yaml
import time
import numpy as np

from src.utils.helper import load_model, concat_feat
from src.config.inference_config import InferenceConfig


def initialize_inputs(batch_size=1, device_id=0):
    """
    Generate random input tensors and move them to GPU
    """
    feature_3d = torch.randn(batch_size, 32, 16, 64, 64).to(device_id).half()
    kp_source = torch.randn(batch_size, 21, 3).to(device_id).half()
    kp_driving = torch.randn(batch_size, 21, 3).to(device_id).half()
    source_image = torch.randn(batch_size, 3, 256, 256).to(device_id).half()
    generator_input = torch.randn(batch_size, 256, 64, 64).to(device_id).half()
    eye_close_ratio = torch.randn(batch_size, 3).to(device_id).half()
    lip_close_ratio = torch.randn(batch_size, 2).to(device_id).half()
    feat_stitching = concat_feat(kp_source, kp_driving).half()
    feat_eye = concat_feat(kp_source, eye_close_ratio).half()
    feat_lip = concat_feat(kp_source, lip_close_ratio).half()

    inputs = {
        'feature_3d': feature_3d,
        'kp_source': kp_source,
        'kp_driving': kp_driving,
        'source_image': source_image,
        'generator_input': generator_input,
        'feat_stitching': feat_stitching,
        'feat_eye': feat_eye,
        'feat_lip': feat_lip
    }

    return inputs


def load_and_compile_models(cfg, model_config):
    """
    Load and compile models for inference
    """
    appearance_feature_extractor = load_model(cfg.checkpoint_F, model_config, cfg.device_id, 'appearance_feature_extractor')
    motion_extractor = load_model(cfg.checkpoint_M, model_config, cfg.device_id, 'motion_extractor')
    warping_module = load_model(cfg.checkpoint_W, model_config, cfg.device_id, 'warping_module')
    spade_generator = load_model(cfg.checkpoint_G, model_config, cfg.device_id, 'spade_generator')
    stitching_retargeting_module = load_model(cfg.checkpoint_S, model_config, cfg.device_id, 'stitching_retargeting_module')

    models_with_params = [
        ('Appearance Feature Extractor', appearance_feature_extractor),
        ('Motion Extractor', motion_extractor),
        ('Warping Network', warping_module),
        ('SPADE Decoder', spade_generator)
    ]

    compiled_models = {}
    for name, model in models_with_params:
        model = model.half()
        model = torch.compile(model, mode='max-autotune')  # Optimize for inference
        model.eval()  # Switch to evaluation mode
        compiled_models[name] = model

    retargeting_models = ['stitching', 'eye', 'lip']
    for retarget in retargeting_models:
        module = stitching_retargeting_module[retarget].half()
        module = torch.compile(module, mode='max-autotune')  # Optimize for inference
        module.eval()  # Switch to evaluation mode
        stitching_retargeting_module[retarget] = module

    return compiled_models, stitching_retargeting_module


def warm_up_models(compiled_models, stitching_retargeting_module, inputs):
    """
    Warm up models to prepare them for benchmarking
    """
    print("Warm up start!")
    with torch.no_grad():
        for _ in range(10):
            compiled_models['Appearance Feature Extractor'](inputs['source_image'])
            compiled_models['Motion Extractor'](inputs['source_image'])
            compiled_models['Warping Network'](inputs['feature_3d'], inputs['kp_driving'], inputs['kp_source'])
            compiled_models['SPADE Decoder'](inputs['generator_input'])  # Adjust input as required
            stitching_retargeting_module['stitching'](inputs['feat_stitching'])
            stitching_retargeting_module['eye'](inputs['feat_eye'])
            stitching_retargeting_module['lip'](inputs['feat_lip'])
    print("Warm up end!")


def measure_inference_times(compiled_models, stitching_retargeting_module, inputs):
    """
    Measure inference times for each model
    """
    times = {name: [] for name in compiled_models.keys()}
    times['Stitching and Retargeting Modules'] = []

    overall_times = []

    with torch.no_grad():
        for _ in range(100):
            torch.cuda.synchronize()
            overall_start = time.time()

            start = time.time()
            compiled_models['Appearance Feature Extractor'](inputs['source_image'])
            torch.cuda.synchronize()
            times['Appearance Feature Extractor'].append(time.time() - start)

            start = time.time()
            compiled_models['Motion Extractor'](inputs['source_image'])
            torch.cuda.synchronize()
            times['Motion Extractor'].append(time.time() - start)

            start = time.time()
            compiled_models['Warping Network'](inputs['feature_3d'], inputs['kp_driving'], inputs['kp_source'])
            torch.cuda.synchronize()
            times['Warping Network'].append(time.time() - start)

            start = time.time()
            compiled_models['SPADE Decoder'](inputs['generator_input'])  # Adjust input as required
            torch.cuda.synchronize()
            times['SPADE Decoder'].append(time.time() - start)

            start = time.time()
            stitching_retargeting_module['stitching'](inputs['feat_stitching'])
            stitching_retargeting_module['eye'](inputs['feat_eye'])
            stitching_retargeting_module['lip'](inputs['feat_lip'])
            torch.cuda.synchronize()
            times['Stitching and Retargeting Modules'].append(time.time() - start)

            overall_times.append(time.time() - overall_start)

    return times, overall_times


def print_benchmark_results(compiled_models, stitching_retargeting_module, retargeting_models, times, overall_times):
    """
    Print benchmark results with average and standard deviation of inference times
    """
    average_times = {name: np.mean(times[name]) * 1000 for name in times.keys()}
    std_times = {name: np.std(times[name]) * 1000 for name in times.keys()}

    for name, model in compiled_models.items():
        num_params = sum(p.numel() for p in model.parameters())
        num_params_in_millions = num_params / 1e6
        print(f"Number of parameters for {name}: {num_params_in_millions:.2f} M")

    for index, retarget in enumerate(retargeting_models):
        num_params = sum(p.numel() for p in stitching_retargeting_module[retarget].parameters())
        num_params_in_millions = num_params / 1e6
        print(f"Number of parameters for part_{index} in Stitching and Retargeting Modules: {num_params_in_millions:.2f} M")

    for name, avg_time in average_times.items():
        std_time = std_times[name]
        print(f"Average inference time for {name} over 100 runs: {avg_time:.2f} ms (std: {std_time:.2f} ms)")


def main():
    """
    Main function to benchmark speed and model parameters
    """
    # Load configuration
    cfg = InferenceConfig()
    model_config_path = cfg.models_config
    with open(model_config_path, 'r') as file:
        model_config = yaml.safe_load(file)

    # Sample input tensors
    inputs = initialize_inputs(device_id = cfg.device_id)

    # Load and compile models
    compiled_models, stitching_retargeting_module = load_and_compile_models(cfg, model_config)

    # Warm up models
    warm_up_models(compiled_models, stitching_retargeting_module, inputs)

    # Measure inference times
    times, overall_times = measure_inference_times(compiled_models, stitching_retargeting_module, inputs)

    # Print benchmark results
    print_benchmark_results(compiled_models, stitching_retargeting_module, ['stitching', 'eye', 'lip'], times, overall_times)


if __name__ == "__main__":
    main()


================================================
FILE: src/config/__init__.py
================================================


================================================
FILE: src/config/argument_config.py
================================================
# coding: utf-8

"""
All configs for user
"""
from dataclasses import dataclass
import tyro
from typing_extensions import Annotated
from typing import Optional, Literal
from .base_config import PrintableConfig, make_abs_path


@dataclass(repr=False)  # use repr from PrintableConfig
class ArgumentConfig(PrintableConfig):
    ########## input arguments ##########
    source: Annotated[str, tyro.conf.arg(aliases=["-s"])] = make_abs_path('../../assets/examples/source/s0.jpg')  # path to the source portrait (human/animal) or video (human)
    driving:  Annotated[str, tyro.conf.arg(aliases=["-d"])] = make_abs_path('../../assets/examples/driving/d0.mp4')  # path to driving video or template (.pkl format)
    output_dir: Annotated[str, tyro.conf.arg(aliases=["-o"])] = 'animations/'  # directory to save output video

    ########## inference arguments ##########
    flag_use_half_precision: bool = True  # whether to use half precision (FP16). If black boxes appear, it might be due to GPU incompatibility; set to False.
    flag_crop_driving_video: bool = False  # whether to crop the driving video, if the given driving info is a video
    device_id: int = 0  # gpu device id
    flag_force_cpu: bool = False  # force cpu inference, WIP!
    flag_normalize_lip: bool = False  # whether to let the lip to close state before animation, only take effect when flag_eye_retargeting and flag_lip_retargeting is False
    flag_source_video_eye_retargeting: bool = False  # when the input is a source video, whether to let the eye-open scalar of each frame to be the same as the first source frame before the animation, only take effect when flag_eye_retargeting and flag_lip_retargeting is False, may cause the inter-frame jittering
    flag_eye_retargeting: bool = False  # not recommend to be True, WIP; whether to transfer the eyes-open ratio of each driving frame to the source image or the corresponding source frame
    flag_lip_retargeting: bool = False  # not recommend to be True, WIP; whether to transfer the lip-open ratio of each driving frame to the source image or the corresponding source frame
    flag_stitching: bool = True  # recommend to True if head movement is small, False if head movement is large or the source image is an animal
    flag_relative_motion: bool = True # whether to use relative motion
    flag_pasteback: bool = True  # whether to paste-back/stitch the animated face cropping from the face-cropping space to the original image space
    flag_do_crop: bool = True  # whether to crop the source portrait or video to the face-cropping space
    driving_option: Literal["expression-friendly", "pose-friendly"] = "expression-friendly" # "expression-friendly" or "pose-friendly"; "expression-friendly" would adapt the driving motion with the global multiplier, and could be used when the source is a human image
    driving_multiplier: float = 1.0 # be used only when driving_option is "expression-friendly"
    driving_smooth_observation_variance: float = 3e-7  # smooth strength scalar for the animated video when the input is a source video, the larger the number, the smoother the animated video; too much smoothness would result in loss of motion accuracy
    audio_priority: Literal['source', 'driving'] = 'driving'  # whether to use the audio from source or driving video
    animation_region: Literal["exp", "pose", "lip", "eyes", "all"] = "all" # the region where the animation was performed, "exp" means the expression, "pose" means the head pose, "all" means all regions
    ########## source crop arguments ##########
    det_thresh: float = 0.15 # detection threshold
    scale: float = 2.3  # the ratio of face area is smaller if scale is larger
    vx_ratio: float = 0  # the ratio to move the face to left or right in cropping space
    vy_ratio: float = -0.125  # the ratio to move the face to up or down in cropping space
    flag_do_rot: bool = True  # whether to conduct the rotation when flag_do_crop is True
    source_max_dim: int = 1280 # the max dim of height and width of source image or video, you can change it to a larger number, e.g., 1920
    source_division: int = 2 # make sure the height and width of source image or video can be divided by this number

    ########## driving crop arguments ##########
    scale_crop_driving_video: float = 2.2  # scale factor for cropping driving video
    vx_ratio_crop_driving_video: float = 0.  # adjust y offset
    vy_ratio_crop_driving_video: float = -0.1  # adjust x offset

    ########## gradio arguments ##########
    server_port: Annotated[int, tyro.conf.arg(aliases=["-p"])] = 8890  # port for gradio server
    share: bool = False  # whether to share the server to public
    server_name: Optional[str] = "127.0.0.1"  # set the local server name, "0.0.0.0" to broadcast all
    flag_do_torch_compile: bool = False  # whether to use torch.compile to accelerate generation
    gradio_temp_dir: Optional[str] = None  # directory to save gradio temp files


================================================
FILE: src/config/base_config.py
================================================
# coding: utf-8

"""
pretty printing class
"""

from __future__ import annotations
import os.path as osp
from typing import Tuple


def make_abs_path(fn):
    return osp.join(osp.dirname(osp.realpath(__file__)), fn)


class PrintableConfig:  # pylint: disable=too-few-public-methods
    """Printable Config defining str function"""

    def __repr__(self):
        lines = [self.__class__.__name__ + ":"]
        for key, val in vars(self).items():
            if isinstance(val, Tuple):
                flattened_val = "["
                for item in val:
                    flattened_val += str(item) + "\n"
                flattened_val = flattened_val.rstrip("\n")
                val = flattened_val + "]"
            lines += f"{key}: {str(val)}".split("\n")
        return "\n    ".join(lines)


================================================
FILE: src/config/crop_config.py
================================================
# coding: utf-8

"""
parameters used for crop faces
"""

from dataclasses import dataclass

from .base_config import PrintableConfig, make_abs_path


@dataclass(repr=False)  # use repr from PrintableConfig
class CropConfig(PrintableConfig):
    insightface_root: str = make_abs_path("../../pretrained_weights/insightface")
    landmark_ckpt_path: str = make_abs_path("../../pretrained_weights/liveportrait/landmark.onnx")
    xpose_config_file_path: str = make_abs_path("../utils/dependencies/XPose/config_model/UniPose_SwinT.py")
    xpose_embedding_cache_path: str = make_abs_path('../utils/resources/clip_embedding')

    xpose_ckpt_path: str = make_abs_path("../../pretrained_weights/liveportrait_animals/xpose.pth")
    device_id: int = 0  # gpu device id
    flag_force_cpu: bool = False  # force cpu inference, WIP
    det_thresh: float = 0.1 # detection threshold
    ########## source image or video cropping option ##########
    dsize: int = 512  # crop size
    scale: float = 2.3  # scale factor
    vx_ratio: float = 0  # vx ratio
    vy_ratio: float = -0.125  # vy ratio +up, -down
    max_face_num: int = 0  # max face number, 0 mean no limit
    flag_do_rot: bool = True # whether to conduct the rotation when flag_do_crop is True
    animal_face_type: str = "animal_face_9"  # animal_face_68 -> 68 landmark points, animal_face_9 -> 9 landmarks
    ########## driving video auto cropping option ##########
    scale_crop_driving_video: float = 2.2  # 2.0 # scale factor for cropping driving video
    vx_ratio_crop_driving_video: float = 0.0  # adjust y offset
    vy_ratio_crop_driving_video: float = -0.1  # adjust x offset
    direction: str = "large-small"  # direction of cropping


================================================
FILE: src/config/inference_config.py
================================================
# coding: utf-8

"""
config dataclass used for inference
"""

import cv2
from numpy import ndarray
import pickle as pkl
from dataclasses import dataclass, field
from typing import Literal, Tuple
from .base_config import PrintableConfig, make_abs_path

def load_lip_array():
    with open(make_abs_path('../utils/resources/lip_array.pkl'), 'rb') as f:
        return pkl.load(f)

@dataclass(repr=False)  # use repr from PrintableConfig
class InferenceConfig(PrintableConfig):
    # HUMAN MODEL CONFIG, NOT EXPORTED PARAMS
    models_config: str = make_abs_path('./models.yaml')  # portrait animation config
    checkpoint_F: str = make_abs_path('../../pretrained_weights/liveportrait/base_models/appearance_feature_extractor.pth')  # path to checkpoint of F
    checkpoint_M: str = make_abs_path('../../pretrained_weights/liveportrait/base_models/motion_extractor.pth')  # path to checkpoint pf M
    checkpoint_G: str = make_abs_path('../../pretrained_weights/liveportrait/base_models/spade_generator.pth')  # path to checkpoint of G
    checkpoint_W: str = make_abs_path('../../pretrained_weights/liveportrait/base_models/warping_module.pth')  # path to checkpoint of W
    checkpoint_S: str = make_abs_path('../../pretrained_weights/liveportrait/retargeting_models/stitching_retargeting_module.pth')  # path to checkpoint to S and R_eyes, R_lip

    # ANIMAL MODEL CONFIG, NOT EXPORTED PARAMS
    # version_animals = "" # old version
    version_animals = "_v1.1" # new (v1.1) version
    checkpoint_F_animal: str = make_abs_path(f'../../pretrained_weights/liveportrait_animals/base_models{version_animals}/appearance_feature_extractor.pth')  # path to checkpoint of F
    checkpoint_M_animal: str = make_abs_path(f'../../pretrained_weights/liveportrait_animals/base_models{version_animals}/motion_extractor.pth')  # path to checkpoint pf M
    checkpoint_G_animal: str = make_abs_path(f'../../pretrained_weights/liveportrait_animals/base_models{version_animals}/spade_generator.pth')  # path to checkpoint of G
    checkpoint_W_animal: str = make_abs_path(f'../../pretrained_weights/liveportrait_animals/base_models{version_animals}/warping_module.pth')  # path to checkpoint of W
    checkpoint_S_animal: str = make_abs_path('../../pretrained_weights/liveportrait/retargeting_models/stitching_retargeting_module.pth')  # path to checkpoint to S and R_eyes, R_lip, NOTE: use human temporarily!

    # EXPORTED PARAMS
    flag_use_half_precision: bool = True
    flag_crop_driving_video: bool = False
    device_id: int = 0
    flag_normalize_lip: bool = True
    flag_source_video_eye_retargeting: bool = False
    flag_eye_retargeting: bool = False
    flag_lip_retargeting: bool = False
    flag_stitching: bool = True
    flag_relative_motion: bool = True
    flag_pasteback: bool = True
    flag_do_crop: bool = True
    flag_do_rot: bool = True
    flag_force_cpu: bool = False
    flag_do_torch_compile: bool = False
    driving_option: str = "pose-friendly" # "expression-friendly" or "pose-friendly"
    driving_multiplier: float = 1.0
    driving_smooth_observation_variance: float = 3e-7 # smooth strength scalar for the animated video when the input is a source video, the larger the number, the smoother the animated video; too much smoothness would result in loss of motion accuracy
    source_max_dim: int = 1280 # the max dim of height and width of source image or video
    source_division: int = 2 # make sure the height and width of source image or video can be divided by this number
    animation_region: Literal["exp", "pose", "lip", "eyes", "all"] = "all" # the region where the animation was performed, "exp" means the expression, "pose" means the head pose

    # NOT EXPORTED PARAMS
    lip_normalize_threshold: float = 0.03 # threshold for flag_normalize_lip
    source_video_eye_retargeting_threshold: float = 0.18 # threshold for eyes retargeting if the input is a source video
    anchor_frame: int = 0 # TO IMPLEMENT

    input_shape: Tuple[int, int] = (256, 256)  # input shape
    output_format: Literal['mp4', 'gif'] = 'mp4'  # output video format
    crf: int = 15  # crf for output video
    output_fps: int = 25 # default output fps

    mask_crop: ndarray = field(default_factory=lambda: cv2.imread(make_abs_path('../utils/resources/mask_template.png'), cv2.IMREAD_COLOR))
    lip_array: ndarray = field(default_factory=load_lip_array)
    size_gif: int = 256 # default gif size, TO IMPLEMENT


================================================
FILE: src/config/models.yaml
================================================
model_params:
  appearance_feature_extractor_params: # the F in the paper
    image_channel: 3
    block_expansion: 64
    num_down_blocks: 2
    max_features: 512
    reshape_channel: 32
    reshape_depth: 16
    num_resblocks: 6
  motion_extractor_params: # the M in the paper
    num_kp: 21
    backbone: convnextv2_tiny
  warping_module_params: # the W in the paper
    num_kp: 21
    block_expansion: 64
    max_features: 512
    num_down_blocks: 2
    reshape_channel: 32
    estimate_occlusion_map: True
    dense_motion_params:
      block_expansion: 32
      max_features: 1024
      num_blocks: 5
      reshape_depth: 16
      compress: 4
  spade_generator_params: # the G in the paper
    upscale: 2 # represents upsample factor 256x256 -> 512x512
    block_expansion: 64
    max_features: 512
    num_down_blocks: 2
  stitching_retargeting_module_params: # the S in the paper
    stitching:
      input_size: 126 # (21*3)*2
      hidden_sizes: [128, 128, 64]
      output_size: 65 # (21*3)+2(tx,ty)
    lip:
      input_size: 65 # (21*3)+2
      hidden_sizes: [128, 128, 64]
      output_size: 63 # (21*3)
    eye:
      input_size: 66 # (21*3)+3
      hidden_sizes: [256, 256, 128, 128, 64]
      output_size: 63 # (21*3)


================================================
FILE: src/gradio_pipeline.py
================================================
# coding: utf-8

"""
Pipeline for gradio
"""

import os.path as osp
import os
import cv2
from rich.progress import track
import gradio as gr
import numpy as np
import torch

from .config.argument_config import ArgumentConfig
from .live_portrait_pipeline import LivePortraitPipeline
from .live_portrait_pipeline_animal import LivePortraitPipelineAnimal
from .utils.io import load_img_online, load_video, resize_to_limit
from .utils.filter import smooth
from .utils.rprint import rlog as log
from .utils.crop import prepare_paste_back, paste_back
from .utils.camera import get_rotation_matrix
from .utils.video import get_fps, has_audio_stream, concat_frames, images2video, add_audio_to_video
from .utils.helper import is_square_video, mkdir, dct2device, basename
from .utils.retargeting_utils import calc_eye_close_ratio, calc_lip_close_ratio


def update_args(args, user_args):
    """update the args according to user inputs
    """
    for k, v in user_args.items():
        if hasattr(args, k):
            setattr(args, k, v)
    return args


class GradioPipeline(LivePortraitPipeline):
    """gradio for human
    """

    def __init__(self, inference_cfg, crop_cfg, args: ArgumentConfig):
        super().__init__(inference_cfg, crop_cfg)
        # self.live_portrait_wrapper = self.live_portrait_wrapper
        self.args = args

    @torch.no_grad()
    def update_delta_new_eyeball_direction(self, eyeball_direction_x, eyeball_direction_y, delta_new, **kwargs):
        if eyeball_direction_x > 0:
                delta_new[0, 11, 0] += eyeball_direction_x * 0.0007
                delta_new[0, 15, 0] += eyeball_direction_x * 0.001
        else:
            delta_new[0, 11, 0] += eyeball_direction_x * 0.001
            delta_new[0, 15, 0] += eyeball_direction_x * 0.0007

        delta_new[0, 11, 1] += eyeball_direction_y * -0.001
        delta_new[0, 15, 1] += eyeball_direction_y * -0.001
        blink = -eyeball_direction_y / 2.

        delta_new[0, 11, 1] += blink * -0.001
        delta_new[0, 13, 1] += blink * 0.0003
        delta_new[0, 15, 1] += blink * -0.001
        delta_new[0, 16, 1] += blink * 0.0003

        return delta_new

    @torch.no_grad()
    def update_delta_new_smile(self, smile, delta_new, **kwargs):
        delta_new[0, 20, 1] += smile * -0.01
        delta_new[0, 14, 1] += smile * -0.02
        delta_new[0, 17, 1] += smile * 0.0065
        delta_new[0, 17, 2] += smile * 0.003
        delta_new[0, 13, 1] += smile * -0.00275
        delta_new[0, 16, 1] += smile * -0.00275
        delta_new[0, 3, 1] += smile * -0.0035
        delta_new[0, 7, 1] += smile * -0.0035

        return delta_new

    @torch.no_grad()
    def update_delta_new_wink(self, wink, delta_new, **kwargs):
        delta_new[0, 11, 1] += wink * 0.001
        delta_new[0, 13, 1] += wink * -0.0003
        delta_new[0, 17, 0] += wink * 0.0003
        delta_new[0, 17, 1] += wink * 0.0003
        delta_new[0, 3, 1] += wink * -0.0003

        return delta_new

    @torch.no_grad()
    def update_delta_new_eyebrow(self, eyebrow, delta_new, **kwargs):
        if eyebrow > 0:
            delta_new[0, 1, 1] += eyebrow * 0.001
            delta_new[0, 2, 1] += eyebrow * -0.001
        else:
            delta_new[0, 1, 0] += eyebrow * -0.001
            delta_new[0, 2, 0] += eyebrow * 0.001
            delta_new[0, 1, 1] += eyebrow * 0.0003
            delta_new[0, 2, 1] += eyebrow * -0.0003
        return delta_new

    @torch.no_grad()
    def update_delta_new_lip_variation_zero(self, lip_variation_zero, delta_new, **kwargs):
        delta_new[0, 19, 0] += lip_variation_zero

        return delta_new

    @torch.no_grad()
    def update_delta_new_lip_variation_one(self, lip_variation_one, delta_new, **kwargs):
        delta_new[0, 14, 1] += lip_variation_one * 0.001
        delta_new[0, 3, 1] += lip_variation_one * -0.0005
        delta_new[0, 7, 1] += lip_variation_one * -0.0005
        delta_new[0, 17, 2] += lip_variation_one * -0.0005

        return delta_new

    @torch.no_grad()
    def update_delta_new_lip_variation_two(self, lip_variation_two, delta_new, **kwargs):
        delta_new[0, 20, 2] += lip_variation_two * -0.001
        delta_new[0, 20, 1] += lip_variation_two * -0.001
        delta_new[0, 14, 1] += lip_variation_two * -0.001

        return delta_new

    @torch.no_grad()
    def update_delta_new_lip_variation_three(self, lip_variation_three, delta_new, **kwargs):
        delta_new[0, 19, 1] += lip_variation_three * 0.001
        delta_new[0, 19, 2] += lip_variation_three * 0.0001
        delta_new[0, 17, 1] += lip_variation_three * -0.0001

        return delta_new

    @torch.no_grad()
    def update_delta_new_mov_x(self, mov_x, delta_new, **kwargs):
        delta_new[0, 5, 0] += mov_x

        return delta_new

    @torch.no_grad()
    def update_delta_new_mov_y(self, mov_y, delta_new, **kwargs):
        delta_new[0, 5, 1] += mov_y

        return delta_new

    @torch.no_grad()
    def execute_video(
        self,
        input_source_image_path=None,
        input_source_video_path=None,
        input_driving_video_path=None,
        input_driving_image_path=None,
        input_driving_video_pickle_path=None,
        flag_normalize_lip=False,
        flag_relative_input=True,
        flag_do_crop_input=True,
        flag_remap_input=True,
        flag_stitching_input=True,
        animation_region="all",
        driving_option_input="pose-friendly",
        driving_multiplier=1.0,
        flag_crop_driving_video_input=True,
        # flag_video_editing_head_rotation=False,
        scale=2.3,
        vx_ratio=0.0,
        vy_ratio=-0.125,
        scale_crop_driving_video=2.2,
        vx_ratio_crop_driving_video=0.0,
        vy_ratio_crop_driving_video=-0.1,
        driving_smooth_observation_variance=3e-7,
        tab_selection=None,
        v_tab_selection=None
    ):
        """ for video-driven portrait animation or video editing
        """
        if tab_selection == 'Image':
            input_source_path = input_source_image_path
        elif tab_selection == 'Video':
            input_source_path = input_source_video_path
        else:
            input_source_path = input_source_image_path

        if v_tab_selection == 'Video':
            input_driving_path = input_driving_video_path
        elif v_tab_selection == 'Image':
            input_driving_path = input_driving_image_path
        elif v_tab_selection == 'Pickle':
            input_driving_path = input_driving_video_pickle_path
        else:
            input_driving_path = input_driving_video_path

        if input_source_path is not None and input_driving_path is not None:
            if osp.exists(input_driving_path) and v_tab_selection == 'Video' and not flag_crop_driving_video_input and is_square_video(input_driving_path) is False:
                flag_crop_driving_video_input = True
                log("The driving video is not square, it will be cropped to square automatically.")
                gr.Info("The driving video is not square, it will be cropped to square automatically.", duration=2)

            args_user = {
                'source': input_source_path,
                'driving': input_driving_path,
                'flag_normalize_lip' : flag_normalize_lip,
                'flag_relative_motion': flag_relative_input,
                'flag_do_crop': flag_do_crop_input,
                'flag_pasteback': flag_remap_input,
                'flag_stitching': flag_stitching_input,
                'animation_region': animation_region,
                'driving_option': driving_option_input,
                'driving_multiplier': driving_multiplier,
                'flag_crop_driving_video': flag_crop_driving_video_input,
                'scale': scale,
                'vx_ratio': vx_ratio,
                'vy_ratio': vy_ratio,
                'scale_crop_driving_video': scale_crop_driving_video,
                'vx_ratio_crop_driving_video': vx_ratio_crop_driving_video,
                'vy_ratio_crop_driving_video': vy_ratio_crop_driving_video,
                'driving_smooth_observation_variance': driving_smooth_observation_variance,
            }
            # update config from user input
            self.args = update_args(self.args, args_user)
            self.live_portrait_wrapper.update_config(self.args.__dict__)
            self.cropper.update_config(self.args.__dict__)

            output_path, output_path_concat = self.execute(self.args)
            gr.Info("Run successfully!", duration=2)
            if output_path.endswith(".jpg"):
                return gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), output_path, gr.update(visible=True), output_path_concat, gr.update(visible=True)
            else:
                return output_path, gr.update(visible=True), output_path_concat, gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
        else:
            raise gr.Error("Please upload the source portrait or source video, and driving video 🤗🤗🤗", duration=5)

    @torch.no_grad()
    def execute_image_retargeting(
        self,
        input_eye_ratio: float,
        input_lip_ratio: float,
        input_head_pitch_variation: float,
        input_head_yaw_variation: float,
        input_head_roll_variation: float,
        mov_x: float,
        mov_y: float,
        mov_z: float,
        lip_variation_zero: float,
        lip_variation_one: float,
        lip_variation_two: float,
        lip_variation_three: float,
        smile: float,
        wink: float,
        eyebrow: float,
        eyeball_direction_x: float,
        eyeball_direction_y: float,
        input_image,
        retargeting_source_scale: float,
        flag_stitching_retargeting_input=True,
        flag_do_crop_input_retargeting_image=True):
        """ for single image retargeting
        """
        if input_head_pitch_variation is None or input_head_yaw_variation is None or input_head_roll_variation is None:
            raise gr.Error("Invalid relative pose input 💥!", duration=5)
        # disposable feature
        f_s_user, x_s_user, R_s_user, R_d_user, x_s_info, source_lmk_user, crop_M_c2o, mask_ori, img_rgb = \
            self.prepare_retargeting_image(
                input_image, input_head_pitch_variation, input_head_yaw_variation, input_head_roll_variation, retargeting_source_scale, flag_do_crop=flag_do_crop_input_retargeting_image)

        if input_eye_ratio is None or input_lip_ratio is None:
            raise gr.Error("Invalid ratio input 💥!", duration=5)
        else:
            device = self.live_portrait_wrapper.device
            # inference_cfg = self.live_portrait_wrapper.inference_cfg
            x_s_user = x_s_user.to(device)
            f_s_user = f_s_user.to(device)
            R_s_user = R_s_user.to(device)
            R_d_user = R_d_user.to(device)
            mov_x = torch.tensor(mov_x).to(device)
            mov_y = torch.tensor(mov_y).to(device)
            mov_z = torch.tensor(mov_z).to(device)
            eyeball_direction_x = torch.tensor(eyeball_direction_x).to(device)
            eyeball_direction_y = torch.tensor(eyeball_direction_y).to(device)
            smile = torch.tensor(smile).to(device)
            wink = torch.tensor(wink).to(device)
            eyebrow = torch.tensor(eyebrow).to(device)
            lip_variation_zero = torch.tensor(lip_variation_zero).to(device)
            lip_variation_one = torch.tensor(lip_variation_one).to(device)
            lip_variation_two = torch.tensor(lip_variation_two).to(device)
            lip_variation_three = torch.tensor(lip_variation_three).to(device)

            x_c_s = x_s_info['kp'].to(device)
            delta_new = x_s_info['exp'].to(device)
            scale_new = x_s_info['scale'].to(device)
            t_new = x_s_info['t'].to(device)
            R_d_new = (R_d_user @ R_s_user.permute(0, 2, 1)) @ R_s_user

            if eyeball_direction_x != 0 or eyeball_direction_y != 0:
                delta_new = self.update_delta_new_eyeball_direction(eyeball_direction_x, eyeball_direction_y, delta_new)
            if smile != 0:
                delta_new = self.update_delta_new_smile(smile, delta_new)
            if wink != 0:
                delta_new = self.update_delta_new_wink(wink, delta_new)
            if eyebrow != 0:
                delta_new = self.update_delta_new_eyebrow(eyebrow, delta_new)
            if lip_variation_zero != 0:
                delta_new = self.update_delta_new_lip_variation_zero(lip_variation_zero, delta_new)
            if lip_variation_one !=  0:
                delta_new = self.update_delta_new_lip_variation_one(lip_variation_one, delta_new)
            if lip_variation_two != 0:
                delta_new = self.update_delta_new_lip_variation_two(lip_variation_two, delta_new)
            if lip_variation_three != 0:
                delta_new = self.update_delta_new_lip_variation_three(lip_variation_three, delta_new)
            if mov_x != 0:
                delta_new = self.update_delta_new_mov_x(-mov_x, delta_new)
            if mov_y !=0 :
                delta_new = self.update_delta_new_mov_y(mov_y, delta_new)

            x_d_new = mov_z * scale_new * (x_c_s @ R_d_new + delta_new) + t_new
            eyes_delta, lip_delta = None, None
            if input_eye_ratio != self.source_eye_ratio:
                combined_eye_ratio_tensor = self.live_portrait_wrapper.calc_combined_eye_ratio([[float(input_eye_ratio)]], source_lmk_user)
                eyes_delta = self.live_portrait_wrapper.retarget_eye(x_s_user, combined_eye_ratio_tensor)
            if input_lip_ratio != self.source_lip_ratio:
                combined_lip_ratio_tensor = self.live_portrait_wrapper.calc_combined_lip_ratio([[float(input_lip_ratio)]], source_lmk_user)
                lip_delta = self.live_portrait_wrapper.retarget_lip(x_s_user, combined_lip_ratio_tensor)
                print(lip_delta)
            x_d_new = x_d_new + \
                    (eyes_delta if eyes_delta is not None else 0) + \
                    (lip_delta if lip_delta is not None else 0)

            if flag_stitching_retargeting_input:
                x_d_new = self.live_portrait_wrapper.stitching(x_s_user, x_d_new)
            out = self.live_portrait_wrapper.warp_decode(f_s_user, x_s_user, x_d_new)
            out = self.live_portrait_wrapper.parse_output(out['out'])[0]
            if flag_do_crop_input_retargeting_image:
                out_to_ori_blend = paste_back(out, crop_M_c2o, img_rgb, mask_ori)
            else:
                out_to_ori_blend = out
            return out, out_to_ori_blend

    @torch.no_grad()
    def prepare_retargeting_image(
        self,
        input_image,
        input_head_pitch_variation, input_head_yaw_variation, input_head_roll_variation,
        retargeting_source_scale,
        flag_do_crop=True):
        """ for single image retargeting
        """
        if input_image is not None:
            # gr.Info("Upload successfully!", duration=2)
            args_user = {'scale': retargeting_source_scale}
            self.args = update_args(self.args, args_user)
            self.cropper.update_config(self.args.__dict__)
            inference_cfg = self.live_portrait_wrapper.inference_cfg
            ######## process source portrait ########
            img_rgb = load_img_online(input_image, mode='rgb', max_dim=1280, n=2)
            if flag_do_crop:
                crop_info = self.cropper.crop_source_image(img_rgb, self.cropper.crop_cfg)
                I_s = self.live_portrait_wrapper.prepare_source(crop_info['img_crop_256x256'])
                source_lmk_user = crop_info['lmk_crop']
                crop_M_c2o = crop_info['M_c2o']
                mask_ori = prepare_paste_back(inference_cfg.mask_crop, crop_info['M_c2o'], dsize=(img_rgb.shape[1], img_rgb.shape[0]))
            else:
                I_s = self.live_portrait_wrapper.prepare_source(img_rgb)
                source_lmk_user = self.cropper.calc_lmk_from_cropped_image(img_rgb)
                crop_M_c2o = None
                mask_ori = None
            x_s_info = self.live_portrait_wrapper.get_kp_info(I_s)
            x_d_info_user_pitch = x_s_info['pitch'] + input_head_pitch_variation
            x_d_info_user_yaw = x_s_info['yaw'] + input_head_yaw_variation
            x_d_info_user_roll = x_s_info['roll'] + input_head_roll_variation
            R_s_user = get_rotation_matrix(x_s_info['pitch'], x_s_info['yaw'], x_s_info['roll'])
            R_d_user = get_rotation_matrix(x_d_info_user_pitch, x_d_info_user_yaw, x_d_info_user_roll)
            ############################################
            f_s_user = self.live_portrait_wrapper.extract_feature_3d(I_s)
            x_s_user = self.live_portrait_wrapper.transform_keypoint(x_s_info)
            return f_s_user, x_s_user, R_s_user, R_d_user, x_s_info, source_lmk_user, crop_M_c2o, mask_ori, img_rgb
        else:
            raise gr.Error("Please upload a source portrait as the retargeting input 🤗🤗🤗", duration=5)

    @torch.no_grad()
    def init_retargeting_image(self, retargeting_source_scale: float, source_eye_ratio: float, source_lip_ratio:float, input_image = None):
        """ initialize the retargeting slider
        """
        if input_image != None:
            args_user = {'scale': retargeting_source_scale}
            self.args = update_args(self.args, args_user)
            self.cropper.update_config(self.args.__dict__)
            # inference_cfg = self.live_portrait_wrapper.inference_cfg
            ######## process source portrait ########
            img_rgb = load_img_online(input_image, mode='rgb', max_dim=1280, n=16)
            log(f"Load source image from {input_image}.")
            crop_info = self.cropper.crop_source_image(img_rgb, self.cropper.crop_cfg)
            if crop_info is None:
                raise gr.Error("Source portrait NO face detected", duration=2)
            source_eye_ratio = calc_eye_close_ratio(crop_info['lmk_crop'][None])
            source_lip_ratio = calc_lip_close_ratio(crop_info['lmk_crop'][None])
            self.source_eye_ratio = round(float(source_eye_ratio.mean()), 2)
            self.source_lip_ratio = round(float(source_lip_ratio[0][0]), 2)
            log("Calculating eyes-open and lip-open ratios successfully!")
            return self.source_eye_ratio, self.source_lip_ratio
        else:
            return source_eye_ratio, source_lip_ratio

    @torch.no_grad()
    def execute_video_retargeting(self, input_lip_ratio: float, input_video, retargeting_source_scale: float, driving_smooth_observation_variance_retargeting: float, video_retargeting_silence=False, flag_do_crop_input_retargeting_video=True):
        """ retargeting the lip-open ratio of each source frame
        """
        # disposable feature
        device = self.live_portrait_wrapper.device

        if not video_retargeting_silence:
            f_s_user_lst, x_s_user_lst, source_lmk_crop_lst, source_M_c2o_lst, mask_ori_lst, source_rgb_lst, img_crop_256x256_lst, lip_delta_retargeting_lst_smooth, source_fps, n_frames = \
                self.prepare_retargeting_video(input_video, retargeting_source_scale, device, input_lip_ratio, driving_smooth_observation_variance_retargeting, flag_do_crop=flag_do_crop_input_retargeting_video)
            if input_lip_ratio is None:
                raise gr.Error("Invalid ratio input 💥!", duration=5)
            else:
                inference_cfg = self.live_portrait_wrapper.inference_cfg

                I_p_pstbk_lst = None
                if flag_do_crop_input_retargeting_video:
                    I_p_pstbk_lst = []
                I_p_lst = []
                for i in track(range(n_frames), description='Retargeting video...', total=n_frames):
                    x_s_user_i = x_s_user_lst[i].to(device)
                    f_s_user_i = f_s_user_lst[i].to(device)

                    lip_delta_retargeting = lip_delta_retargeting_lst_smooth[i]
                    x_d_i_new = x_s_user_i + lip_delta_retargeting
                    x_d_i_new = self.live_portrait_wrapper.stitching(x_s_user_i, x_d_i_new)
                    out = self.live_portrait_wrapper.warp_decode(f_s_user_i, x_s_user_i, x_d_i_new)
                    I_p_i = self.live_portrait_wrapper.parse_output(out['out'])[0]
                    I_p_lst.append(I_p_i)

                    if flag_do_crop_input_retargeting_video:
                        I_p_pstbk = paste_back(I_p_i, source_M_c2o_lst[i], source_rgb_lst[i], mask_ori_lst[i])
                        I_p_pstbk_lst.append(I_p_pstbk)
        else:
            inference_cfg = self.live_portrait_wrapper.inference_cfg
            f_s_user_lst, x_s_user_lst, x_d_i_new_lst, source_M_c2o_lst, mask_ori_lst, source_rgb_lst, img_crop_256x256_lst, source_fps, n_frames = \
                self.prepare_video_lip_silence(input_video, device, flag_do_crop=flag_do_crop_input_retargeting_video)

            I_p_pstbk_lst = None
            if flag_do_crop_input_retargeting_video:
                I_p_pstbk_lst = []
            I_p_lst = []
            for i in track(range(n_frames), description='Silencing lip...', total=n_frames):
                x_s_user_i = x_s_user_lst[i].to(device)
                f_s_user_i = f_s_user_lst[i].to(device)
                x_d_i_new = x_d_i_new_lst[i]
                x_d_i_new = self.live_portrait_wrapper.stitching(x_s_user_i, x_d_i_new)
                out = self.live_portrait_wrapper.warp_decode(f_s_user_i, x_s_user_i, x_d_i_new)
                I_p_i = self.live_portrait_wrapper.parse_output(out['out'])[0]
                I_p_lst.append(I_p_i)

                if flag_do_crop_input_retargeting_video:
                    I_p_pstbk = paste_back(I_p_i, source_M_c2o_lst[i], source_rgb_lst[i], mask_ori_lst[i])
                    I_p_pstbk_lst.append(I_p_pstbk)

        mkdir(self.args.output_dir)
        flag_source_has_audio = has_audio_stream(input_video)

        ######### build the final concatenation result #########
        # source frame | generation
        frames_concatenated = concat_frames(driving_image_lst=None, source_image_lst=img_crop_256x256_lst, I_p_lst=I_p_lst)
        wfp_concat = osp.join(self.args.output_dir, f'{basename(input_video)}_retargeting_concat.mp4')
        images2video(frames_concatenated, wfp=wfp_concat, fps=source_fps)

        if flag_source_has_audio:
            # final result with concatenation
            wfp_concat_with_audio = osp.join(self.args.output_dir, f'{basename(input_video)}_retargeting_concat_with_audio.mp4')
            add_audio_to_video(wfp_concat, input_video, wfp_concat_with_audio)
            os.replace(wfp_concat_with_audio, wfp_concat)
            log(f"Replace {wfp_concat_with_audio} with {wfp_concat}")

        # save the animated result
        wfp = osp.join(self.args.output_dir, f'{basename(input_video)}_retargeting.mp4')
        if I_p_pstbk_lst is not None and len(I_p_pstbk_lst) > 0:
            images2video(I_p_pstbk_lst, wfp=wfp, fps=source_fps)
        else:
            images2video(I_p_lst, wfp=wfp, fps=source_fps)

        ######### build the final result #########
        if flag_source_has_audio:
            wfp_with_audio = osp.join(self.args.output_dir, f'{basename(input_video)}_retargeting_with_audio.mp4')
            add_audio_to_video(wfp, input_video, wfp_with_audio)
            os.replace(wfp_with_audio, wfp)
            log(f"Replace {wfp_with_audio} with {wfp}")
        gr.Info("Run successfully!", duration=2)
        return wfp_concat, wfp

    @torch.no_grad()
    def prepare_retargeting_video(self, input_video, retargeting_source_scale, device, input_lip_ratio, driving_smooth_observation_variance_retargeting, flag_do_crop=True):
        """ for video retargeting
        """
        if input_video is not None:
            # gr.Info("Upload successfully!", duration=2)
            args_user = {'scale': retargeting_source_scale}
            self.args = update_args(self.args, args_user)
            self.cropper.update_config(self.args.__dict__)
            inference_cfg = self.live_portrait_wrapper.inference_cfg
            ######## process source video ########
            source_rgb_lst = load_video(input_video)
            source_rgb_lst = [resize_to_limit(img, inference_cfg.source_max_dim, inference_cfg.source_division) for img in source_rgb_lst]
            source_fps = int(get_fps(input_video))
            n_frames = len(source_rgb_lst)
            log(f"Load source video from {input_video}. FPS is {source_fps}")

            if flag_do_crop:
                ret_s = self.cropper.crop_source_video(source_rgb_lst, self.cropper.crop_cfg)
                log(f'Source video is cropped, {len(ret_s["frame_crop_lst"])} frames are processed.')
                if len(ret_s["frame_crop_lst"]) != n_frames:
                    n_frames = min(len(source_rgb_lst), len(ret_s["frame_crop_lst"]))
                img_crop_256x256_lst, source_lmk_crop_lst, source_M_c2o_lst = ret_s['frame_crop_lst'], ret_s['lmk_crop_lst'], ret_s['M_c2o_lst']
                mask_ori_lst = [prepare_paste_back(inference_cfg.mask_crop, source_M_c2o, dsize=(source_rgb_lst[0].shape[1], source_rgb_lst[0].shape[0])) for source_M_c2o in source_M_c2o_lst]
            else:
                source_lmk_crop_lst = self.cropper.calc_lmks_from_cropped_video(source_rgb_lst)
                img_crop_256x256_lst = [cv2.resize(_, (256, 256)) for _ in source_rgb_lst]  # force to resize to 256x256
                source_M_c2o_lst, mask_ori_lst = None, None

            c_s_eyes_lst, c_s_lip_lst = self.live_portrait_wrapper.calc_ratio(source_lmk_crop_lst)
            # save the motion template
            I_s_lst = self.live_portrait_wrapper.prepare_videos(img_crop_256x256_lst)
            source_template_dct = self.make_motion_template(I_s_lst, c_s_eyes_lst, c_s_lip_lst, output_fps=source_fps)

            c_d_lip_retargeting = [input_lip_ratio]
            f_s_user_lst, x_s_user_lst, lip_delta_retargeting_lst = [], [], []
            for i in track(range(n_frames), description='Preparing retargeting video...', total=n_frames):
                x_s_info = source_template_dct['motion'][i]
                x_s_info = dct2device(x_s_info, device)
                x_s_user = x_s_info['x_s']

                source_lmk = source_lmk_crop_lst[i]
                img_crop_256x256 = img_crop_256x256_lst[i]
                I_s = I_s_lst[i]
                f_s_user = self.live_portrait_wrapper.extract_feature_3d(I_s)

                combined_lip_ratio_tensor_retargeting = self.live_portrait_wrapper.calc_combined_lip_ratio(c_d_lip_retargeting, source_lmk)
                lip_delta_retargeting = self.live_portrait_wrapper.retarget_lip(x_s_user, combined_lip_ratio_tensor_retargeting)
                f_s_user_lst.append(f_s_user); x_s_user_lst.append(x_s_user); lip_delta_retargeting_lst.append(lip_delta_retargeting.cpu().numpy().astype(np.float32))
            lip_delta_retargeting_lst_smooth = smooth(lip_delta_retargeting_lst, lip_delta_retargeting_lst[0].shape, device, driving_smooth_observation_variance_retargeting)

            return f_s_user_lst, x_s_user_lst, source_lmk_crop_lst, source_M_c2o_lst, mask_ori_lst, source_rgb_lst, img_crop_256x256_lst, lip_delta_retargeting_lst_smooth, source_fps, n_frames
        else:
            # when press the clear button, go here
            raise gr.Error("Please upload a source video as the retargeting input 🤗🤗🤗", duration=5)

    @torch.no_grad()
    def prepare_video_lip_silence(self, input_video, device, flag_do_crop=True):
        """ for keeping lips in the source video silent
        """
        if input_video is not None:
            inference_cfg = self.live_portrait_wrapper.inference_cfg
            ######## process source video ########
            source_rgb_lst = load_video(input_video)
            source_rgb_lst = [resize_to_limit(img, inference_cfg.source_max_dim, inference_cfg.source_division) for img in source_rgb_lst]
            source_fps = int(get_fps(input_video))
            n_frames = len(source_rgb_lst)
            log(f"Load source video from {input_video}. FPS is {source_fps}")

            if flag_do_crop:
                ret_s = self.cropper.crop_source_video(source_rgb_lst, self.cropper.crop_cfg)
                log(f'Source video is cropped, {len(ret_s["frame_crop_lst"])} frames are processed.')
                if len(ret_s["frame_crop_lst"]) != n_frames:
                    n_frames = min(len(source_rgb_lst), len(ret_s["frame_crop_lst"]))
                img_crop_256x256_lst, source_lmk_crop_lst, source_M_c2o_lst = ret_s['frame_crop_lst'], ret_s['lmk_crop_lst'], ret_s['M_c2o_lst']
                mask_ori_lst = [prepare_paste_back(inference_cfg.mask_crop, source_M_c2o, dsize=(source_rgb_lst[0].shape[1], source_rgb_lst[0].shape[0])) for source_M_c2o in source_M_c2o_lst]
            else:
                source_lmk_crop_lst = self.cropper.calc_lmks_from_cropped_video(source_rgb_lst)
                img_crop_256x256_lst = [cv2.resize(_, (256, 256)) for _ in source_rgb_lst]  # force to resize to 256x256
                source_M_c2o_lst, mask_ori_lst = None, None

            c_s_eyes_lst, c_s_lip_lst = self.live_portrait_wrapper.calc_ratio(source_lmk_crop_lst)
            # save the motion template
            I_s_lst = self.live_portrait_wrapper.prepare_videos(img_crop_256x256_lst)
            source_template_dct = self.make_motion_template(I_s_lst, c_s_eyes_lst, c_s_lip_lst, output_fps=source_fps)

            f_s_user_lst, x_s_user_lst, x_d_i_new_lst = [], [], []
            for i in track(range(n_frames), description='Preparing silencing lip...', total=n_frames):
                x_s_info = source_template_dct['motion'][i]
                x_s_info = dct2device(x_s_info, device)
                scale_s = x_s_info['scale']
                x_s_user = x_s_info['x_s']
                x_c_s = x_s_info['kp']
                R_s = x_s_info['R']
                t_s = x_s_info['t']
                delta_new = torch.zeros_like(x_s_info['exp']) + torch.from_numpy(inference_cfg.lip_array).to(dtype=torch.float32, device=device)
                for eyes_idx in [11, 13, 15, 16, 18]:
                    delta_new[:, eyes_idx, :] = x_s_info['exp'][:, eyes_idx, :]
                source_lmk = source_lmk_crop_lst[i]
                img_crop_256x256 = img_crop_256x256_lst[i]
                I_s = I_s_lst[i]
                f_s_user = self.live_portrait_wrapper.extract_feature_3d(I_s)
                x_d_i_new = scale_s * (x_c_s @ R_s + delta_new) + t_s
                f_s_user_lst.append(f_s_user); x_s_user_lst.append(x_s_user); x_d_i_new_lst.append(x_d_i_new)
            return f_s_user_lst, x_s_user_lst, x_d_i_new_lst, source_M_c2o_lst, mask_ori_lst, source_rgb_lst, img_crop_256x256_lst, source_fps, n_frames
        else:
            # when press the clear button, go here
            raise gr.Error("Please upload a source video as the input 🤗🤗🤗", duration=5)

class GradioPipelineAnimal(LivePortraitPipelineAnimal):
    """gradio for animal
    """
    def __init__(self, inference_cfg, crop_cfg, args: ArgumentConfig):
        inference_cfg.flag_crop_driving_video = True # ensure the face_analysis_wrapper is enabled
        super().__init__(inference_cfg, crop_cfg)
        # self.live_portrait_wrapper_animal = self.live_portrait_wrapper_animal
        self.args = args

    @torch.no_grad()
    def execute_video(
        self,
        input_source_image_path=None,
        input_driving_video_path=None,
        input_driving_video_pickle_path=None,
        flag_do_crop_input=False,
        flag_remap_input=False,
        driving_multiplier=1.0,
        flag_stitching=False,
        flag_crop_driving_video_input=False,
        scale=2.3,
        vx_ratio=0.0,
        vy_ratio=-0.125,
        scale_crop_driving_video=2.2,
        vx_ratio_crop_driving_video=0.0,
        vy_ratio_crop_driving_video=-0.1,
        tab_selection=None,
    ):
        """ for video-driven potrait animation
        """
        input_source_path = input_source_image_path

        if tab_selection == 'Video':
            input_driving_path = input_driving_video_path
        elif tab_selection == 'Pickle':
            input_driving_path = input_driving_video_pickle_path
        else:
            input_driving_path = input_driving_video_pickle_path

        if input_source_path is not None and input_driving_path is not None:
            if osp.exists(input_driving_path) and tab_selection == 'Video' and is_square_video(input_driving_path) is False:
                flag_crop_driving_video_input = True
                log("The driving video is not square, it will be cropped to square automatically.")
                gr.Info("The driving video is not square, it will be cropped to square automatically.", duration=2)

            args_user = {
                'source': input_source_path,
                'driving': input_driving_path,
                'flag_do_crop': flag_do_crop_input,
                'flag_pasteback': flag_remap_input,
                'driving_multiplier': driving_multiplier,
                'flag_stitching': flag_stitching,
                'flag_crop_driving_video': flag_crop_driving_video_input,
                'scale': scale,
                'vx_ratio': vx_ratio,
                'vy_ratio': vy_ratio,
                'scale_crop_driving_video': scale_crop_driving_video,
                'vx_ratio_crop_driving_video': vx_ratio_crop_driving_video,
                'vy_ratio_crop_driving_video': vy_ratio_crop_driving_video,
            }
            # update config from user input
            self.args = update_args(self.args, args_user)
            self.live_portrait_wrapper_animal.update_config(self.args.__dict__)
            self.cropper.update_config(self.args.__dict__)
            # video driven animation
            video_path, video_path_concat, video_gif_path = self.execute(self.args)
            gr.Info("Run successfully!", duration=2)
            return video_path, video_path_concat, video_gif_path
        else:
            raise gr.Error("Please upload the source animal image, and driving video 🤗🤗🤗", duration=5)


================================================
FILE: src/live_portrait_pipeline.py
================================================
# coding: utf-8

"""
Pipeline of LivePortrait (Human)
"""

import torch
torch.backends.cudnn.benchmark = True # disable CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR warning

import cv2; cv2.setNumThreads(0); cv2.ocl.setUseOpenCL(False)
import numpy as np
import os
import os.path as osp
from rich.progress import track

from .config.argument_config import ArgumentConfig
from .config.inference_config import InferenceConfig
from .config.crop_config import CropConfig
from .utils.cropper import Cropper
from .utils.camera import get_rotation_matrix
from .utils.video import images2video, concat_frames, get_fps, add_audio_to_video, has_audio_stream
from .utils.crop import prepare_paste_back, paste_back
from .utils.io import load_image_rgb, load_video, resize_to_limit, dump, load
from .utils.helper import mkdir, basename, dct2device, is_video, is_template, remove_suffix, is_image, is_square_video, calc_motion_multiplier
from .utils.filter import smooth
from .utils.rprint import rlog as log
# from .utils.viz import viz_lmk
from .live_portrait_wrapper import LivePortraitWrapper


def make_abs_path(fn):
    return osp.join(osp.dirname(osp.realpath(__file__)), fn)


class LivePortraitPipeline(object):

    def __init__(self, inference_cfg: InferenceConfig, crop_cfg: CropConfig):
        self.live_portrait_wrapper: LivePortraitWrapper = LivePortraitWrapper(inference_cfg=inference_cfg)
        self.cropper: Cropper = Cropper(crop_cfg=crop_cfg)

    def make_motion_template(self, I_lst, c_eyes_lst, c_lip_lst, **kwargs):
        n_frames = I_lst.shape[0]
        template_dct = {
            'n_frames': n_frames,
            'output_fps': kwargs.get('output_fps', 25),
            'motion': [],
            'c_eyes_lst': [],
            'c_lip_lst': [],
        }

        for i in track(range(n_frames), description='Making motion templates...', total=n_frames):
            # collect s, R, δ and t for inference
            I_i = I_lst[i]
            x_i_info = self.live_portrait_wrapper.get_kp_info(I_i)
            x_s = self.live_portrait_wrapper.transform_keypoint(x_i_info)
            R_i = get_rotation_matrix(x_i_info['pitch'], x_i_info['yaw'], x_i_info['roll'])

            item_dct = {
                'scale': x_i_info['scale'].cpu().numpy().astype(np.float32),
                'R': R_i.cpu().numpy().astype(np.float32),
                'exp': x_i_info['exp'].cpu().numpy().astype(np.float32),
                't': x_i_info['t'].cpu().numpy().astype(np.float32),
                'kp': x_i_info['kp'].cpu().numpy().astype(np.float32),
                'x_s': x_s.cpu().numpy().astype(np.float32),
            }

            template_dct['motion'].append(item_dct)

            c_eyes = c_eyes_lst[i].astype(np.float32)
            template_dct['c_eyes_lst'].append(c_eyes)

            c_lip = c_lip_lst[i].astype(np.float32)
            template_dct['c_lip_lst'].append(c_lip)

        return template_dct

    def execute(self, args: ArgumentConfig):
        # for convenience
        inf_cfg = self.live_portrait_wrapper.inference_cfg
        device = self.live_portrait_wrapper.device
        crop_cfg = self.cropper.crop_cfg

        ######## load source input ########
        flag_is_source_video = False
        source_fps = None
        if is_image(args.source):
            flag_is_source_video = False
            img_rgb = load_image_rgb(args.source)
            img_rgb = resize_to_limit(img_rgb, inf_cfg.source_max_dim, inf_cfg.source_division)
            log(f"Load source image from {args.source}")
            source_rgb_lst = [img_rgb]
        elif is_video(args.source):
            flag_is_source_video = True
            source_rgb_lst = load_video(args.source)
            source_rgb_lst = [resize_to_limit(img, inf_cfg.source_max_dim, inf_cfg.source_division) for img in source_rgb_lst]
            source_fps = int(get_fps(args.source))
            log(f"Load source video from {args.source}, FPS is {source_fps}")
        else:  # source input is an unknown format
            raise Exception(f"Unknown source format: {args.source}")

        ######## process driving info ########
        flag_load_from_template = is_template(args.driving)
        driving_rgb_crop_256x256_lst = None
        wfp_template = None

        if flag_load_from_template:
            # NOTE: load from template, it is fast, but the cropping video is None
            log(f"Load from template: {args.driving}, NOT the video, so the cropping video and audio are both NULL.", style='bold green')
            driving_template_dct = load(args.driving)
            c_d_eyes_lst = driving_template_dct['c_eyes_lst'] if 'c_eyes_lst' in driving_template_dct.keys() else driving_template_dct['c_d_eyes_lst'] # compatible with previous keys
            c_d_lip_lst = driving_template_dct['c_lip_lst'] if 'c_lip_lst' in driving_template_dct.keys() else driving_template_dct['c_d_lip_lst']
            driving_n_frames = driving_template_dct['n_frames']
            flag_is_driving_video = True if driving_n_frames > 1 else False
            if flag_is_source_video and flag_is_driving_video:
                n_frames = min(len(source_rgb_lst), driving_n_frames)  # minimum number as the number of the animated frames
            elif flag_is_source_video and not flag_is_driving_video:
                n_frames = len(source_rgb_lst)
            else:
                n_frames = driving_n_frames

            # set output_fps
            output_fps = driving_template_dct.get('output_fps', inf_cfg.output_fps)
            log(f'The FPS of template: {output_fps}')

            if args.flag_crop_driving_video:
                log("Warning: flag_crop_driving_video is True, but the driving info is a template, so it is ignored.")

        elif osp.exists(args.driving):
            if is_video(args.driving):
                flag_is_driving_video = True
                # load from video file, AND make motion template
                output_fps = int(get_fps(args.driving))
                log(f"Load driving video from: {args.driving}, FPS is {output_fps}")
                driving_rgb_lst = load_video(args.driving)
            elif is_image(args.driving):
                flag_is_driving_video = False
                driving_img_rgb = load_image_rgb(args.driving)
                output_fps = 25
                log(f"Load driving image from {args.driving}")
                driving_rgb_lst = [driving_img_rgb]
            else:
                raise Exception(f"{args.driving} is not a supported type!")
            ######## make motion template ########
            log("Start making driving motion template...")
            driving_n_frames = len(driving_rgb_lst)
            if flag_is_source_video and flag_is_driving_video:
                n_frames = min(len(source_rgb_lst), driving_n_frames)  # minimum number as the number of the animated frames
                driving_rgb_lst = driving_rgb_lst[:n_frames]
            elif flag_is_source_video and not flag_is_driving_video:
                n_frames = len(source_rgb_lst)
            else:
                n_frames = driving_n_frames
            if inf_cfg.flag_crop_driving_video or (not is_square_video(args.driving)):
                ret_d = self.cropper.crop_driving_video(driving_rgb_lst)
                log(f'Driving video is cropped, {len(ret_d["frame_crop_lst"])} frames are processed.')
                if len(ret_d["frame_crop_lst"]) is not n_frames and flag_is_driving_video:
                    n_frames = min(n_frames, len(ret_d["frame_crop_lst"]))
                driving_rgb_crop_lst, driving_lmk_crop_lst = ret_d['frame_crop_lst'], ret_d['lmk_crop_lst']
                driving_rgb_crop_256x256_lst = [cv2.resize(_, (256, 256)) for _ in driving_rgb_crop_lst]
            else:
                driving_lmk_crop_lst = self.cropper.calc_lmks_from_cropped_video(driving_rgb_lst)
                driving_rgb_crop_256x256_lst = [cv2.resize(_, (256, 256)) for _ in driving_rgb_lst]  # force to resize to 256x256
            #######################################

            c_d_eyes_lst, c_d_lip_lst = self.live_portrait_wrapper.calc_ratio(driving_lmk_crop_lst)
            # save the motion template
            I_d_lst = self.live_portrait_wrapper.prepare_videos(driving_rgb_crop_256x256_lst)
            driving_template_dct = self.make_motion_template(I_d_lst, c_d_eyes_lst, c_d_lip_lst, output_fps=output_fps)

            wfp_template = remove_suffix(args.driving) + '.pkl'
            dump(wfp_template, driving_template_dct)
            log(f"Dump motion template to {wfp_template}")
        else:
            raise Exception(f"{args.driving} does not exist!")
        if not flag_is_driving_video:
            c_d_eyes_lst = c_d_eyes_lst*n_frames
            c_d_lip_lst = c_d_lip_lst*n_frames

        ######## prepare for pasteback ########
        I_p_pstbk_lst = None
        if inf_cfg.flag_pasteback and inf_cfg.flag_do_crop and inf_cfg.flag_stitching:
            I_p_pstbk_lst = []
            log("Prepared pasteback mask done.")

        I_p_lst = []
        R_d_0, x_d_0_info = None, None
        flag_normalize_lip = inf_cfg.flag_normalize_lip  # not overwrite
        flag_source_video_eye_retargeting = inf_cfg.flag_source_video_eye_retargeting  # not overwrite
        lip_delta_before_animation, eye_delta_before_animation = None, None

        ######## process source info ########
        if flag_is_source_video:
            log(f"Start making source motion template...")

            source_rgb_lst = source_rgb_lst[:n_frames]
            if inf_cfg.flag_do_crop:
                ret_s = self.cropper.crop_source_video(source_rgb_lst, crop_cfg)
                log(f'Source video is cropped, {len(ret_s["frame_crop_lst"])} frames are processed.')
                if len(ret_s["frame_crop_lst"]) is not n_frames:
                    n_frames = min(n_frames, len(ret_s["frame_crop_lst"]))
                img_crop_256x256_lst, source_lmk_crop_lst, source_M_c2o_lst = ret_s['frame_crop_lst'], ret_s['lmk_crop_lst'], ret_s['M_c2o_lst']
            else:
                source_lmk_crop_lst = self.cropper.calc_lmks_from_cropped_video(source_rgb_lst)
                img_crop_256x256_lst = [cv2.resize(_, (256, 256)) for _ in source_rgb_lst]  # force to resize to 256x256

            c_s_eyes_lst, c_s_lip_lst = self.live_portrait_wrapper.calc_ratio(source_lmk_crop_lst)
            # save the motion template
            I_s_lst = self.live_portrait_wrapper.prepare_videos(img_crop_256x256_lst)
            source_template_dct = self.make_motion_template(I_s_lst, c_s_eyes_lst, c_s_lip_lst, output_fps=source_fps)

            key_r = 'R' if 'R' in driving_template_dct['motion'][0].keys() else 'R_d'  # compatible with previous keys
            if inf_cfg.flag_relative_motion:
                if flag_is_driving_video:
                    x_d_exp_lst = [source_template_dct['motion'][i]['exp'] + driving_template_dct['motion'][i]['exp'] - driving_template_dct['motion'][0]['exp'] for i in range(n_frames)]
                    x_d_exp_lst_smooth = smooth(x_d_exp_lst, source_template_dct['motion'][0]['exp'].shape, device, inf_cfg.driving_smooth_observation_variance)
                else:
                    x_d_exp_lst = [source_template_dct['motion'][i]['exp'] + (driving_template_dct['motion'][0]['exp'] - inf_cfg.lip_array) for i in range(n_frames)]
                    x_d_exp_lst_smooth = [torch.tensor(x_d_exp[0], dtype=torch.float32, device=device) for x_d_exp in x_d_exp_lst]
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    if flag_is_driving_video:
                        x_d_r_lst = [(np.dot(driving_template_dct['motion'][i][key_r], driving_template_dct['motion'][0][key_r].transpose(0, 2, 1))) @ source_template_dct['motion'][i]['R'] for i in range(n_frames)]
                        x_d_r_lst_smooth = smooth(x_d_r_lst, source_template_dct['motion'][0]['R'].shape, device, inf_cfg.driving_smooth_observation_variance)
                    else:
                        x_d_r_lst = [source_template_dct['motion'][i]['R'] for i in range(n_frames)]
                        x_d_r_lst_smooth = [torch.tensor(x_d_r[0], dtype=torch.float32, device=device) for x_d_r in x_d_r_lst]
            else:
                if flag_is_driving_video:
                    x_d_exp_lst = [driving_template_dct['motion'][i]['exp'] for i in range(n_frames)]
                    x_d_exp_lst_smooth = smooth(x_d_exp_lst, source_template_dct['motion'][0]['exp'].shape, device, inf_cfg.driving_smooth_observation_variance)
                else:
                    x_d_exp_lst = [driving_template_dct['motion'][0]['exp']]
                    x_d_exp_lst_smooth = [torch.tensor(x_d_exp[0], dtype=torch.float32, device=device) for x_d_exp in x_d_exp_lst]*n_frames
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    if flag_is_driving_video:
                        x_d_r_lst = [driving_template_dct['motion'][i][key_r] for i in range(n_frames)]
                        x_d_r_lst_smooth = smooth(x_d_r_lst, source_template_dct['motion'][0]['R'].shape, device, inf_cfg.driving_smooth_observation_variance)
                    else:
                        x_d_r_lst = [driving_template_dct['motion'][0][key_r]]
                        x_d_r_lst_smooth = [torch.tensor(x_d_r[0], dtype=torch.float32, device=device) for x_d_r in x_d_r_lst]*n_frames

        else:  # if the input is a source image, process it only once
            if inf_cfg.flag_do_crop:
                crop_info = self.cropper.crop_source_image(source_rgb_lst[0], crop_cfg)
                if crop_info is None:
                    raise Exception("No face detected in the source image!")
                source_lmk = crop_info['lmk_crop']
                img_crop_256x256 = crop_info['img_crop_256x256']
            else:
                source_lmk = self.cropper.calc_lmk_from_cropped_image(source_rgb_lst[0])
                img_crop_256x256 = cv2.resize(source_rgb_lst[0], (256, 256))  # force to resize to 256x256
            I_s = self.live_portrait_wrapper.prepare_source(img_crop_256x256)
            x_s_info = self.live_portrait_wrapper.get_kp_info(I_s)
            x_c_s = x_s_info['kp']
            R_s = get_rotation_matrix(x_s_info['pitch'], x_s_info['yaw'], x_s_info['roll'])
            f_s = self.live_portrait_wrapper.extract_feature_3d(I_s)
            x_s = self.live_portrait_wrapper.transform_keypoint(x_s_info)

            # let lip-open scalar to be 0 at first
            if flag_normalize_lip and inf_cfg.flag_relative_motion and source_lmk is not None:
                c_d_lip_before_animation = [0.]
                combined_lip_ratio_tensor_before_animation = self.live_portrait_wrapper.calc_combined_lip_ratio(c_d_lip_before_animation, source_lmk)
                if combined_lip_ratio_tensor_before_animation[0][0] >= inf_cfg.lip_normalize_threshold:
                    lip_delta_before_animation = self.live_portrait_wrapper.retarget_lip(x_s, combined_lip_ratio_tensor_before_animation)

            if inf_cfg.flag_pasteback and inf_cfg.flag_do_crop and inf_cfg.flag_stitching:
                mask_ori_float = prepare_paste_back(inf_cfg.mask_crop, crop_info['M_c2o'], dsize=(source_rgb_lst[0].shape[1], source_rgb_lst[0].shape[0]))

        ######## animate ########
        if flag_is_driving_video or (flag_is_source_video and not flag_is_driving_video):
            log(f"The animated video consists of {n_frames} frames.")
        else:
            log(f"The output of image-driven portrait animation is an image.")
        for i in track(range(n_frames), description='🚀Animating...', total=n_frames):
            if flag_is_source_video:  # source video
                x_s_info = source_template_dct['motion'][i]
                x_s_info = dct2device(x_s_info, device)

                source_lmk = source_lmk_crop_lst[i]
                img_crop_256x256 = img_crop_256x256_lst[i]
                I_s = I_s_lst[i]
                f_s = self.live_portrait_wrapper.extract_feature_3d(I_s)

                x_c_s = x_s_info['kp']
                R_s = x_s_info['R']
                x_s =x_s_info['x_s']

                # let lip-open scalar to be 0 at first if the input is a video
                if flag_normalize_lip and inf_cfg.flag_relative_motion and source_lmk is not None:
                    c_d_lip_before_animation = [0.]
                    combined_lip_ratio_tensor_before_animation = self.live_portrait_wrapper.calc_combined_lip_ratio(c_d_lip_before_animation, source_lmk)
                    if combined_lip_ratio_tensor_before_animation[0][0] >= inf_cfg.lip_normalize_threshold:
                        lip_delta_before_animation = self.live_portrait_wrapper.retarget_lip(x_s, combined_lip_ratio_tensor_before_animation)
                    else:
                        lip_delta_before_animation = None

                # let eye-open scalar to be the same as the first frame if the latter is eye-open state
                if flag_source_video_eye_retargeting and source_lmk is not None:
                    if i == 0:
                        combined_eye_ratio_tensor_frame_zero = c_s_eyes_lst[0]
                        c_d_eye_before_animation_frame_zero = [[combined_eye_ratio_tensor_frame_zero[0][:2].mean()]]
                        if c_d_eye_before_animation_frame_zero[0][0] < inf_cfg.source_video_eye_retargeting_threshold:
                            c_d_eye_before_animation_frame_zero = [[0.39]]
                    combined_eye_ratio_tensor_before_animation = self.live_portrait_wrapper.calc_combined_eye_ratio(c_d_eye_before_animation_frame_zero, source_lmk)
                    eye_delta_before_animation = self.live_portrait_wrapper.retarget_eye(x_s, combined_eye_ratio_tensor_before_animation)

                if inf_cfg.flag_pasteback and inf_cfg.flag_do_crop and inf_cfg.flag_stitching:  # prepare for paste back
                    mask_ori_float = prepare_paste_back(inf_cfg.mask_crop, source_M_c2o_lst[i], dsize=(source_rgb_lst[i].shape[1], source_rgb_lst[i].shape[0]))
            if flag_is_source_video and not flag_is_driving_video:
                x_d_i_info = driving_template_dct['motion'][0]
            else:
                x_d_i_info = driving_template_dct['motion'][i]
            x_d_i_info = dct2device(x_d_i_info, device)
            R_d_i = x_d_i_info['R'] if 'R' in x_d_i_info.keys() else x_d_i_info['R_d']  # compatible with previous keys

            if i == 0:  # cache the first frame
                R_d_0 = R_d_i
                x_d_0_info = x_d_i_info.copy()

            delta_new = x_s_info['exp'].clone()
            if inf_cfg.flag_relative_motion:
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    R_new = x_d_r_lst_smooth[i] if flag_is_source_video else (R_d_i @ R_d_0.permute(0, 2, 1)) @ R_s
                else:
                    R_new = R_s
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "exp":
                    if flag_is_source_video:
                        for idx in [1,2,6,11,12,13,14,15,16,17,18,19,20]:
                            delta_new[:, idx, :] = x_d_exp_lst_smooth[i][idx, :]
                        delta_new[:, 3:5, 1] = x_d_exp_lst_smooth[i][3:5, 1]
                        delta_new[:, 5, 2] = x_d_exp_lst_smooth[i][5, 2]
                        delta_new[:, 8, 2] = x_d_exp_lst_smooth[i][8, 2]
                        delta_new[:, 9, 1:] = x_d_exp_lst_smooth[i][9, 1:]
                    else:
                        if flag_is_driving_video:
                            delta_new = x_s_info['exp'] + (x_d_i_info['exp'] - x_d_0_info['exp'])
                        else:
                            delta_new = x_s_info['exp'] + (x_d_i_info['exp'] - torch.from_numpy(inf_cfg.lip_array).to(dtype=torch.float32, device=device))
                elif inf_cfg.animation_region == "lip":
                    for lip_idx in [6, 12, 14, 17, 19, 20]:
                        if flag_is_source_video:
                            delta_new[:, lip_idx, :] = x_d_exp_lst_smooth[i][lip_idx, :]
                        elif flag_is_driving_video:
                            delta_new[:, lip_idx, :] = (x_s_info['exp'] + (x_d_i_info['exp'] - x_d_0_info['exp']))[:, lip_idx, :]
                        else:
                            delta_new[:, lip_idx, :] = (x_s_info['exp'] + (x_d_i_info['exp'] - torch.from_numpy(inf_cfg.lip_array).to(dtype=torch.float32, device=device)))[:, lip_idx, :]
                elif inf_cfg.animation_region == "eyes":
                    for eyes_idx in [11, 13, 15, 16, 18]:
                        if flag_is_source_video:
                            delta_new[:, eyes_idx, :] = x_d_exp_lst_smooth[i][eyes_idx, :]
                        elif flag_is_driving_video:
                            delta_new[:, eyes_idx, :] = (x_s_info['exp'] + (x_d_i_info['exp'] - x_d_0_info['exp']))[:, eyes_idx, :]
                        else:
                            delta_new[:, eyes_idx, :] = (x_s_info['exp'] + (x_d_i_info['exp'] - 0))[:, eyes_idx, :]
                if inf_cfg.animation_region == "all":
                    scale_new = x_s_info['scale'] if flag_is_source_video else x_s_info['scale'] * (x_d_i_info['scale'] / x_d_0_info['scale'])
                else:
                    scale_new = x_s_info['scale']
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    t_new = x_s_info['t'] if flag_is_source_video else x_s_info['t'] + (x_d_i_info['t'] - x_d_0_info['t'])
                else:
                    t_new = x_s_info['t']
            else:
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    R_new = x_d_r_lst_smooth[i] if flag_is_source_video else R_d_i
                else:
                    R_new = R_s
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "exp":
                    for idx in [1,2,6,11,12,13,14,15,16,17,18,19,20]:
                        delta_new[:, idx, :] = x_d_exp_lst_smooth[i][idx, :] if flag_is_source_video else x_d_i_info['exp'][:, idx, :]
                    delta_new[:, 3:5, 1] = x_d_exp_lst_smooth[i][3:5, 1] if flag_is_source_video else x_d_i_info['exp'][:, 3:5, 1]
                    delta_new[:, 5, 2] = x_d_exp_lst_smooth[i][5, 2] if flag_is_source_video else x_d_i_info['exp'][:, 5, 2]
                    delta_new[:, 8, 2] = x_d_exp_lst_smooth[i][8, 2] if flag_is_source_video else x_d_i_info['exp'][:, 8, 2]
                    delta_new[:, 9, 1:] = x_d_exp_lst_smooth[i][9, 1:] if flag_is_source_video else x_d_i_info['exp'][:, 9, 1:]
                elif inf_cfg.animation_region == "lip":
                    for lip_idx in [6, 12, 14, 17, 19, 20]:
                        delta_new[:, lip_idx, :] = x_d_exp_lst_smooth[i][lip_idx, :] if flag_is_source_video else x_d_i_info['exp'][:, lip_idx, :]
                elif inf_cfg.animation_region == "eyes":
                    for eyes_idx in [11, 13, 15, 16, 18]:
                        delta_new[:, eyes_idx, :] = x_d_exp_lst_smooth[i][eyes_idx, :] if flag_is_source_video else x_d_i_info['exp'][:, eyes_idx, :]
                scale_new = x_s_info['scale']
                if inf_cfg.animation_region == "all" or inf_cfg.animation_region == "pose":
                    t_new = x_d_i_info['t']
                else:
                    t_new = x_s_info['t']

            t_new[..., 2].fill_(0)  # zero tz
            x_d_i_new = scale_new * (x_c_s @ R_new + delta_new) + t_new

            if inf_cfg.flag_relative_motion and inf_cfg.driving_option == "expression-friendly" and not flag_is_source_video and flag_is_driving_video:
                if i == 0:
                    x_d_0_new = x_d_i_new
                    motion_multiplier = calc_motion_multiplier(x_s, x_d_0_new)
                    # motion_multiplier *= inf_cfg.driving_multiplier
                x_d_diff = (x_d_i_new - x_d_0_new) * motion_multiplier
                x_d_i_new = x_d_diff + x_s

            # Algorithm 1:
            if not inf_cfg.flag_stitching and not inf_cfg.flag_eye_retargeting and not inf_cfg.flag_lip_retargeting:
                # without stitching or retargeting
                if flag_normalize_lip and lip_delta_before_animation is not None:
                    x_d_i_new += lip_delta_before_animation
                if flag_source_video_eye_retargeting and eye_delta_before_animation is not None:
                    x_d_i_new += eye_delta_before_animation
                else:
                    pass
            elif inf_cfg.flag_stitching and not inf_cfg.flag_eye_retargeting and not inf_cfg.flag_lip_retargeting:
                # with stitching and without retargeting
                if flag_normalize_lip and lip_delta_before_animation is not None:
                    x_d_i_new = self.live_portrait_wrapper.stitching(x_s, x_d_i_new) + lip_delta_before_animation
                else:
                    x_d_i_new = self.live_portrait_wrapper.stitching(x_s, x_d_i_new)
                if flag_source_video_eye_retargeting and eye_delta_before_animation is not None:
                    x_d_i_new += eye_delta_before_animation
            else:
                eyes_delta, lip_delta = None, None
                if inf_cfg.flag_eye_retargeting and source_lmk is not None:
                    c_d_eyes_i = c_d_eyes_lst[i]
                    combined_eye_ratio_tensor = self.live_portrait_wrapper.calc_combined_eye_ratio(c_d_eyes_i, source_lmk)
                    # ∆_eyes,i = R_eyes(x_s; c_s,eyes, c_d,eyes,i)
                    eyes_delta = self.live_portrait_wrapper.retarget_eye(x_s, combined_eye_ratio_tensor)
                if inf_cfg.flag_lip_retargeting and source_lmk is not None:
                    c_d_lip_i = c_d_lip_lst[i]
                    combined_lip_ratio_tensor = self.live_portrait_wrapper.calc_combined_lip_ratio(c_d_lip_i, source_lmk)
                    # ∆_lip,i = R_lip(x_s; c_s,lip, c_d,lip,i)
                    lip_delta = self.live_portrait_wrapper.retarget_lip(x_s, combined_lip_ratio_tensor)

                if inf_cfg.flag_relative_motion:  # use x_s
                    x_d_i_new = x_s + \
                        (eyes_delta if eyes_delta is not None else 0) + \
                        (lip_delta if lip_delta is not None else 0)
                else:  # use x_d,i
                    x_d_i_new = x_d_i_new + \
                        (eyes_delta if eyes_delta is not None else 0) + \
                        (lip_delta if lip_delta is not None else 0)

                if inf_cfg.flag_stitching:
                    x_d_i_new = self.live_portrait_wrapper.stitching(x_s, x_d_i_new)

            x_d_i_new = x_s + (x_d_i_new - x_s) * inf_cfg.driving_multiplier
            out = self.live_portrait_wrapper.warp_decode(f_s, x_s, x_d_i_new)
            I_p_i = self.live_portrait_wrapper.parse_output(out['out'])[0]
            I_p_lst.append(I_p_i)

            if inf_cfg.flag_pasteback and inf_cfg.flag_do_crop and inf_cfg.flag_stitching:
                # TODO: the paste back procedure is slow, considering optimize it using multi-threading or GPU
                if flag_is_source_video:
                    I_p_pstbk = paste_back(I_p_i, source_M_c2o_lst[i], source_rgb_lst[i], mask_ori_float)
                else:
                    I_p_pstbk = paste_back(I_p_i, crop_info['M_c2o'], source_rgb_lst[0], mask_ori_float)
                I_p_pstbk_lst.append(I_p_pstbk)

        mkdir(args.output_dir)
        wfp_concat = None
        ######### build the final concatenation result #########
        # driving frame | source frame | generation
        if flag_is_source_video and flag_is_driving_video:
            frames_concatenated = concat_frames(driving_rgb_crop_256x256_lst, img_crop_256x256_lst, I_p_lst)
        elif flag_is_source_video and not flag_is_driving_video:
            if flag_load_from_template:
                frames_concatenated = concat_frames(driving_rgb_crop_256x256_lst, img_crop_256x256_lst, I_p_lst)
            else:
                frames_concatenated = concat_frames(driving_rgb_crop_256x256_lst*n_frames, img_crop_256x256_lst, I_p_lst)
        else:
            frames_concatenated = concat_frames(driving_rgb_crop_256x256_lst, [img_crop_256x256], I_p_lst)

        if flag_is_driving_video or (flag_is_source_video and not flag_is_driving_video):
            flag_source_has_audio = flag_is_source_video and has_audio_stream(args.source)
            flag_driving_has_audio = (not flag_load_from_template) and has_audio_stream(args.driving)

            wfp_concat = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}_concat.mp4')

            # NOTE: update output fps
            output_fps = source_fps if flag_is_source_video else output_fps
            images2video(frames_concatenated, wfp=wfp_concat, fps=output_fps)

            if flag_source_has_audio or flag_driving_has_audio:
                # final result with concatenation
                wfp_concat_with_audio = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}_concat_with_audio.mp4')
                audio_from_which_video = args.driving if ((flag_driving_has_audio and args.audio_priority == 'driving') or (not flag_source_has_audio)) else args.source
                log(f"Audio is selected from {audio_from_which_video}, concat mode")
                add_audio_to_video(wfp_concat, audio_from_which_video, wfp_concat_with_audio)
                os.replace(wfp_concat_with_audio, wfp_concat)
                log(f"Replace {wfp_concat_with_audio} with {wfp_concat}")

            # save the animated result
            wfp = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}.mp4')
            if I_p_pstbk_lst is not None and len(I_p_pstbk_lst) > 0:
                images2video(I_p_pstbk_lst, wfp=wfp, fps=output_fps)
            else:
                images2video(I_p_lst, wfp=wfp, fps=output_fps)

            ######### build the final result #########
            if flag_source_has_audio or flag_driving_has_audio:
                wfp_with_audio = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}_with_audio.mp4')
                audio_from_which_video = args.driving if ((flag_driving_has_audio and args.audio_priority == 'driving') or (not flag_source_has_audio)) else args.source
                log(f"Audio is selected from {audio_from_which_video}")
                add_audio_to_video(wfp, audio_from_which_video, wfp_with_audio)
                os.replace(wfp_with_audio, wfp)
                log(f"Replace {wfp_with_audio} with {wfp}")

            # final log
            if wfp_template not in (None, ''):
                log(f'Animated template: {wfp_template}, you can specify `-d` argument with this template path next time to avoid cropping video, motion making and protecting privacy.', style='bold green')
            log(f'Animated video: {wfp}')
            log(f'Animated video with concat: {wfp_concat}')
        else:
            wfp_concat = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}_concat.jpg')
            cv2.imwrite(wfp_concat, frames_concatenated[0][..., ::-1])
            wfp = osp.join(args.output_dir, f'{basename(args.source)}--{basename(args.driving)}.jpg')
            if I_p_pstbk_lst is not None and len(I_p_pstbk_lst) > 0:
                cv2.imwrite(wfp, I_p_pstbk_lst[0][..., ::-1])
            else:
                cv2.imwrite(wfp, frames_concatenated[0][..., ::-1])
            # final log
            log(f'Animated image: {wfp}')
            log(f'Animated image with concat: {wfp_concat}')

        return wfp, wfp_concat


================================================
FILE: src/live_portrait_pipeline_animal.py
================================================
# coding: utf-8

"""
Pipeline of LivePortrait (Animal)
"""

import warnings
warnings.filterwarnings("ignore", message="torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument.")
warnings.filterwarnings("ignore", message="torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly.")
warnings.fil

Download .txt

gitextract_l1jm0s2t/

├── .gitignore
├── .vscode/
│   └── settings.json
├── LICENSE
├── app.py
├── app_animals.py
├── assets/
│   ├── .gitignore
│   ├── docs/
│   │   ├── changelog/
│   │   │   ├── 2024-07-10.md
│   │   │   ├── 2024-07-19.md
│   │   │   ├── 2024-07-24.md
│   │   │   ├── 2024-08-02.md
│   │   │   ├── 2024-08-05.md
│   │   │   ├── 2024-08-06.md
│   │   │   ├── 2024-08-19.md
│   │   │   └── 2025-01-01.md
│   │   ├── directory-structure.md
│   │   ├── how-to-install-ffmpeg.md
│   │   └── speed.md
│   ├── examples/
│   │   └── driving/
│   │       ├── aggrieved.pkl
│   │       ├── d1.pkl
│   │       ├── d2.pkl
│   │       ├── d5.pkl
│   │       ├── d7.pkl
│   │       ├── d8.pkl
│   │       ├── laugh.pkl
│   │       ├── open_lip.pkl
│   │       ├── shake_face.pkl
│   │       ├── shy.pkl
│   │       ├── talking.pkl
│   │       └── wink.pkl
│   └── gradio/
│       ├── gradio_description_animate_clear.md
│       ├── gradio_description_animation.md
│       ├── gradio_description_retargeting.md
│       ├── gradio_description_retargeting_video.md
│       ├── gradio_description_upload.md
│       ├── gradio_description_upload_animal.md
│       └── gradio_title.md
├── inference.py
├── inference_animals.py
├── pretrained_weights/
│   └── .gitkeep
├── readme.md
├── readme_zh_cn.md
├── requirements.txt
├── requirements_base.txt
├── requirements_macOS.txt
├── speed.py
└── src/
    ├── config/
    │   ├── __init__.py
    │   ├── argument_config.py
    │   ├── base_config.py
    │   ├── crop_config.py
    │   ├── inference_config.py
    │   └── models.yaml
    ├── gradio_pipeline.py
    ├── live_portrait_pipeline.py
    ├── live_portrait_pipeline_animal.py
    ├── live_portrait_wrapper.py
    ├── modules/
    │   ├── __init__.py
    │   ├── appearance_feature_extractor.py
    │   ├── convnextv2.py
    │   ├── dense_motion.py
    │   ├── motion_extractor.py
    │   ├── spade_generator.py
    │   ├── stitching_retargeting_network.py
    │   ├── util.py
    │   └── warping_network.py
    └── utils/
        ├── __init__.py
        ├── animal_landmark_runner.py
        ├── camera.py
        ├── check_windows_port.py
        ├── crop.py
        ├── cropper.py
        ├── dependencies/
        │   ├── XPose/
        │   │   ├── config_model/
        │   │   │   ├── UniPose_SwinT.py
        │   │   │   └── coco_transformer.py
        │   │   ├── models/
        │   │   │   ├── UniPose/
        │   │   │   │   ├── __init__.py
        │   │   │   │   ├── attention.py
        │   │   │   │   ├── backbone.py
        │   │   │   │   ├── deformable_transformer.py
        │   │   │   │   ├── fuse_modules.py
        │   │   │   │   ├── mask_generate.py
        │   │   │   │   ├── ops/
        │   │   │   │   │   ├── functions/
        │   │   │   │   │   │   ├── __init__.py
        │   │   │   │   │   │   └── ms_deform_attn_func.py
        │   │   │   │   │   ├── modules/
        │   │   │   │   │   │   ├── __init__.py
        │   │   │   │   │   │   ├── ms_deform_attn.py
        │   │   │   │   │   │   └── ms_deform_attn_key_aware.py
        │   │   │   │   │   ├── setup.py
        │   │   │   │   │   ├── src/
        │   │   │   │   │   │   ├── cpu/
        │   │   │   │   │   │   │   ├── ms_deform_attn_cpu.cpp
        │   │   │   │   │   │   │   └── ms_deform_attn_cpu.h
        │   │   │   │   │   │   ├── cuda/
        │   │   │   │   │   │   │   ├── ms_deform_attn_cuda.cu
        │   │   │   │   │   │   │   ├── ms_deform_attn_cuda.h
        │   │   │   │   │   │   │   └── ms_deform_im2col_cuda.cuh
        │   │   │   │   │   │   ├── ms_deform_attn.h
        │   │   │   │   │   │   └── vision.cpp
        │   │   │   │   │   └── test.py
        │   │   │   │   ├── position_encoding.py
        │   │   │   │   ├── swin_transformer.py
        │   │   │   │   ├── transformer_deformable.py
        │   │   │   │   ├── transformer_vanilla.py
        │   │   │   │   ├── unipose.py
        │   │   │   │   └── utils.py
        │   │   │   ├── __init__.py
        │   │   │   └── registry.py
        │   │   ├── predefined_keypoints.py
        │   │   ├── transforms.py
        │   │   └── util/
        │   │       ├── addict.py
        │   │       ├── box_ops.py
        │   │       ├── config.py
        │   │       ├── keypoint_ops.py
        │   │       └── misc.py
        │   └── insightface/
        │       ├── __init__.py
        │       ├── app/
        │       │   ├── __init__.py
        │       │   ├── common.py
        │       │   └── face_analysis.py
        │       ├── data/
        │       │   ├── __init__.py
        │       │   ├── image.py
        │       │   ├── objects/
        │       │   │   └── meanshape_68.pkl
        │       │   ├── pickle_object.py
        │       │   └── rec_builder.py
        │       ├── model_zoo/
        │       │   ├── __init__.py
        │       │   ├── arcface_onnx.py
        │       │   ├── attribute.py
        │       │   ├── inswapper.py
        │       │   ├── landmark.py
        │       │   ├── model_store.py
        │       │   ├── model_zoo.py
        │       │   ├── retinaface.py
        │       │   └── scrfd.py
        │       └── utils/
        │           ├── __init__.py
        │           ├── constant.py
        │           ├── download.py
        │           ├── face_align.py
        │           ├── filesystem.py
        │           ├── storage.py
        │           └── transform.py
        ├── face_analysis_diy.py
        ├── filter.py
        ├── helper.py
        ├── human_landmark_runner.py
        ├── io.py
        ├── resources/
        │   ├── clip_embedding_68.pkl
        │   ├── clip_embedding_9.pkl
        │   └── lip_array.pkl
        ├── retargeting_utils.py
        ├── rprint.py
        ├── timer.py
        ├── video.py
        └── viz.py

Download .txt

SYMBOL INDEX (718 symbols across 79 files)

FILE: app.py
  function partial_fields (line 19) | def partial_fields(target_class, kwargs):
  function fast_check_ffmpeg (line 23) | def fast_check_ffmpeg():
  function gpu_wrapped_execute_video (line 59) | def gpu_wrapped_execute_video(*args, **kwargs):
  function gpu_wrapped_execute_image_retargeting (line 63) | def gpu_wrapped_execute_image_retargeting(*args, **kwargs):
  function gpu_wrapped_execute_video_retargeting (line 67) | def gpu_wrapped_execute_video_retargeting(*args, **kwargs):
  function reset_sliders (line 71) | def reset_sliders(*args, **kwargs):

FILE: app_animals.py
  function partial_fields (line 19) | def partial_fields(target_class, kwargs):
  function fast_check_ffmpeg (line 23) | def fast_check_ffmpeg():
  function gpu_wrapped_execute_video (line 57) | def gpu_wrapped_execute_video(*args, **kwargs):

FILE: inference.py
  function partial_fields (line 17) | def partial_fields(target_class, kwargs):
  function fast_check_ffmpeg (line 21) | def fast_check_ffmpeg():
  function fast_check_args (line 29) | def fast_check_args(args: ArgumentConfig):
  function main (line 36) | def main():

FILE: inference_animals.py
  function partial_fields (line 17) | def partial_fields(target_class, kwargs):
  function fast_check_ffmpeg (line 21) | def fast_check_ffmpeg():
  function fast_check_args (line 29) | def fast_check_args(args: ArgumentConfig):
  function main (line 36) | def main():

FILE: speed.py
  function initialize_inputs (line 20) | def initialize_inputs(batch_size=1, device_id=0):
  function load_and_compile_models (line 49) | def load_and_compile_models(cfg, model_config):
  function warm_up_models (line 83) | def warm_up_models(compiled_models, stitching_retargeting_module, inputs):
  function measure_inference_times (line 100) | def measure_inference_times(compiled_models, stitching_retargeting_modul...
  function print_benchmark_results (line 146) | def print_benchmark_results(compiled_models, stitching_retargeting_modul...
  function main (line 168) | def main():

FILE: src/config/argument_config.py
  class ArgumentConfig (line 14) | class ArgumentConfig(PrintableConfig):

FILE: src/config/base_config.py
  function make_abs_path (line 12) | def make_abs_path(fn):
  class PrintableConfig (line 16) | class PrintableConfig:  # pylint: disable=too-few-public-methods
    method __repr__ (line 19) | def __repr__(self):

FILE: src/config/crop_config.py
  class CropConfig (line 13) | class CropConfig(PrintableConfig):

FILE: src/config/inference_config.py
  function load_lip_array (line 14) | def load_lip_array():
  class InferenceConfig (line 19) | class InferenceConfig(PrintableConfig):

FILE: src/gradio_pipeline.py
  function update_args (line 28) | def update_args(args, user_args):
  class GradioPipeline (line 37) | class GradioPipeline(LivePortraitPipeline):
    method __init__ (line 41) | def __init__(self, inference_cfg, crop_cfg, args: ArgumentConfig):
    method update_delta_new_eyeball_direction (line 47) | def update_delta_new_eyeball_direction(self, eyeball_direction_x, eyeb...
    method update_delta_new_smile (line 67) | def update_delta_new_smile(self, smile, delta_new, **kwargs):
    method update_delta_new_wink (line 80) | def update_delta_new_wink(self, wink, delta_new, **kwargs):
    method update_delta_new_eyebrow (line 90) | def update_delta_new_eyebrow(self, eyebrow, delta_new, **kwargs):
    method update_delta_new_lip_variation_zero (line 102) | def update_delta_new_lip_variation_zero(self, lip_variation_zero, delt...
    method update_delta_new_lip_variation_one (line 108) | def update_delta_new_lip_variation_one(self, lip_variation_one, delta_...
    method update_delta_new_lip_variation_two (line 117) | def update_delta_new_lip_variation_two(self, lip_variation_two, delta_...
    method update_delta_new_lip_variation_three (line 125) | def update_delta_new_lip_variation_three(self, lip_variation_three, de...
    method update_delta_new_mov_x (line 133) | def update_delta_new_mov_x(self, mov_x, delta_new, **kwargs):
    method update_delta_new_mov_y (line 139) | def update_delta_new_mov_y(self, mov_y, delta_new, **kwargs):
    method execute_video (line 145) | def execute_video(
    method execute_image_retargeting (line 231) | def execute_image_retargeting(
    method prepare_retargeting_image (line 336) | def prepare_retargeting_image(
    method init_retargeting_image (line 377) | def init_retargeting_image(self, retargeting_source_scale: float, sour...
    method execute_video_retargeting (line 401) | def execute_video_retargeting(self, input_lip_ratio: float, input_vide...
    method prepare_retargeting_video (line 488) | def prepare_retargeting_video(self, input_video, retargeting_source_sc...
    method prepare_video_lip_silence (line 544) | def prepare_video_lip_silence(self, input_video, device, flag_do_crop=...
  class GradioPipelineAnimal (line 596) | class GradioPipelineAnimal(LivePortraitPipelineAnimal):
    method __init__ (line 599) | def __init__(self, inference_cfg, crop_cfg, args: ArgumentConfig):
    method execute_video (line 606) | def execute_video(

FILE: src/live_portrait_pipeline.py
  function make_abs_path (line 31) | def make_abs_path(fn):
  class LivePortraitPipeline (line 35) | class LivePortraitPipeline(object):
    method __init__ (line 37) | def __init__(self, inference_cfg: InferenceConfig, crop_cfg: CropConfig):
    method make_motion_template (line 41) | def make_motion_template(self, I_lst, c_eyes_lst, c_lip_lst, **kwargs):
    method execute (line 77) | def execute(self, args: ArgumentConfig):

FILE: src/live_portrait_pipeline_animal.py
  function make_abs_path (line 35) | def make_abs_path(fn):
  class LivePortraitPipelineAnimal (line 38) | class LivePortraitPipelineAnimal(object):
    method __init__ (line 40) | def __init__(self, inference_cfg: InferenceConfig, crop_cfg: CropConfig):
    method make_motion_template (line 44) | def make_motion_template(self, I_lst, **kwargs):
    method execute (line 69) | def execute(self, args: ArgumentConfig):

FILE: src/live_portrait_wrapper.py
  class LivePortraitWrapper (line 22) | class LivePortraitWrapper(object):
    method __init__ (line 27) | def __init__(self, inference_cfg: InferenceConfig):
    method inference_ctx (line 70) | def inference_ctx(self):
    method update_config (line 78) | def update_config(self, user_args):
    method prepare_source (line 83) | def prepare_source(self, img: np.ndarray) -> torch.Tensor:
    method prepare_videos (line 104) | def prepare_videos(self, imgs) -> torch.Tensor:
    method extract_feature_3d (line 122) | def extract_feature_3d(self, x: torch.Tensor) -> torch.Tensor:
    method get_kp_info (line 131) | def get_kp_info(self, x: torch.Tensor, **kwargs) -> dict:
    method get_pose_dct (line 157) | def get_pose_dct(self, kp_info: dict) -> dict:
    method get_fs_and_kp_info (line 165) | def get_fs_and_kp_info(self, source_prepared, driving_first_frame):
    method transform_keypoint (line 184) | def transform_keypoint(self, kp_info: dict):
    method retarget_eye (line 214) | def retarget_eye(self, kp_source: torch.Tensor, eye_close_ratio: torch...
    method retarget_lip (line 227) | def retarget_lip(self, kp_source: torch.Tensor, lip_close_ratio: torch...
    method stitch (line 240) | def stitch(self, kp_source: torch.Tensor, kp_driving: torch.Tensor) ->...
    method stitching (line 253) | def stitching(self, kp_source: torch.Tensor, kp_driving: torch.Tensor)...
    method warp_decode (line 276) | def warp_decode(self, feature_3d: torch.Tensor, kp_source: torch.Tenso...
    method parse_output (line 300) | def parse_output(self, out: torch.Tensor) -> np.ndarray:
    method calc_ratio (line 310) | def calc_ratio(self, lmk_lst):
    method calc_combined_eye_ratio (line 320) | def calc_combined_eye_ratio(self, c_d_eyes_i, source_lmk):
    method calc_combined_lip_ratio (line 328) | def calc_combined_lip_ratio(self, c_d_lip_i, source_lmk):
  class LivePortraitWrapperAnimal (line 337) | class LivePortraitWrapperAnimal(LivePortraitWrapper):
    method __init__ (line 341) | def __init__(self, inference_cfg: InferenceConfig):

FILE: src/modules/appearance_feature_extractor.py
  class AppearanceFeatureExtractor (line 12) | class AppearanceFeatureExtractor(nn.Module):
    method __init__ (line 14) | def __init__(self, image_channel, block_expansion, num_down_blocks, ma...
    method forward (line 38) | def forward(self, source_image):

FILE: src/modules/convnextv2.py
  class Block (line 15) | class Block(nn.Module):
    method __init__ (line 23) | def __init__(self, dim, drop_path=0.):
    method forward (line 33) | def forward(self, x):
  class ConvNeXtV2 (line 48) | class ConvNeXtV2(nn.Module):
    method __init__ (line 60) | def __init__(
    method _init_weights (line 108) | def _init_weights(self, m):
    method forward_features (line 113) | def forward_features(self, x):
    method forward (line 119) | def forward(self, x):
  function convnextv2_tiny (line 147) | def convnextv2_tiny(**kwargs):

FILE: src/modules/dense_motion.py
  class DenseMotionNetwork (line 13) | class DenseMotionNetwork(nn.Module):
    method __init__ (line 14) | def __init__(self, block_expansion, num_blocks, max_features, num_kp, ...
    method create_sparse_motions (line 29) | def create_sparse_motions(self, feature, kp_driving, kp_source):
    method create_deformed_feature (line 45) | def create_deformed_feature(self, feature, sparse_motions):
    method create_heatmap_representations (line 55) | def create_heatmap_representations(self, feature, kp_driving, kp_source):
    method forward (line 67) | def forward(self, feature, kp_driving, kp_source):

FILE: src/modules/motion_extractor.py
  class MotionExtractor (line 18) | class MotionExtractor(nn.Module):
    method __init__ (line 19) | def __init__(self, **kwargs):
    method load_pretrained (line 26) | def load_pretrained(self, init_path: str):
    method forward (line 33) | def forward(self, x):

FILE: src/modules/spade_generator.py
  class SPADEDecoder (line 13) | class SPADEDecoder(nn.Module):
    method __init__ (line 14) | def __init__(self, upscale=1, max_features=256, block_expansion=64, ou...
    method forward (line 41) | def forward(self, feature):

FILE: src/modules/stitching_retargeting_network.py
  class StitchingRetargetingNetwork (line 18) | class StitchingRetargetingNetwork(nn.Module):
    method __init__ (line 19) | def __init__(self, input_size, hidden_sizes, output_size):
    method initialize_weights_to_zero (line 31) | def initialize_weights_to_zero(self):
    method forward (line 37) | def forward(self, x):

FILE: src/modules/util.py
  function kp2gaussian (line 17) | def kp2gaussian(kp, spatial_size, kp_variance):
  function make_coordinate_grid (line 41) | def make_coordinate_grid(spatial_size, ref, **kwargs):
  class ConvT2d (line 61) | class ConvT2d(nn.Module):
    method __init__ (line 66) | def __init__(self, in_features, out_features, kernel_size=3, stride=2,...
    method forward (line 73) | def forward(self, x):
  class ResBlock3d (line 80) | class ResBlock3d(nn.Module):
    method __init__ (line 85) | def __init__(self, in_features, kernel_size, padding):
    method forward (line 92) | def forward(self, x):
  class UpBlock3d (line 103) | class UpBlock3d(nn.Module):
    method __init__ (line 108) | def __init__(self, in_features, out_features, kernel_size=3, padding=1...
    method forward (line 115) | def forward(self, x):
  class DownBlock2d (line 123) | class DownBlock2d(nn.Module):
    method __init__ (line 128) | def __init__(self, in_features, out_features, kernel_size=3, padding=1...
    method forward (line 134) | def forward(self, x):
  class DownBlock3d (line 142) | class DownBlock3d(nn.Module):
    method __init__ (line 147) | def __init__(self, in_features, out_features, kernel_size=3, padding=1...
    method forward (line 158) | def forward(self, x):
  class SameBlock2d (line 166) | class SameBlock2d(nn.Module):
    method __init__ (line 171) | def __init__(self, in_features, out_features, groups=1, kernel_size=3,...
    method forward (line 180) | def forward(self, x):
  class Encoder (line 187) | class Encoder(nn.Module):
    method __init__ (line 192) | def __init__(self, block_expansion, in_features, num_blocks=3, max_fea...
    method forward (line 200) | def forward(self, x):
  class Decoder (line 207) | class Decoder(nn.Module):
    method __init__ (line 212) | def __init__(self, block_expansion, in_features, num_blocks=3, max_fea...
    method forward (line 228) | def forward(self, x):
  class Hourglass (line 240) | class Hourglass(nn.Module):
    method __init__ (line 245) | def __init__(self, block_expansion, in_features, num_blocks=3, max_fea...
    method forward (line 251) | def forward(self, x):
  class SPADE (line 255) | class SPADE(nn.Module):
    method __init__ (line 256) | def __init__(self, norm_nc, label_nc):
    method forward (line 268) | def forward(self, x, segmap):
  class SPADEResnetBlock (line 278) | class SPADEResnetBlock(nn.Module):
    method __init__ (line 279) | def __init__(self, fin, fout, norm_G, label_nc, use_se=False, dilation...
    method forward (line 302) | def forward(self, x, seg1):
    method shortcut (line 309) | def shortcut(self, x, seg1):
    method actvn (line 316) | def actvn(self, x):
  function filter_state_dict (line 320) | def filter_state_dict(state_dict, remove_name='fc'):
  class GRN (line 329) | class GRN(nn.Module):
    method __init__ (line 333) | def __init__(self, dim):
    method forward (line 338) | def forward(self, x):
  class LayerNorm (line 344) | class LayerNorm(nn.Module):
    method __init__ (line 351) | def __init__(self, normalized_shape, eps=1e-6, data_format="channels_l...
    method forward (line 361) | def forward(self, x):
  function _no_grad_trunc_normal_ (line 372) | def _no_grad_trunc_normal_(tensor, mean, std, a, b):
  function drop_path (line 408) | def drop_path(x, drop_prob=0., training=False, scale_by_keep=True):
  class DropPath (line 428) | class DropPath(nn.Module):
    method __init__ (line 432) | def __init__(self, drop_prob=None, scale_by_keep=True):
    method forward (line 437) | def forward(self, x):
  function trunc_normal_ (line 441) | def trunc_normal_(tensor, mean=0., std=1., a=-2., b=2.):
  function _ntuple (line 445) | def _ntuple(n):

FILE: src/modules/warping_network.py
  class WarpingNetwork (line 14) | class WarpingNetwork(nn.Module):
    method __init__ (line 15) | def __init__(
    method deform_input (line 46) | def deform_input(self, inp, deformation):
    method forward (line 49) | def forward(self, feature_3d, kp_driving, kp_source):

FILE: src/utils/animal_landmark_runner.py
  class XPoseRunner (line 25) | class XPoseRunner(object):
    method __init__ (line 26) | def __init__(self, model_config_path, model_checkpoint_path, embedding...
    method load_animal_model (line 42) | def load_animal_model(self, model_config_path, model_checkpoint_path, ...
    method load_image (line 51) | def load_image(self, input_image):
    method get_unipose_output (line 61) | def get_unipose_output(self, image, instance_text_prompt, keypoint_tex...
    method run (line 106) | def run(self, input_image, instance_text_prompt, keypoint_text_example...
    method warmup (line 131) | def warmup(self):

FILE: src/utils/camera.py
  function headpose_pred_to_degree (line 14) | def headpose_pred_to_degree(pred):
  function get_rotation_matrix (line 31) | def get_rotation_matrix(pitch_, yaw_, roll_):

FILE: src/utils/crop.py
  function make_abs_path (line 16) | def make_abs_path(fn):
  function _transform_img (line 19) | def _transform_img(img, M, dsize, flags=CV2_INTERP, borderMode=None):
  function _transform_pts (line 36) | def _transform_pts(pts, M):
  function parse_pt2_from_pt101 (line 45) | def parse_pt2_from_pt101(pt101, use_lip=True):
  function parse_pt2_from_pt106 (line 63) | def parse_pt2_from_pt106(pt106, use_lip=True):
  function parse_pt2_from_pt203 (line 80) | def parse_pt2_from_pt203(pt203, use_lip=True):
  function parse_pt2_from_pt68 (line 96) | def parse_pt2_from_pt68(pt68, use_lip=True):
  function parse_pt2_from_pt5 (line 123) | def parse_pt2_from_pt5(pt5, use_lip=True):
  function parse_pt2_from_pt9 (line 139) | def parse_pt2_from_pt9(pt9, use_lip=True):
  function parse_pt2_from_pt_x (line 163) | def parse_pt2_from_pt_x(pts, use_lip=True):
  function parse_rect_from_landmark (line 191) | def parse_rect_from_landmark(
  function parse_bbox_from_landmark (line 250) | def parse_bbox_from_landmark(pts, **kwargs):
  function crop_image_by_bbox (line 282) | def crop_image_by_bbox(img, bbox, lmk=None, dsize=512, angle=None, flag_...
  function _estimate_similar_transform_from_pts (line 328) | def _estimate_similar_transform_from_pts(
  function crop_image (line 376) | def crop_image(img, pts: np.ndarray, **kwargs):
  function average_bbox_lst (line 404) | def average_bbox_lst(bbox_lst):
  function prepare_paste_back (line 410) | def prepare_paste_back(mask_crop, crop_M_c2o, dsize):
  function paste_back (line 417) | def paste_back(img_crop, M_c2o, img_ori, mask_ori):

FILE: src/utils/cropper.py
  function make_abs_path (line 24) | def make_abs_path(fn):
  class Trajectory (line 29) | class Trajectory:
  class Cropper (line 41) | class Cropper(object):
    method __init__ (line 42) | def __init__(self, **kwargs) -> None:
    method update_config (line 88) | def update_config(self, user_args):
    method crop_source_image (line 93) | def crop_source_image(self, img_rgb_: np.ndarray, crop_cfg: CropConfig):
    method calc_lmk_from_cropped_image (line 153) | def calc_lmk_from_cropped_image(self, img_rgb_, **kwargs):
    method crop_source_video (line 172) | def crop_source_video(self, source_rgb_lst, crop_cfg: CropConfig, **kw...
    method crop_driving_video (line 225) | def crop_driving_video(self, driving_rgb_lst, **kwargs):
    method calc_lmks_from_cropped_video (line 285) | def calc_lmks_from_cropped_video(self, driving_rgb_crop_lst, **kwargs):

FILE: src/utils/dependencies/XPose/models/UniPose/attention.py
  class MultiheadAttention (line 39) | class MultiheadAttention(Module):
    method __init__ (line 65) | def __init__(self, embed_dim, num_heads, dropout=0., bias=True, add_bi...
    method _reset_parameters (line 91) | def _reset_parameters(self):
    method __setstate__ (line 94) | def __setstate__(self, state):
    method forward (line 101) | def forward(self, query, key, value, key_padding_mask=None,
  function multi_head_attention_forward (line 163) | def multi_head_attention_forward(query: Tensor,

FILE: src/utils/dependencies/XPose/models/UniPose/backbone.py
  class FrozenBatchNorm2d (line 31) | class FrozenBatchNorm2d(torch.nn.Module):
    method __init__ (line 40) | def __init__(self, n):
    method _load_from_state_dict (line 47) | def _load_from_state_dict(
    method forward (line 58) | def forward(self, x):
  class BackboneBase (line 71) | class BackboneBase(nn.Module):
    method __init__ (line 72) | def __init__(
    method forward (line 98) | def forward(self, tensor_list: NestedTensor):
  class Backbone (line 110) | class Backbone(BackboneBase):
    method __init__ (line 113) | def __init__(
  class Joiner (line 137) | class Joiner(nn.Sequential):
    method __init__ (line 138) | def __init__(self, backbone, position_embedding):
    method forward (line 141) | def forward(self, tensor_list: NestedTensor):
  function build_backbone (line 153) | def build_backbone(args):

FILE: src/utils/dependencies/XPose/models/UniPose/deformable_transformer.py
  class DeformableTransformer (line 33) | class DeformableTransformer(nn.Module):
    method __init__ (line 35) | def __init__(self, d_model=256, nhead=8,
    method _reset_parameters (line 280) | def _reset_parameters(self):
    method get_valid_ratio (line 293) | def get_valid_ratio(self, mask):
    method init_ref_points (line 302) | def init_ref_points(self, use_num_queries):
    method forward (line 311) | def forward(self, srcs, masks, refpoint_embed, pos_embeds, tgt, attn_m...
  class TransformerEncoder (line 525) | class TransformerEncoder(nn.Module):
    method __init__ (line 527) | def __init__(self,
    method get_reference_points (line 579) | def get_reference_points(spatial_shapes, valid_ratios, device):
    method forward (line 592) | def forward(self,
  class TransformerDecoder (line 690) | class TransformerDecoder(nn.Module):
    method __init__ (line 692) | def __init__(self, decoder_layer, num_layers, norm=None,
    method forward (line 769) | def forward(self, tgt, memory,
  class DeformableTransformerEncoderLayer (line 938) | class DeformableTransformerEncoderLayer(nn.Module):
    method __init__ (line 939) | def __init__(self,
    method with_pos_embed (line 969) | def with_pos_embed(tensor, pos):
    method forward_ffn (line 972) | def forward_ffn(self, src):
    method forward (line 978) | def forward(self, src, pos, reference_points, spatial_shapes, level_st...
  class DeformableTransformerDecoderLayer (line 996) | class DeformableTransformerDecoderLayer(nn.Module):
    method __init__ (line 997) | def __init__(self, d_model=256, d_ffn=1024,
    method rm_self_attn_modules (line 1041) | def rm_self_attn_modules(self):
    method with_pos_embed (line 1047) | def with_pos_embed(tensor, pos):
    method forward_ffn (line 1050) | def forward_ffn(self, tgt, ipdb_flag=False):
    method forward (line 1059) | def forward(self,
  function _get_clones (line 1135) | def _get_clones(module, N, layer_share=False):
  function build_deformable_transformer (line 1143) | def build_deformable_transformer(args):

FILE: src/utils/dependencies/XPose/models/UniPose/fuse_modules.py
  class FeatureResizer (line 8) | class FeatureResizer(nn.Module):
    method __init__ (line 14) | def __init__(self, input_feat_size, output_feat_size, dropout, do_ln=T...
    method forward (line 22) | def forward(self, encoder_features):
  function l1norm (line 32) | def l1norm(X, dim, eps=1e-8):
  function l2norm (line 40) | def l2norm(X, dim, eps=1e-8):
  function func_attention (line 48) | def func_attention(query, context, smooth=1, raw_feature_norm="softmax",...
  class BiMultiHeadAttention (line 97) | class BiMultiHeadAttention(nn.Module):
    method __init__ (line 98) | def __init__(self, v_dim, l_dim, embed_dim, num_heads, dropout=0.1, cf...
    method _shape (line 127) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method _reset_parameters (line 130) | def _reset_parameters(self):
    method forward (line 144) | def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):
  class BiAttentionBlock (line 240) | class BiAttentionBlock(nn.Module):
    method __init__ (line 241) | def __init__(self, v_dim, l_dim, embed_dim, num_heads, dropout=0.1,
    method forward (line 267) | def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):

FILE: src/utils/dependencies/XPose/models/UniPose/mask_generate.py
  function prepare_for_mask (line 4) | def prepare_for_mask(kpt_mask):
  function post_process (line 40) | def post_process(outputs_class, outputs_coord, dn_meta, aux_loss, _set_a...

FILE: src/utils/dependencies/XPose/models/UniPose/ops/functions/ms_deform_attn_func.py
  class MSDeformAttnFunction (line 21) | class MSDeformAttnFunction(Function):
    method forward (line 23) | def forward(ctx, value, value_spatial_shapes, value_level_start_index,...
    method backward (line 32) | def backward(ctx, grad_output):
  function ms_deform_attn_core_pytorch (line 41) | def ms_deform_attn_core_pytorch(value, value_spatial_shapes, sampling_lo...

FILE: src/utils/dependencies/XPose/models/UniPose/ops/modules/ms_deform_attn.py
  function _is_power_of_2 (line 26) | def _is_power_of_2(n):
  class MSDeformAttn (line 32) | class MSDeformAttn(nn.Module):
    method __init__ (line 33) | def __init__(self, d_model=256, n_levels=4, n_heads=8, n_points=4, use...
    method _reset_parameters (line 66) | def _reset_parameters(self):
    method forward (line 82) | def forward(self, query, reference_points, input_flatten, input_spatia...

FILE: src/utils/dependencies/XPose/models/UniPose/ops/modules/ms_deform_attn_key_aware.py
  function _is_power_of_2 (line 27) | def _is_power_of_2(n):
  class MSDeformAttn (line 33) | class MSDeformAttn(nn.Module):
    method __init__ (line 34) | def __init__(self, d_model=256, n_levels=4, n_heads=8, n_points=4, use...
    method _reset_parameters (line 67) | def _reset_parameters(self):
    method forward (line 83) | def forward(self, query, key, reference_points, input_flatten, input_s...

FILE: src/utils/dependencies/XPose/models/UniPose/ops/setup.py
  function get_extensions (line 23) | def get_extensions():

FILE: src/utils/dependencies/XPose/models/UniPose/ops/src/cpu/ms_deform_attn_cpu.cpp
  function ms_deform_attn_cpu_forward (line 17) | at::Tensor
  function ms_deform_attn_cpu_backward (line 29) | std::vector<at::Tensor>

FILE: src/utils/dependencies/XPose/models/UniPose/ops/src/ms_deform_attn.h
  function im2col_step (line 27) | int im2col_step)

FILE: src/utils/dependencies/XPose/models/UniPose/ops/src/vision.cpp
  function PYBIND11_MODULE (line 13) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: src/utils/dependencies/XPose/models/UniPose/ops/test.py
  function check_forward_equal_with_pytorch_double (line 32) | def check_forward_equal_with_pytorch_double():
  function check_forward_equal_with_pytorch_float (line 48) | def check_forward_equal_with_pytorch_float():
  function check_gradient_numerical (line 63) | def check_gradient_numerical(channels=4, grad_value=True, grad_sampling_...

FILE: src/utils/dependencies/XPose/models/UniPose/position_encoding.py
  class PositionEmbeddingSine (line 24) | class PositionEmbeddingSine(nn.Module):
    method __init__ (line 29) | def __init__(self, num_pos_feats=64, temperature=10000, normalize=Fals...
    method forward (line 40) | def forward(self, tensor_list: NestedTensor):
  class PositionEmbeddingSineHW (line 66) | class PositionEmbeddingSineHW(nn.Module):
    method __init__ (line 71) | def __init__(self, num_pos_feats=64, temperatureH=10000, temperatureW=...
    method forward (line 83) | def forward(self, tensor_list: NestedTensor):
  class PositionEmbeddingLearned (line 114) | class PositionEmbeddingLearned(nn.Module):
    method __init__ (line 118) | def __init__(self, num_pos_feats=256):
    method reset_parameters (line 124) | def reset_parameters(self):
    method forward (line 128) | def forward(self, tensor_list: NestedTensor):
  function build_position_encoding (line 142) | def build_position_encoding(args):

FILE: src/utils/dependencies/XPose/models/UniPose/swin_transformer.py
  class Mlp (line 14) | class Mlp(nn.Module):
    method __init__ (line 17) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 26) | def forward(self, x):
  function window_partition (line 35) | def window_partition(x, window_size):
  function window_reverse (line 49) | def window_reverse(windows, window_size, H, W):
  class WindowAttention (line 65) | class WindowAttention(nn.Module):
    method __init__ (line 78) | def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scal...
    method forward (line 112) | def forward(self, x, mask=None):
  class SwinTransformerBlock (line 146) | class SwinTransformerBlock(nn.Module):
    method __init__ (line 163) | def __init__(self, dim, num_heads, window_size=7, shift_size=0,
    method forward (line 187) | def forward(self, x, mask_matrix):
  class PatchMerging (line 246) | class PatchMerging(nn.Module):
    method __init__ (line 252) | def __init__(self, dim, norm_layer=nn.LayerNorm):
    method forward (line 258) | def forward(self, x, H, W):
  class BasicLayer (line 287) | class BasicLayer(nn.Module):
    method __init__ (line 305) | def __init__(self,
    method forward (line 347) | def forward(self, x, H, W):
  class PatchEmbed (line 389) | class PatchEmbed(nn.Module):
    method __init__ (line 398) | def __init__(self, patch_size=4, in_chans=3, embed_dim=96, norm_layer=...
    method forward (line 412) | def forward(self, x):
  class SwinTransformer (line 431) | class SwinTransformer(nn.Module):
    method __init__ (line 460) | def __init__(self,
    method _freeze_stages (line 553) | def _freeze_stages(self):
    method forward_raw (line 572) | def forward_raw(self, x):
    method forward (line 605) | def forward(self, tensor_list: NestedTensor):
    method train (line 648) | def train(self, mode=True):
  function build_swin_transformer (line 655) | def build_swin_transformer(modelname, pretrain_img_size, **kw):

FILE: src/utils/dependencies/XPose/models/UniPose/transformer_deformable.py
  class DeformableTransformer (line 25) | class DeformableTransformer(nn.Module):
    method __init__ (line 26) | def __init__(self, d_model=256, nhead=8,
    method _reset_parameters (line 68) | def _reset_parameters(self):
    method get_proposal_pos_embed (line 80) | def get_proposal_pos_embed(self, proposals):
    method gen_encoder_output_proposals (line 95) | def gen_encoder_output_proposals(self, memory, memory_padding_mask, sp...
    method get_valid_ratio (line 127) | def get_valid_ratio(self, mask):
    method forward (line 136) | def forward(self, srcs, masks, pos_embeds, query_embed=None):
  class DeformableTransformerEncoderLayer (line 215) | class DeformableTransformerEncoderLayer(nn.Module):
    method __init__ (line 216) | def __init__(self,
    method with_pos_embed (line 249) | def with_pos_embed(tensor, pos):
    method forward_ffn (line 252) | def forward_ffn(self, src):
    method forward (line 258) | def forward(self, src, pos, reference_points, spatial_shapes, level_st...
  class DeformableTransformerEncoder (line 275) | class DeformableTransformerEncoder(nn.Module):
    method __init__ (line 276) | def __init__(self, encoder_layer, num_layers, norm=None):
    method get_reference_points (line 287) | def get_reference_points(spatial_shapes, valid_ratios, device):
    method forward (line 301) | def forward(self, src, spatial_shapes, level_start_index, valid_ratios...
  class DeformableTransformerDecoderLayer (line 327) | class DeformableTransformerDecoderLayer(nn.Module):
    method __init__ (line 328) | def __init__(self, d_model=256, d_ffn=1024,
    method rm_self_attn_modules (line 374) | def rm_self_attn_modules(self):
    method with_pos_embed (line 381) | def with_pos_embed(tensor, pos):
    method forward_ffn (line 384) | def forward_ffn(self, tgt):
    method forward_sa (line 390) | def forward_sa(self,
    method forward_ca (line 436) | def forward_ca(self,
    method forward (line 473) | def forward(self,
  class DeformableTransformerDecoder (line 515) | class DeformableTransformerDecoder(nn.Module):
    method __init__ (line 516) | def __init__(self, decoder_layer, num_layers, return_intermediate=Fals...
    method forward (line 533) | def forward(self, tgt, reference_points, src, src_spatial_shapes,
  function _get_clones (line 574) | def _get_clones(module, N):
  function build_deforamble_transformer (line 578) | def build_deforamble_transformer(args):

FILE: src/utils/dependencies/XPose/models/UniPose/transformer_vanilla.py
  class TextTransformer (line 18) | class TextTransformer(nn.Module):
    method __init__ (line 19) | def __init__(self, num_layers, d_model=256, nheads=8, dim_feedforward=...
    method forward (line 31) | def forward(self, memory_text:torch.Tensor, text_attention_mask:torch....
  class TransformerEncoderLayer (line 58) | class TransformerEncoderLayer(nn.Module):
    method __init__ (line 59) | def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, ...
    method with_pos_embed (line 76) | def with_pos_embed(self, tensor, pos: Optional[Tensor]):
    method forward (line 79) | def forward(

FILE: src/utils/dependencies/XPose/models/UniPose/unipose.py
  class UniPose (line 26) | class UniPose(nn.Module):
    method __init__ (line 29) | def __init__(self, backbone, transformer, num_classes, num_queries,
    method open_set_transfer_init (line 242) | def open_set_transfer_init(self):
    method _reset_parameters (line 269) | def _reset_parameters(self):
    method init_ref_points (line 275) | def init_ref_points(self, use_num_queries):
    method forward (line 303) | def forward(self, samples: NestedTensor, targets: List = None, **kw):
  function build_unipose (line 492) | def build_unipose(args):
  class ContrastiveAssign (line 573) | class ContrastiveAssign(nn.Module):
    method __init__ (line 574) | def __init__(self, project=False, cal_bias=None, max_text_len=256):
    method forward (line 586) | def forward(self, x, text_dict):

FILE: src/utils/dependencies/XPose/models/UniPose/utils.py
  function _get_clones (line 18) | def _get_clones(module, N, layer_share=False):
  function get_sine_pos_embed (line 26) | def get_sine_pos_embed(
  function gen_encoder_output_proposals (line 58) | def gen_encoder_output_proposals(memory: Tensor, memory_padding_mask: Te...
  class RandomBoxPerturber (line 116) | class RandomBoxPerturber():
    method __init__ (line 117) | def __init__(self, x_noise_scale=0.2, y_noise_scale=0.2, w_noise_scale...
    method __call__ (line 120) | def __call__(self, refanchors: Tensor) -> Tensor:
  function sigmoid_focal_loss (line 131) | def sigmoid_focal_loss(inputs, targets, num_boxes, alpha: float = 0.25, ...
  class MLP (line 162) | class MLP(nn.Module):
    method __init__ (line 165) | def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
    method forward (line 171) | def forward(self, x):
  function _get_activation_fn (line 177) | def _get_activation_fn(activation, d_model=256, batch_dim=0):
  function gen_sineembed_for_position (line 193) | def gen_sineembed_for_position(pos_tensor):
  function oks_overlaps (line 222) | def oks_overlaps(kpt_preds, kpt_gts, kpt_valids, kpt_areas, sigmas):
  function oks_loss (line 243) | def oks_loss(pred,
  class OKSLoss (line 271) | class OKSLoss(nn.Module):
    method __init__ (line 282) | def __init__(self,
    method forward (line 304) | def forward(self,

FILE: src/utils/dependencies/XPose/models/__init__.py
  function build_model (line 9) | def build_model(args):

FILE: src/utils/dependencies/XPose/models/registry.py
  class Registry (line 12) | class Registry(object):
    method __init__ (line 14) | def __init__(self, name):
    method __repr__ (line 18) | def __repr__(self):
    method __len__ (line 23) | def __len__(self):
    method name (line 27) | def name(self):
    method module_dict (line 31) | def module_dict(self):
    method get (line 34) | def get(self, key):
    method registe_with_name (line 37) | def registe_with_name(self, module_name=None, force=False):
    method register (line 40) | def register(self, module_build_function, module_name=None, force=False):

FILE: src/utils/dependencies/XPose/transforms.py
  function crop (line 19) | def crop(image, target, region):
  function hflip (line 83) | def hflip(image, target):
  function resize (line 148) | def resize(image, target, size, max_size=None):
  function pad (line 213) | def pad(image, target, padding):
  class ResizeDebug (line 226) | class ResizeDebug(object):
    method __init__ (line 227) | def __init__(self, size):
    method __call__ (line 230) | def __call__(self, img, target):
  class RandomCrop (line 234) | class RandomCrop(object):
    method __init__ (line 235) | def __init__(self, size):
    method __call__ (line 238) | def __call__(self, img, target):
  class RandomSizeCrop (line 243) | class RandomSizeCrop(object):
    method __init__ (line 244) | def __init__(self, min_size: int, max_size: int, respect_boxes: bool =...
    method __call__ (line 251) | def __call__(self, img: PIL.Image.Image, target: dict):
  class CenterCrop (line 265) | class CenterCrop(object):
    method __init__ (line 266) | def __init__(self, size):
    method __call__ (line 269) | def __call__(self, img, target):
  class RandomHorizontalFlip (line 277) | class RandomHorizontalFlip(object):
    method __init__ (line 278) | def __init__(self, p=0.5):
    method __call__ (line 281) | def __call__(self, img, target):
  class RandomResize (line 287) | class RandomResize(object):
    method __init__ (line 288) | def __init__(self, sizes, max_size=None):
    method __call__ (line 293) | def __call__(self, img, target=None):
  class RandomPad (line 298) | class RandomPad(object):
    method __init__ (line 299) | def __init__(self, max_pad):
    method __call__ (line 302) | def __call__(self, img, target):
  class RandomSelect (line 308) | class RandomSelect(object):
    method __init__ (line 313) | def __init__(self, transforms1, transforms2, p=0.5):
    method __call__ (line 318) | def __call__(self, img, target):
  class ToTensor (line 324) | class ToTensor(object):
    method __call__ (line 325) | def __call__(self, img, target):
  class RandomErasing (line 329) | class RandomErasing(object):
    method __init__ (line 331) | def __init__(self, *args, **kwargs):
    method __call__ (line 334) | def __call__(self, img, target):
  class Normalize (line 338) | class Normalize(object):
    method __init__ (line 339) | def __init__(self, mean, std):
    method __call__ (line 343) | def __call__(self, image, target=None):
  class Compose (line 379) | class Compose(object):
    method __init__ (line 380) | def __init__(self, transforms):
    method __call__ (line 383) | def __call__(self, image, target):
    method __repr__ (line 388) | def __repr__(self):

FILE: src/utils/dependencies/XPose/util/addict.py
  class Dict (line 4) | class Dict(dict):
    method __init__ (line 6) | def __init__(__self, *args, **kwargs):
    method __setattr__ (line 25) | def __setattr__(self, name, value):
    method __setitem__ (line 32) | def __setitem__(self, name, value):
    method __add__ (line 49) | def __add__(self, other):
    method _hook (line 59) | def _hook(cls, item):
    method __getattr__ (line 66) | def __getattr__(self, item):
    method __missing__ (line 69) | def __missing__(self, name):
    method __delattr__ (line 74) | def __delattr__(self, name):
    method to_dict (line 77) | def to_dict(self):
    method copy (line 90) | def copy(self):
    method deepcopy (line 93) | def deepcopy(self):
    method __deepcopy__ (line 96) | def __deepcopy__(self, memo):
    method update (line 103) | def update(self, *args, **kwargs):
    method __getnewargs__ (line 118) | def __getnewargs__(self):
    method __getstate__ (line 121) | def __getstate__(self):
    method __setstate__ (line 124) | def __setstate__(self, state):
    method __or__ (line 127) | def __or__(self, other):
    method __ror__ (line 134) | def __ror__(self, other):
    method __ior__ (line 141) | def __ior__(self, other):
    method setdefault (line 145) | def setdefault(self, key, default=None):
    method freeze (line 152) | def freeze(self, shouldFreeze=True):
    method unfreeze (line 158) | def unfreeze(self):

FILE: src/utils/dependencies/XPose/util/box_ops.py
  function box_cxcywh_to_xyxy (line 9) | def box_cxcywh_to_xyxy(x):
  function box_xyxy_to_cxcywh (line 16) | def box_xyxy_to_cxcywh(x):
  function box_iou (line 24) | def box_iou(boxes1, boxes2):
  function generalized_box_iou (line 41) | def generalized_box_iou(boxes1, boxes2):
  function box_iou_pairwise (line 69) | def box_iou_pairwise(boxes1, boxes2):
  function generalized_box_iou_pairwise (line 85) | def generalized_box_iou_pairwise(boxes1, boxes2):
  function masks_to_boxes (line 109) | def masks_to_boxes(masks):

FILE: src/utils/dependencies/XPose/util/config.py
  function check_file_exist (line 19) | def check_file_exist(filename, msg_tmpl='file "{}" does not exist'):
  class ConfigDict (line 23) | class ConfigDict(Dict):
    method __missing__ (line 25) | def __missing__(self, name):
    method __getattr__ (line 28) | def __getattr__(self, name):
  class Config (line 41) | class Config(object):
    method _validate_py_syntax (line 66) | def _validate_py_syntax(filename):
    method _file2dict (line 76) | def _file2dict(filename):
    method _merge_a_into_b (line 143) | def _merge_a_into_b(a, b):
    method fromfile (line 187) | def fromfile(filename):
    method __init__ (line 192) | def __init__(self, cfg_dict=None, cfg_text=None, filename=None):
    method filename (line 215) | def filename(self):
    method text (line 219) | def text(self):
    method pretty_text (line 223) | def pretty_text(self):
    method __repr__ (line 310) | def __repr__(self):
    method __len__ (line 313) | def __len__(self):
    method __getattr__ (line 316) | def __getattr__(self, name):
    method __getitem__ (line 329) | def __getitem__(self, name):
    method __setattr__ (line 332) | def __setattr__(self, name, value):
    method __setitem__ (line 337) | def __setitem__(self, name, value):
    method __iter__ (line 342) | def __iter__(self):
    method dump (line 345) | def dump(self, file=None):
    method merge_from_dict (line 353) | def merge_from_dict(self, options):
    method __setstate__ (line 385) | def __setstate__(self, state):
    method copy (line 389) | def copy(self):
    method deepcopy (line 392) | def deepcopy(self):
  class DictAction (line 396) | class DictAction(Action):
    method _parse_int_float_bool (line 404) | def _parse_int_float_bool(val):
    method __call__ (line 419) | def __call__(self, parser, namespace, values, option_string=None):

FILE: src/utils/dependencies/XPose/util/keypoint_ops.py
  function keypoint_xyxyzz_to_xyzxyz (line 3) | def keypoint_xyxyzz_to_xyzxyz(keypoints: torch.Tensor):
  function keypoint_xyzxyz_to_xyxyzz (line 18) | def keypoint_xyzxyz_to_xyxyzz(keypoints: torch.Tensor):

FILE: src/utils/dependencies/XPose/util/misc.py
  class SmoothedValue (line 34) | class SmoothedValue(object):
    method __init__ (line 39) | def __init__(self, window_size=20, fmt=None):
    method update (line 47) | def update(self, value, n=1):
    method synchronize_between_processes (line 52) | def synchronize_between_processes(self):
    method median (line 66) | def median(self):
    method avg (line 73) | def avg(self):
    method global_avg (line 78) | def global_avg(self):
    method max (line 86) | def max(self):
    method value (line 90) | def value(self):
    method __str__ (line 93) | def __str__(self):
  function _get_global_gloo_group (line 102) | def _get_global_gloo_group():
  function all_gather_cpu (line 113) | def all_gather_cpu(data):
  function all_gather (line 171) | def all_gather(data):
  function reduce_dict (line 220) | def reduce_dict(input_dict, average=True):
  class MetricLogger (line 247) | class MetricLogger(object):
    method __init__ (line 248) | def __init__(self, delimiter="\t"):
    method update (line 252) | def update(self, **kwargs):
    method __getattr__ (line 259) | def __getattr__(self, attr):
    method __str__ (line 267) | def __str__(self):
    method synchronize_between_processes (line 278) | def synchronize_between_processes(self):
    method add_meter (line 282) | def add_meter(self, name, meter):
    method log_every (line 285) | def log_every(self, iterable, print_freq, header=None, logger=None):
  function get_sha (line 346) | def get_sha():
  function collate_fn (line 366) | def collate_fn(batch):
  function _max_by_axis (line 373) | def _max_by_axis(the_list):
  class NestedTensor (line 382) | class NestedTensor(object):
    method __init__ (line 383) | def __init__(self, tensors, mask: Optional[Tensor]):
    method imgsize (line 395) | def imgsize(self):
    method to (line 404) | def to(self, device):
    method to_img_list_single (line 415) | def to_img_list_single(self, tensor, mask):
    method to_img_list (line 422) | def to_img_list(self):
    method device (line 439) | def device(self):
    method decompose (line 442) | def decompose(self):
    method __repr__ (line 445) | def __repr__(self):
    method shape (line 449) | def shape(self):
  function nested_tensor_from_tensor_list (line 456) | def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
  function _onnx_nested_tensor_from_tensor_list (line 484) | def _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> N...
  function setup_for_distributed (line 512) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 527) | def is_dist_avail_and_initialized():
  function get_world_size (line 535) | def get_world_size():
  function get_rank (line 541) | def get_rank():
  function is_main_process (line 547) | def is_main_process():
  function save_on_master (line 551) | def save_on_master(*args, **kwargs):
  function init_distributed_mode (line 555) | def init_distributed_mode(args):
  function accuracy (line 621) | def accuracy(output, target, topk=(1,)):
  function accuracy_onehot (line 639) | def accuracy_onehot(pred, gt):
  function interpolate (line 654) | def interpolate(input, size=None, scale_factor=None, mode="nearest", ali...
  class color_sys (line 675) | class color_sys():
    method __init__ (line 676) | def __init__(self, num_colors) -> None:
    method __call__ (line 686) | def __call__(self, idx):
  function inverse_sigmoid (line 689) | def inverse_sigmoid(x, eps=1e-3):
  function clean_state_dict (line 695) | def clean_state_dict(state_dict):

FILE: src/utils/dependencies/insightface/app/common.py
  class Face (line 5) | class Face(dict):
    method __init__ (line 7) | def __init__(self, d=None, **kwargs):
    method __setattr__ (line 19) | def __setattr__(self, name, value):
    method __getattr__ (line 30) | def __getattr__(self, name):
    method embedding_norm (line 34) | def embedding_norm(self):
    method normed_embedding (line 40) | def normed_embedding(self):
    method sex (line 46) | def sex(self):

FILE: src/utils/dependencies/insightface/app/face_analysis.py
  class FaceAnalysis (line 25) | class FaceAnalysis:
    method __init__ (line 26) | def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', allowe...
    method prepare (line 49) | def prepare(self, ctx_id, det_thresh=0.5, det_size=(640, 640)):
    method get (line 60) | def get(self, img, max_num=0):
    method draw_on (line 81) | def draw_on(self, img, faces):

FILE: src/utils/dependencies/insightface/data/image.py
  class ImageCache (line 6) | class ImageCache:
  function get_image (line 9) | def get_image(name, to_rgb=False):

FILE: src/utils/dependencies/insightface/data/pickle_object.py
  function get_object (line 7) | def get_object(name):

FILE: src/utils/dependencies/insightface/data/rec_builder.py
  class RecBuilder (line 9) | class RecBuilder():
    method __init__ (line 10) | def __init__(self, path, image_size=(112, 112)):
    method add (line 23) | def add(self, imgs):
    method add_image (line 44) | def add_image(self, img, label):
    method close (line 64) | def close(self):

FILE: src/utils/dependencies/insightface/model_zoo/arcface_onnx.py
  class ArcFaceONNX (line 19) | class ArcFaceONNX:
    method __init__ (line 20) | def __init__(self, model_file=None, session=None):
    method prepare (line 61) | def prepare(self, ctx_id, **kwargs):
    method get (line 65) | def get(self, img, face):
    method compute_sim (line 70) | def compute_sim(self, feat1, feat2):
    method get_feat (line 77) | def get_feat(self, imgs):
    method forward (line 87) | def forward(self, batch_data):

FILE: src/utils/dependencies/insightface/model_zoo/attribute.py
  class Attribute (line 19) | class Attribute:
    method __init__ (line 20) | def __init__(self, model_file=None, session=None):
    method prepare (line 68) | def prepare(self, ctx_id, **kwargs):
    method get (line 72) | def get(self, img, face):

FILE: src/utils/dependencies/insightface/model_zoo/inswapper.py
  class INSwapper (line 12) | class INSwapper():
    method __init__ (line 13) | def __init__(self, model_file=None, session=None):
    method forward (line 41) | def forward(self, img, latent):
    method get (line 46) | def get(self, img, target_face, source_face, paste_back=True):

FILE: src/utils/dependencies/insightface/model_zoo/landmark.py
  class Landmark (line 21) | class Landmark:
    method __init__ (line 22) | def __init__(self, model_file=None, session=None):
    method prepare (line 76) | def prepare(self, ctx_id, **kwargs):
    method get (line 80) | def get(self, img, face):

FILE: src/utils/dependencies/insightface/model_zoo/model_store.py
  function short_hash (line 29) | def short_hash(name):
  function find_params_file (line 36) | def find_params_file(dir_path):
  function get_model_file (line 46) | def get_model_file(name, root=os.path.join('~', '.insightface', 'models')):

FILE: src/utils/dependencies/insightface/model_zoo/model_zoo.py
  class PickableInferenceSession (line 22) | class PickableInferenceSession(onnxruntime.InferenceSession):
    method __init__ (line 24) | def __init__(self, model_path, **kwargs):
    method __getstate__ (line 28) | def __getstate__(self):
    method __setstate__ (line 31) | def __setstate__(self, values):
  class ModelRouter (line 35) | class ModelRouter:
    method __init__ (line 36) | def __init__(self, onnx_file):
    method get_model (line 39) | def get_model(self, **kwargs):
  function find_onnx_file (line 61) | def find_onnx_file(dir_path):
  function get_default_providers (line 70) | def get_default_providers():
  function get_default_provider_options (line 73) | def get_default_provider_options():
  function get_model (line 76) | def get_model(name, **kwargs):

FILE: src/utils/dependencies/insightface/model_zoo/retinaface.py
  function softmax (line 17) | def softmax(z):
  function distance2bbox (line 26) | def distance2bbox(points, distance, max_shape=None):
  function distance2kps (line 49) | def distance2kps(points, distance, max_shape=None):
  class RetinaFace (line 72) | class RetinaFace:
    method __init__ (line 73) | def __init__(self, model_file=None, session=None):
    method _init_vars (line 87) | def _init_vars(self):
    method prepare (line 130) | def prepare(self, ctx_id, **kwargs):
    method forward (line 146) | def forward(self, img, threshold):
    method detect (line 207) | def detect(self, img, input_size = None, max_num=0, metric='default'):
    method nms (line 262) | def nms(self, dets):
  function get_retinaface (line 292) | def get_retinaface(name, download=False, root='~/.insightface/models', *...

FILE: src/utils/dependencies/insightface/model_zoo/scrfd.py
  function softmax (line 17) | def softmax(z):
  function distance2bbox (line 26) | def distance2bbox(points, distance, max_shape=None):
  function distance2kps (line 49) | def distance2kps(points, distance, max_shape=None):
  class SCRFD (line 72) | class SCRFD:
    method __init__ (line 73) | def __init__(self, model_file=None, session=None):
    method _init_vars (line 88) | def _init_vars(self):
    method prepare (line 133) | def prepare(self, ctx_id, **kwargs):
    method forward (line 149) | def forward(self, img, threshold):
    method detect (line 220) | def detect(self, img, input_size = None, max_num=0, metric='default'):
    method nms (line 275) | def nms(self, dets):
  function get_scrfd (line 305) | def get_scrfd(name, download=False, root='~/.insightface/models', **kwar...
  function scrfd_2p5gkps (line 315) | def scrfd_2p5gkps(**kwargs):

FILE: src/utils/dependencies/insightface/utils/download.py
  function check_sha1 (line 10) | def check_sha1(filename, sha1_hash):
  function download_file (line 36) | def download_file(url, path=None, overwrite=False, sha1_hash=None):

FILE: src/utils/dependencies/insightface/utils/face_align.py
  function estimate_norm (line 11) | def estimate_norm(lmk, image_size=112,mode='arcface'):
  function norm_crop (line 27) | def norm_crop(img, landmark, image_size=112, mode='arcface'):
  function norm_crop2 (line 32) | def norm_crop2(img, landmark, image_size=112, mode='arcface'):
  function square_crop (line 37) | def square_crop(im, S):
  function transform (line 52) | def transform(data, center, output_size, scale, rotation):
  function trans_points2d (line 71) | def trans_points2d(pts, M):
  function trans_points3d (line 83) | def trans_points3d(pts, M):
  function trans_points (line 98) | def trans_points(pts, M):

FILE: src/utils/dependencies/insightface/utils/filesystem.py
  function get_model_dir (line 9) | def get_model_dir(name, root='~/.insightface'):
  function makedirs (line 14) | def makedirs(path):
  function try_import (line 30) | def try_import(package, message=None):
  function try_import_cv2 (line 54) | def try_import_cv2():
  function try_import_mmcv (line 68) | def try_import_mmcv():
  function try_import_rarfile (line 82) | def try_import_rarfile():
  function import_try_install (line 96) | def import_try_install(package, extern_url=None):
  function try_import_dali (line 142) | def try_import_dali():

FILE: src/utils/dependencies/insightface/utils/storage.py
  function download (line 9) | def download(sub_dir, name, force=False, root='~/.insightface'):
  function ensure_available (line 27) | def ensure_available(sub_dir, name, root='~/.insightface'):
  function download_onnx (line 30) | def download_onnx(sub_dir, model_file, force=False, root='~/.insightface...

FILE: src/utils/dependencies/insightface/utils/transform.py
  function transform (line 7) | def transform(data, center, output_size, scale, rotation):
  function trans_points2d (line 26) | def trans_points2d(pts, M):
  function trans_points3d (line 38) | def trans_points3d(pts, M):
  function trans_points (line 53) | def trans_points(pts, M):
  function estimate_affine_matrix_3d23d (line 59) | def estimate_affine_matrix_3d23d(X, Y):
  function P2sRt (line 71) | def P2sRt(P):
  function matrix2angle (line 91) | def matrix2angle(R):

FILE: src/utils/face_analysis_diy.py
  function sort_by_direction (line 14) | def sort_by_direction(faces, direction: str = 'large-small', face_center...
  class FaceAnalysisDIY (line 35) | class FaceAnalysisDIY(FaceAnalysis):
    method __init__ (line 36) | def __init__(self, name='buffalo_l', root='~/.insightface', allowed_mo...
    method get (line 41) | def get(self, img_bgr, **kwargs):
    method warmup (line 72) | def warmup(self):

FILE: src/utils/filter.py
  function smooth (line 8) | def smooth(x_d_lst, shape, device, observation_variance=3e-7, process_va...

FILE: src/utils/helper.py
  function tensor_to_numpy (line 23) | def tensor_to_numpy(data: Union[np.ndarray, torch.Tensor]) -> np.ndarray:
  function calc_motion_multiplier (line 29) | def calc_motion_multiplier(
  function suffix (line 44) | def suffix(filename):
  function prefix (line 52) | def prefix(filename):
  function basename (line 60) | def basename(filename):
  function remove_suffix (line 65) | def remove_suffix(filepath):
  function is_image (line 70) | def is_image(file_path):
  function is_video (line 75) | def is_video(file_path):
  function is_template (line 81) | def is_template(file_path):
  function mkdir (line 87) | def mkdir(d, log=False):
  function squeeze_tensor_to_numpy (line 96) | def squeeze_tensor_to_numpy(tensor):
  function dct2device (line 101) | def dct2device(dct: dict, device):
  function concat_feat (line 110) | def concat_feat(kp_source: torch.Tensor, kp_driving: torch.Tensor) -> to...
  function remove_ddp_dumplicate_key (line 124) | def remove_ddp_dumplicate_key(state_dict):
  function load_model (line 131) | def load_model(ckpt_path, model_config, device, model_type):
  function load_description (line 175) | def load_description(fp):
  function is_square_video (line 181) | def is_square_video(video_path):
  function clean_state_dict (line 193) | def clean_state_dict(state_dict):

FILE: src/utils/human_landmark_runner.py
  function make_abs_path (line 13) | def make_abs_path(fn):
  function to_ndarray (line 17) | def to_ndarray(obj):
  class LandmarkRunner (line 26) | class LandmarkRunner(object):
    method __init__ (line 29) | def __init__(self, **kwargs):
    method _run (line 56) | def _run(self, inp):
    method run (line 60) | def run(self, img_rgb: np.ndarray, lmk=None):
    method warmup (line 87) | def warmup(self):

FILE: src/utils/io.py
  function load_image_rgb (line 12) | def load_image_rgb(image_path: str):
  function load_video (line 19) | def load_video(video_info, n_frames=-1):
  function contiguous (line 32) | def contiguous(obj):
  function resize_to_limit (line 38) | def resize_to_limit(img: np.ndarray, max_dim=1920, division=2):
  function load_img_online (line 73) | def load_img_online(obj, mode="bgr", **kwargs):
  function load (line 95) | def load(fp):
  function dump (line 106) | def dump(wfp, obj):

FILE: src/utils/retargeting_utils.py
  function calculate_distance_ratio (line 9) | def calculate_distance_ratio(lmk: np.ndarray, idx1: int, idx2: int, idx3...
  function calc_eye_close_ratio (line 14) | def calc_eye_close_ratio(lmk: np.ndarray, target_eye_ratio: np.ndarray =...
  function calc_lip_close_ratio (line 23) | def calc_lip_close_ratio(lmk: np.ndarray) -> np.ndarray:

FILE: src/utils/timer.py
  class Timer (line 9) | class Timer(object):
    method __init__ (line 12) | def __init__(self):
    method tic (line 18) | def tic(self):
    method toc (line 23) | def toc(self, average=True):
    method clear (line 27) | def clear(self):

FILE: src/utils/video.py
  function exec_cmd (line 21) | def exec_cmd(cmd):
  function images2video (line 25) | def images2video(images, wfp, **kwargs):
  function video2gif (line 50) | def video2gif(video_fp, fps=30, size=256):
  function merge_audio_video (line 67) | def merge_audio_video(video_fp, audio_fp, wfp):
  function blend (line 76) | def blend(img: np.ndarray, mask: np.ndarray, background_color=(255, 255,...
  function concat_frames (line 84) | def concat_frames(driving_image_lst, source_image_lst, I_p_lst):
  class VideoWriter (line 105) | class VideoWriter:
    method __init__ (line 106) | def __init__(self, **kwargs):
    method write (line 122) | def write(self, image):
    method close (line 128) | def close(self):
  function change_video_fps (line 133) | def change_video_fps(input_file, output_file, fps=20, codec='libx264', c...
  function get_fps (line 138) | def get_fps(filepath, default_fps=25):
  function has_audio_stream (line 151) | def has_audio_stream(video_path: str) -> bool:
  function add_audio_to_video (line 189) | def add_audio_to_video(silent_video_path: str, audio_video_path: str, ou...
  function bb_intersection_over_union (line 209) | def bb_intersection_over_union(boxA, boxB):

FILE: src/utils/viz.py
  function viz_lmk (line 6) | def viz_lmk(img_, vps, **kwargs):

Download .json

Condensed preview — 145 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (801K chars).

[
  {
    "path": ".gitignore",
    "chars": 663,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n**/__pycache__/\n*.py[cod]\n**/*.py[cod]\n*$py.class\n\n# Model weights\n"
  },
  {
    "path": ".vscode/settings.json",
    "chars": 444,
    "preview": "{\n    \"[python]\": {\n        \"editor.tabSize\": 4\n    },\n    \"files.eol\": \"\\n\",\n    \"files.insertFinalNewline\": true,\n    "
  },
  {
    "path": "LICENSE",
    "chars": 1415,
    "preview": "MIT License\n\nCopyright (c) 2024 Kuaishou Visual Generation and Interaction Center\n\nPermission is hereby granted, free of"
  },
  {
    "path": "app.py",
    "chars": 25481,
    "preview": "# coding: utf-8\n\n\"\"\"\nThe entrance of the gradio for human\n\"\"\"\n\nimport os\nimport tyro\nimport subprocess\nimport gradio as "
  },
  {
    "path": "app_animals.py",
    "chars": 11356,
    "preview": "# coding: utf-8\n\n\"\"\"\nThe entrance of the gradio for animal\n\"\"\"\n\nimport os\nimport tyro\nimport subprocess\nimport gradio as"
  },
  {
    "path": "assets/.gitignore",
    "chars": 51,
    "preview": "examples/driving/*.pkl\nexamples/driving/*_crop.mp4\n"
  },
  {
    "path": "assets/docs/changelog/2024-07-10.md",
    "chars": 2272,
    "preview": "## 2024/07/10\n\n**First, thank you all for your attention, support, sharing, and contributions to LivePortrait!** ❤️\nThe "
  },
  {
    "path": "assets/docs/changelog/2024-07-19.md",
    "chars": 2404,
    "preview": "## 2024/07/19\n\n**Once again, we would like to express our heartfelt gratitude for your love, attention, and support for "
  },
  {
    "path": "assets/docs/changelog/2024-07-24.md",
    "chars": 619,
    "preview": "## 2024/07/24\n\n### Updates\n\n- **Portrait pose editing:** You can change the `relative pitch`, `relative yaw`, and `relat"
  },
  {
    "path": "assets/docs/changelog/2024-08-02.md",
    "chars": 4772,
    "preview": "## 2024/08/02\n\n<table class=\"center\" style=\"width: 80%; margin-left: auto; margin-right: auto;\">\n<tr>\n    <td style=\"tex"
  },
  {
    "path": "assets/docs/changelog/2024-08-05.md",
    "chars": 893,
    "preview": "## One-click Windows Installer\n\n### Download the installer from HuggingFace\n```bash\n# !pip install -U \"huggingface_hub[c"
  },
  {
    "path": "assets/docs/changelog/2024-08-06.md",
    "chars": 647,
    "preview": "## Precise Portrait Editing\n\nInspired by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-Advance"
  },
  {
    "path": "assets/docs/changelog/2024-08-19.md",
    "chars": 4409,
    "preview": "## Image Driven and Regional Control\n\n<p align=\"center\">\n  <img src=\"../image-driven-image-2024-08-19.jpg\" alt=\"LivePort"
  },
  {
    "path": "assets/docs/changelog/2025-01-01.md",
    "chars": 1421,
    "preview": "## 2025/01/01\n\n**We’re thrilled that cats 🐱 are now speaking and singing across the internet!**  🎶\n\nIn this update, we’v"
  },
  {
    "path": "assets/docs/directory-structure.md",
    "chars": 743,
    "preview": "## The directory structure of `pretrained_weights`\n\n```text\npretrained_weights\n├── insightface\n│   └── models\n│       └─"
  },
  {
    "path": "assets/docs/how-to-install-ffmpeg.md",
    "chars": 719,
    "preview": "## Install FFmpeg\n\nMake sure you have `ffmpeg` and `ffprobe` installed on your system. If you don't have them installed,"
  },
  {
    "path": "assets/docs/speed.md",
    "chars": 903,
    "preview": "### Speed\n\nBelow are the results of inferring one frame on an RTX 4090 GPU using the native PyTorch framework with `torc"
  },
  {
    "path": "assets/gradio/gradio_description_animate_clear.md",
    "chars": 558,
    "preview": "<div style=\"font-size: 1.2em; text-align: center;\">\n    Step 3: Click the <strong>🚀 Animate</strong> button below to gen"
  },
  {
    "path": "assets/gradio/gradio_description_animation.md",
    "chars": 1457,
    "preview": "<span style=\"font-size: 1.2em;\">🔥 To animate the source image or video with the driving video, please follow these steps"
  },
  {
    "path": "assets/gradio/gradio_description_retargeting.md",
    "chars": 805,
    "preview": "<br>\n\n<!-- ## Retargeting -->\n<!-- <span style=\"font-size: 1.2em;\">🔥 To edit the eyes and lip open ratio of the source p"
  },
  {
    "path": "assets/gradio/gradio_description_retargeting_video.md",
    "chars": 435,
    "preview": "<br>\n<div style=\"display: flex; justify-content: center; align-items: center; text-align: center; font-size: 1.2em;\">\n  "
  },
  {
    "path": "assets/gradio/gradio_description_upload.md",
    "chars": 959,
    "preview": "<br>\n<div style=\"font-size: 1.2em; display: flex; justify-content: space-between;\">\n    <div style=\"flex: 1; text-align:"
  },
  {
    "path": "assets/gradio/gradio_description_upload_animal.md",
    "chars": 783,
    "preview": "<br>\n<div style=\"font-size: 1.2em; display: flex; justify-content: space-between;\">\n    <div style=\"flex: 1; text-align:"
  },
  {
    "path": "assets/gradio/gradio_title.md",
    "chars": 1254,
    "preview": "<div style=\"display: flex; justify-content: center; align-items: center; text-align: center;\">\n  <div>\n    <h1>LivePortr"
  },
  {
    "path": "inference.py",
    "chars": 1779,
    "preview": "# coding: utf-8\n\n\"\"\"\nThe entrance of humans\n\"\"\"\n\nimport os\nimport os.path as osp\nimport tyro\nimport subprocess\nfrom src."
  },
  {
    "path": "inference_animals.py",
    "chars": 1812,
    "preview": "# coding: utf-8\n\n\"\"\"\nThe entrance of animal\n\"\"\"\n\nimport os\nimport os.path as osp\nimport tyro\nimport subprocess\nfrom src."
  },
  {
    "path": "pretrained_weights/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "readme.md",
    "chars": 21154,
    "preview": "<h1 align=\"center\">LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control</h1>\n\n<div align='c"
  },
  {
    "path": "readme_zh_cn.md",
    "chars": 16129,
    "preview": "<h1 align=\"center\">LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control</h1>\n\n<!-- ===== 作者"
  },
  {
    "path": "requirements.txt",
    "chars": 71,
    "preview": "-r requirements_base.txt\n\nonnxruntime-gpu==1.18.0\ntransformers==4.38.0\n"
  },
  {
    "path": "requirements_base.txt",
    "chars": 295,
    "preview": "numpy==1.26.4\npyyaml==6.0.1\nopencv-python==4.10.0.84\nscipy==1.13.1\nimageio==2.34.2\nlmdb==1.4.1\ntqdm==4.66.4\nrich==13.7.1"
  },
  {
    "path": "requirements_macOS.txt",
    "chars": 160,
    "preview": "-r requirements_base.txt\n\n--extra-index-url https://download.pytorch.org/whl/cpu\ntorch==2.3.0\ntorchvision==0.18.0\ntorcha"
  },
  {
    "path": "speed.py",
    "chars": 7810,
    "preview": "# coding: utf-8\n\n\"\"\"\nBenchmark the inference speed of each module in LivePortrait.\n\nTODO: heavy GPT style, need to refac"
  },
  {
    "path": "src/config/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "src/config/argument_config.py",
    "chars": 4969,
    "preview": "# coding: utf-8\n\n\"\"\"\nAll configs for user\n\"\"\"\nfrom dataclasses import dataclass\nimport tyro\nfrom typing_extensions impor"
  },
  {
    "path": "src/config/base_config.py",
    "chars": 802,
    "preview": "# coding: utf-8\n\n\"\"\"\npretty printing class\n\"\"\"\n\nfrom __future__ import annotations\nimport os.path as osp\nfrom typing imp"
  },
  {
    "path": "src/config/crop_config.py",
    "chars": 1703,
    "preview": "# coding: utf-8\n\n\"\"\"\nparameters used for crop faces\n\"\"\"\n\nfrom dataclasses import dataclass\n\nfrom .base_config import Pri"
  },
  {
    "path": "src/config/inference_config.py",
    "chars": 4433,
    "preview": "# coding: utf-8\n\n\"\"\"\nconfig dataclass used for inference\n\"\"\"\n\nimport cv2\nfrom numpy import ndarray\nimport pickle as pkl\n"
  },
  {
    "path": "src/config/models.yaml",
    "chars": 1235,
    "preview": "model_params:\n  appearance_feature_extractor_params: # the F in the paper\n    image_channel: 3\n    block_expansion: 64\n "
  },
  {
    "path": "src/gradio_pipeline.py",
    "chars": 34289,
    "preview": "# coding: utf-8\n\n\"\"\"\nPipeline for gradio\n\"\"\"\n\nimport os.path as osp\nimport os\nimport cv2\nfrom rich.progress import track"
  },
  {
    "path": "src/live_portrait_pipeline.py",
    "chars": 31893,
    "preview": "# coding: utf-8\n\n\"\"\"\nPipeline of LivePortrait (Human)\n\"\"\"\n\nimport torch\ntorch.backends.cudnn.benchmark = True # disable "
  },
  {
    "path": "src/live_portrait_pipeline_animal.py",
    "chars": 11320,
    "preview": "# coding: utf-8\n\n\"\"\"\nPipeline of LivePortrait (Animal)\n\"\"\"\n\nimport warnings\nwarnings.filterwarnings(\"ignore\", message=\"t"
  },
  {
    "path": "src/live_portrait_wrapper.py",
    "chars": 16246,
    "preview": "# coding: utf-8\n\n\"\"\"\nWrappers for LivePortrait core functions\n\"\"\"\n\nimport contextlib\nimport os.path as osp\nimport numpy "
  },
  {
    "path": "src/modules/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "src/modules/appearance_feature_extractor.py",
    "chars": 1974,
    "preview": "# coding: utf-8\n\n\"\"\"\nAppearance extractor(F) defined in paper, which maps the source image s to a 3D appearance feature "
  },
  {
    "path": "src/modules/convnextv2.py",
    "chars": 5126,
    "preview": "# coding: utf-8\n\n\"\"\"\nThis moudle is adapted to the ConvNeXtV2 version for the extraction of implicit keypoints, poses, a"
  },
  {
    "path": "src/modules/dense_motion.py",
    "chars": 5547,
    "preview": "# coding: utf-8\n\n\"\"\"\nThe module that predicting a dense motion from sparse motion representation given by kp_source and "
  },
  {
    "path": "src/modules/motion_extractor.py",
    "chars": 1083,
    "preview": "# coding: utf-8\n\n\"\"\"\nMotion extractor(M), which directly predicts the canonical keypoints, head pose and expression defo"
  },
  {
    "path": "src/modules/spade_generator.py",
    "chars": 2659,
    "preview": "# coding: utf-8\n\n\"\"\"\nSpade decoder(G) defined in the paper, which input the warped feature to generate the animated imag"
  },
  {
    "path": "src/modules/stitching_retargeting_network.py",
    "chars": 1501,
    "preview": "# coding: utf-8\n\n\"\"\"\nStitching module(S) and two retargeting modules(R) defined in the paper.\n\n- The stitching module pa"
  },
  {
    "path": "src/modules/util.py",
    "chars": 15937,
    "preview": "# coding: utf-8\n\n\"\"\"\nThis file defines various neural network modules and utility functions, including convolutional and"
  },
  {
    "path": "src/modules/warping_network.py",
    "chars": 2846,
    "preview": "# coding: utf-8\n\n\"\"\"\nWarping field estimator(W) defined in the paper, which generates a warping field using the implicit"
  },
  {
    "path": "src/utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "src/utils/animal_landmark_runner.py",
    "chars": 6104,
    "preview": "# coding: utf-8\n\n\"\"\"\nface detectoin and alignment using XPose\n\"\"\"\n\nimport os\nimport pickle\nimport torch\nimport numpy as "
  },
  {
    "path": "src/utils/camera.py",
    "chars": 1809,
    "preview": "# coding: utf-8\n\n\"\"\"\nfunctions for processing and transforming 3D facial keypoints\n\"\"\"\n\nimport numpy as np\nimport torch\n"
  },
  {
    "path": "src/utils/check_windows_port.py",
    "chars": 345,
    "preview": "import socket\nimport sys\n\nif len(sys.argv) != 2:\n    print(\"Usage: python check_port.py <port>\")\n    sys.exit(1)\n\nport ="
  },
  {
    "path": "src/utils/crop.py",
    "chars": 13758,
    "preview": "# coding: utf-8\n\n\"\"\"\ncropping function and the related preprocess functions for cropping\n\"\"\"\n\nimport numpy as np\nimport "
  },
  {
    "path": "src/utils/cropper.py",
    "chars": 13021,
    "preview": "# coding: utf-8\n\nimport os.path as osp\nimport torch\nimport numpy as np\nimport cv2; cv2.setNumThreads(0); cv2.ocl.setUseO"
  },
  {
    "path": "src/utils/dependencies/XPose/config_model/UniPose_SwinT.py",
    "chars": 2615,
    "preview": "_base_ = ['coco_transformer.py']\n\nuse_label_enc = True\n\nnum_classes=2\n\nlr = 0.0001\nparam_dict_type = 'default'\nlr_backbo"
  },
  {
    "path": "src/utils/dependencies/XPose/config_model/coco_transformer.py",
    "chars": 209,
    "preview": "data_aug_scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]\ndata_aug_max_size = 1333\ndata_aug_scales2_resi"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/__init__.py",
    "chars": 542,
    "preview": "# ------------------------------------------------------------------------\n# Conditional DETR\n# Copyright (c) 2021 Micro"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/attention.py",
    "chars": 20104,
    "preview": "# ------------------------------------------------------------------------\n# UniPose\n# url: https://github.com/IDEA-Rese"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/backbone.py",
    "chars": 7577,
    "preview": "# ------------------------------------------------------------------------\n# UniPose\n# url: https://github.com/IDEA-Rese"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/deformable_transformer.py",
    "chars": 55732,
    "preview": "# ------------------------------------------------------------------------\n# UniPose\n# url: https://github.com/IDEA-Rese"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/fuse_modules.py",
    "chars": 11325,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n# from timm.models.layers import DropPath\nfrom src.m"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/mask_generate.py",
    "chars": 2001,
    "preview": "import torch\n\n\ndef prepare_for_mask(kpt_mask):\n\n\n    tgt_size2 = 50 * 69\n    attn_mask2 = torch.ones(kpt_mask.shape[0], "
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/functions/__init__.py",
    "chars": 598,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/functions/ms_deform_attn_func.py",
    "chars": 3298,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/modules/__init__.py",
    "chars": 584,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/modules/ms_deform_attn.py",
    "chars": 7551,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/modules/ms_deform_attn_key_aware.py",
    "chars": 7061,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/setup.py",
    "chars": 2596,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/cpu/ms_deform_attn_cpu.cpp",
    "chars": 1256,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/cpu/ms_deform_attn_cpu.h",
    "chars": 1139,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/cuda/ms_deform_attn_cuda.cu",
    "chars": 7321,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/cuda/ms_deform_attn_cuda.h",
    "chars": 1140,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/cuda/ms_deform_im2col_cuda.cuh",
    "chars": 54694,
    "preview": "/*!\n**************************************************************************\n* Deformable DETR\n* Copyright (c) 2020 Se"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/ms_deform_attn.h",
    "chars": 1838,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/src/vision.cpp",
    "chars": 799,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/ops/test.py",
    "chars": 4087,
    "preview": "# ------------------------------------------------------------------------------------------------\n# Deformable DETR\n# C"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/position_encoding.py",
    "chars": 6280,
    "preview": "# ------------------------------------------------------------------------\n# ED-Pose\n# Copyright (c) 2023 IDEA. All Righ"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/swin_transformer.py",
    "chars": 27309,
    "preview": "\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nimport n"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/transformer_deformable.py",
    "chars": 27621,
    "preview": "# ------------------------------------------------------------------------\n# ED-Pose\n# Copyright (c) 2023 IDEA. All Righ"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/transformer_vanilla.py",
    "chars": 3463,
    "preview": "# Copyright (c) Aishwarya Kamath & Nicolas Carion. Licensed under the Apache License 2.0. All Rights Reserved\n# Copyrigh"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/unipose.py",
    "chars": 26812,
    "preview": "# ------------------------------------------------------------------------\n# ED-Pose\n# Copyright (c) 2023 IDEA. All Righ"
  },
  {
    "path": "src/utils/dependencies/XPose/models/UniPose/utils.py",
    "chars": 13897,
    "preview": "# ------------------------------------------------------------------------\n# ED-Pose\n# Copyright (c) 2023 IDEA. All Righ"
  },
  {
    "path": "src/utils/dependencies/XPose/models/__init__.py",
    "chars": 688,
    "preview": "# ------------------------------------------------------------------------\n# ED-Pose\n# Copyright (c) 2023 IDEA. All Righ"
  },
  {
    "path": "src/utils/dependencies/XPose/models/registry.py",
    "chars": 1757,
    "preview": "# -*- coding: utf-8 -*-\n# @Author: Yihao Chen\n# @Date:   2021-08-16 16:03:17\n# @Last Modified by:   Shilong Liu\n# @Last "
  },
  {
    "path": "src/utils/dependencies/XPose/predefined_keypoints.py",
    "chars": 14841,
    "preview": "person = {\"keypoints\":['nose', 'left eye', 'right eye', 'left ear', 'right ear', 'left shoulder', 'right shoulder', 'lef"
  },
  {
    "path": "src/utils/dependencies/XPose/transforms.py",
    "chars": 13961,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nTransforms and data augmentation for both ima"
  },
  {
    "path": "src/utils/dependencies/XPose/util/addict.py",
    "chars": 4901,
    "preview": "import copy\n\n\nclass Dict(dict):\n\n    def __init__(__self, *args, **kwargs):\n        object.__setattr__(__self, '__parent"
  },
  {
    "path": "src/utils/dependencies/XPose/util/box_ops.py",
    "chars": 3925,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nUtilities for bounding box manipulation and G"
  },
  {
    "path": "src/utils/dependencies/XPose/util/config.py",
    "chars": 14218,
    "preview": "# ==========================================================\n# Modified from mmcv\n# ===================================="
  },
  {
    "path": "src/utils/dependencies/XPose/util/keypoint_ops.py",
    "chars": 801,
    "preview": "import torch, os\n\ndef keypoint_xyxyzz_to_xyzxyz(keypoints: torch.Tensor):\n    \"\"\"_summary_\n\n    Args:\n        keypoints "
  },
  {
    "path": "src/utils/dependencies/XPose/util/misc.py",
    "chars": 23272,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nMisc functions, including distributed helpers"
  },
  {
    "path": "src/utils/dependencies/insightface/__init__.py",
    "chars": 399,
    "preview": "# coding: utf-8\n# pylint: disable=wrong-import-position\n\"\"\"InsightFace: A Face Analysis Toolkit.\"\"\"\nfrom __future__ impo"
  },
  {
    "path": "src/utils/dependencies/insightface/app/__init__.py",
    "chars": 29,
    "preview": "from .face_analysis import *\n"
  },
  {
    "path": "src/utils/dependencies/insightface/app/common.py",
    "chars": 1467,
    "preview": "import numpy as np\nfrom numpy.linalg import norm as l2norm\n#from easydict import EasyDict\n\nclass Face(dict):\n\n    def __"
  },
  {
    "path": "src/utils/dependencies/insightface/app/face_analysis.py",
    "chars": 4247,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-05-04\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/data/__init__.py",
    "chars": 67,
    "preview": "from .image import get_image\nfrom .pickle_object import get_object\n"
  },
  {
    "path": "src/utils/dependencies/insightface/data/image.py",
    "chars": 731,
    "preview": "import cv2\nimport os\nimport os.path as osp\nfrom pathlib import Path\n\nclass ImageCache:\n    data = {}\n\ndef get_image(name"
  },
  {
    "path": "src/utils/dependencies/insightface/data/pickle_object.py",
    "chars": 415,
    "preview": "import cv2\nimport os\nimport os.path as osp\nfrom pathlib import Path\nimport pickle\n\ndef get_object(name):\n    objects_dir"
  },
  {
    "path": "src/utils/dependencies/insightface/data/rec_builder.py",
    "chars": 2511,
    "preview": "import pickle\nimport numpy as np\nimport os\nimport os.path as osp\nimport sys\nimport mxnet as mx\n\n\nclass RecBuilder():\n   "
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/__init__.py",
    "chars": 195,
    "preview": "from .model_zoo import get_model\nfrom .arcface_onnx import ArcFaceONNX\nfrom .retinaface import RetinaFace\nfrom .scrfd im"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/arcface_onnx.py",
    "chars": 3111,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-05-04\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/attribute.py",
    "chars": 3351,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-06-19\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/inswapper.py",
    "chars": 5279,
    "preview": "import time\nimport numpy as np\nimport onnxruntime\nimport cv2\nimport onnx\nfrom onnx import numpy_helper\nfrom ..utils impo"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/landmark.py",
    "chars": 4198,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-05-04\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/model_store.py",
    "chars": 3108,
    "preview": "\"\"\"\nThis code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/model_store.py\n\"\"\"\nf"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/model_zoo.py",
    "chars": 3579,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-05-04\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/retinaface.py",
    "chars": 11110,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-09-18\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/model_zoo/scrfd.py",
    "chars": 12852,
    "preview": "# -*- coding: utf-8 -*-\n# @Organization  : insightface.ai\n# @Author        : Jia Guo\n# @Time          : 2021-05-04\n# @Fu"
  },
  {
    "path": "src/utils/dependencies/insightface/utils/__init__.py",
    "chars": 215,
    "preview": "from __future__ import absolute_import\n\nfrom .storage import download, ensure_available, download_onnx\nfrom .filesystem "
  },
  {
    "path": "src/utils/dependencies/insightface/utils/constant.py",
    "chars": 32,
    "preview": "\nDEFAULT_MP_NAME = 'buffalo_l'\n\n"
  },
  {
    "path": "src/utils/dependencies/insightface/utils/download.py",
    "chars": 3360,
    "preview": "\"\"\"\nThis code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/download.py\n\"\"\"\nimport o"
  },
  {
    "path": "src/utils/dependencies/insightface/utils/face_align.py",
    "chars": 3356,
    "preview": "import cv2\nimport numpy as np\nfrom skimage import transform as trans\n\n\narcface_dst = np.array(\n    [[38.2946, 51.6963], "
  },
  {
    "path": "src/utils/dependencies/insightface/utils/filesystem.py",
    "chars": 4250,
    "preview": "\"\"\"\nThis code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/filesystem.py\n\"\"\"\nimport"
  },
  {
    "path": "src/utils/dependencies/insightface/utils/storage.py",
    "chars": 1891,
    "preview": "\nimport os\nimport os.path as osp\nimport zipfile\nfrom .download import download_file\n\nBASE_REPO_URL = 'https://github.com"
  },
  {
    "path": "src/utils/dependencies/insightface/utils/transform.py",
    "chars": 3379,
    "preview": "import cv2\nimport math\nimport numpy as np\nfrom skimage import transform as trans\n\n\ndef transform(data, center, output_si"
  },
  {
    "path": "src/utils/face_analysis_diy.py",
    "chars": 3077,
    "preview": "# coding: utf-8\n\n\"\"\"\nface detectoin and alignment using InsightFace\n\"\"\"\n\nimport numpy as np\nfrom .rprint import rlog as "
  },
  {
    "path": "src/utils/filter.py",
    "chars": 767,
    "preview": "# coding: utf-8\n\nimport torch\nimport numpy as np\nfrom pykalman import KalmanFilter\n\n\ndef smooth(x_d_lst, shape, device, "
  },
  {
    "path": "src/utils/helper.py",
    "chars": 6189,
    "preview": "# coding: utf-8\n\n\"\"\"\nutility functions and classes to handle feature extraction and model loading\n\"\"\"\n\nimport os\nimport "
  },
  {
    "path": "src/utils/human_landmark_runner.py",
    "chars": 2977,
    "preview": "# coding: utf-8\n\nimport os.path as osp\nimport cv2; cv2.setNumThreads(0); cv2.ocl.setUseOpenCL(False)\nimport torch\nimport"
  },
  {
    "path": "src/utils/io.py",
    "chars": 3195,
    "preview": "# coding: utf-8\n\nimport os.path as osp\nimport imageio\nimport numpy as np\nimport pickle\nimport cv2; cv2.setNumThreads(0);"
  },
  {
    "path": "src/utils/retargeting_utils.py",
    "chars": 991,
    "preview": "\n\"\"\"\nFunctions to compute distance ratios between specific pairs of facial landmarks\n\"\"\"\n\nimport numpy as np\n\n\ndef calcu"
  },
  {
    "path": "src/utils/rprint.py",
    "chars": 248,
    "preview": "# coding: utf-8\n\n\"\"\"\ncustom print and log functions \n\"\"\"\n\n__all__ = ['rprint', 'rlog']\n\ntry:\n    from rich.console impor"
  },
  {
    "path": "src/utils/timer.py",
    "chars": 606,
    "preview": "# coding: utf-8\n\n\"\"\"\ntools to measure elapsed time\n\"\"\"\n\nimport time\n\nclass Timer(object):\n    \"\"\"A simple timer.\"\"\"\n\n   "
  },
  {
    "path": "src/utils/video.py",
    "chars": 7339,
    "preview": "# coding: utf-8\n\n\"\"\"\nFunctions for processing video\n\nATTENTION: you need to install ffmpeg and ffprobe in your env!\n\"\"\"\n"
  },
  {
    "path": "src/utils/viz.py",
    "chars": 516,
    "preview": "# coding: utf-8\n\nimport cv2; cv2.setNumThreads(0); cv2.ocl.setUseOpenCL(False)\n\n\ndef viz_lmk(img_, vps, **kwargs):\n    \""
  }
]

// ... and 16 more files (download for full content)

About this extraction

This page contains the full source code of the KlingAIResearch/LivePortrait GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 145 files (752.1 KB), approximately 198.4k tokens, and a symbol index with 718 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo