🔥 HuGe100K - The largest multi-view human dataset with 100,000+ subjects! 🔥
High-resolution • Multi-view • Diverse poses • SMPL-X aligned
Complete the form to get access credentials and download links!
### ⚖️ **License and Attribution**
This dataset includes images derived from the **DeepFashion** dataset, originally provided by MMLAB at The Chinese University of Hong Kong. The use of DeepFashion images in this dataset has been explicitly authorized by the original authors solely for the purpose of creating and distributing this dataset. **Users must not further reproduce, distribute, sell, or commercially exploit any images or derived data originating from DeepFashion.** For any subsequent or separate use of the DeepFashion data, users must directly obtain authorization from MMLAB and comply with the original [DeepFashion License](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html).
---
## 📝 **Citation**
If you find our work helpful, please cite us using the following BibTeX:
```bibtex
@article{zhuang2024idolinstant,
title={IDOL: Instant Photorealistic 3D Human Creation from a Single Image},
author={Yiyu Zhuang and Jiaxi Lv and Hao Wen and Qing Shuai and Ailing Zeng and Hao Zhu and Shifeng Chen and Yujiu Yang and Xun Cao and Wei Liu},
journal={arXiv preprint arXiv:2412.14963},
year={2024},
url={https://arxiv.org/abs/2412.14963},
}
```
## **License**
This project is licensed under the **MIT License**.
- **Permissions**: This license grants permission to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.
- **Condition**: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
- **Disclaimer**: The software is provided "as is", without warranty of any kind.
For more information, see the full license [here](https://opensource.org/licenses/MIT).
## **Support Our Work** ⭐
If you find our work useful for your research or applications:
- Please ⭐ **star our repository** to help us reach more people
- Consider **citing our paper** in your publications (see [Citation](#citation) section)
- Share our project with others who might benefit from it
Your support helps us continue developing open-source research projects like this one!
## 📚 **Acknowledgments**
This project is majorly built upon several excellent open-source projects:
- [E3Gen](https://github.com/olivia23333/E3Gen): Efficient, Expressive and Editable Avatars Generation
- [SAPIENS](https://github.com/facebookresearch/sapiens): High-resolution visual models for human-centric tasks
- [GeoLRM](https://github.com/alibaba-yuanjing-aigclab/GeoLRM): Large Reconstruction Model for High-Quality 3D Generation
- [3D Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting): Real-Time 3DGS Rendering
We thank all the authors for their contributions to the open-source community.
================================================
FILE: configs/idol_debug.yaml
================================================
debug: True
# code_size: [32, 256, 256]
code_size: [32, 1024, 1024]
model:
# base_learning_rate: 2.0e-04 # yy Need to check
target: lib.SapiensGS_SA_v1
params:
# optimizer add
# use_bf16: true
max_steps: 100_000
warmup_steps: 10_000 #12_000
use_checkpoint: true
lambda_depth_tv: 0.05
lambda_lpips: 0 #2.0
lambda_mse: 20 #1.0
lambda_offset: 1 #offset_weight: 50 mse 20, lpips 0.1
neck_learning_rate: 5e-4
decoder_learning_rate: 5e-4
output_hidden_states: true # if True, will output the hidden states from sapiens shallow layer, for the neck decoder
loss_coef: 0.5
init_iter: 500
scale_weight: 0.01
smplx_path: 'work_dirs/demo_data/Ways_to_Catch_360_clip1.json'
code_reshape: [32, 96, 96]
patch_size: 1
code_activation:
type: tanh
mean: 0.0
std: 0.5
clip_range: 2
grid_size: 64
encoder:
target: lib.models.sapiens.SapiensWrapper_ts
params:
model_path: work_dirs/ckpt/sapiens_1b_epoch_173_torchscript.pt2
# model_path: /apdcephfs_cq8/share_1367250/harriswen/projects/sapiens_convert/checkpoints//sapiens_1b_epoch_173_torchscript.pt2
layer_num: 40
img_size: [1024, 736]
freeze: True
neck:
target: lib.models.transformer_sa.neck_SA_v3_skip # TODO!! add a self attention version
params:
patch_size: 4 #4,
in_chans: 32 #32, # the uv code dims
num_patches: 9216 #4096 #num_patches #,#4096, # 16*16
embed_dim: 1536 # sapiens' latent dims # 1920 # 1920 for sapiens encoder2 #1024 # the feature extrators outputs
decoder_embed_dim: 128 # 1024
decoder_depth: 2 # 8
decoder_num_heads: 4 #16,
total_num_hidden_states: 12
mlp_ratio: 4.
decoder:
target: lib.models.decoders.UVNDecoder_gender
params:
interp_mode: bilinear
base_layers: [16, 64]
density_layers: [64, 1]
color_layers: [16, 128, 9]
offset_layers: [64, 3]
use_dir_enc: false
dir_layers: [16, 64]
activation: silu
bg_color: 1
sigma_activation: sigmoid
sigmoid_saturation: 0.001
gender: neutral
is_sub2: true ## update, make it into 10w gs points
multires: 0
image_size: [640, 896]
superres: false
focal: 1120
up_cnn_in_channels: 128 # be the same as decoder_embed_dim
reshape_type: VitHead
vithead_param:
in_channels: 128 # be the same as decoder_embed_dim
out_channels: 32
deconv_out_channels: [128, 64]
deconv_kernel_sizes: [4, 4]
conv_out_channels: [128, 128]
conv_kernel_sizes: [3, 3]
fix_sigma: true
dataset:
target: lib.datasets.dataloader.DataModuleFromConfig
params:
batch_size: 1 #16 # 6 for lpips
num_workers: 1 #2
# working when in debug mode
debug_cache_path:./processed_data/flux_batch1_5000_test_50_local.npy
train:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
cache_path: [
./processed_data/deepfashion_train_140_local.npy,
./processed_data/flux_batch1_5000_train_140_local.npy
]
specific_observation_num: 5
better_range: true
first_is_front: true
if_include_video_ref_img: true
prob_include_video_ref_img: 0.5
img_res: [640, 896]
validation:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
load_imgs: true
specific_observation_num: 3
better_range: true
first_is_front: true
img_res: [640, 896]
cache_path: [
./processed_data/flux_batch1_5000_test_50_local.npy,
#./processed_data/flux_batch1_5000_val_10_local.npy
]
lightning:
modelcheckpoint:
params:
every_n_train_steps: 4000 #2000
save_top_k: -1
save_last: true
monitor: 'train/loss_mse' # ADD this logging in the wrapper_sa
mode: "min"
filename: 'sample-synData-epoch{epoch:02d}-val_loss{val/loss:.2f}'
callbacks: {}
trainer:
num_sanity_val_steps: 1
accumulate_grad_batches: 1
gradient_clip_val: 10.0
max_steps: 80000
check_val_every_n_epoch: 1 ## check validation set every 1 training batches in the current epoch
benchmark: true
val_check_interval: 1.0
================================================
FILE: configs/idol_v0.yaml
================================================
debug: True
# code_size: [32, 256, 256]
code_size: [32, 1024, 1024]
model:
# base_learning_rate: 2.0e-04 # yy Need to check
target: lib.SapiensGS_SA_v1
params:
# optimizer add
# use_bf16: true
max_steps: 100_000
warmup_steps: 10_000 #12_000
use_checkpoint: true
lambda_depth_tv: 0.05
lambda_lpips: 10 #2.0
lambda_mse: 20 #1.0
lambda_offset: 1 #offset_weight: 50 mse 20, lpips 0.1
neck_learning_rate: 5e-4
decoder_learning_rate: 5e-4
output_hidden_states: true # if True, will output the hidden states from sapiens shallow layer, for the neck decoder
loss_coef: 0.5
init_iter: 500
scale_weight: 0.01
smplx_path: 'work_dirs/demo_data/Ways_to_Catch_360_clip1.json'
code_reshape: [32, 96, 96]
patch_size: 1
code_activation:
type: tanh
mean: 0.0
std: 0.5
clip_range: 2
grid_size: 64
encoder:
target: lib.models.sapiens.SapiensWrapper_ts
params:
model_path: work_dirs/ckpt/sapiens_1b_epoch_173_torchscript.pt2
# model_path: /apdcephfs_cq8/share_1367250/harriswen/projects/sapiens_convert/checkpoints//sapiens_1b_epoch_173_torchscript.pt2
layer_num: 40
img_size: [1024, 736]
freeze: True
neck:
target: lib.models.transformer_sa.neck_SA_v3_skip # TODO!! add a self attention version
params:
patch_size: 4 #4,
in_chans: 32 #32, # the uv code dims
num_patches: 9216 #4096 #num_patches #,#4096, # 16*16
embed_dim: 1536 # 1920 # 1920 for sapiens encoder2 #1024 # the feature extrators outputs
decoder_embed_dim: 1536 # 1024
decoder_depth: 16 # 8
decoder_num_heads: 16 #16,
total_num_hidden_states: 40
mlp_ratio: 4.
decoder:
target: lib.models.decoders.UVNDecoder_gender
params:
interp_mode: bilinear
base_layers: [16, 64]
density_layers: [64, 1]
color_layers: [16, 128, 9]
offset_layers: [64, 3]
use_dir_enc: false
dir_layers: [16, 64]
activation: silu
bg_color: 1
sigma_activation: sigmoid
sigmoid_saturation: 0.001
gender: neutral
is_sub2: true ## update, make it into 10w gs points
multires: 0
image_size: [640, 896]
superres: false
focal: 1120
up_cnn_in_channels: 1536 # be the same as decoder_embed_dim
reshape_type: VitHead
vithead_param:
in_channels: 1536 # be the same as decoder_embed_dim
out_channels: 32
deconv_out_channels: [512, 512, 512, 256]
deconv_kernel_sizes: [4, 4, 4, 4]
conv_out_channels: [128, 128]
conv_kernel_sizes: [3, 3]
fix_sigma: true
dataset:
target: lib.datasets.dataloader.DataModuleFromConfig
params:
batch_size: 1 #16 # 6 for lpips
num_workers: 2 #2
# working when in debug mode
debug_cache_path: ./processed_data/flux_batch1_5000_test_50_local.npy
train:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
cache_path: [
./processed_data/deepfashion_train_140_local.npy,
./processed_data/flux_batch1_5000_train_140_local.npy
]
specific_observation_num: 5
better_range: true
first_is_front: true
if_include_video_ref_img: true
prob_include_video_ref_img: 0.5
img_res: [640, 896]
validation:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
load_imgs: true
specific_observation_num: 5
better_range: true
first_is_front: true
img_res: [640, 896]
cache_path: [
./processed_data/deepfashion_val_10_local.npy,
./processed_data/flux_batch1_5000_val_10_local.npy
]
lightning:
modelcheckpoint:
params:
every_n_train_steps: 4000 #2000
save_top_k: -1
save_last: true
monitor: 'train/loss_mse' # ADD this logging in the wrapper_sa
mode: "min"
filename: 'sample-synData-epoch{epoch:02d}-val_loss{val/loss:.2f}'
callbacks: {}
trainer:
num_sanity_val_steps: 0
accumulate_grad_batches: 1
gradient_clip_val: 10.0
max_steps: 80000
check_val_every_n_epoch: 1 ## check validation set every 1 training batches in the current epoch
benchmark: true
================================================
FILE: configs/test_dataset.yaml
================================================
dataset:
target: lib.datasets.dataloader.DataModuleFromConfig
params:
batch_size: 1
num_workers: 2
# working when in debug mode
debug_cache_path: ./processed_data/flux_batch1_5000_test_50_local.npy
train:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
cache_path: [
./processed_data/deepfashion_train_140_local.npy,
./processed_data/flux_batch1_5000_train_140_local.npy
]
specific_observation_num: 5
better_range: true
first_is_front: true
if_include_video_ref_img: true
prob_include_video_ref_img: 0.5
img_res: [640, 896]
validation:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
load_imgs: true
specific_observation_num: 5
better_range: true
first_is_front: true
img_res: [640, 896]
cache_path: [
./processed_data/deepfashion_val_10_local.npy,
./processed_data/flux_batch1_5000_val_10_local.npy
]
test:
target: lib.datasets.AvatarDataset
params:
data_prefix: None
load_imgs: true
specific_observation_num: 5
better_range: true
first_is_front: true
img_res: [640, 896]
cache_path: [
./processed_data/deepfashion_test_50_local.npy,
./processed_data/flux_batch1_5000_test_50_local.npy
]
================================================
FILE: data_processing/prepare_cache.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Data preparation script for DeepFashion video dataset.
This script processes video files and their corresponding parameters,
and splits the dataset into train/val/test sets.
"""
import os
import numpy as np
import argparse
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(description="Prepare DeepFashion video dataset")
parser.add_argument(
"--video_dir",
type=str,
default="/apdcephfs/private_harriswen/data/deepfashion/",
help="Base directory containing imageX folders"
)
parser.add_argument(
"--output_dir",
type=str,
default="./",
help="Directory to save the processed data"
)
parser.add_argument(
"--prefix",
type=str,
default="DeepFashion",
help="Prefix for the output file names"
)
parser.add_argument(
"--max_videos",
type=int,
default=20000,
help="Maximum number of videos to process (for creating smaller datasets)"
)
return parser.parse_args()
def prepare_dataset(video_dir, output_dir, prefix, max_total_videos=20000):
"""
Prepare the DeepFashion dataset by processing videos and parameters.
Args:
video_dir: Base directory containing imageX folders
output_dir: Directory to save processed data
prefix: Prefix for output filenames
max_total_videos: Maximum number of videos to process (default: 20000)
"""
# Find all imageX subdirectories
image_dirs = []
for item in os.listdir(video_dir):
if item.startswith("image") and os.path.isdir(os.path.join(video_dir, item)):
image_dirs.append(item)
image_dirs.sort()
print(f"Found {len(image_dirs)} image directories: {image_dirs}")
# Collect all video files
all_video_files = []
all_param_files = []
all_dir_names = []
# import ipdb; ipdb.set_trace()
for image_dir in image_dirs:
videos_path = os.path.join(video_dir, image_dir, "videos")
params_path = os.path.join(video_dir, image_dir, "param")
if not os.path.exists(videos_path):
print(f"Warning: Videos directory not found in {image_dir}, skipping.")
continue
if not os.path.exists(params_path):
print(f"Warning: Parameters directory not found in {image_dir}, skipping.")
continue
# Get list of video names in current directory
param_names = os.listdir(params_path)
# filter the files with .npy extension
param_names = [name for name in param_names if name.endswith(".npy")]
for name in param_names:
video_path = os.path.join(videos_path, name.replace(".npy", ".mp4"))
param_path = os.path.join(params_path, name)
# Check if both video and parameter files exist
if not os.path.exists(video_path):
print(f"Warning: Video file not found: {video_path}, skipping.")
continue
if not os.path.exists(param_path):
print(f"Warning: Parameter file not found: {param_path}, skipping.")
continue
# Add to collection only if both files exist
all_video_files.append(video_path)
all_param_files.append(param_path)
all_dir_names.append(image_dir)
total_videos = len(all_video_files)
print(f"Total valid videos found: {total_videos}")
if total_videos == 0:
print("Error: No valid video-parameter pairs found. Please check your data paths.")
return
# Limit number of videos to process
if max_total_videos < total_videos:
# Randomly shuffle and select first max_total_videos
indices = list(range(total_videos))
np.random.shuffle(indices)
indices = indices[:max_total_videos]
all_video_files = [all_video_files[i] for i in indices]
all_param_files = [all_param_files[i] for i in indices]
all_dir_names = [all_dir_names[i] for i in indices]
print(f"Limiting to {max_total_videos} videos")
# Process videos and parameters
scenes = []
processed_count = 0
skipped_count = 0
for video_path, param_path, dir_name in zip(all_video_files, all_param_files, all_dir_names):
processed_count += 1
case_name = os.path.basename(video_path)
print(f"Processing {processed_count}/{len(all_video_files)}: {dir_name}/{case_name}")
try:
# Create scene dictionary
scenes.append(dict(
video_path=video_path,
image_paths=None, # only fill it for the data in a images sequence instead of a video
param_path=param_path,
image_ref=video_path.replace(".mp4", ".jpg")
))
except Exception as e:
print(f"Error processing {video_path}: {e}")
skipped_count += 1
print(f"Total scenes collected: {len(scenes)}")
print(f"Total scenes skipped: {skipped_count}")
if len(scenes) == 0:
print("Error: No scenes could be processed. Please check your data.")
return
# Split dataset
total_scenes = len(scenes)
test_scenes = scenes[-50:] if total_scenes > 50 else []
val_scenes = scenes[-60:-50] if total_scenes > 60 else []
train_scenes = scenes[:-60] if total_scenes > 60 else scenes
# Save each split
splits = {
"train": train_scenes,
"val": val_scenes,
"test": test_scenes,
"all": scenes
}
# Create output directory
os.makedirs(output_dir, exist_ok=True)
# Save each split to separate file
for split_name, split_data in splits.items():
if not split_data:
continue
cache_path = os.path.join(
output_dir,
f"{prefix}_{split_name}_{len(split_data)}.npy"
)
np.save(cache_path, split_data)
print(f"Saved {split_name} split with {len(split_data)} samples to {cache_path}")
if __name__ == "__main__":
# Parse command line arguments
args = parse_args()
# Prepare and save the dataset
prepare_dataset(args.video_dir, args.output_dir, args.prefix, args.max_videos)
print(f"Done processing {args.video_dir} dataset")
================================================
FILE: data_processing/process_datasets.sh
================================================
#!/bin/bash
# Data processing script for multiple datasets
# This script processes all specified datasets and saves the results to output directories
# Define the list of dataset paths
DATASET_PATHS=(
"/PATH/TO/deepfashion"
"/PATH/TO/flux_batch1_5000"
"/PATH/TO/flux_batch2"
# Add more dataset paths here as needed
)
# Output base directory for processed cache files
OUTPUT_BASE_DIR="./processed_data"
# Maximum videos to process per dataset (set to a smaller number for testing)
# if you want to process all videos, set MAX_VIDEOS to a very large number
MAX_VIDEOS=200
# Process each dataset
for DATASET_PATH in "${DATASET_PATHS[@]}"; do
# Extract dataset name from path (use the last directory name as prefix)
DATASET_NAME=$(basename "$DATASET_PATH")
# Create output directory for this dataset
OUTPUT_DIR="${OUTPUT_BASE_DIR}"
mkdir -p "$OUTPUT_DIR"
echo "===== Processing ${DATASET_NAME} Dataset ====="
echo "Source: ${DATASET_PATH}"
echo "Destination: ${OUTPUT_DIR}"
# Run the processing script
python data_processing/prepare_cache.py \
--video_dir "${DATASET_PATH}" \
--output_dir "${OUTPUT_DIR}" \
--prefix "${DATASET_NAME}" \
--max_videos "${MAX_VIDEOS}"
# Check if processing was successful
if [ $? -ne 0 ]; then
echo "Error processing ${DATASET_NAME} dataset"
echo "Continuing with next dataset..."
else
echo "Successfully processed ${DATASET_NAME} dataset"
fi
echo "----------------------------------------"
done
echo "===== All datasets processing completed ====="
echo "Results saved to: ${OUTPUT_BASE_DIR}"
# List all processed datasets
echo "Processed datasets:"
for DATASET_PATH in "${DATASET_PATHS[@]}"; do
DATASET_NAME=$(basename "$DATASET_PATH")
echo "- ${DATASET_NAME}: ${OUTPUT_BASE_DIR}/${DATASET_NAME}"
done
================================================
FILE: data_processing/visualize_samples.py
================================================
import torch
import numpy as np
import os
os.environ["PYOPENGL_PLATFORM"] = "osmesa"
import smplx
import trimesh
import pyrender
import imageio
def init_smplx_model():
"""Initialize the SMPL-X model with predefined settings."""
body_model = smplx.SMPLX('PATH_TO_YOUR_SMPLX_FOLDER',
gender="neutral",
create_body_pose=False,
create_betas=False,
create_global_orient=False,
create_transl=False,
create_expression=True,
create_jaw_pose=True,
create_leye_pose=True,
create_reye_pose=True,
create_right_hand_pose=False,
create_left_hand_pose=False,
use_pca=False,
num_pca_comps=12,
num_betas=10,
flat_hand_mean=False)
return body_model
# Load SMPL-X parameters
param_path = "./100samples/Apose/param/Argentina_male_buff_thermal wear_20~30 years old_1573.npy"
param = np.load(param_path, allow_pickle=True).item()
# Extract SMPL-X parameters
smpl_params = param['smpl_params'].reshape(1, -1)
scale, transl, global_orient, pose, betas, left_hand_pose, right_hand_pose, jaw_pose, leye_pose, reye_pose, expression = torch.split(smpl_params, [1, 3, 3, 63, 10, 45, 45, 3, 3, 3, 10], dim=1)
# Initialize SMPL-X model and generate vertices
device = torch.device("cpu")
model = init_smplx_model().to(device)
output = model(global_orient=global_orient, body_pose=pose, betas=betas, left_hand_pose=left_hand_pose,
right_hand_pose=right_hand_pose, jaw_pose=jaw_pose, leye_pose=leye_pose, reye_pose=reye_pose,
expression=expression)
vertices = output.vertices[0].detach().cpu().numpy()
faces = model.faces
# Create a Trimesh and Pyrender mesh
mesh = trimesh.Trimesh(vertices, faces)
mesh_pyrender = pyrender.Mesh.from_trimesh(mesh)
rendered_images_list = []
# Loop through multiple camera views
for idx in range(24):
scene = pyrender.Scene()
scene.add(mesh_pyrender)
# Load and process camera parameters
camera_params = param['poses']
intrinsic_params = camera_params[idx][1] # fx, fy, cx, cy
extrinsic_params = camera_params[idx][0] # R|T
# Set up Pyrender camera
camera = pyrender.IntrinsicsCamera(fx=intrinsic_params[0], fy=intrinsic_params[1],
cx=intrinsic_params[2], cy=intrinsic_params[3])
# Convert COLMAP coordinates to Pyrender-compatible transformation
extrinsic_params_inv = torch.inverse(extrinsic_params.clone())
scale_factor = extrinsic_params_inv[:3, :3].norm(dim=1)
extrinsic_params_inv[:3, 1:3] = -extrinsic_params_inv[:3, 1:3]
extrinsic_params_inv[3, :3] = 0
# Add camera and lighting
scene.add(camera, pose=extrinsic_params_inv)
light = pyrender.DirectionalLight(color=[1.0, 1.0, 1.0], intensity=10.0)
scene.add(light, pose=extrinsic_params_inv)
# Render the scene
renderer = pyrender.OffscreenRenderer(640, 896)
color, depth = renderer.render(scene)
rendered_images_list.append(color)
renderer.delete()
# Save rendered images as a video
rendered_images = np.stack(rendered_images_list)
imageio.mimwrite('rendered_results.mp4', rendered_images, fps=15)
print("Rendered results saved as rendered_results.mp4")
# Load an existing video and test alignment
video_path = param_path.replace("param", "videos").replace("npy", "mp4")
input_video = imageio.get_reader(video_path)
input_frames = [frame for frame in input_video]
blended_frames = [(0.5 * frame + 0.5 * render_frame).astype(np.uint8) for render_frame, frame in zip(rendered_images, input_frames)]
imageio.mimwrite('aligned_results.mp4', blended_frames, fps=15)
print("Blended video saved as aligned_results.mp4")
================================================
FILE: dataset/README.md
================================================
# 🌟 HuGe100K Dataset Documentation
## 📊 Dataset Overview
HuGe100K is a large-scale multi-view human dataset featuring diverse attributes, high-fidelity appearances, and well-aligned SMPL-X models.
## 📁 File Format and Structure
The dataset is organized with the following structure:
```
HuGe100K/
├── flux_batch1/
│ ├── images[0...9]/ # different batch of images
│ │ ├── videos/ # Folder for .mp4 and .jpg files
│ │ │ ├── Algeria_female_average_high fashion_50~60 years old_844.jpg
│ │ │ ├── Algeria_female_average_high fashion_50~60 years old_844.mp4
│ │ │ └── ... (more .jpg and .mp4)
│ │ └── param/ # Folder for parameter files (.npy)
│ │ ├── Algeria_female_average_high fashion_50~60 years old_844.npy
│ │ └── ... (more .npy files)
├── flux_batch2/
│ └── ... (similar structure with images[0...9])
├── flux_batch3/
│ └── ... (similar structure with images[0...9])
├── flux_batch4/
│ └── ... (similar structure with images[0...9])
├── flux_batch5/
│ └── ... (similar structure with images[0...9])
├── flux_batch6/
│ └── ... (similar structure with images[0...9])
├── flux_batch7/
│ └── ... (similar structure with images[0...9])
├── flux_batch8/
│ └── ... (similar structure with images[0...9])
├── flux_batch9/
│ └── ... (similar structure with images[0...9])
└── deepfashion/
└── ... (similar structure with images[0...9])
```
Where:
- Each `images[X]` folder contains:
- `videos/`: Reference images and generatedvideo files
- `param/`: Camera and body pose parameters
- **flux_batch1 through flux_batch7**: Contains subjects in A-pose
- **flux_batch8 and flux_batch9**: Contains subjects in diverse poses
- **deepfashion**: Contains subjects in A-pose (derived from the DeepFashion dataset)
### File Naming Convention
Files follow the naming pattern: `Area_Gender_BodyType_Clothing_Age_ID.extension`
For example:
- `Algeria_female_average_high fashion_50~60 years old_844.jpg`: Reference image of an Algerian female with average build in high fashion clothing
- `Algeria_female_average_high fashion_50~60 years old_844.npy`: Parameter file for the same subject
### 📸 Sample Visualization