Full Code of ethz-vlg/mvtracker for AI

main ceea8ad2af77 cached

183 files

1.8 MB

515.4k tokens

1221 symbols

1 requests

Download .txt

Showing preview only (1,934K chars total). Download the full file or copy to clipboard to get everything.

Repository: ethz-vlg/mvtracker
Branch: main
Commit: ceea8ad2af77
Files: 183
Total size: 1.8 MB

Directory structure:
gitextract_p_4pogs3/

├── .gitignore
├── README.md
├── configs/
│   ├── eval.yaml
│   ├── experiment/
│   │   ├── mvtracker.yaml
│   │   ├── mvtracker_overfit.yaml
│   │   └── mvtracker_overfit_mini.yaml
│   ├── model/
│   │   ├── copycat.yaml
│   │   ├── cotracker1_offline.yaml
│   │   ├── cotracker1_online.yaml
│   │   ├── cotracker2_offline.yaml
│   │   ├── cotracker2_online.yaml
│   │   ├── cotracker3_offline.yaml
│   │   ├── cotracker3_online.yaml
│   │   ├── default.yaml
│   │   ├── delta.yaml
│   │   ├── locotrack.yaml
│   │   ├── mvtracker.yaml
│   │   ├── scenetracker.yaml
│   │   ├── spatialtrackerv2.yaml
│   │   ├── spatracker_monocular.yaml
│   │   ├── spatracker_monocular_pretrained.yaml
│   │   ├── spatracker_multiview.yaml
│   │   └── tapip3d.yaml
│   └── train.yaml
├── demo.py
├── hubconf.py
├── mvtracker/
│   ├── __init__.py
│   ├── cli/
│   │   ├── __init__.py
│   │   ├── eval.py
│   │   ├── train.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── helpers.py
│   │       ├── pylogger.py
│   │       └── rich_utils.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── dexycb_multiview_dataset.py
│   │   ├── generic_scene_dataset.py
│   │   ├── kubric_multiview_dataset.py
│   │   ├── panoptic_studio_multiview_dataset.py
│   │   ├── tap_vid_datasets.py
│   │   └── utils.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── evaluator_3dpt.py
│   │   └── metrics.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── copycat.py
│   │   │   ├── cotracker2/
│   │   │   │   ├── __init__.py
│   │   │   │   └── blocks.py
│   │   │   ├── dpt/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_model.py
│   │   │   │   ├── blocks.py
│   │   │   │   ├── midas_net.py
│   │   │   │   ├── models.py
│   │   │   │   ├── transforms.py
│   │   │   │   └── vit.py
│   │   │   ├── dynamic3dgs/
│   │   │   │   ├── LICENSE.md
│   │   │   │   ├── colormap.py
│   │   │   │   ├── export_depths_from_pretrained_checkpoint.py
│   │   │   │   ├── external.py
│   │   │   │   ├── helpers.py
│   │   │   │   ├── merge_tapvid3d_per_camera_annotations.py
│   │   │   │   ├── metadata_dexycb.py
│   │   │   │   ├── metadata_kubric.py
│   │   │   │   ├── reorganize_dexycb.py
│   │   │   │   ├── test.py
│   │   │   │   ├── track_2d.py
│   │   │   │   ├── track_3d.py
│   │   │   │   ├── train.py
│   │   │   │   └── visualize.py
│   │   │   ├── embeddings.py
│   │   │   ├── loftr/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── linear_attention.py
│   │   │   │   └── transformer.py
│   │   │   ├── losses.py
│   │   │   ├── model_utils.py
│   │   │   ├── monocular_baselines.py
│   │   │   ├── mvtracker/
│   │   │   │   ├── __init__.py
│   │   │   │   └── mvtracker.py
│   │   │   ├── ptv3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── model.py
│   │   │   │   └── serialization/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── default.py
│   │   │   │       ├── hilbert.py
│   │   │   │       └── z_order.py
│   │   │   ├── shape-of-motion/
│   │   │   │   ├── .gitignore
│   │   │   │   ├── .gitmodules
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── flow3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── configs.py
│   │   │   │   │   ├── data/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── base_dataset.py
│   │   │   │   │   │   ├── casual_dataset.py
│   │   │   │   │   │   ├── colmap.py
│   │   │   │   │   │   ├── iphone_dataset.py
│   │   │   │   │   │   ├── panoptic_dataset.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   ├── init_utils.py
│   │   │   │   │   ├── loss_utils.py
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   ├── params.py
│   │   │   │   │   ├── renderer.py
│   │   │   │   │   ├── scene_model.py
│   │   │   │   │   ├── tensor_dataclass.py
│   │   │   │   │   ├── trainer.py
│   │   │   │   │   ├── trajectories.py
│   │   │   │   │   ├── transforms.py
│   │   │   │   │   ├── validator.py
│   │   │   │   │   └── vis/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── playback_panel.py
│   │   │   │   │       ├── render_panel.py
│   │   │   │   │       ├── utils.py
│   │   │   │   │       └── viewer.py
│   │   │   │   └── launch_davis.py
│   │   │   ├── spatracker/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── blocks.py
│   │   │   │   ├── softsplat.py
│   │   │   │   ├── spatracker_monocular.py
│   │   │   │   └── spatracker_multiview.py
│   │   │   ├── vggt/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── heads/
│   │   │   │   │   ├── camera_head.py
│   │   │   │   │   ├── dpt_head.py
│   │   │   │   │   ├── head_act.py
│   │   │   │   │   ├── track_head.py
│   │   │   │   │   ├── track_modules/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── base_track_predictor.py
│   │   │   │   │   │   ├── blocks.py
│   │   │   │   │   │   ├── modules.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── layers/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── block.py
│   │   │   │   │   ├── drop_path.py
│   │   │   │   │   ├── layer_scale.py
│   │   │   │   │   ├── mlp.py
│   │   │   │   │   ├── patch_embed.py
│   │   │   │   │   ├── rope.py
│   │   │   │   │   ├── swiglu_ffn.py
│   │   │   │   │   └── vision_transformer.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── aggregator.py
│   │   │   │   │   └── vggt.py
│   │   │   │   └── utils/
│   │   │   │       ├── geometry.py
│   │   │   │       ├── load_fn.py
│   │   │   │       ├── pose_enc.py
│   │   │   │       ├── rotation.py
│   │   │   │       └── visual_track.py
│   │   │   └── vit/
│   │   │       ├── __init__.py
│   │   │       ├── common.py
│   │   │       └── encoder.py
│   │   └── evaluation_predictor_3dpt.py
│   └── utils/
│       ├── __init__.py
│       ├── basic.py
│       ├── eval_utils.py
│       ├── geom.py
│       ├── improc.py
│       ├── misc.py
│       ├── visualizer_mp4.py
│       └── visualizer_rerun.py
├── requirements.full.txt
├── requirements.txt
└── scripts/
    ├── 4ddress_preprocessing.py
    ├── __init__.py
    ├── compare_cdist-topk_against_pointops-knn.py
    ├── dex_ycb_to_neus_format.py
    ├── egoexo4d_preprocessing.py
    ├── estimate_depth_with_duster.py
    ├── hi4d_preprocessing.py
    ├── merge_comparison_mp4s.py
    ├── panoptic_studio_preprocessing.py
    ├── plot_aj_for_varying_depth_noise_levels.py
    ├── plot_aj_for_varying_n_of_views.py
    ├── profiling.md
    ├── selfcap_preprocessing.py
    ├── slurm/
    │   ├── eval.sh
    │   ├── mvtracker-nodepthaugs.sh
    │   ├── mvtracker.sh
    │   ├── spatracker.sh
    │   ├── test_reproducibility.sh
    │   ├── triplane-128.sh
    │   └── triplane-256.sh
    └── summarize_eval_results.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.idea
__pycache__/
*.DS_Store
*.pth
*.pt
*.mp4
*.npy
vis_results/
checkpoints/
logs/
slurm_logs/
submit*
logs*

/running
/datasets
/env.sh
/eular_log
/outputs
/wandb

================================================
FILE: README.md
================================================
<div align="center" style="line-height:1.2; margin:0; padding:0;">
<h1 style="margin-bottom:0em;">Multi-View 3D Point Tracking</h1>

<a href="https://arxiv.org/abs/2508.21060"><img src="https://img.shields.io/badge/arXiv-2508.21060-b31b1b" alt="arXiv"></a>
<a href="https://ethz-vlg.github.io/mvtracker/"><img src="https://img.shields.io/badge/Project%20Page-009688?logo=internetcomputer&logoColor=white" alt="Project Page"></a>
<a href="https://ethz-vlg.github.io/mvtracker/#qualitative-visualization"><img src="https://img.shields.io/badge/Interactive%20Results-673ab7?logo=apachespark&logoColor=white" alt="Interactive Results"></a>
[![](https://img.shields.io/badge/🤗%20Demo-Coming%20soon…-ffcc00)](#)
<br>
[**Frano Rajič**](https://m43.github.io/)<sup>1</sup> · 
[**Haofei Xu**](https://haofeixu.github.io/)<sup>1</sup> · 
[**Marko Mihajlovic**](https://markomih.github.io/)<sup>1</sup> · 
[**Siyuan Li**](https://siyuanliii.github.io/)<sup>1</sup> · 
[**Irem Demir**](https://github.com/iremddemir)<sup>1</sup>  
[**Emircan Gündoğdu**](https://github.com/emircangun)<sup>1</sup> · 
[**Lei Ke**](https://www.kelei.site/)<sup>2</sup> · 
[**Sergey Prokudin**](https://vlg.inf.ethz.ch/team/Dr-Sergey-Prokudin.html)<sup>1,3</sup> · 
[**Marc Pollefeys**](https://people.inf.ethz.ch/marc.pollefeys/)<sup>1,4</sup> · 
[**Siyu Tang**](https://vlg.inf.ethz.ch/team/Prof-Dr-Siyu-Tang.html)<sup>1</sup>
<br>
<sup>1</sup>[ETH Zürich](https://vlg.inf.ethz.ch/) &emsp;
<sup>2</sup>[Carnegie Mellon University](https://www.cmu.edu/) &emsp;
<sup>3</sup>[Balgrist University Hospital](https://www.balgrist.ch/) &emsp;
<sup>4</sup>[Microsoft](https://www.microsoft.com/)
</div>

<p float="left">
  <img alt="selfcap" src="https://github.com/user-attachments/assets/b502d193-c37c-43be-af6c-653b5de7597e" width="48%" /> 
  <img alt="dexycb" src="https://github.com/user-attachments/assets/d14d4c6c-152e-4040-b29b-3da4b7e8b913" width="48%" /> 
  <img alt="4d-dress-stretching" src="https://github.com/user-attachments/assets/f3eabdda-59e1-4032-b345-c4603ea86fc0" width="48%" />
  <img alt="4d-dress-avatarmove" src="https://github.com/user-attachments/assets/3fef9924-84ad-4295-95e2-5b82ae7c3053" width="48%" />
</p>

MVTracker is the first **data-driven multi-view 3D point tracker** for tracking arbitrary 3D points across multiple cameras. It fuses multi-view features into a unified 3D feature point cloud, within which it leverages kNN-based correlation to capture spatiotemporal relationships across views. A transformer then iteratively refines the point tracks, handling occlusions and adapting to varying camera setups without per-sequence optimization.


## Updates

- <ins>August 28, 2025</ins>: Public release.


## Quick Start

This repo was validated on **Python 3.10.12**, **PyTorch 2.3.0** (CUDA 12.1), **cuDNN 8903**, and **gcc 11.3.0**. If you want a fresh minimal environment that runs the Hub demo and `demo.py`:
```bash
conda create -n 3dpt python=3.10.12 -y
conda activate 3dpt
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install -r https://raw.githubusercontent.com/ethz-vlg/mvtracker/refs/heads/main/requirements.txt

# Optional, speeds up the model
pip install --upgrade --no-build-isolation flash-attn==2.5.8  # Speeds up attention
pip install "git+https://github.com/ethz-vlg/pointcept.git@2082918#subdirectory=libs/pointops"  # Speeds up kNN search; may require gcc 11.3.0: conda install -c conda-forge gcc_linux-64=11.3.0 gxx_linux-64=11.3.0 gcc=11.3.0 gxx=11.3.0
```

With the minimal dependencies in place, you can try MVTracker directly via **PyTorch Hub**:
```python
import torch
import numpy as np
from huggingface_hub import hf_hub_download

device = "cuda" if torch.cuda.is_available() else "cpu"
mvtracker = torch.hub.load("ethz-vlg/mvtracker", "mvtracker", pretrained=True, device=device)

# Example input from demo sample (downloaded automatically)
sample = np.load(hf_hub_download("ethz-vlg/mvtracker", "data_sample.npz"))
rgbs = torch.from_numpy(sample["rgbs"]).float()
depths = torch.from_numpy(sample["depths"]).float()
intrs = torch.from_numpy(sample["intrs"]).float()
extrs = torch.from_numpy(sample["extrs"]).float()
query_points = torch.from_numpy(sample["query_points"]).float()

with torch.no_grad():
    results = mvtracker(
        rgbs=rgbs[None].to(device) / 255.0,
        depths=depths[None].to(device),
        intrs=intrs[None].to(device),
        extrs=extrs[None].to(device),
        query_points_3d=query_points[None].to(device),
    )

pred_tracks = results["traj_e"].cpu()  # [T,N,3]
pred_vis = results["vis_e"].cpu()      # [T,N]
print(pred_tracks.shape, pred_vis.shape)
```

Alternatively, you can run our interactive demo:

```bash
python demo.py --rerun save --lightweight
```

By default this saves a lightweight `.rrd` recording (e.g., `mvtracker_demo.rrd`) that you can open in any Rerun viewer. The simplest option is to drag and drop the file into the [online viewer](https://app.rerun.io/version/0.21.0). For the best experience, you can also install Rerun locally (`pip install rerun-sdk==0.21.0; rerun`). Results can be explored interactively in the viewer with WASD/QE navigation, mouse rotation and zoom, and timeline playback controls.

<details>
<summary>[Interactive viewer on a cluster or with GUI support - click to expand]</summary>
  
If you are working on a cluster, you can stream results directly to your laptop by forwarding a port (`ssh -R 9876:localhost:9876 user@cluster`) and then running the demo in streaming mode (`python demo.py --rerun stream`), which sends live data into your local Rerun instance. If you are running the demo locally with GUI support, you can automatically spawn a Rerun window (`python demo.py --rerun spawn`).

</details>


## Installation

You can use a pretrained model directly via **PyTorch Hub** (see Quick Start above), or clone this repo if you want to run our demo, evaluation, or training. We recommend using **PyTorch with CUDA** for best performance. CPU-only runs are possible but very slow.

```bash
git clone https://github.com/ethz-vlg/mvtracker.git
cd mvtracker
```

To extend the conda environment from the Quick Start to support training and evaluation, install the full requirements by running `pip install -r requirements.full.txt`. Baselines based on SpatialTracker V1 also require cupy:
```bash
pip install tensorflow==2.12.1 tensorflow-datasets tensorflow-graphics tensorboard
pip install cupy-cuda12x==12.2.0
python -m cupyx.tools.install_library --cuda 12.x --library cutensor
python -m cupyx.tools.install_library --cuda 12.x --library nccl
python -m cupyx.tools.install_library --cuda 12.x --library cudnn
```


## Datasets

To benchmark multi-view 3D point tracking, we provide preprocessed versions of three datasets:

- **MV-Kubric**: a synthetic training dataset adapted from single-view Kubric into a multi-view setting.  
- **Panoptic Studio**: evaluation benchmark with real-world activities such as basketball, juggling, and toy play (10 sequences).  
- **DexYCB**: evaluation benchmark with real-world hand–object interactions (10 sequences).  

<details>
<summary>[Downloading our preprocessed datasets - click to expand]</summary>
  
You can download and extract them as (~72 GB after extraction):

```bash
# MV-Kubric (simulated + DUSt3R depths)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/kubric-multiview--test.tar.gz -P datasets/
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/kubric-multiview--test--dust3r-depth.tar.gz -P datasets/
tar -xvzf datasets/kubric-multiview--test.tar.gz -C datasets/
tar -xvzf datasets/kubric-multiview--test--dust3r-depth.tar.gz -C datasets/
rm datasets/kubric-multiview*.tar.gz

# Panoptic Studio (optimization-based depth from Dynamic3DGS)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/panoptic-multiview.tar.gz -P datasets/
tar -xvzf datasets/panoptic-multiview.tar.gz -C datasets/
rm datasets/panoptic-multiview.tar.gz

# DexYCB (Kinect + DUSt3R depths)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/dex-ycb-multiview.tar.gz -P datasets/
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/dex-ycb-multiview--dust3r-depth.tar.gz -P datasets/
tar -xvzf datasets/dex-ycb-multiview.tar.gz -C datasets/
tar -xvzf datasets/dex-ycb-multiview--dust3r-depth.tar.gz -C datasets/
rm datasets/dex-ycb-multiview*.tar.gz

# $ du -sch datasets/*
# 31G     kubric-multiview
# 13G     panoptic-multiview
# 29G     dex-ycb-multiview
# 72G     total
```

</details>


<details>
<summary>[Regenerating datasets from scratch - click to expand]</summary>
  
If you wish to regenerate datasets from scratch, we provide scripts with docstrings that explain usage and list the commands we used. For licensing and usage terms, please refer to the original datasets. 
- MV-Kubric data for training and testing can be generated with [ethz-vlg/kubric](https://github.com/ethz-vlg/kubric/blob/multiview-point-tracking/challenges/point_tracking_3d/worker.py).
- DexYCB can be downloaded and labels regenerated using [`scripts/dex_ycb_to_neus_format.py`](./scripts/dex_ycb_to_neus_format.py); note that we have created labels for 10 sequences, but DexYCB is much larger and more labels could be produced if needed.
- Panoptic Studio can be downloaded and labels regenerated using [`scripts/panoptic_studio_preprocessing.py`](./scripts/panoptic_studio_preprocessing.py).
- DUSt3R depths can be produced for any dataset with [`scripts/estimate_depth_with_duster.py`](./scripts/estimate_depth_with_duster.py).
- For unlabeled datasets used only in qualitative experiments, we provide the following preprocessing scripts: [4D-Dress](./scripts/4ddress_preprocessing.py), [Hi4D](./scripts/hi4d_preprocessing.py), [EgoExo4D](./scripts/egoexo4d_preprocessing.py), and [SelfCap](./scripts/selfcap_preprocessing.py).  

</details>

For quick testing, we also release a small **demo sample** (~200 MB):

```bash
python demo.py --random_query_points
```

Our generic loader [`GenericSceneDataset`](./mvtracker/datasets/generic_scene_dataset.py) supports adding new datasets. It can compute depths on the fly with [DUSt3R](https://github.com/naver/dust3r), [VGGT](https://vgg-t.github.io), [MonoFusion](https://imnotprepared.github.io/research/25_DSR/index.html), or [MoGe-2](https://github.com/microsoft/MoGe), and can also estimate camera poses with VGGT.  



## Evaluation

Evaluation is driven by Hydra configs. See [`mvtracker/cli/eval.py`](./mvtracker/cli/eval.py) and [`configs/eval.yaml`](./configs/eval.yaml) for details.

To evaluate MVTracker with our best model, first download the checkpoint from [Hugging Face](https://huggingface.co/ethz-vlg/mvtracker):

```bash
wget https://huggingface.co/ethz-vlg/mvtracker/resolve/main/mvtracker_200000_june2025.pth -P checkpoints/
```

Then run:

```bash
python -m mvtracker.cli.eval \
  experiment_path=logs/mvtracker \
  model=mvtracker \
  datasets.eval.names=[kubric-multiview-v3-views0123] \
  restore_ckpt_path=checkpoints/mvtracker_200000_june2025.pth

# Expected result:
# {
#   "eval_kubric-multiview-v3-views0123/model__ate_visible__dynamic-static-mean": 5.07,
#   "eval_kubric-multiview-v3-views0123/model__average_jaccard__dynamic-static-mean": 81.42,
#   "eval_kubric-multiview-v3-views0123/model__average_pts_within_thresh__dynamic-static-mean": 90.00
# }
```

To evaluate a baseline, e.g. CoTracker3-Online (auto-downloaded checkpoint), run:

```bash
python -m mvtracker.cli.eval experiment_path=logs/cotracker3-online model=cotracker3_online

# Expected result:
# {
#   "eval_panoptic-multiview-views1_7_14_20/model__average_jaccard__any": 74.56
# }
```

For more baselines and dataset setups (e.g. varying camera counts, camera subsets, etc.), see [`scripts/slurm/eval.sh`](./scripts/slurm/eval.sh) for the commands used in our experiments.

<details>
<summary>[Details on evaluation parameters - click to expand]</summary>
  
The evaluation datasets are specified with `datasets.eval.names`. Each name is parsed by the dataset `from_name()` factory (see e.g. [`DexYCBMultiViewDataset.from_name`](./mvtracker/datasets/dexycb_multiview_dataset.py)), which supports modifiers such as `-views`, `-duster`, `-novelviews`, `-removehand`, `-2dpt`, or `-cached`. This makes it easy to select subsets of cameras, enable different depth sources, or ensure deterministic track sampling. The main labeled benchmarks are:
- **Kubric (synthetic)** — e.g. `kubric-multiview-v3-views0123`  
- **Panoptic Studio (real)** — e.g. `panoptic-multiview-views1_7_14_20`  
- **DexYCB (real)** — e.g. `dex-ycb-multiview-views0123`  

For reproducibility of our main results, we also provide *cached* variants of each benchmark, which freeze track selection exactly as used in our paper. Without `-cached`, random seeding ensures reproducibility, but cached versions guarantee identical tracks across environments. The following cached variants are included in the released datasets:
- `kubric-multiview-v3-views0123-cached`  
- `kubric-multiview-v3-duster0123-cached`  
- `panoptic-multiview-views1_7_14_20-cached`  
- `panoptic-multiview-views27_16_14_8-cached`  
- `panoptic-multiview-views1_4_7_11-cached`  
- `dex-ycb-multiview-views0123-cached`  
- `dex-ycb-multiview-duster0123-cached`  

</details>

## Training

To run a small overfitting test that fits into 24 GB GPU RAM:

```bash
python -m mvtracker.cli.train +experiment=mvtracker_overfit_mini
```

For a full-scale MVTracker on an 80 GB GPU:

```bash
python -m mvtracker.cli.train +experiment=mvtracker_overfit
```

## Practical Considerations

<details>
<summary>[Scene normalization - click to expand]</summary>

Performance depends strongly on scene normalization. MVTracker was trained on Kubric with randomized but bounded scales and camera setups. At test time, scenes with very different scales, rotations, or translations must be aligned to this distribution. Our generic loader provides an automatic normalization that assumes the ground plane is parallel to the XY plane. This automatic normalization worked reasonably well for 4D-Dress, Hi4D, EgoExo4D, and SelfCap. For Panoptic and DexYCB, we applied manual similarity transforms, which are encoded in the respective dataloaders. Robust, general-purpose normalization remains an open challenge.  

</details>


<details>
<summary>[Challenges and future directions - click to expand]</summary>

The central challenge in multi-view 3D point tracking is 4D reconstruction: obtaining depth maps that are accurate, temporally consistent, and available in real time, especially under sparse-view setups. MVTracker performs well when sensor depth and camera calibration are provided, but in settings where both must be estimated, errors in reconstruction quickly make tracking unreliable. While learned motion priors help tolerate moderate noise, they cannot replace a robust reconstruction backbone. We believe progress will hinge on methods that jointly solve depth estimation and tracking for mutual refinement, or large-scale foundation models for 4D reconstruction and tracking that fully leverage data and compute. We hope the community will direct future efforts toward this goal.


</details>


## Acknowledgements

Our code builds upon and was inspired by many prior works, including [SpaTracker](https://github.com/henry123-boy/SpaTracker), [CoTracker](https://github.com/facebookresearch/co-tracker), and [DUSt3R](https://github.com/naver/dust3r). We thank the authors for releasing their code and pretrained models. We are also grateful to maintainers of [Rerun](https://rerun.io) for their helpful visualization toolkit.

## Citation

If you find our repository useful, please consider giving it a star ⭐ and citing our work:
```bibtex
@inproceedings{rajic2025mvtracker,
  title     = {Multi-View 3D Point Tracking},
  author    = {Raji{\v{c}}, Frano and Xu, Haofei and Mihajlovic, Marko and Li, Siyuan and Demir, Irem and G{\"u}ndo{\u{g}}du, Emircan and Ke, Lei and Prokudin, Sergey and Pollefeys, Marc and Tang, Siyu},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}
```


================================================
FILE: configs/eval.yaml
================================================
defaults:
  - train
  - _self_

modes:
  eval_only: true

trainer:
  precision: 32-true

# Optional overrides specific to evaluation runs
datasets:
  eval:
    names: [ "panoptic-multiview-views1_7_14_20" ]
    max_seq_len: 1000
evaluation:
  consume_model_stats: false        # whether to report model stats (which can slow down the forward pass)
  evaluator:
    rerun_viz_indices: null
    forward_pass_log_indices: null
    mp4_track_viz_indices: null

#    rerun_viz_indices: [ 0,1,2 ]
#    forward_pass_log_indices: [ 0,1,2 ]
#    mp4_track_viz_indices: [ 0,1,2 ]

#    rerun_viz_indices: [ 0,3,27, 2,23 ]
#    forward_pass_log_indices: null
#    mp4_track_viz_indices: [ 0,3,27, 2,23 ]

#    rerun_viz_indices: [ 0, 7 ]
#    forward_pass_log_indices: null
#    mp4_track_viz_indices: [ 0, 7 ]

#    rerun_viz_indices: [ 0, 5 ]
#    forward_pass_log_indices: null
#    mp4_track_viz_indices: [ 0, 5 ]

#    rerun_viz_indices: [ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29 ]
#    forward_pass_log_indices: [ 0,1,2,3,4 ]
#    mp4_track_viz_indices: [ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29 ]


================================================
FILE: configs/experiment/mvtracker.yaml
================================================
# @package _global_
defaults:
  - override /model: mvtracker

experiment_path: ./logs/mvtracker


================================================
FILE: configs/experiment/mvtracker_overfit.yaml
================================================
# @package _global_
defaults:
  - override /model: mvtracker

experiment_path: ./logs/debug/mvtracker-overfit

datasets:
  root: ./datasets
  train:
    name: kubric-multiview-v3-views0123-training
    batch_size: 1
    sequence_len: 24
    traj_per_sample: 512
    num_workers: 4
  eval:
    names: [kubric-multiview-v3-views0123-overfit-on-training]
    num_workers: 2
    max_seq_len: 1000

trainer:
  num_steps: 1500
  eval_freq: 500
  viz_freq: 500
  save_ckpt_freq: 500
  augment_train_iters: false

augmentations:
  probability: 1.0
  rgb: false
  depth: false
  cropping: true
  variable_trajpersample: false
  scene_transform: false
  camera_params_noise: false
  variable_depth_type: false
  variable_num_views: false

modes:
  tune_per_scene: true
  dont_validate_at_start: true
  do_initial_static_pretrain: false
  pretrain_only: false
  eval_only: false
  debug: false


================================================
FILE: configs/experiment/mvtracker_overfit_mini.yaml
================================================
# @package _global_
defaults:
  - mvtracker_overfit

experiment_path: ./logs/debug/mvtracker-overfit-mini

datasets:
  train:
    traj_per_sample: 8

model:
  fmaps_dim: 32


================================================
FILE: configs/model/copycat.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.copycat.CopyCat


================================================
FILE: configs/model/cotracker1_offline.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOfflineWrapper
    model_name: cotracker2v1
    grid_size: 10


================================================
FILE: configs/model/cotracker1_online.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOnlineWrapper
    model_name: cotracker2v1_online
    grid_size: 10


================================================
FILE: configs/model/cotracker2_offline.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOfflineWrapper
    model_name: cotracker2
    grid_size: 10


================================================
FILE: configs/model/cotracker2_online.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOnlineWrapper
    model_name: cotracker2_online
    grid_size: 10


================================================
FILE: configs/model/cotracker3_offline.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOfflineWrapper
    model_name: cotracker3_offline
    grid_size: 10


================================================
FILE: configs/model/cotracker3_online.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.CoTrackerOnlineWrapper
    model_name: cotracker3_online
    grid_size: 10


================================================
FILE: configs/model/default.yaml
================================================
# @package _global_
model:
  _target_: ???

trainer:
  train_iters: 4

evaluation:
  eval_iters: 4
  interp_shape: null

  predictor_settings:
    kubric:
      visibility_threshold: 0.9
      grid_size: 0
      n_grids_per_view: 1
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0
    dex_ycb:
      visibility_threshold: 0.9
      grid_size: 0
      n_grids_per_view: 1
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0
    panoptic:
      visibility_threshold: 0.9
      grid_size: 0
      n_grids_per_view: 1
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0
    tapvid2d-davis:
      visibility_threshold: 0.9
      grid_size: 0
      n_grids_per_view: 1
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0
    generic:
      visibility_threshold: 0.9
      grid_size: 0
      n_grids_per_view: 1
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0


================================================
FILE: configs/model/delta.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.DELTAWrapper
    ckpt: checkpoints/densetrack3d.pth
    upsample_factor: 4
    grid_size: 20
    return_2d_track: false


================================================
FILE: configs/model/locotrack.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.LocoTrackWrapper
    model_size: base

evaluation:
  interp_shape: [ 256, 256 ]


================================================
FILE: configs/model/mvtracker.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.mvtracker.mvtracker.MVTracker
  sliding_window_len: 12
  stride: 4
  normalize_scene_in_fwd_pass: false
  fmaps_dim: 128
  add_space_attn: true
  num_heads: 6
  hidden_size: 256
  space_depth: 6
  time_depth: 6
  num_virtual_tracks: 64
  use_flash_attention: true
  corr_n_groups: 1
  corr_n_levels: 4
  corr_neighbors: 16
  corr_add_neighbor_offset: true
  corr_add_neighbor_xyz: false
  corr_filter_invalid_depth: false   # slower, but would make sure points with invalid depth are not considered in corr

evaluation:
  interp_shape: [ 384, 512 ]

  predictor_settings:
    kubric:
      visibility_threshold: 0.5
      grid_size: 4
      local_grid_size: 18
    dex_ycb:
      visibility_threshold: 0.01
      grid_size: 4
      local_grid_size: 18
    panoptic:
      visibility_threshold: 0.01
      grid_size: 6
      local_grid_size: 18
    tapvid2d-davis:
      visibility_threshold: 0.01
      grid_size: 6
      n_grids_per_view: 6
      local_grid_size: 0
      local_extent: 50
      sift_size: 0
      num_uniformly_sampled_pts: 0
    generic:
      visibility_threshold: 0.01
      grid_size: 4
      local_grid_size: 18

trainer:
  precision: bf16-mixed


================================================
FILE: configs/model/scenetracker.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.SceneTrackerWrapper
    ckpt: checkpoints/scenetracker-odyssey-200k.pth
    return_2d_track: false

evaluation:
  interp_shape: [ 384, 512 ]


================================================
FILE: configs/model/spatialtrackerv2.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.SpaTrackerV2Wrapper
    model_type: online  # or offline, whichever is better on a specific dataset
    vo_points: 756

evaluation:
  predictor_settings:
    kubric:
      visibility_threshold: 0.01
    dex_ycb:
      visibility_threshold: 0.01
    panoptic:
      visibility_threshold: 0.01


================================================
FILE: configs/model/spatracker_monocular.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.spatracker.spatracker_monocular.SpaTrackerMultiViewAdapter

  sliding_window_len: 12
  stride: 4
  add_space_attn: true
  num_heads: 8
  hidden_size: 384
  space_depth: 6
  time_depth: 6
  triplane_zres: 128

evaluation:
  interp_shape: [ 512, 512 ]    # This checkpoint was trained on 512x512 Kubric sequences
  predictor_settings:
    kubric:
      visibility_threshold: 0.5
      grid_size: 4
      local_grid_size: 18
    dex_ycb:
      visibility_threshold: 0.5
      grid_size: 0
      local_grid_size: 18
    panoptic:
      visibility_threshold: 0.5
      grid_size: 4
      local_grid_size: 18

#restore_ckpt_path: checkpoints/spatracker_monocular_trained-on-kubric-depth_069800.pth
#restore_ckpt_path: checkpoints/spatracker_monocular_trained-on-duster-depth_090800.pth


================================================
FILE: configs/model/spatracker_monocular_pretrained.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.spatracker.spatracker_monocular.SpaTrackerMultiViewAdapter

  sliding_window_len: 12
  stride: 4
  add_space_attn: true
  num_heads: 8
  hidden_size: 384
  space_depth: 6
  time_depth: 6
  triplane_zres: 128

evaluation:
  interp_shape: [ 384, 512 ]
  predictor_settings:
    kubric:
      visibility_threshold: 0.9
      grid_size: 4
      local_grid_size: 18
    dex_ycb:
      visibility_threshold: 0.9
      grid_size: 4
      local_grid_size: 18
    panoptic:
      visibility_threshold: 0.9
      grid_size: 4
      local_grid_size: 18

#restore_ckpt_path: checkpoints/spatracker_monocular_original-authors-ckpt.pth


================================================
FILE: configs/model/spatracker_multiview.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.spatracker.spatracker_multiview.MultiViewSpaTracker

  sliding_window_len: 12
  stride: 4
  add_space_attn: true
  use_3d_pos_embed: true
  remove_zeromlpflow: true
  concat_triplane_features: true
  num_heads: 8
  hidden_size: 384
  space_depth: 6
  time_depth: 6
  fmaps_dim: 128
  triplane_xres: 128
  triplane_yres: 128
  triplane_zres: 128

evaluation:
  interp_shape: [ 512, 512 ]    # This checkpoint was trained on 512x512 Kubric sequences
  predictor_settings:
    kubric:
      visibility_threshold: 0.5
      grid_size: 4
      local_grid_size: 18
    dex_ycb:
      visibility_threshold: 0.01
      grid_size: 4
      local_grid_size: 18
    panoptic:
      visibility_threshold: 0.01
      grid_size: 4
      local_grid_size: 18

#restore_ckpt_path: checkpoints/spatracker_multiview_trained-on-kubric-depth_100000.pth
#model:
#  triplane_xres: 128
#  triplane_yres: 128
#  triplane_zres: 128

#restore_ckpt_path: checkpoints/spatracker_multiview_trained-on-duster-depth_100000.pth
#model:
#  triplane_xres: 256
#  triplane_yres: 256
#  triplane_zres: 128


================================================
FILE: configs/model/tapip3d.yaml
================================================
# @package _global_
defaults:
  - default

model:
  _target_: mvtracker.models.core.monocular_baselines.MonocularToMultiViewAdapter
  model:
    _target_: mvtracker.models.core.monocular_baselines.TAPIP3DWrapper
    ckpt: checkpoints/tapip3d_final.pth
    num_iters: 6
    grid_size: 8
    resolution_factor: 1 # --> [ 384, 512 ]
#    resolution_factor: 2 # --> [ 543, 724 ]

evaluation:
  interp_shape: [ 384, 512 ] # --> resolution_factor = 1
#  interp_shape: [ 543, 724 ] # --> resolution_factor = 2
  predictor_settings:
    kubric:
      visibility_threshold: 0.01
    dex_ycb:
      visibility_threshold: 0.01
    panoptic:
      visibility_threshold: 0.01


================================================
FILE: configs/train.yaml
================================================
defaults:
  - _self_
  - model: mvtracker

experiment_path: ???                # where to store checkpoints, visualizations, etc.
restore_ckpt_path: null             # resume from checkpoint

# === Datasets ===
datasets:
  root: ./datasets
  train:
    name: kubric-multiview-v3-training
    batch_size: 1
    sequence_len: 24                # frames per sequence
    traj_per_sample: 2048           # number of 3D points/trajectories per sample
    max_videos: null                # takes all training videos by default
    kubric_max_depth: 24
    num_workers: 8
  eval:
    names:
      - panoptic-multiview-views1_7_14_20
      - kubric-multiview-v3-overfit-on-training
      - kubric-multiview-v3-views0123
      - kubric-multiview-v3-duster0123
      - dex-ycb-multiview
      - dex-ycb-multiview-duster0123
    num_workers: 4
    max_seq_len: 1000

# === Trainer Settings ===
trainer:
  num_steps: 200000
  eval_freq: 10000
  viz_freq: 10000
  save_ckpt_freq: 500

  lr: 0.0005
  gamma: 0.8
  wdecay: 0.00001
  anneal_strategy: linear
  grad_clip: 1.0
  precision: 16-mixed               # training precision (e.g., 16-mixed, bf16-mixed or 32-true)
  visibility_loss_weight: 0.1

  augment_train_iters: false
  augment_train_iters_warmup: 2000

# === Evaluation Settings ===
evaluation:
  consume_model_stats: false        # whether to report model stats (which can slow down the forward pass)
  evaluator:
    _target_: mvtracker.evaluation.evaluator_3dpt.Evaluator
    rerun_viz_indices: null
    forward_pass_log_indices: null
    mp4_track_viz_indices: [0]

# === Execution Modes ===
modes:
  debug: false                      # enable for quick iteration
  tune_per_scene: false             # overfit to single scene (debugging)
  validate_at_start: false          # run eval before train starts
  do_initial_static_pretrain: false # run static-only phase first
  pretrain_only: false              # stop after static pretraining
  eval_only: false                  # skip training, just run evaluation

  debugging_hotfix_datapoint_path: null  # path to a dumped datapoint (no need to set debug flag)

# === Reproducibility ===
reproducibility:
  # Note that reproducibility will not work if
  # floating point precision is set to 16-mixed,
  # but with 32 it will. Note also that the number
  # of data loading workers (num_workers) might
  # affect reproducibility as well. The number of
  # GPUs surely affects reproducibility.
  seed: 36
  deterministic: false              # speeds up training at expense of determinism

# === Augmentations ===
augmentations:
  probability: 0.8

  rgb: true
  depth: true
  variable_depth_type: true
  variable_num_views: true

  cropping: true
  cropping_size: [384, 512]
  variable_vggt_crop_size: false
  keep_principal_point_centered: false

  variable_trajpersample: true

  scene_transform: true
  camera_params_noise: true
  normalize_scene_following_vggt: false

# === Logging ===
logging:
  log_wandb: false
  wandb_project: mvtracker-ablation
  tags: ["kubric", "3dpt", "multiview"]

# === Extras ===
extras:
  print_config: true                     # pretty print config tree at the start
  ignore_warnings: false                 # disable python warnings if they annoy you
  enable_faulthandler_traceback: false   # enable traceback dump on timeout for debugging of main process hanging
  faulthandler_traceback_timeout: 600    # timeout in seconds before dumping traceback (e.g. 600 = 10 min)

# === Hydra Settings ===
hydra:
  run:
    dir: ${experiment_path}


================================================
FILE: demo.py
================================================
import argparse
import os
import warnings

import numpy as np
import rerun as rr  # pip install rerun-sdk==0.21.0
import torch
from huggingface_hub import hf_hub_download

from mvtracker.utils.visualizer_rerun import log_pointclouds_to_rerun, log_tracks_to_rerun


def main():
    p = argparse.ArgumentParser()
    p.add_argument(
        "--rerun",
        choices=["save", "spawn", "stream"],
        default="save",
        help=(
            "Whether to save recording to disk, spawn a new Rerun instance, or stream to an existing one. "
            "If 'spawn', make sure a rerun window can be spawned in your environment. "
            "If 'stream', make sure a rerun instance is running at port 9876. "
            "If 'save', the recording will be saved to a `.rrd` file that can be drag-and-dropped into "
            "a running rerun viewer, including the online viewer at https://app.rerun.io/version/0.21.0. "
            "For the online viewer, you want to create low memory-usage recordings with --lightweight."
        ),
    )
    p.add_argument(
        "--lightweight",
        action="store_true",
        help=(
            "Use lightweight rerun logging (less memory usage). This is recommended if you want to "
            "view the recording in the online Rerun viewer at https://app.rerun.io/version/0.21.0."
        ),
    )
    p.add_argument(
        "--random_query_points",
        action="store_true",
        help="Use random query points instead of demo ones.",
    )
    p.add_argument(
        "--rrd",
        default="mvtracker_demo.rrd",
        help=(
            "Path to save a .rrd file if `--rerun save` is used. "
            "Note that rerun prefers recordings to have a .rrd suffix."
        ),
    )
    args = p.parse_args()
    np.random.seed(72)
    torch.manual_seed(72)
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Load MVTracker predictor
    mvtracker = torch.hub.load("ethz-vlg/mvtracker", "mvtracker", pretrained=True, device=device)

    # Download demo sample from Hugging Face Hub
    sample_path = hf_hub_download(
        repo_id="ethz-vlg/mvtracker",
        filename="data_sample.npz",
        token=os.getenv("HF_TOKEN"),
        repo_type="model",
    )
    sample = np.load(sample_path)

    rgbs = torch.from_numpy(sample["rgbs"]).float()
    depths = torch.from_numpy(sample["depths"]).float()
    intrs = torch.from_numpy(sample["intrs"]).float()
    extrs = torch.from_numpy(sample["extrs"]).float()
    query_points = torch.from_numpy(sample["query_points"]).float()

    # Optionally, sample random queries in a cylinder of radius 12, height [-1, +10] and replace the demo queries
    if args.random_query_points:
        from mvtracker.models.core.model_utils import init_pointcloud_from_rgbd
        num_queries = 512
        t0 = 0
        xy_radius = 12.0
        z_min, z_max = -1.0, 10.0
        xyz, _ = init_pointcloud_from_rgbd(
            fmaps=rgbs[None],  # [1,V,T,1,H,W], uint8 0–255
            depths=depths[None],  # [1,V,T,1,H,W]
            intrs=intrs[None],  # [1,V,T,3,3]
            extrs=extrs[None],  # [1,V,T,3,4]
            stride=1,
            level=0,
        )
        pts = xyz[t0]  # [V*H*W, 3] at t=0
        assert pts.numel() > 0, "No valid depth points to sample queries from."

        r2 = pts[:, 0] ** 2 + pts[:, 1] ** 2
        mask = (r2 <= xy_radius ** 2) & (pts[:, 2] >= z_min) & (pts[:, 2] <= z_max)
        pool = pts[mask]
        assert pool.shape[0] > 0, "Cylinder mask removed all points; increase radius or z-range."

        idx = torch.randperm(pool.shape[0])[:num_queries]
        pts = pool[idx]
        ts = torch.full((pts.shape[0], 1), float(t0), device=pts.device)
        query_points = torch.cat([ts, pts], dim=1).float()  # (N,4): (t,x,y,z)
        print(f"Sampled {pts.shape[0]} queries from depth at t={t0} within r<={xy_radius}, z∈[{z_min},{z_max}].")

    # Run prediction
    torch.set_float32_matmul_precision("high")
    amp_dtype = torch.bfloat16 if (device == "cuda" and torch.cuda.get_device_capability()[0] >= 8) else torch.float16
    with torch.no_grad(), torch.cuda.amp.autocast(enabled=device == "cuda", dtype=amp_dtype):
        results = mvtracker(
            rgbs=rgbs[None].to(device) / 255.0,
            depths=depths[None].to(device),
            intrs=intrs[None].to(device),
            extrs=extrs[None].to(device),
            query_points_3d=query_points[None].to(device),
        )
    pred_tracks = results["traj_e"].cpu()  # [T,N,3]
    pred_vis = results["vis_e"].cpu()  # [T,N]

    # Visualize results
    rr.init("3dpt", recording_id="v0.16")
    if args.rerun == "stream":
        rr.connect_tcp()
    elif args.rerun == "spawn":
        rr.spawn()
    log_pointclouds_to_rerun(
        dataset_name="demo",
        datapoint_idx=0,
        rgbs=rgbs[None],
        depths=depths[None],
        intrs=intrs[None],
        extrs=extrs[None],
        depths_conf=None,
        conf_thrs=[5.0],
        log_only_confident_pc=False,
        radii=-2.45,
        fps=12,
        bbox_crop=None,
        sphere_radius_crop=12.0,
        sphere_center_crop=np.array([0, 0, 0]),
        log_rgb_image=False,
        log_depthmap_as_image_v1=False,
        log_depthmap_as_image_v2=False,
        log_camera_frustrum=True,
        log_rgb_pointcloud=True,
    )
    log_tracks_to_rerun(
        dataset_name="demo",
        datapoint_idx=0,
        predictor_name="MVTracker",
        gt_trajectories_3d_worldspace=None,
        gt_visibilities_any_view=None,
        query_points_3d=query_points[None],
        pred_trajectories=pred_tracks,
        pred_visibilities=pred_vis,
        per_track_results=None,
        radii_scale=1.0,
        fps=12,
        sphere_radius_crop=12.0,
        sphere_center_crop=np.array([0, 0, 0]),
        log_per_interval_results=False,
        max_tracks_to_log=100 if args.lightweight else None,
        track_batch_size=50,
        method_id=None,
        color_per_method_id=None,
        memory_lightweight_logging=args.lightweight,
    )
    if args.rerun == "save":
        rr.save(args.rrd)
        print(f"Saved Rerun recording to: {os.path.abspath(args.rrd)}")


if __name__ == "__main__":
    warnings.filterwarnings("ignore", message=".*DtypeTensor constructors are no longer.*", module="pointops.query")
    warnings.filterwarnings("ignore", message=".*Plan failed with a cudnnException.*", module="torch.nn.modules.conv")
    main()


================================================
FILE: hubconf.py
================================================
# Copyright (c) ETH VLG.
# Licensed under the terms in the LICENSE file at the root of this repo.

from pathlib import Path
import os
import torch

_WEIGHTS = {
    "mvtracker_main": "hf://ethz-vlg/mvtracker::mvtracker_200000_june2025.pth",
    "mvtracker_cleandepth": "hf://ethz-vlg/mvtracker::mvtracker_200000_june2025_cleandepth.pth",
}


def _load_ckpt(spec: str):
    if spec.startswith("http"):
        return torch.hub.load_state_dict_from_url(spec, map_location="cpu")
    if spec.startswith("hf://"):
        from huggingface_hub import hf_hub_download
        repo_id, filename = spec[len("hf://"):].split("::", 1)
        path = hf_hub_download(repo_id=repo_id, filename=filename, token=os.getenv("HF_TOKEN"))
        return torch.load(path, map_location="cpu")
    path = Path(spec).expanduser().resolve()
    return torch.load(str(path), map_location="cpu")


def _extract_model_state(sd):
    """
    Accept:
      - plain state dict
      - {'state_dict': ...}
      - {'model': ..., 'optimizer': ..., 'scheduler': ..., 'total_steps': ...}
    Returns a clean model state_dict.
    """
    if isinstance(sd, dict):
        if "state_dict" in sd and isinstance(sd["state_dict"], dict):
            sd = sd["state_dict"]
        elif "model" in sd and isinstance(sd["model"], dict):
            sd = sd["model"]
    # Strip optional "model." prefix
    sd = {k.replace("model.", "", 1): v for k, v in sd.items()}
    return sd


def _build_model(**overrides):
    from mvtracker.models.core.mvtracker.mvtracker import MVTracker
    cfg = dict(
        sliding_window_len=12,
        stride=4,
        normalize_scene_in_fwd_pass=False,
        fmaps_dim=128,
        add_space_attn=True,
        num_heads=6,
        hidden_size=256,
        space_depth=6,
        time_depth=6,
        num_virtual_tracks=64,
        use_flash_attention=True,
        corr_n_groups=1,
        corr_n_levels=4,
        corr_neighbors=16,
        corr_add_neighbor_offset=True,
        corr_add_neighbor_xyz=False,
        corr_filter_invalid_depth=False,
    )
    cfg.update(overrides)
    return MVTracker(**cfg)


def _load_into(model, checkpoint_key: str):
    raw = _load_ckpt(_WEIGHTS[checkpoint_key])
    sd = _extract_model_state(raw)
    missing, unexpected = model.load_state_dict(sd, strict=False)
    if unexpected:
        raise RuntimeError(f"Unexpected keys in state_dict: {unexpected}")
    return model


def mvtracker_model(*,
                    pretrained: bool = False,
                    device: str = "cuda",
                    checkpoint: str = "mvtracker_main",
                    **model_kwargs):
    """
    Return a bare MVTracker nn.Module.

    - pretrained=False: random init with model_kwargs.
    - pretrained=True : load from _WEIGHTS[checkpoint], then .eval().
    """
    model = _build_model(**model_kwargs).to(device)
    if pretrained:
        model = _load_into(model, checkpoint)
        model.eval()
    return model


def mvtracker_predictor(*,
                        pretrained: bool = True,
                        device: str = "cuda",
                        checkpoint: str = "mvtracker_main",
                        model_kwargs: dict | None = None,
                        predictor_kwargs: dict | None = None):
    """
    Return EvaluationPredictor wrapped around MVTracker.

    Pass model configuration via `model_kwargs={...}` (matches MVTracker.__init__).
    Pass predictor configuration via `predictor_kwargs={...}`:
      - interp_shape, visibility_threshold, grid_size, n_grids_per_view,
        local_grid_size, local_extent, sift_size, num_uniformly_sampled_pts, n_iters
    """
    from mvtracker.models.evaluation_predictor_3dpt import EvaluationPredictor

    model_kwargs = {} if model_kwargs is None else dict(model_kwargs)
    predictor_kwargs = {} if predictor_kwargs is None else dict(predictor_kwargs)

    predictor_defaults = dict(
        interp_shape=(384, 512),
        visibility_threshold=0.5,
        grid_size=4,
        n_grids_per_view=1,
        local_grid_size=18,
        local_extent=50,
        sift_size=0,
        num_uniformly_sampled_pts=0,
        n_iters=6,
    )
    pk = {**predictor_defaults, **predictor_kwargs}

    model = mvtracker_model(pretrained=pretrained, device=device, checkpoint=checkpoint, **model_kwargs)
    return EvaluationPredictor(multiview_model=model, **pk)


def mvtracker(pretrained: bool = True, device: str = "cuda"):
    """Default public endpoint: predictor with main checkpoint."""
    return mvtracker_predictor(pretrained=pretrained, device=device, checkpoint="mvtracker_main")


def mvtracker_cleandepth(pretrained: bool = True, device: str = "cuda"):
    """Predictor with 'clean depth only' checkpoint."""
    return mvtracker_predictor(pretrained=pretrained, device=device, checkpoint="mvtracker_cleandepth")

================================================
FILE: mvtracker/__init__.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.

# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.


================================================
FILE: mvtracker/cli/__init__.py
================================================


================================================
FILE: mvtracker/cli/eval.py
================================================
import hydra
from omegaconf import DictConfig

from mvtracker.cli.train import main as train_main


@hydra.main(version_base="1.3", config_path="../../configs", config_name="eval")
def main(cfg: DictConfig):
    train_main(cfg)


if __name__ == "__main__":
    main()


================================================
FILE: mvtracker/cli/train.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

import torch

torch.set_float32_matmul_precision('high')

from lightning.fabric.wrappers import _unwrap_objects
from mvtracker.datasets.generic_scene_dataset import GenericSceneDataset

from torch.utils.tensorboard import SummaryWriter
import gpustat
import json
import threading
import warnings
from pathlib import Path

import hydra
import numpy as np
import pandas as pd
import torch.optim as optim
import wandb
from lightning.fabric import Fabric
from lightning.fabric.utilities import AttributeDict
from omegaconf import DictConfig, OmegaConf
from torch import nn
from torch.utils.data import DataLoader
from tqdm import tqdm
import signal, sys

from mvtracker.datasets import KubricMultiViewDataset
from mvtracker.datasets import TapVidDataset
from mvtracker.datasets import kubric_multiview_dataset
from mvtracker.datasets.dexycb_multiview_dataset import DexYCBMultiViewDataset
from mvtracker.datasets.panoptic_studio_multiview_dataset import PanopticStudioMultiViewDataset
from mvtracker.datasets.utils import collate_fn, dataclass_to_cuda_
from mvtracker.models.core.losses import balanced_ce_loss, sequence_loss_3d
from mvtracker.models.core.model_utils import world_space_to_pixel_xy_and_camera_z, pixel_xy_and_camera_z_to_world_space
from mvtracker.models.evaluation_predictor_3dpt import EvaluationPredictor as EvaluationPredictor3D
from mvtracker.utils.visualizer_mp4 import MultiViewVisualizer, Visualizer
from mvtracker.cli.utils import extras
from mvtracker.cli.utils.helpers import maybe_close_wandb

import logging
import os

import torch
import time
from collections import deque
from torchdata.stateful_dataloader import StatefulDataLoader


def fetch_optimizer(trainer_cfg, model):
    """Create the optimizer and learning rate scheduler"""
    optimizer = optim.AdamW(model.parameters(), lr=trainer_cfg.lr, weight_decay=trainer_cfg.wdecay)
    if trainer_cfg.anneal_strategy in ["linear", "cos"]:
        scheduler = optim.lr_scheduler.OneCycleLR(
            optimizer,
            trainer_cfg.lr,
            trainer_cfg.num_steps + 100,
            pct_start=0.05,
            cycle_momentum=False,
            anneal_strategy=trainer_cfg.anneal_strategy,
        )
    elif trainer_cfg.anneal_strategy == "restarts":
        scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(
            optimizer,
            T_0=5000,
            T_mult=1,
            eta_min=trainer_cfg.lr / 1000,
        )

    return optimizer, scheduler


def forward_batch_multi_view(batch, model, cfg, step, train_iters, gamma, save_debug_logs=False, debug_logs_path=''):
    # Per view data
    rgbs = batch.video
    depths = batch.videodepth
    image_features = batch.feats
    intrs = batch.intrs
    extrs = batch.extrs
    gt_trajectories_2d_pixelspace_w_z_cameraspace = batch.trajectory
    gt_visibilities_per_view = batch.visibility
    query_points_3d = batch.query_points_3d

    # Non-per-view data
    gt_trajectories_3d_worldspace = batch.trajectory_3d
    valid_tracks_per_frame = batch.valid
    track_upscaling_factor = batch.track_upscaling_factor

    batch_size, num_views, num_frames, _, height, width = rgbs.shape
    num_points = gt_trajectories_2d_pixelspace_w_z_cameraspace.shape[3]

    # Assert shapes of per-view data
    assert rgbs.shape == (batch_size, num_views, num_frames, 3, height, width)
    assert depths.shape == (batch_size, num_views, num_frames, 1, height, width)
    assert intrs.shape == (batch_size, num_views, num_frames, 3, 3)
    assert extrs.shape == (batch_size, num_views, num_frames, 3, 4)
    assert gt_trajectories_2d_pixelspace_w_z_cameraspace.shape == (batch_size, num_views, num_frames, num_points, 3)
    assert gt_visibilities_per_view.shape == (batch_size, num_views, num_frames, num_points)

    # Assert shapes of non-per-view data
    assert query_points_3d.shape == (batch_size, num_points, 4)
    assert gt_trajectories_3d_worldspace.shape == (batch_size, num_frames, num_points, 3)
    assert valid_tracks_per_frame.shape == (batch_size, num_frames, num_points)

    gt_visibilities_any_view = gt_visibilities_per_view.any(dim=1)
    assert gt_visibilities_any_view.any(dim=1).all(), "All points should be visible at in least one frame."

    for batch_idx in range(batch_size):
        for point_idx in range(num_points):
            t = query_points_3d[batch_idx, point_idx, 0].long().item()
            valid_tracks_per_frame[batch_idx, :t, point_idx] = False

    # Run the model
    results = model(
        rgbs=rgbs,
        depths=depths,
        image_features=image_features,
        query_points=query_points_3d,
        iters=train_iters,
        is_train=True,
        intrs=intrs,
        extrs=extrs,
        save_debug_logs=save_debug_logs,
        debug_logs_path=debug_logs_path,
    )
    pred_trajectories = results["traj_e"]
    pred_visibilities = results["vis_e"]
    vis_predictions = results["train_data"]["vis_predictions"]
    coord_predictions = results["train_data"]["coord_predictions"]
    p_idx_end_list = results["train_data"]["p_idx_end_list"]
    sort_inds = results["train_data"]["sort_inds"]

    # Prepare the ground truth for the loss functions,
    # which expect the data to be in the sliding-window
    vis_gts = []
    traj_gts = []
    valids_gts = []
    query_points_t_min = query_points_3d[:, :, 0].long().min()
    for i, wind_p_idx_end in enumerate(p_idx_end_list):
        gt_visibilities_any_view_sorted = gt_visibilities_any_view[:, :, sort_inds]
        gt_trajectories_3d_worldspace_sorted = gt_trajectories_3d_worldspace[:, :, sort_inds]
        valid_tracks_per_frame_sorted = valid_tracks_per_frame[:, :, sort_inds]
        ind = query_points_t_min + i * (cfg.model.sliding_window_len // 2)
        vis_gts.append(gt_visibilities_any_view_sorted[:, ind: ind + cfg.model.sliding_window_len, :wind_p_idx_end])
        traj_gts.append(
            gt_trajectories_3d_worldspace_sorted[:, ind: ind + cfg.model.sliding_window_len, :wind_p_idx_end])
        valids_gts.append(valid_tracks_per_frame_sorted[:, ind: ind + cfg.model.sliding_window_len, :wind_p_idx_end])

    # Compute the losses
    logging.info(f"[DEBUG] "
                 f"{step=} "
                 f"{track_upscaling_factor=} "
                 f"{coord_predictions[0][0][0, 0, 0]=} "
                 f"{coord_predictions[-1][0][0, 0, 0]=} "
                 f"{vis_predictions[0][0, 0, 0]=} "
                 f"{vis_predictions[-1][0, 0, 0]=}")
    xyz_loss = sequence_loss_3d(coord_predictions, traj_gts, vis_gts, valids_gts, gamma) * track_upscaling_factor
    vis_loss = balanced_ce_loss(vis_predictions, vis_gts, valids_gts)

    # Compute 3DPT metrics
    # eval_3dpt_results_dict = evaluate_3dpt(
    #     gt_tracks=gt_trajectories_3d_worldspace[0].cpu().numpy(),
    #     gt_visibilities=gt_visibilities_any_view[0].cpu().numpy(),
    #     pred_tracks=pred_trajectories[0].detach().cpu().numpy(),
    #     pred_visibilities=(pred_visibilities[0] > 0.5).detach().cpu().numpy(),
    #     evaluation_setting="kubric-multiview",
    #     track_upscaling_factor=track_upscaling_factor,
    #     prefix="train_3dpt",
    #     verbose=False,
    #     query_points=query_points_3d[0].cpu().numpy(),
    # )

    # Invert the intrinsics and extrinsics matrices
    intrs_inv = torch.inverse(intrs.float())
    extrs_square = torch.eye(4).to(extrs.device)[None].repeat(batch_size, num_views, num_frames, 1, 1)
    extrs_square[:, :, :, :3, :] = extrs
    extrs_inv = torch.inverse(extrs_square.float())

    # Project the predictions to pixel space
    pred_trajectories = pred_trajectories[0].detach()
    pred_trajectories_pixel_xy_camera_z_per_view = torch.stack([
        torch.cat(world_space_to_pixel_xy_and_camera_z(
            world_xyz=pred_trajectories,
            intrs=intrs[0, view_idx],
            extrs=extrs[0, view_idx],
        ), dim=-1)
        for view_idx in range(num_views)
    ], dim=0)
    for view_idx in range(num_views):
        pred_trajectories_reproduced = pixel_xy_and_camera_z_to_world_space(
            pixel_xy=pred_trajectories_pixel_xy_camera_z_per_view[view_idx, :, :, :2],
            camera_z=pred_trajectories_pixel_xy_camera_z_per_view[view_idx, :, :, 2:],
            intrs_inv=intrs_inv[0, view_idx],
            extrs_inv=extrs_inv[0, view_idx],
        )
        if not torch.allclose(pred_trajectories_reproduced, pred_trajectories, atol=1):
            warnings.warn(f"Reprojection of the predicted trajectories failed: "
                          f"view_idx={view_idx}, "
                          f"max_diff={torch.max(torch.abs(pred_trajectories_reproduced - pred_trajectories))}")

    logging.info(
        f"{step=}, "
        f"seq={batch.seq_name}, "
        f"{xyz_loss.item()=}, "
        f"{vis_loss.item()=}, "
    )

    output = {
        "flow": {
            "loss": xyz_loss * 1.0,
            "predictions": pred_trajectories_pixel_xy_camera_z_per_view,
            "predictions_worldspace": pred_trajectories,
        },
        "visibility": {
            "loss": vis_loss * cfg.trainer.visibility_loss_weight,
            "predictions": pred_visibilities[0].detach(),
        },
        # "metrics": {
        #     k: v
        #     for k, v in eval_3dpt_results_dict.items()
        #     if "per_track" not in k
        # },
    }
    return output


def run_test_eval(cfg, evaluator, model, dataloaders, writer, step):
    if len(dataloaders) == 0:
        return

    logging.info(f"Eval – GPU usage A: {gpustat.new_query()}")

    log_dir = cfg.experiment_path
    model.eval()
    for ds_name, dataloader in dataloaders:
        if ds_name.startswith("kubric"):
            predictor_settings = cfg.evaluation.predictor_settings["kubric"]
        elif ds_name.startswith("dex-ycb"):
            predictor_settings = cfg.evaluation.predictor_settings["dex_ycb"]
        elif ds_name.startswith("panoptic"):
            predictor_settings = cfg.evaluation.predictor_settings["panoptic"]
        elif ds_name.startswith("tapvid2d-davis"):
            predictor_settings = cfg.evaluation.predictor_settings["tapvid2d-davis"]
        else:
            predictor_settings = cfg.evaluation.predictor_settings["generic"]
            logging.info(f"Using generic predictor settings for dataset with name {ds_name}")

        predictor = EvaluationPredictor3D(
            multiview_model=model,
            interp_shape=cfg.evaluation.interp_shape,
            single_point="single" in ds_name,
            n_iters=cfg.evaluation.eval_iters,
            **predictor_settings
        )

        log_dir_ds = os.path.join(log_dir, f"eval_{ds_name}")
        os.makedirs(log_dir_ds, exist_ok=True)

        if cfg.evaluation.consume_model_stats and hasattr(model, "init_stats"):
            model.init_stats()
        metrics = evaluator.evaluate_sequence(
            model=predictor,
            test_dataloader=dataloader,
            dataset_name=ds_name,
            writer=writer,
            step=step,
            log_dir=log_dir_ds,
        )
        if cfg.evaluation.consume_model_stats and hasattr(model, "consume_stats"):
            model.consume_stats()

        metrics_to_log = {
            k: np.nanmean([v[k] for v in metrics.values() if k in v]).round(2)
            for k in metrics[0].keys()
        }
        for k, v in metrics_to_log.items():
            writer.add_scalar(k, v, step)

        with pd.option_context(
                'display.max_rows', None,
                'display.max_columns', None,
                'display.max_colwidth', None,
                'display.width', None,
        ):
            logging.info(f"Per-sequence Metrics for {ds_name}: {pd.DataFrame(metrics)}")
            logging.info(f"Average metrics for {ds_name}: {json.dumps(metrics_to_log, indent=4)}")

        # Save metrics to csv
        if log_dir_ds is not None:
            df = pd.DataFrame(metrics)
            df = df.T
            assert df.map(lambda x: (len(x) == 1) if isinstance(x, np.ndarray) else True).all().all()
            df = df.map(lambda x: x[0] if isinstance(x, np.ndarray) or isinstance(x, list) else x)
            df.to_csv(f"{log_dir_ds}/step-{step}_metrics.csv")

            df = pd.DataFrame(metrics_to_log, index=["score"])
            df = df.T
            df.to_csv(f"{log_dir_ds}/step-{step}_metrics_avg.csv")
            logging.info(f"Saved metrics to {log_dir_ds}/step-{step}_metrics_avg.csv")
        # logging.info(f"Eval – GPU usage (after {ds_name}): {gpustat.new_query()}")

    # logging.info(f"Eval – GPU usage B: {gpustat.new_query()}")
    del predictor
    del metrics
    # logging.info(f"Eval – GPU usage C: {gpustat.new_query()}")
    torch.cuda.empty_cache()
    # logging.info(f"Eval – GPU usage D: {gpustat.new_query()}")

    model.train()


def augment_train_iters(train_iters: int, current_step: int, warmup_steps: int = 1000) -> int:
    """
    Adaptive iteration scheduler with warmup:
    - During warmup_steps: always return 1
    - After warmup:
        - 10% chance: return 1
        - 15% chance: return random int in [2, train_iters - 1]
        - 75% chance: return train_iters
    """
    if current_step < warmup_steps or train_iters <= 1:
        return 1

    rng = torch.Generator().manual_seed(current_step)
    p = torch.rand(1, generator=rng).item()

    if p < 0.10:
        return 1
    elif p < 0.25 and train_iters > 2:
        mid_candidates = list(range(2, train_iters))
        idx = torch.randint(len(mid_candidates), (1,), generator=rng).item()
        return mid_candidates[idx]
    else:
        return train_iters


@hydra.main(version_base="1.3", config_path="../../configs", config_name="train.yaml")
@maybe_close_wandb
def main(cfg: DictConfig):
    """Main entry point for training.

    :param cfg: DictConfig configuration composed by Hydra.
    :return: Optional[float] with optimized metric value.
    """
    extras(cfg)
    Path(cfg.experiment_path).mkdir(exist_ok=True, parents=True)

    num_nodes = int(os.environ.get("SLURM_JOB_NUM_NODES", 1))
    devices = int(os.environ.get("SLURM_GPUS_PER_NODE", torch.cuda.device_count()))
    logging.info(f"SLURM job num nodes: {num_nodes}")
    logging.info(f"SLURM tasks per node (devices): {devices}")

    from lightning.fabric.strategies import DDPStrategy
    fabric = Fabric(
        num_nodes=num_nodes,
        devices=devices,
        precision=cfg.trainer.precision,
        strategy=DDPStrategy(find_unused_parameters=True),
    )
    fabric.launch()
    fabric.seed_everything(cfg.reproducibility.seed, workers=True)
    if cfg.reproducibility.deterministic:
        torch.use_deterministic_algorithms(True)
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
        torch.autograd.set_detect_anomaly(True)

    if cfg.logging.get("log_wandb", False) and fabric.global_rank == 0:
        exp_name = cfg.experiment_path.replace("./logs/", "").replace("/", "_").replace("\\", "_")
        wandb.init(
            project=cfg.logging.wandb_project,
            name=exp_name,
            tags=cfg.logging.get("tags", []),
            config=OmegaConf.to_container(cfg, resolve=True),
            sync_tensorboard=True,
        )

    original_numpy = torch.Tensor.numpy

    def patched_numpy(self, *args, **kwargs):
        if self.dtype == torch.bfloat16:
            return original_numpy(self.float(), *args, **kwargs)
        return original_numpy(self, *args, **kwargs)

    torch.Tensor.numpy = patched_numpy

    eval_dataloaders = []
    for dataset_name in cfg.datasets.eval.names:
        if dataset_name.startswith("tapvid2d-davis-"):
            eval_dataset = TapVidDataset.from_name(dataset_name, cfg.datasets.root)
        elif dataset_name.startswith("kubric-multiview-v3-25views"):
            kubric_kwargs = {
                "data_root": os.path.join(cfg.datasets.root, "kubric_multiview_003", "kubric_25_view"),
                "seq_len": 24,
                "traj_per_sample": 200,
                "seed": 72,
                "sample_vis_1st_frame": True,
                "tune_per_scene": False,
                "max_videos": 30,
                "use_duster_depths": False,
                "duster_views": None,
                "clean_duster_depths": False,
                "views_to_return": list(range(20)),
                "novel_views": list(range(20, 25)),
                "num_views": -1,
                "depth_noise_std": 0,
            }
            eval_dataset = KubricMultiViewDataset(**kubric_kwargs)
        elif dataset_name.startswith("kubric-multiview-v3"):
            eval_dataset = KubricMultiViewDataset.from_name(dataset_name, cfg.datasets.root, cfg)
        elif dataset_name.startswith("panoptic-multiview"):
            eval_dataset = PanopticStudioMultiViewDataset.from_name(dataset_name, cfg.datasets.root)
        elif dataset_name.startswith("dex-ycb-multiview"):
            eval_dataset = DexYCBMultiViewDataset.from_name(dataset_name, cfg.datasets.root)
        elif dataset_name == "egoexo4d":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/egoexo4d-processed/maxframes-300_downsample-1_downscale-512/",
                drop_first_n_frames=44,
            )
        elif dataset_name == "4d-dress":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/4d-dress-processed-resized-512-selection",
                use_duster_depths=False,
            )
        elif dataset_name == "hi4d":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/hi4d-processed-resized-512",
                use_duster_depths=False,
                use_vggt_depths_with_aligned_cameras=True,
            )
        elif dataset_name == "selfcap-v1":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-False_startframe-90_maxframes-256_downsample-10_downscale-512/",
                drop_first_n_frames=72,
            )
        elif dataset_name == "selfcap-v2":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-True_startframe-90_maxframes-256_downsample-10_downscale-512/",
                drop_first_n_frames=72,
            )
        elif dataset_name == "selfcap-v3":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-False_startframe-90_maxframes-256_downsample-20_downscale-512/",
                drop_first_n_frames=36,
            )
        elif dataset_name == "selfcap-v4":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-False_startframe-90_maxframes-256_downsample-30_downscale-512/",
                drop_first_n_frames=24,
            )
        elif dataset_name == "selfcap-v5":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-False_startframe-90_maxframes-256_downsample-5_downscale-512/",
                drop_first_n_frames=144,
            )
        elif dataset_name == "selfcap-v6":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-8-seq-False_startframe-90_maxframes-2560_downsample-10_downscale-512/",
                drop_first_n_frames=44,
            )
        elif dataset_name == "selfcap-v7":
            eval_dataset = GenericSceneDataset(
                dataset_dir="datasets/selfcap-processed/numcams-4-seq-False_startframe-90_maxframes-256_downsample-10_downscale-512/",
                drop_first_n_frames=72,
            )
        else:
            raise ValueError(f"Dataset {dataset_name} not supported for evaluation.")
        eval_dataloader = torch.utils.data.DataLoader(
            eval_dataset,
            batch_size=1,
            shuffle=False,
            num_workers=cfg.datasets.eval.num_workers,
            collate_fn=collate_fn,
        )
        eval_dataloaders.append((dataset_name, eval_dataloader))

    # # Let each rank handle a subset of the evaluation dataloaders
    # eval_dataloaders_for_rank = []
    # for idx, (dset_name, dset_loader) in enumerate(eval_dataloaders):
    #     if (idx % fabric.world_size) == fabric.global_rank:
    #         eval_dataloaders_for_rank.append((dset_name, fabric.setup_dataloaders(dset_loader)))
    # eval_dataloaders = eval_dataloaders_for_rank

    train_viz_save_dir = os.path.join(cfg.experiment_path, f"train_{cfg.datasets.train.name}")
    os.makedirs(train_viz_save_dir, exist_ok=True)
    visualizer = MultiViewVisualizer(
        save_dir=train_viz_save_dir,
        pad_value=16,
        fps=12,
        show_first_frame=0,
        tracks_leave_trace=0,
    )

    evaluator = hydra.utils.instantiate(cfg.evaluation.evaluator)

    if cfg.modes.do_initial_static_pretrain and not cfg.modes.eval_only:
        pretraining_datasets = [
            kubric_multiview_dataset.KubricMultiViewDataset(
                data_root=os.path.join(cfg.datasets.root, "kubric_multiview_003", "train"),
                traj_per_sample=cfg.datasets.train.traj_per_sample,
                ratio_dynamic=0.1,
                ratio_very_dynamic=0.0,
                num_views=4,
                enable_cropping_augs=cfg.augmentations.cropping,

                seq_len=seq_len,
                static_cropping=static_cropping,
                max_videos=max_videos,
            )
            for seq_len, static_cropping, max_videos in [
                (12, True, 500),
                (18, True, 500),
                (24, True, 1000),
                (24, False, 2000),
            ]
        ]
        pretraining_dataset = torch.utils.data.ConcatDataset(pretraining_datasets)
        pretraining_dataloader = StatefulDataLoader(
            pretraining_dataset,
            batch_size=cfg.datasets.train.batch_size,
            shuffle=False,
            num_workers=cfg.datasets.train.num_workers,
            pin_memory=True,
            pin_memory_device="cuda",
            collate_fn=collate_fn,
            drop_last=True,
            in_order=cfg.reproducibility.deterministic,
        )
        pretraining_dataloader = fabric.setup_dataloaders(pretraining_dataloader)
    else:
        pretraining_dataloader = None

    if cfg.modes.eval_only:
        train_dataset = None
    elif cfg.datasets.train.name.startswith("kubric-multiview-v3"):
        train_dataset = KubricMultiViewDataset.from_name(cfg.datasets.train.name, cfg.datasets.root, cfg, fabric)
    else:
        raise ValueError(f"Dataset {cfg.datasets.train.name} not supported for training")

    if not cfg.modes.eval_only:
        train_loader = StatefulDataLoader(
            train_dataset,
            batch_size=cfg.datasets.train.batch_size,
            shuffle=True,
            num_workers=cfg.datasets.train.num_workers,
            pin_memory=True,
            collate_fn=collate_fn,
            drop_last=True,
            prefetch_factor=4 if cfg.datasets.train.num_workers > 0 else None,
            in_order=cfg.reproducibility.deterministic,
        )
        # eval_dataloaders += [("kubric-multiview-v3-training", train_loader)]
        train_loader = fabric.setup_dataloaders(train_loader)
        logging.info(f"LEN TRAIN LOADER={len(train_loader)}")
        num_epochs = cfg.trainer.num_steps // len(train_loader) + 1 + (1 if cfg.modes.do_initial_static_pretrain else 0)
        if cfg.modes.do_initial_static_pretrain:
            cfg.trainer.num_steps += len(pretraining_dataloader)
    else:
        train_loader = None
        num_epochs = None

    epoch = -1
    total_steps = 0

    model: nn.Module = hydra.utils.instantiate(cfg.model)
    model.cuda()
    optimizer, scheduler = fetch_optimizer(cfg.trainer, model)
    model, optimizer = fabric.setup(model, optimizer)

    folder_ckpts = [
        f
        for f in os.listdir(cfg.experiment_path)
        if f.endswith(".pth")
           and not os.path.isdir(f)
           and not "final" in f
           and not "unwrap_model" in f
           and not "unwrap_module" in f
    ]
    logging.info(f"Found {len(folder_ckpts)} checkpoints: {folder_ckpts}")
    if len(folder_ckpts) > 0:
        # We can load this checkpoint directly since we have saved it during training
        ckpt_name = sorted(folder_ckpts)[-1]
        experiment_path = os.path.join(cfg.experiment_path, ckpt_name)
        state = AttributeDict(
            model=model,
            optimizer=optimizer,
            scheduler=scheduler,
            total_steps=total_steps,
        )
        logging.info(f"Total steps before loading checkpoint: {total_steps}")
        fabric.load(experiment_path, state)
        total_steps = state.total_steps  # Integers are immutable, so they cannot be changed inplace
        if train_loader is not None:
            epoch = total_steps // len(train_loader) - 1
        logging.info(f"Loaded checkpoint {experiment_path} (total_steps={total_steps})")
        logging.info(f"Total steps after loading checkpoint: {total_steps}")

    elif cfg.restore_ckpt_path is not None:
        restore_ckpt_path = cfg.restore_ckpt_path
        assert restore_ckpt_path.endswith(".pth")
        logging.info(f"Restoring pre-trained weights from {os.path.abspath(restore_ckpt_path)}")
        training_ckpt = "total_steps" in torch.load(restore_ckpt_path)
        if training_ckpt:
            # Loading a checkpoint saved by fabric during training
            logging.info("Trying to load as a training checkpoint...")
            state = AttributeDict(model=model)
            try:
                fabric.load(restore_ckpt_path, state, strict=True)
            except RuntimeError as e:
                logging.warning(f"Failed to load weights with from {restore_ckpt_path} with strict=True: {e}. "
                                f"Trying again with strict=False.")
                fabric.load(restore_ckpt_path, state, strict=False)
            logging.info(f"Loaded checkpoint {restore_ckpt_path}")
        else:
            fabric.load_raw(restore_ckpt_path, model)

    tb_writer = SummaryWriter(log_dir=os.path.join(cfg.experiment_path, f"runs_{fabric.global_rank}"))
    if cfg.modes.eval_only or cfg.modes.validate_at_start:
        run_test_eval(cfg, evaluator, model, eval_dataloaders, tb_writer, total_steps - 1)
        fabric.barrier()
        if cfg.modes.eval_only:
            return

    total_durations = deque()
    dataloader_durations = deque()
    fwd_durations = deque()
    sync_durations = deque()
    bwd_durations = deque()
    timing_log_freq = 100

    def handle_sigterm(signum, frame):
        logging.error(f"Signal {signum} received, saving checkpoint and exiting...")
        ckpt_iter = "0" * (6 - len(str(total_steps))) + str(total_steps)
        save_path = Path(f"{cfg.experiment_path}/model_{ckpt_iter}.pth")
        state = AttributeDict(
            model=model,
            optimizer=optimizer,
            scheduler=scheduler,
            total_steps=total_steps + 1,
        )
        fabric.save(save_path, state)
        logging.info(f"Saved checkpoint to {save_path}. Waiting for all ranks to finish...")
        fabric.barrier()
        logging.info(f"Calling sys.exit(0) now.")
        sys.exit(0)

    signal.signal(signal.SIGUSR1, handle_sigterm)
    signal.signal(signal.SIGTERM, handle_sigterm)
    logging.info(f"Registered signal handlers for SIGUSR1 and SIGTERM.")

    model.train()
    should_keep_training = True if cfg.trainer.num_steps > 0 else False
    total_batches_loaded = 0
    total_batches_failed = 0
    if fabric.global_rank == 0:
        tqdm_total_steps = tqdm(
            total=cfg.trainer.num_steps,
            desc=f"Total Training Progress (rank={fabric.global_rank})",
            unit="batch",
            initial=total_steps,
            position=0,
        )
    threads = []
    had_run_pretraining_epoch = cfg.modes.do_initial_static_pretrain and total_steps > len(pretraining_dataloader)
    logging.info(f"{total_steps=}, {epoch=}/{num_epochs}, {had_run_pretraining_epoch=}")
    while should_keep_training:
        epoch += 1
        i_batch = -1

        if cfg.modes.do_initial_static_pretrain and not had_run_pretraining_epoch:
            had_run_pretraining_epoch = True
            data_iter = iter(pretraining_dataloader)
            n_batches = len(pretraining_dataloader)
        else:
            data_iter = iter(train_loader)
            n_batches = len(train_loader)
        if fabric.global_rank == 0:
            tqdm_epoch = tqdm(total=n_batches, desc=f"Epoch {epoch + 1}/{num_epochs}", unit="batch", position=1)

        while i_batch < n_batches:
            start_time_1 = time.time()
            logging.info(f"Gonna load batch {i_batch + 1}/{n_batches} (rank={fabric.global_rank})")
            try:
                batch = next(data_iter)
            except StopIteration:
                data_iter = iter(train_loader)
                n_batches = len(train_loader)
                batch = next(data_iter)

            batch, gotit = batch
            total_batches_loaded += 1

            if cfg.modes.debugging_hotfix_datapoint_path is not None:
                logging.info(f"Debugging hotfix: loading batch from {cfg.modes.debugging_hotfix_datapoint_path}")
                batch = torch.load(cfg.modes.debugging_hotfix_datapoint_path, map_location="cuda:0")
                logging.info(f"Debugging hotfix: loaded batch {batch.seq_name} "
                             f"with {len(batch.video)} views and {batch.video.shape[2]} frames")

            if not all(gotit):
                total_batches_failed += 1
                logging.info(f"batch is None: "
                             f"failed {total_batches_failed} / {total_batches_loaded} "
                             f"({total_batches_failed / total_batches_loaded * 100:.2f}%) batches")
                continue

            i_batch += 1
            dataclass_to_cuda_(batch)
            assert model.training

            start_time_2 = time.time()
            dataloader_duration = start_time_2 - start_time_1
            logging.info(f"Datapoint: {batch.seq_name} (Waited for {dataloader_duration:>5.2f}s)")

            train_iters = cfg.trainer.train_iters
            if cfg.trainer.augment_train_iters:
                train_iters = augment_train_iters(train_iters, total_steps, cfg.trainer.augment_train_iters_warmup)
            optimizer.zero_grad()

            try:
                output = forward_batch_multi_view(
                    batch=batch,
                    model=model,
                    cfg=cfg,
                    step=total_steps,
                    train_iters=train_iters,
                    gamma=cfg.trainer.gamma,
                    save_debug_logs=(
                            ((total_steps % cfg.trainer.viz_freq) == (cfg.trainer.viz_freq - 1))
                            or (total_steps in [0, 10, 100, cfg.trainer.num_steps - 1])
                    ),
                    debug_logs_path=os.path.join(
                        cfg.experiment_path,
                        f'forward_pass__train_step-{total_steps}_global_rank-{fabric.global_rank}'
                    ),
                )
            except Exception as e:
                logging.critical(f"Forward pass crashed at step {total_steps}: {e}")

                # Save current checkpoint
                save_path = Path(f"{cfg.experiment_path}/test_{total_steps:06d}.pth")
                state = AttributeDict(
                    model=model,
                    optimizer=optimizer,
                    scheduler=scheduler,
                    total_steps=total_steps + 1,
                )
                fabric._strategy.checkpoint_io.save_checkpoint(
                    checkpoint=fabric._strategy._convert_stateful_objects_in_state(_unwrap_objects(state), filter={}),
                    path=save_path,
                )
                logging.info(f"Saved crash checkpoint to {save_path}")

                # Save the batch
                batch_path = Path(f"{cfg.experiment_path}/crash_batch_step_{total_steps:06d}.pt")
                try:
                    torch.save(batch, batch_path)
                    logging.info(f"Saved crashing batch to {batch_path}")
                except Exception as batch_exc:
                    logging.error(f"Failed to save crashing batch as .pt: {batch_exc}")

                raise  # re-raise to crash the job after saving artifacts

            loss = torch.tensor(0.0).cuda()
            for k, v in output.items():
                if k == "metrics":
                    for metric_name, metric_value in v.items():
                        tb_writer.add_scalar(metric_name, metric_value, total_steps)
                elif "loss" in v:
                    loss += v["loss"]
                    tb_writer.add_scalar(f"live_{k}_loss", v["loss"].item(), total_steps)
                else:
                    raise ValueError(f"Unknown key {k} in output")

            start_time_3 = time.time()
            fwd_duration = start_time_3 - start_time_2

            fabric.barrier()

            start_time_4 = time.time()
            sync_duration = start_time_4 - start_time_3

            fabric.backward(loss)
            # Log a limited number of grad + optimizer state pairs, also log current learning rate
            if (total_steps <= 10) or (total_steps % cfg.trainer.viz_freq == 0):
                log_limit = 5
                logged = 0
                prefix = f"[DEBUG] [RANK={fabric.global_rank:03d}]"
                logging.info(f"{prefix} RNG seed: {torch.initial_seed()}")
                logging.info(f"{prefix} Step={total_steps} – Gradients and Optimizer State")
                for name, param in model.named_parameters():
                    if param.grad is not None and param in optimizer.state:
                        state = optimizer.state[param]
                        exp_avg_norm = state['exp_avg'].norm().item() if 'exp_avg' in state else float('nan')
                        exp_avg_sq_norm = state['exp_avg_sq'].norm().item() if 'exp_avg_sq' in state else float('nan')
                        grad_norm = param.grad.norm().item()
                        logging.info(
                            f"{prefix} Param: {name:<60s} | "
                            f"grad_norm={grad_norm:>14.9f} | "
                            f"exp_avg_norm={exp_avg_norm:>14.9f} | "
                            f"exp_avg_sq_norm={exp_avg_sq_norm:>14.9f}"
                        )
                        logged += 1
                        if logged >= log_limit:
                            break
                for name, param in model.named_parameters():
                    if param.grad_fn:
                        print(f"{prefix} {name} grad_fn: {param.grad_fn}")
                logging.info(f"{prefix} LR at step {total_steps}: {scheduler.get_last_lr()}")
            fabric.clip_gradients(model, optimizer, clip_val=cfg.trainer.grad_clip)
            optimizer.step()
            scheduler.step()

            start_time_5 = time.time()
            bwd_duration = start_time_5 - start_time_4

            if fabric.global_rank == 0:
                if (total_steps % cfg.trainer.viz_freq == 0) or (
                        total_steps == cfg.trainer.num_steps - 1) or total_steps in [0, 10, 100]:
                    logging.info(f"Creating training viz logs (rank: {fabric.global_rank}, step: {total_steps})")
                    video = batch.video.clone().cpu()
                    video_depth = batch.videodepth.clone().cpu()
                    gt_viz, vector_colors = visualizer.visualize(
                        video=video,
                        video_depth=video_depth,
                        tracks=batch.trajectory.clone().cpu(),
                        visibility=batch.visibility.clone().cpu(),
                        query_frame=batch.query_points_3d[..., 0].long().clone().cpu(),
                        filename="train_gt_traj",
                        writer=tb_writer,
                        step=total_steps,
                        save_video=False,
                    )
                    pred_viz, _ = visualizer.visualize(
                        video=video,
                        video_depth=video_depth,
                        tracks=output["flow"]["predictions"][None].cpu(),
                        visibility=(output["visibility"]["predictions"][None] > 0.5).cpu(),
                        query_frame=batch.query_points_3d[..., 0].long().clone().cpu(),
                        filename="train_pred_traj",
                        writer=tb_writer,
                        step=total_steps,
                        save_video=False,
                    )
                    viz = torch.cat([gt_viz[..., :gt_viz.shape[-1] // 2], pred_viz], dim=-1)
                    thread = threading.Thread(
                        target=Visualizer.save_video,
                        args=(viz, visualizer.save_dir, f"train", tb_writer, visualizer.fps, total_steps)
                    )
                    thread.start()
                    threads.append(thread)

                if len(output) > 1:
                    tb_writer.add_scalar(f"live_total_loss", loss.item(), total_steps)
                tb_writer.add_scalar(f"learning_rate", optimizer.param_groups[0]["lr"], total_steps)

            if total_steps % cfg.trainer.save_ckpt_freq == 0:
                ckpt_iter = "0" * (6 - len(str(total_steps))) + str(total_steps)
                save_path = Path(f"{cfg.experiment_path}/model_{ckpt_iter}.pth")
                logging.info(f"Saving file {save_path}")
                state = AttributeDict(
                    model=model,
                    optimizer=optimizer,
                    scheduler=scheduler,
                    total_steps=total_steps + 1,
                )
                fabric.save(save_path, state)

            if total_steps % cfg.trainer.eval_freq == 0 and total_steps > 1:
                run_test_eval(cfg, evaluator, model, eval_dataloaders, tb_writer, total_steps)
                fabric.barrier()

            total_steps += 1
            if fabric.global_rank == 0:
                tqdm_epoch.update(1)
                tqdm_total_steps.update(1)
                tqdm_epoch.set_postfix(
                    loss=loss.item(),
                    lr=optimizer.param_groups[0]["lr"],
                    train_iters=cfg.trainer.train_iters,
                    gamma=cfg.trainer.gamma,
                    seq_name=batch.seq_name,
                )

            total_duration = time.time() - start_time_1
            logging.info(
                f"[timing:{total_steps:06d}] "
                f"Total: {total_duration:>6.2f}s | "
                f"Data: {dataloader_duration:>6.2f}s | "
                f"Fwd: {fwd_duration:>6.2f}s | "
                f"Sync: {sync_duration:>6.2f}s | "
                f"Bwd: {bwd_duration:>6.2f}s | "
            )
            if fabric.global_rank == 0:
                dataloader_durations.append(dataloader_duration)
                fwd_durations.append(fwd_duration)
                sync_durations.append(sync_duration)
                bwd_durations.append(bwd_duration)
                total_durations.append(total_duration)

                tb_writer.add_scalar(f"timing/step", total_duration, total_steps)
                tb_writer.add_scalar(f"timing/only_fwd", fwd_durations[-1], total_steps)
                tb_writer.add_scalar(f"timing/only_sync", sync_durations[-1], total_steps)
                tb_writer.add_scalar(f"timing/only_bwd", bwd_durations[-1], total_steps)
                tb_writer.add_scalar(f"timing/only_dataloader", dataloader_duration, total_steps)

                if len(total_durations) >= timing_log_freq:
                    total_durations_np = np.array(total_durations)
                    fwd_durations_np = np.array(fwd_durations)
                    sync_durations_np = np.array(sync_durations)
                    bwd_durations_np = np.array(bwd_durations)
                    dataloader_durations_np = np.array(dataloader_durations)

                    total_duration_mean = np.mean(total_durations_np)
                    fwd_duration_mean = np.mean(fwd_durations_np)
                    sync_duration_mean = np.mean(sync_durations_np)
                    bwd_duration_mean = np.mean(bwd_durations_np)
                    dataloader_duration_mean = np.mean(dataloader_durations_np)

                    total_duration_median = np.median(total_durations_np)
                    fwd_duration_median = np.median(fwd_durations_np)
                    sync_duration_median = np.median(sync_durations_np)
                    bwd_duration_median = np.median(bwd_durations_np)
                    dataloader_duration_median = np.median(dataloader_durations_np)

                    total_duration_std = np.std(total_durations_np)
                    fwd_duration_std = np.std(fwd_durations_np)
                    sync_duration_std = np.std(sync_durations_np)
                    bwd_duration_std = np.std(bwd_durations_np)
                    dataloader_duration_std = np.std(dataloader_durations_np)

                    tb_writer.add_scalar("timing/step_mean", total_duration_mean, total_steps)
                    tb_writer.add_scalar("timing/step_median", total_duration_median, total_steps)
                    tb_writer.add_scalar("timing/only_fwd_mean", fwd_duration_mean, total_steps)
                    tb_writer.add_scalar("timing/only_fwd_median", fwd_duration_median, total_steps)
                    tb_writer.add_scalar("timing/only_sync_mean", sync_duration_mean, total_steps)
                    tb_writer.add_scalar("timing/only_sync_median", sync_duration_median, total_steps)
                    tb_writer.add_scalar("timing/only_bwd_mean", bwd_duration_mean, total_steps)
                    tb_writer.add_scalar("timing/only_bwd_median", bwd_duration_median, total_steps)
                    tb_writer.add_scalar("timing/only_dataloader_mean", dataloader_duration_mean, total_steps)
                    tb_writer.add_scalar("timing/only_dataloader_median", dataloader_duration_median, total_steps)

                    logging.info(
                        f"[timing:total] "
                        f"Mean: {total_duration_mean:>6.2f}s | "
                        f"Median: {total_duration_median:>6.2f}s | "
                        f"Std: {total_duration_std:6.2f}s"
                    )
                    logging.info(
                        f"[timing:fwd]   "
                        f"Mean: {fwd_duration_mean:>6.2f}s | "
                        f"Median: {fwd_duration_median:>6.2f}s | "
                        f"Std: {fwd_duration_std:6.2f}s"
                    )
                    logging.info(
                        f"[timing:sync]  "
                        f"Mean: {sync_duration_mean:>6.2f}s | "
                        f"Median: {sync_duration_median:>6.2f}s | "
                        f"Std: {sync_duration_std:6.2f}s"
                    )
                    logging.info(
                        f"[timing:bwd]   "
                        f"Mean: {bwd_duration_mean:>6.2f}s | "
                        f"Median: {bwd_duration_median:>6.2f}s | "
                        f"Std: {bwd_duration_std:6.2f}s"
                    )
                    logging.info(
                        f"[timing:datal] "
                        f"Mean: {dataloader_duration_mean:>6.2f}s | "
                        f"Median: {dataloader_duration_median:>6.2f}s | "
                        f"Std: {dataloader_duration_std:6.2f}s"
                    )

                    total_durations.clear()
                    fwd_durations.clear()
                    sync_durations.clear()
                    bwd_durations.clear()
                    dataloader_durations.clear()

            if total_steps > cfg.trainer.num_steps:
                should_keep_training = False
                break

        if fabric.global_rank == 0:
            tqdm_epoch.close()

    if fabric.global_rank == 0:
        tqdm_total_steps.close()
    logging.info("FINISHED TRAINING")

    save_path = f"{cfg.experiment_path}/model_final.pth"
    logging.info(f"Saving file {save_path}")
    state = AttributeDict(
        model=model,
        optimizer=optimizer,
        scheduler=scheduler,
        total_steps=total_steps,
    )
    fabric.save(save_path, state)
    run_test_eval(cfg, evaluator, model, eval_dataloaders, tb_writer, total_steps)
    for thread in threads:
        thread.join()
    tb_writer.flush()
    tb_writer.close()
    fabric.barrier()


if __name__ == "__main__":
    main()


================================================
FILE: mvtracker/cli/utils/__init__.py
================================================
from .pylogger import RankedLogger
from .rich_utils import enforce_tags, print_config_tree
from .helpers import extras, get_metric_value, task_wrapper


================================================
FILE: mvtracker/cli/utils/helpers.py
================================================
import faulthandler
import warnings
from functools import wraps
from importlib.util import find_spec
from typing import Any, Callable, Dict, Optional, Tuple

import wandb
from omegaconf import DictConfig

from mvtracker.cli.utils import pylogger, rich_utils

log = pylogger.RankedLogger(__name__, rank_zero_only=True)


def extras(cfg: DictConfig) -> None:
    """Applies optional utilities before the task is started.

    Utilities:
        - Ignoring python warnings
        - Setting tags from command line
        - Rich config printing

    :param cfg: A DictConfig object containing the config tree.
    """
    # return if no `extras` config
    if not cfg.get("extras"):
        log.warning("Extras config not found! <cfg.extras=null>")
        return

    # disable python warnings
    if cfg.extras.get("ignore_warnings"):
        log.info("Disabling python warnings! <cfg.extras.ignore_warnings=True>")
        warnings.filterwarnings("ignore")

    # prompt user to input tags from command line if none are provided in the config
    if cfg.extras.get("enforce_tags"):
        log.info("Enforcing tags! <cfg.extras.enforce_tags=True>")
        rich_utils.enforce_tags(cfg, save_to_file=True)

    # pretty print config tree using Rich library
    if cfg.extras.get("print_config"):
        log.info("Printing config tree with Rich! <cfg.extras.print_config=True>")
        rich_utils.print_config_tree(cfg, print_order=None, resolve=True, save_to_file=True)

    if cfg.extras.get("enable_faulthandler_traceback"):
        log.info("Enabling faulthandler timeouts!")
        faulthandler.dump_traceback_later(timeout=cfg.extras.faulthandler_traceback_timeout, repeat=True)


def task_wrapper(task_func: Callable) -> Callable:
    """Optional decorator that controls the failure behavior when executing the task function.

    This wrapper can be used to:
        - make sure loggers are closed even if the task function raises an exception (prevents multirun failure)
        - save the exception to a `.log` file
        - mark the run as failed with a dedicated file in the `logs/` folder (so we can find and rerun it later)
        - etc. (adjust depending on your needs)

    Example:
    ```
    @utils.task_wrapper
    def train(cfg: DictConfig) -> Tuple[Dict[str, Any], Dict[str, Any]]:
        ...
        return metric_dict, object_dict
    ```

    :param task_func: The task function to be wrapped.

    :return: The wrapped task function.
    """

    def wrap(cfg: DictConfig) -> Tuple[Dict[str, Any], Dict[str, Any]]:
        # execute the task
        try:
            metric_dict, object_dict = task_func(cfg=cfg)

        # things to do if exception occurs
        except Exception as ex:
            # save exception to `.log` file
            log.exception("")

            # some hyperparameter combinations might be invalid or cause out-of-memory errors
            # so when using hparam search plugins like Optuna, you might want to disable
            # raising the below exception to avoid multirun failure
            raise ex

        # things to always do after either success or exception
        finally:
            # display output dir path in terminal
            log.info(f"Output dir: {cfg.paths.output_dir}")

            # always close wandb run (even if exception occurs so multirun won't fail)
            if find_spec("wandb"):  # check if wandb is installed
                import wandb

                if wandb.run:
                    log.info("Closing wandb!")
                    wandb.finish()

        return metric_dict, object_dict

    return wrap


def get_metric_value(metric_dict: Dict[str, Any], metric_name: Optional[str]) -> Optional[float]:
    """Safely retrieves value of the metric logged in LightningModule.

    :param metric_dict: A dict containing metric values.
    :param metric_name: If provided, the name of the metric to retrieve.
    :return: If a metric name was provided, the value of the metric.
    """
    if not metric_name:
        log.info("Metric name is None! Skipping metric value retrieval...")
        return None

    if metric_name not in metric_dict:
        raise Exception(
            f"Metric value not found! <metric_name={metric_name}>\n"
            "Make sure metric name logged in LightningModule is correct!\n"
            "Make sure `optimized_metric` name in `hparams_search` config is correct!"
        )

    metric_value = metric_dict[metric_name].item()
    log.info(f"Retrieved metric value! <{metric_name}={metric_value}>")

    return metric_value


def maybe_close_wandb(fn: Callable) -> Callable:
    @wraps(fn)
    def wrapper(cfg, *args, **kwargs):
        try:
            return fn(cfg, *args, **kwargs)
        finally:
            if wandb.run is not None:
                wandb.finish()

    return wrapper


================================================
FILE: mvtracker/cli/utils/pylogger.py
================================================
import logging
from typing import Mapping, Optional

from lightning_utilities.core.rank_zero import rank_prefixed_message, rank_zero_only


class RankedLogger(logging.LoggerAdapter):
    """A multi-GPU-friendly python command line logger."""

    def __init__(
            self,
            name: str = __name__,
            rank_zero_only: bool = False,
            extra: Optional[Mapping[str, object]] = None,
    ) -> None:
        """Initializes a multi-GPU-friendly python command line logger that logs on all processes
        with their rank prefixed in the log message.

        :param name: The name of the logger. Default is ``__name__``.
        :param rank_zero_only: Whether to force all logs to only occur on the rank zero process. Default is `False`.
        :param extra: (Optional) A dict-like object which provides contextual information. See `logging.LoggerAdapter`.
        """
        logger = logging.getLogger(name)
        super().__init__(logger=logger, extra=extra)
        self.rank_zero_only = rank_zero_only

    def log(self, level: int, msg: str, rank: Optional[int] = None, *args, **kwargs) -> None:
        """Delegate a log call to the underlying logger, after prefixing its message with the rank
        of the process it's being logged from. If `'rank'` is provided, then the log will only
        occur on that rank/process.

        :param level: The level to log at. Look at `logging.__init__.py` for more information.
        :param msg: The message to log.
        :param rank: The rank to log at.
        :param args: Additional args to pass to the underlying logging function.
        :param kwargs: Any additional keyword args to pass to the underlying logging function.
        """
        if self.isEnabledFor(level):
            msg, kwargs = self.process(msg, kwargs)
            current_rank = getattr(rank_zero_only, "rank", None)
            if current_rank is None:
                raise RuntimeError("The `rank_zero_only.rank` needs to be set before use")
            msg = rank_prefixed_message(msg, current_rank)
            if self.rank_zero_only:
                if current_rank == 0:
                    self.logger.log(level, msg, *args, **kwargs)
            else:
                if rank is None:
                    self.logger.log(level, msg, *args, **kwargs)
                elif current_rank == rank:
                    self.logger.log(level, msg, *args, **kwargs)


================================================
FILE: mvtracker/cli/utils/rich_utils.py
================================================
from pathlib import Path
from typing import Sequence, Optional

import rich
import rich.syntax
import rich.tree
from hydra.core.hydra_config import HydraConfig
from lightning_utilities.core.rank_zero import rank_zero_only
from omegaconf import DictConfig, OmegaConf, open_dict
from rich.prompt import Prompt

from mvtracker.cli.utils import pylogger

log = pylogger.RankedLogger(__name__, rank_zero_only=True)


@rank_zero_only
def print_config_tree(
        cfg: DictConfig,
        print_order: Optional[Sequence[str]] = (
                "experiment_paths",
                "model",
                "predictor_settings",
        ),
        resolve: bool = False,
        save_to_file: bool = False,
) -> None:
    """Prints the contents of a DictConfig as a tree structure using the Rich library.

    :param cfg: A DictConfig composed by Hydra.
    :param print_order: Determines in what order config components are printed.
    :param resolve: Whether to resolve reference fields of DictConfig.
    :param save_to_file: Whether to export config to the hydra output folder.
    """
    style = "italic cyan"
    tree = rich.tree.Tree("CONFIG", style=style, guide_style=style)

    queue = []

    # add fields from `print_order` to queue
    if print_order is not None:
        for field in print_order:
            queue.append(field) if field in cfg else log.warning(
                f"Field '{field}' not found in config. Skipping '{field}' config printing..."
            )

    # add all the other fields to queue (not specified in `print_order`)
    for field in cfg:
        if field not in queue:
            queue.append(field)

    # generate config tree from queue
    for field in queue:
        branch = tree.add(field, style=style, guide_style=style)

        config_group = cfg[field]
        if isinstance(config_group, DictConfig):
            branch_content = OmegaConf.to_yaml(config_group, resolve=resolve)
        else:
            branch_content = str(config_group)

        branch.add(rich.syntax.Syntax(branch_content, "yaml"))

    # print config tree
    rich.print(tree)

    # save config tree to file
    if save_to_file:
        with open(Path(HydraConfig.get().runtime.output_dir, "config_tree.log"), "w") as file:
            rich.print(tree, file=file)


@rank_zero_only
def enforce_tags(cfg: DictConfig, save_to_file: bool = False) -> None:
    """Prompts user to input tags from command line if no tags are provided in config.

    :param cfg: A DictConfig composed by Hydra.
    :param save_to_file: Whether to export tags to the hydra output folder. Default is ``False``.
    """
    if not cfg.get("tags"):
        if "id" in HydraConfig().cfg.hydra.job:
            raise ValueError("Specify tags before launching a multirun!")

        log.warning("No tags provided in config. Prompting user to input tags...")
        tags = Prompt.ask("Enter a list of comma separated tags", default="dev")
        tags = [t.strip() for t in tags.split(",") if t != ""]

        with open_dict(cfg):
            cfg.tags = tags

        log.info(f"Tags: {cfg.tags}")

    if save_to_file:
        with open(Path(cfg.paths.output_dir, "tags.log"), "w") as file:
            rich.print(cfg.tags, file=file)


================================================
FILE: mvtracker/datasets/__init__.py
================================================
from .dexycb_multiview_dataset import DexYCBMultiViewDataset
from .kubric_multiview_dataset import KubricMultiViewDataset
from .panoptic_studio_multiview_dataset import PanopticStudioMultiViewDataset
from .tap_vid_datasets import TapVidDataset


================================================
FILE: mvtracker/datasets/dexycb_multiview_dataset.py
================================================
import logging
import os
import pathlib
import re
import time
import warnings

import cv2
import matplotlib
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from scipy.spatial.transform import Rotation as R
from torch.utils.data import Dataset

from mvtracker.datasets.utils import Datapoint, transform_scene


class DexYCBMultiViewDataset(Dataset):

    @staticmethod
    def from_name(dataset_name: str, dataset_root: str):
        """
        Examples of datasets supported by this factory method:
        - "dex-ycb-multiview",
        - "dex-ycb-multiview-single",
        - "dex-ycb-multiview-removehand",
        - "dex-ycb-multiview-duster0123",
        - "dex-ycb-multiview-duster0123cleaned",
        - "dex-ycb-multiview-duster0123cleaned-views0123",
        - "dex-ycb-multiview-duster0123cleaned-views0123-novelviews45",
        - "dex-ycb-multiview-duster0123cleaned-views0123-novelviews45-removehand",
        - "dex-ycb-multiview-duster0123cleaned-views0123-novelviews45-removehand-single",
        - "dex-ycb-multiview-duster0123cleaned-views0123-novelviews45-removehand-2dpt-single",
        - "dex-ycb-multiview-duster0123cleaned-views0123-novelviews45-removehand-2dpt-single-cached",
        """
        # Parse the dataset name, chunk by chunk
        non_parsed = dataset_name.replace("dex-ycb-multiview", "", 1)

        if non_parsed.startswith("-duster"):
            match = re.match(r"-duster(\d+)(cleaned)?", non_parsed)
            assert match is not None
            duster_views = list(map(int, match.group(1)))
            use_duster = True
            use_duster_cleaned = match.group(2) is not None
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            use_duster = False
            use_duster_cleaned = False
            duster_views = None

        if non_parsed.startswith("-views"):
            match = re.match(r"-views(\d+)", non_parsed)
            assert match is not None
            views = list(map(int, match.group(1)))
            if duster_views is not None:
                assert all(v in duster_views for v in views)
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            views = duster_views

        if non_parsed.startswith("-novelviews"):
            match = re.match(r"-novelviews(\d+)", non_parsed)
            assert match is not None
            novel_views = list(map(int, match.group(1)))
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            novel_views = None

        if non_parsed.startswith("-removehand"):
            remove_hand = True
            non_parsed = non_parsed.replace("-removehand", "", 1)
        else:
            remove_hand = False

        if non_parsed.startswith("-single"):
            single_point = True
            non_parsed = non_parsed.replace("-single", "", 1)
        else:
            single_point = False

        if non_parsed.startswith("-2dpt"):
            eval_2dpt = True
            non_parsed = non_parsed.replace("-2dpt", "", 1)
        else:
            eval_2dpt = False

        if non_parsed.startswith("-cached"):
            use_cached_tracks = True
            non_parsed = non_parsed.replace("-cached", "", 1)
        else:
            use_cached_tracks = False

        assert non_parsed == "", f"Unparsed part of the dataset name: {non_parsed}"

        if views is None and duster_views is None:
            views = [0, 1, 2, 3]  # Make the legacy "dex-ycb-multiview" name take the first 4 views (not all 8)

        return DexYCBMultiViewDataset(
            data_root=os.path.join(dataset_root, "dex-ycb-multiview"),
            views_to_return=views,
            novel_views=novel_views,
            remove_hand=remove_hand,
            use_duster_depths=use_duster,
            duster_views=duster_views,
            clean_duster_depths=use_duster_cleaned,
            traj_per_sample=384,
            seed=72,
            max_videos=10,
            perform_sanity_checks=False,
            use_cached_tracks=use_cached_tracks,
        )

    def __init__(
            self,
            data_root,
            remove_hand=False,
            views_to_return=None,
            novel_views=None,
            use_duster_depths=False,
            clean_duster_depths=False,
            duster_views=None,
            traj_per_sample=768,
            seed=None,
            max_videos=None,
            perform_sanity_checks=False,
            use_cached_tracks=False,
    ):
        super().__init__()
        self.data_root = data_root
        self.remove_hand = remove_hand
        self.views_to_return = views_to_return
        self.novel_views = novel_views
        self.use_duster_depths = use_duster_depths
        self.clean_duster_depths = clean_duster_depths
        self.duster_views = duster_views
        self.traj_per_sample = traj_per_sample
        self.seed = seed
        self.perform_sanity_checks = perform_sanity_checks
        self.use_cached_tracks = use_cached_tracks
        self.cache_name = self._cache_key()
        self.seq_names = self._get_sequence_names(max_videos)
        self.getitem_calls = 0

    def _get_sequence_names(self, max_videos):
        """
        Fetch all valid sequence names from the dataset root.

        Args:
            max_videos (int): Limit the number of sequences to load.

        Returns:
            List[str]: Sorted list of valid sequence names.
        """
        seq_names = [
            fname
            for fname in os.listdir(self.data_root)
            if os.path.isdir(os.path.join(self.data_root, fname))
               and not fname.startswith(".")
               and not fname.startswith("_")
        ]
        seq_names = sorted(seq_names)
        valid_seqs = []

        for seq_name in seq_names:
            scene_path = os.path.join(self.data_root, seq_name)
            view_folders = [
                d for d in os.listdir(scene_path)
                if os.path.isdir(os.path.join(scene_path, d)) and d.startswith("view_")
            ]
            if not view_folders:
                warnings.warn(f"Skipping {scene_path} because it has no views.")
                continue

            valid_seqs.append(seq_name)

        if max_videos is not None:
            valid_seqs = valid_seqs[:max_videos]

        print(f"Using {len(valid_seqs)} videos from {self.data_root}")
        return valid_seqs

    def _cache_key(self):
        name = f"cachedtracks--seed{self.seed}"
        if self.views_to_return is not None:
            name += f"-views{'_'.join(map(str, self.views_to_return))}"
        if self.traj_per_sample is not None:
            name += f"-n{self.traj_per_sample}"
        if self.remove_hand:
            name += "-removehand"
        return name + "--v1"  # bump this if you change the selection policy

    def __len__(self):
        return len(self.seq_names)

    def __getitem__(self, index):
        start_time = time.time()
        sample = self._getitem_helper(index)

        self.getitem_calls += 1
        if self.getitem_calls < 10:
            print(f"Loading {index:>06d} took  {time.time() - start_time:.3f} sec. Getitem calls: {self.getitem_calls}")

        return sample, True

    def _getitem_helper(self, index):
        """
        Helper function to load a single sample.

        Args:
            index (int): Index of the sample to load.

        Returns:
            CoTrackerData, bool: Sample data and success flag.
        """
        if self.seed is None:
            seed = torch.randint(0, 2 ** 32 - 1, (1,)).item()
        else:
            seed = self.seed
        rnd_torch = torch.Generator().manual_seed(seed)
        rnd_np = np.random.RandomState(seed=seed)

        datapoint_path = os.path.join(self.data_root, self.seq_names[index])

        views = {}
        view_folders = sorted([f for f in os.listdir(datapoint_path) if f.startswith("view_")])
        if self.views_to_return is not None:
            views_to_return = self.views_to_return
        else:
            views_to_return = sorted(list(range(len(view_folders))))
        views_to_load = views_to_return.copy()
        if self.novel_views is not None:
            views_to_load = list(set(views_to_load + self.novel_views))
        for v in views_to_load:
            view_path = os.path.join(datapoint_path, view_folders[v])

            # Load RGB images
            rgb_folder = os.path.join(view_path, "rgb")
            rgb_files = sorted(os.listdir(rgb_folder))
            rgb_images = [cv2.imread(os.path.join(rgb_folder, f))[:, :, ::-1] for f in rgb_files]

            # Load depth maps
            depth_folder = os.path.join(view_path, "depth")
            depth_files = sorted(os.listdir(depth_folder))
            depth_images = [cv2.imread(os.path.join(depth_folder, f), cv2.IMREAD_ANYDEPTH) for f in depth_files]

            # Load camera parameters
            camera_params_file = os.path.join(view_path, "intrinsics_extrinsics.npz")
            params = np.load(camera_params_file)
            intrinsics = params["intrinsics"][:3, :3]  # Extract K
            extrinsics = params["extrinsics"][:3, :]  # Extract R|t (world to camera)

            views[v] = {
                "rgb": np.stack(rgb_images),
                "depth": np.stack(depth_images),
                "intrinsics": intrinsics,
                "extrinsics": extrinsics,
            }

        rgbs = np.stack([views[v]["rgb"] for v in views_to_return])
        n_views, n_frames, h, w, _ = rgbs.shape
        depths = np.stack([views[v]["depth"] for v in views_to_return])[..., None].astype(np.float32) / 1000
        intrs = np.stack([views[v]["intrinsics"] for v in views_to_return])[:, None, :, :].repeat(n_frames, axis=1)
        extrs = np.stack([views[v]["extrinsics"] for v in views_to_return])[:, None, :, :].repeat(n_frames, axis=1)

        # Load novel views if they exist
        novel_rgbs = None
        novel_intrs = None
        novel_extrs = None
        if self.novel_views is not None:
            novel_rgbs = np.stack([views[v]["rgb"] for v in self.novel_views])
            novel_intrs = np.stack([views[v]["intrinsics"] for v in self.novel_views])[:, None, :, :].repeat(n_frames,
                                                                                                             axis=1)
            novel_extrs = np.stack([views[v]["extrinsics"] for v in self.novel_views])[:, None, :, :].repeat(n_frames,
                                                                                                             axis=1)

        # Load Duster's features and estimated depths if they exist
        duster_views = self.duster_views if self.duster_views is not None else views_to_return
        duster_views_str = ''.join(str(v) for v in duster_views)
        duster_root = pathlib.Path(datapoint_path) / f'duster-views-{duster_views_str}'
        if self.use_duster_depths:
            assert duster_root.exists() and (duster_root / f"3d_model__{n_frames - 1:05d}__scene.npz").exists(), \
                f"Duster root {duster_root} does not exist."

        feats = None
        feat_dim = None
        feat_stride = None
        depth_confs = None
        if duster_root.exists() and (duster_root / f"3d_model__{n_frames - 1:05d}__scene.npz").exists():
            duster_depths = []
            duster_confs = []
            duster_feats = []
            for frame_idx in range(n_frames):
                scene = np.load(duster_root / f"3d_model__{frame_idx:05d}__scene.npz")
                duster_depth = torch.from_numpy(scene["depths"])
                duster_conf = torch.from_numpy(scene["confs"])
                duster_msk = torch.from_numpy(scene["cleaned_mask"])

                if self.clean_duster_depths:
                    duster_depth = duster_depth * duster_msk

                duster_depth = F.interpolate(duster_depth[:, None], (h, w), mode='nearest')
                duster_depths.append(duster_depth[:, 0, :, :, None])

                duster_conf = F.interpolate(duster_conf[:, None], (h, w), mode='nearest')
                duster_confs.append(duster_conf[:, 0, :, :, None])

                if "feats" in scene:
                    duster_feats.append(torch.from_numpy(scene["feats"]))

            duster_depths = torch.stack(duster_depths, dim=1).numpy()
            duster_confs = torch.stack(duster_confs, dim=1).numpy()
            if duster_feats:
                feats = torch.stack(duster_feats, dim=1).numpy()

            # Extract the correct views
            assert duster_depths.shape[0] == len(duster_views)
            duster_depths = duster_depths[[duster_views.index(v) for v in views_to_return]]
            duster_confs = duster_confs[[duster_views.index(v) for v in views_to_return]]
            if feats is not None:
                assert feats.shape[0] == len(duster_views)
                feats = feats[[duster_views.index(v) for v in views_to_return]]

            # Reshape the features
            if feats is not None:
                assert feats.ndim == 4
                assert feats.shape[0] == n_views
                assert feats.shape[1] == n_frames
                feat_stride = np.round(np.sqrt(h * w / feats.shape[2])).astype(int)
                feat_dim = feats.shape[3]
                feats = feats.reshape(n_views, n_frames, h // feat_stride, w // feat_stride, feat_dim)

            # Replace the depths with the Duster depths, if configured so
            if self.use_duster_depths:
                depths = duster_depths
                depth_confs = duster_confs

        tracks_3d_file = os.path.join(datapoint_path, "tracks_3d.npz")
        tracks_3d_data = np.load(tracks_3d_file, allow_pickle=True)
        traj3d_world = tracks_3d_data["tracks_3d"]
        traj2d = tracks_3d_data["tracks_2d"][views_to_return]
        traj2d_w_z = np.concatenate((traj2d, tracks_3d_data["tracks_2d_z"][views_to_return][:, :, :, None]), axis=-1)
        visibility = tracks_3d_data["tracks_2d_visibilities"][views_to_return]

        # Label the trajectories according to: 0: hand, 1: moving ycb object, 2: static ycb objects
        object_id_to_name = tracks_3d_data["object_id_to_name"].item()
        traj_object_id = tracks_3d_data["object_ids"]
        for object_name in object_id_to_name.values():
            assert object_name == "mano-right-hand" or object_name.startswith("ycb")
        avg_movement_per_object_id = {}
        for object_id in np.unique(traj_object_id):
            object_mask = traj_object_id == object_id
            object_traj = traj3d_world[:, object_mask]
            avg_movement_per_object_id[object_id] = np.linalg.norm(object_traj[1:] - object_traj[:-1], axis=-1).mean()
        hand_id = {v: k for k, v in object_id_to_name.items()}["mano-right-hand"]
        dynamic_ycb_object_ids = [k for k, v in avg_movement_per_object_id.items() if v >= 1e-4 and k != hand_id]
        assert len(dynamic_ycb_object_ids) == 1
        dynamic_ycb_object_id = dynamic_ycb_object_ids[0]
        static_ycb_object_ids = [k for k, v in avg_movement_per_object_id.items() if v < 1e-4 and k != hand_id]
        assert 1 + 1 + len(static_ycb_object_ids) == len(object_id_to_name)
        # remap object ids to 0: hand, 1: dynamic ycb object, 2: static ycb objects
        traj_object_id = (
                0 * (traj_object_id == hand_id) +
                1 * (traj_object_id == dynamic_ycb_object_id) +
                2 * np.isin(traj_object_id, static_ycb_object_ids)
        )

        if self.remove_hand:
            traj3d_world = traj3d_world[:, traj_object_id > 0]
            traj2d = traj2d[:, :, traj_object_id > 0]
            traj2d_w_z = traj2d_w_z[:, :, traj_object_id > 0]
            visibility = visibility[:, :, traj_object_id > 0]
            traj_object_id = traj_object_id[traj_object_id > 0]

        n_tracks = traj3d_world.shape[1]
        assert rgbs.shape == (n_views, n_frames, h, w, 3)
        assert depths.shape == (n_views, n_frames, h, w, 1)
        assert depth_confs is None or depth_confs.shape == (n_views, n_frames, h, w, 1)
        assert feats is None or feats.shape == (n_views, n_frames, h // feat_stride, w // feat_stride, feat_dim)
        assert intrs.shape == (n_views, n_frames, 3, 3)
        assert extrs.shape == (n_views, n_frames, 3, 4)
        assert traj2d.shape == (n_views, n_frames, n_tracks, 2)
        assert visibility.shape == (n_views, n_frames, n_tracks)
        assert traj3d_world.shape == (n_frames, n_tracks, 3)
        assert traj_object_id.shape == (n_tracks,)

        if novel_rgbs is not None:
            assert novel_rgbs.shape == (len(self.novel_views), n_frames, h, w, 3)
            assert novel_intrs.shape == (len(self.novel_views), n_frames, 3, 3)
            assert novel_extrs.shape == (len(self.novel_views), n_frames, 3, 4)

        # Make sure our intrinsics and extrinsics work correctly
        point_3d_world = traj3d_world
        point_4d_world_homo = np.concatenate([point_3d_world, np.ones_like(point_3d_world[..., :1])], axis=-1)
        point_3d_camera = np.einsum('ABij,BCj->ABCi', extrs, point_4d_world_homo)
        if self.perform_sanity_checks:
            point_2d_pixel_homo = np.einsum('ABij,ABCj->ABCi', intrs, point_3d_camera)
            point_2d_pixel = point_2d_pixel_homo[..., :2] / point_2d_pixel_homo[..., 2:]
            point_2d_pixel_gt = traj2d

            point_2d_pixel_no_nan = np.nan_to_num(point_2d_pixel, nan=0)
            point_2d_pixel_gt_no_nan = np.nan_to_num(point_2d_pixel_gt, nan=0)

            assert np.allclose(point_2d_pixel_no_nan[0, :, 0, :], point_2d_pixel_no_nan[0, :, 0, :],
                               atol=1), f"Proj. failed"
            assert np.allclose(point_2d_pixel_gt_no_nan, point_2d_pixel_gt_no_nan, atol=1), f"Point projection failed"

            assert np.allclose(point_3d_camera[..., 2:], traj2d_w_z[..., -1:], atol=1)

        # Convert everything to torch tensors
        rgbs = torch.from_numpy(rgbs).permute(0, 1, 4, 2, 3).float()
        depths = torch.from_numpy(depths).permute(0, 1, 4, 2, 3).float()
        depth_confs = torch.from_numpy(depth_confs).permute(0, 1, 4, 2, 3).float() if depth_confs is not None else None
        feats = torch.from_numpy(feats).permute(0, 1, 4, 2, 3).float() if feats is not None else None
        intrs = torch.from_numpy(intrs).float()
        extrs = torch.from_numpy(extrs).float()
        traj2d = torch.from_numpy(traj2d)
        traj2d_w_z = torch.from_numpy(traj2d_w_z)
        traj3d_world = torch.from_numpy(traj3d_world)
        traj_object_id = torch.from_numpy(traj_object_id)
        visibility = torch.from_numpy(visibility)
        if novel_rgbs is not None:
            novel_rgbs = torch.from_numpy(novel_rgbs).permute(0, 1, 4, 2, 3).float()
            novel_intrs = torch.from_numpy(novel_intrs).float()
            novel_extrs = torch.from_numpy(novel_extrs).float()

        # Track selection
        cache_root = os.path.join(self.data_root, self.seq_names[index], "cache")
        os.makedirs(cache_root, exist_ok=True)
        cache_file = os.path.join(cache_root, f"{self.cache_name}.npz")

        # Check if we can use cached tracks
        use_cache = bool(self.use_cached_tracks) and os.path.isfile(cache_file)
        if use_cache:
            cache = np.load(cache_file)
            inds_sampled = torch.from_numpy(cache["track_indices"])
            traj2d_w_z = torch.from_numpy(cache["traj2d_w_z"])
            traj3d_world = torch.from_numpy(cache["traj3d_world"])
            traj_object_id = torch.from_numpy(cache["traj_object_id"])
            visibility = torch.from_numpy(cache["visibility"])
            valids = torch.from_numpy(cache["valids"])
            query_points = torch.from_numpy(cache["query_points"])

        # Otherwise, sample the tracks and create query points
        else:
            # Force query points on hand to appear later
            # This avoids querying when the GT hand reconstruction is severely lacking
            # Identify tracks that are invisible in the first frame across all views (as they are probably on the hand)
            invisible_at_first_frame = visibility[:, 0, :] == 0
            invisible_at_first_frame = invisible_at_first_frame.unsqueeze(1).expand(-1, 5, -1)
            # Set visibility to 0 for the first 5 frames where the first frame was invisible
            visibility[:, 0:5, :] *= ~invisible_at_first_frame  # Keep visible ones, set others to 0

            # Sample the points to track
            visible_for_at_least_two_frames = visibility.any(0).sum(0) >= 2
            hectic_visibility = ((visibility[:, :-1] & ~visibility[:, 1:]).sum(0) >= 3).any(0)
            valid_tracks = visible_for_at_least_two_frames & ~hectic_visibility
            valid_tracks = valid_tracks.nonzero(as_tuple=False)[:, 0]

            point_inds = torch.randperm(len(valid_tracks), generator=rnd_torch)
            traj_per_sample = self.traj_per_sample if self.traj_per_sample is not None else len(point_inds)
            assert len(point_inds) >= traj_per_sample
            point_inds = point_inds[:traj_per_sample]
            inds_sampled = valid_tracks[point_inds]

            n_tracks = len(inds_sampled)
            traj2d = traj2d[:, :, inds_sampled].float()
            traj2d_w_z = traj2d_w_z[:, :, inds_sampled].float()
            traj3d_world = traj3d_world[:, inds_sampled].float()
            traj_object_id = traj_object_id[inds_sampled]
            visibility = visibility[:, :, inds_sampled]

            valids = ~torch.isnan(traj2d).any(dim=-1).any(dim=0)

            # Create the query points
            gt_visibilities_any_view = visibility.any(dim=0)
            assert (gt_visibilities_any_view.sum(dim=0) >= 2).all(), "All points should be visible in least two frames."
            last_visible_index = (torch.arange(n_frames).unsqueeze(-1) * gt_visibilities_any_view).max(0).values
            assert gt_visibilities_any_view[last_visible_index[None, :], torch.arange(n_tracks)].all()
            gt_visibilities_any_view[last_visible_index[None, :], torch.arange(n_tracks)] = False
            assert (gt_visibilities_any_view.sum(dim=0) >= 1).all()

            n_non_first_point_appearance_queries = n_tracks // 4
            n_first_point_appearance_queries = n_tracks - n_non_first_point_appearance_queries

            first_point_appearances = torch.argmax(
                gt_visibilities_any_view[..., -n_first_point_appearance_queries:].float(), dim=0)
            non_first_point_appearances = first_point_appearances.new_zeros((n_non_first_point_appearance_queries,))
            for track_idx in range(n_tracks)[:n_non_first_point_appearance_queries]:
                # Randomly take a timestep where the point is visible
                non_zero_timesteps = torch.nonzero(gt_visibilities_any_view[:, track_idx] == 1)
                random_timestep = non_zero_timesteps[rnd_np.randint(len(non_zero_timesteps))].item()
                non_first_point_appearances[track_idx] = random_timestep

            query_points_t = torch.cat([non_first_point_appearances, first_point_appearances], dim=0)
            query_points_xyz_worldspace = traj3d_world[query_points_t, torch.arange(n_tracks)]
            query_points = torch.cat([query_points_t[:, None], query_points_xyz_worldspace], dim=1)
            assert gt_visibilities_any_view[query_points_t, torch.arange(n_tracks)].all()

            # Replace nans with zeros
            traj2d[torch.isnan(traj2d)] = 0
            traj2d_w_z[torch.isnan(traj2d_w_z)] = 0
            traj3d_world[torch.isnan(traj3d_world)] = 0
            assert torch.isnan(visibility).sum() == 0

            # Cache the selected tracks and query points
            if self.use_cached_tracks:
                logging.warn(f"Caching tracks for {self.seq_names[index]} at {os.path.abspath(cache_file)}")
                np.savez_compressed(
                    cache_file,
                    track_indices=inds_sampled.numpy(),
                    traj2d_w_z=traj2d_w_z.numpy(),
                    traj3d_world=traj3d_world.numpy(),
                    traj_object_id=traj_object_id.numpy(),
                    visibility=visibility.numpy(),
                    valids=valids.numpy(),
                    query_points=query_points.numpy(),
                )

        # Normalize the scene to be similar to Kubric's scene
        scale = 6
        rot_x = R.from_euler('x', 220, degrees=True).as_matrix()
        rot_y = R.from_euler('y', 3, degrees=True).as_matrix()
        rot_z = R.from_euler('z', -30, degrees=True).as_matrix()
        rot = torch.from_numpy(rot_z @ rot_y @ rot_x)
        translation = torch.tensor([0.0, 0.0, 0.5], dtype=torch.float32)
        (
            depths_trans, extrs_trans, query_points_trans, traj3d_world_trans, traj2d_w_z_trans
        ) = transform_scene(scale, rot, translation, depths, extrs, query_points, traj3d_world, traj2d_w_z)
        novel_extrs_trans = transform_scene(scale, rot, translation, None, novel_extrs, None, None, None)[1]

        # rerun_viz_scene("nane/scene__no_transform/", rgbs, depths, intrs, extrs, traj3d_world, 0.1)
        # rerun_viz_scene("nane/scene_transformed/", rgbs, depths_trans, intrs, extrs_trans, traj3d_world_trans, 1)

        # # Use the auto scene normalization of generic scenes
        # from mvtracker.datasets.generic_scene_dataset import compute_auto_scene_normalization
        # scale, rot, translation = compute_auto_scene_normalization(depths, torch.ones_like(depths) * 100, extrs_trans, intrs)
        # scale = scale * T[0, 0].item()
        # print(f"{scale=}")
        # (depths_trans, extrs_trans, query_points_trans, traj3d_world_trans, traj2d_w_z_trans
        # ) = transform_scene(scale, rot, translation, depths_trans, extrs_trans, query_points_trans, traj3d_world_trans, traj2d_w_z_trans)
        # _, novel_extrs_trans, _, _, _ = transform_scene(scale, rot, translation, None, novel_extrs_trans, None, None, None)
        # 82.7 91.1 --> 80.8 89.1

        segs = torch.ones((n_frames, 1, h, w))  # Dummy segmentation masks
        datapoint = Datapoint(
            video=rgbs,
            videodepth=depths_trans,
            videodepthconf=depth_confs.float() if depth_confs is not None else None,
            feats=feats,
            segmentation=segs,
            trajectory=traj2d_w_z_trans,
            trajectory_3d=traj3d_world_trans,
            trajectory_category=traj_object_id,
            visibility=visibility,
            valid=valids,
            seq_name=self.seq_names[index],
            intrs=intrs,
            extrs=extrs_trans,
            query_points=None,
            query_points_3d=query_points_trans,
            track_upscaling_factor=1 / scale,

            novel_video=novel_rgbs,
            novel_intrs=novel_intrs,
            novel_extrs=novel_extrs_trans,
        )
        return datapoint


def rerun_viz_scene(entity_prefix, rgbs, depths, intrs, extrs, tracks, radii_scale,
                    viz_camera=False, viz_point_cloud=True, fps=12):
    import rerun as rr

    # Initialize Rerun
    rr.init(f"3dpt", recording_id="v0.16")
    rr.connect_tcp()

    V, T, _, H, W = rgbs.shape
    _, N, _ = tracks.shape
    assert rgbs.shape == (V, T, 3, H, W)
    assert depths.shape == (V, T, 1, H, W)
    assert intrs.shape == (V, T, 3, 3)
    assert extrs.shape == (V, T, 3, 4)
    assert tracks.shape == (T, N, 3)

    # Compute inverse intrinsics and extrinsics
    intrs_inv = torch.inverse(intrs.float()).type(intrs.dtype)
    extrs_square = torch.eye(4).to(extrs.device).repeat(V, T, 1, 1)
    extrs_square[:, :, :3, :] = extrs
    extrs_inv = torch.inverse(extrs_square.float()).type(extrs.dtype)
    assert intrs_inv.shape == (V, T, 3, 3)
    assert extrs_inv.shape == (V, T, 4, 4)

    for v in range(V):  # Iterate over views
        for t in range(T):  # Iterate over frames
            rr.set_time_seconds("frame", t / fps)

            # Log RGB image
            rgb_image = rgbs[v, t].permute(1, 2, 0).cpu().numpy()
            if viz_camera:
                rr.log(f"{entity_prefix}image/view-{v}/rgb", rr.Image(rgb_image))

            # Log Depth map
            depth_map = depths[v, t, 0].cpu().numpy()
            if viz_camera:
                rr.log(f"{entity_prefix}image/view-{v}/depth", rr.DepthImage(depth_map, point_fill_ratio=0.2))

            # Log Camera
            K = intrs[v, t].cpu().numpy()
            world_T_cam = np.eye(4)
            world_T_cam[:3, :3] = extrs_inv[v, t, :3, :3].cpu().numpy()
            world_T_cam[:3, 3] = extrs_inv[v, t, :3, 3].cpu().numpy()
            if viz_camera:
                rr.log(f"{entity_prefix}image/view-{v}", rr.Pinhole(image_from_camera=K, width=W, height=H))
                rr.log(f"{entity_prefix}image/view-{v}",
                       rr.Transform3D(translation=world_T_cam[:3, 3], mat3x3=world_T_cam[:3, :3]))

            # Generate and log point cloud colored by RGB values
            # Compute 3D points from depth map
            y, x = np.indices((H, W))
            homo_pixel_coords = np.stack([x.ravel(), y.ravel(), np.ones_like(x).ravel()], axis=1).T
            depth_values = depth_map.ravel()
            cam_coords = (intrs_inv[v, t].cpu().numpy() @ homo_pixel_coords) * depth_values
            cam_coords = np.vstack((cam_coords, np.ones((1, cam_coords.shape[1]))))
            world_coords = (world_T_cam @ cam_coords)[:3].T

            # Filter out points with zero depth
            valid_mask = depth_values > 0
            world_coords = world_coords[valid_mask]
            rgb_colors = rgb_image.reshape(-1, 3)[valid_mask].astype(np.uint8)

            # Log the point cloud
            if viz_point_cloud:
                rr.log(f"{entity_prefix}point_cloud/view-{v}",
                       rr.Points3D(world_coords, colors=rgb_colors, radii=0.02 * radii_scale))

    # Log 3D tracks
    x = tracks[0, :, 0]
    c = (x - x.min()) / (x.max() - x.min() + 1e-8)
    colors = (matplotlib.colormaps["gist_rainbow"](c)[:, :3] * 255).astype(np.uint8)
    for t in range(T):
        rr.set_time_seconds("frame", t / fps)
        rr.log(
            f"{entity_prefix}tracks/points",
            rr.Points3D(positions=tracks[t], colors=colors, radii=0.01 * radii_scale),
        )
        if t > 0:
            strips = np.concatenate(
                [np.stack([tracks[:t, n], tracks[1:t + 1, n]], axis=-2) for n in range(N)],
                axis=0,
            )
            strip_colors = np.concatenate(
                [np.repeat(colors[n][None], t, axis=0) for n in range(N)],
                axis=0,
            )
            rr.log(
                f"{entity_prefix}tracks/lines",
                rr.LineStrips3D(strips=strips, colors=strip_colors, radii=0.005 * radii_scale),
            )


================================================
FILE: mvtracker/datasets/generic_scene_dataset.py
================================================
import logging
import os
import pickle
import sys
from contextlib import ExitStack
from typing import Tuple

import numpy as np
import torch
import torch.nn.functional as F
from PIL import Image
from torch.nn.functional import interpolate
from torch.utils.data import Dataset
from torchvision import transforms as TF
from tqdm import tqdm

from mvtracker.datasets.utils import Datapoint, transform_scene, align_umeyama, apply_sim3_to_extrinsics


class GenericSceneDataset(Dataset):
    def __init__(
            self,
            dataset_dir,

            use_duster_depths=True,
            use_vggt_depths_with_aligned_cameras=False,
            use_vggt_depths_with_raw_cameras=False,
            use_monofusion_depths=False,
            use_moge2_depths=False,

            skip_depth_computation_if_cached=True,

            drop_first_n_frames=0,

            scene_normalization_mode="auto",  # "auto" | "manual" | "none"
            scene_normalization_auto_conf_thresh=4.8,
            scene_normalization_auto_target_radius=6.3,
            scene_normalization_auto_rescale_by_camera_radius=True,
            scene_normalization_manual_scale=None,  # Optional float
            scene_normalization_manual_rotation=None,  # Optional 3x3 torch.Tensor rotation matrix
            scene_normalization_manual_translation=None,  # Optional 3D torch.Tensor post-scale translation vector
            # E.g., the manual transform that translates up by 1.4 units and scales 2.5 times (was good for EgoExo4D):
            #   scale = 2.5
            #   translate_x = 0
            #   translate_y = 0
            #   translate_z = 1.4 * scale
            #   T = torch.tensor([
            #       [scale, 0.0, 0.0, translate_x],
            #       [0.0, scale, 0.0, translate_y],
            #       [0.0, 0.0, scale, translate_z],
            #       [0.0, 0.0, 0.0, 1.0],
            #   ], dtype=torch.float32)

            stream_viz_to_rerun=False,
    ):
        self.dataset_dir = dataset_dir

        self.use_duster_depths = use_duster_depths
        self.use_vggt_depths_with_aligned_cameras = use_vggt_depths_with_aligned_cameras
        self.use_vggt_depths_with_raw_cameras = use_vggt_depths_with_raw_cameras
        self.use_monofusion_depths = use_monofusion_depths
        self.use_moge2_depths = use_moge2_depths
        # --- Assert exclusive depth-source configuration ---
        # Exactly 0 or 1 of these should be True. (0 => fall back to pkl/dust3r.)
        depth_flags = (int(self.use_duster_depths)
                       + int(self.use_vggt_depths_with_aligned_cameras)
                       + int(self.use_vggt_depths_with_raw_cameras)
                       + int(self.use_monofusion_depths)
                       + int(self.use_moge2_depths))
        assert depth_flags <= 1, (
            "Misconfigured dataset: choose at most one depth source among "
            "`use_monofusion_depths`, `use_moge2_depths`, `use_duster_depths`."
        )

        self.skip_depth_computation_if_cached = skip_depth_computation_if_cached
        self.drop_first_n_frames = drop_first_n_frames

        self.scene_normalization_mode = scene_normalization_mode
        self.scene_normalization_auto_conf_thresh = scene_normalization_auto_conf_thresh
        self.scene_normalization_auto_target_radius = scene_normalization_auto_target_radius
        self.scene_normalization_auto_rescale_by_camera_radius = scene_normalization_auto_rescale_by_camera_radius
        self.scene_normalization_manual_scale = scene_normalization_manual_scale
        self.scene_normalization_manual_rotation = scene_normalization_manual_rotation
        self.scene_normalization_manual_translation = scene_normalization_manual_translation

        self.stream_viz_to_rerun = stream_viz_to_rerun

        self.seq_names = sorted([
            f.replace(".pkl", "")
            for f in os.listdir(dataset_dir)
            if f.endswith(".pkl")
        ])
        assert self.seq_names, f"No sequences found in {dataset_dir}"

    def __len__(self):
        return len(self.seq_names)

    def __getitem__(self, idx):
        seq_name = self.seq_names[idx]
        pkl_path = os.path.join(self.dataset_dir, f"{seq_name}.pkl")
        with open(pkl_path, "rb") as f:
            data = pickle.load(f)

        ego_cam = data.get("ego_cam_name", None)
        rgbs_dict = data["rgbs"]
        intrs_dict = data["intrs"]
        extrs_dict = data["extrs"]
        depths_dict = data.get("depths", None)

        if ego_cam:
            rgbs_dict.pop(ego_cam)
            intrs_dict.pop(ego_cam)
            extrs_dict.pop(ego_cam)
            if depths_dict is not None:
                depths_dict.pop(ego_cam)

        cam_names = sorted(rgbs_dict.keys())
        n_views = len(cam_names)
        n_frames, _, H, W = rgbs_dict[cam_names[0]].shape

        rgbs = torch.stack([torch.from_numpy(rgbs_dict[cam]) for cam in cam_names])  # [V, T, 3, H, W]
        intrs = torch.stack([torch.from_numpy(intrs_dict[cam]) for cam in cam_names])  # [V, 3, 3]
        intrs = intrs[:, None].expand(-1, n_frames, -1, -1)  # [V, T, 3, 3]

        extr_list = []
        for cam in cam_names:
            e = extrs_dict[cam]
            if e.ndim == 2:
                e = np.broadcast_to(e[None, ...], (n_frames, 3, 4))
            extr_list.append(torch.from_numpy(e.copy()))
        extrs = torch.stack(extr_list)  # [V, T, 3, 4]

        # ------- Depth selection & caching -------
        if self.use_duster_depths:
            depth_root = os.path.join(self.dataset_dir, f"duster_depths__{seq_name}")
            if not os.path.exists(os.path.join(depth_root, f"3d_model__{n_frames - 1:05d}__scene.npz")):
                if "../duster" not in sys.path:
                    sys.path.insert(0, "../duster")
                from scripts.egoexo4d_preprocessing import main_estimate_duster_depth
                pkl_path = os.path.join(self.dataset_dir, f"{seq_name}.pkl")

                # Re-enable autograd locally (overrides any surrounding no_grad/inference_mode)
                with ExitStack() as stack:
                    stack.enter_context(torch.inference_mode(False))
                    stack.enter_context(torch.enable_grad())
                    main_estimate_duster_depth(pkl_path, depth_root, self.skip_depth_computation_if_cached)
            duster_depths, duster_confs = [], []
            for t in range(n_frames):
                scene_path = os.path.join(depth_root, f"3d_model__{t:05d}__scene.npz")
                scene = np.load(scene_path)
                d = torch.from_numpy(scene["depths"])  # [V, H', W']
                d = interpolate(d[:, None], size=(H, W), mode="nearest")  # [V, 1, H, W]
                duster_depths.append(d)
                c = torch.from_numpy(scene["confs"])
                c = interpolate(c[:, None], size=(H, W), mode="nearest")
                duster_confs.append(c)
            depths = torch.stack(duster_depths, dim=1)  # [V, T, 1, H, W]
            depth_confs = torch.stack(duster_confs, dim=1)

        elif self.use_vggt_depths_with_aligned_cameras:
            depths, depth_confs, intrs, extrs = _ensure_vggt_aligned_cache_and_load(
                rgbs=rgbs,
                seq_name=seq_name,
                dataset_root=self.dataset_dir,
                extrs_gt=extrs,  # your current GT world->cam
                vggt_cache_subdir="vggt_cache",
                skip_if_cached=self.skip_depth_computation_if_cached,
                model_id="facebook/VGGT-1B",
            )

        elif self.use_vggt_depths_with_raw_cameras:
            # Only use VGGT’s own (raw) cameras and depths
            depths, depth_confs, intrs, extrs = _ensure_vggt_raw_cache_and_load(
                rgbs=rgbs,
                seq_name=seq_name,
                dataset_root=self.dataset_dir,
                vggt_cache_subdir="vggt_cache",
                skip_if_cached=self.skip_depth_computation_if_cached,
                model_id="facebook/VGGT-1B",
            )

        elif self.use_monofusion_depths:
            # MonoFusion (Dust3r + FG/BG-heuristic + MoGE-2) with caching
            final_depths, final_confs = _ensure_monofusion_cache_and_load(
                rgbs=rgbs,
                seq_name=seq_name,
                dataset_root=self.dataset_dir,
                monofusion_cache_subdir="monofusion_cache",
                skip_if_cached=self.skip_depth_computation_if_cached,
            )
            depths = final_depths
            depth_confs = final_confs

        elif self.use_moge2_depths:
            # Raw MoGe-2 (metric) with caching
            depths, depth_confs = _ensure_moge2_cache_and_load(
                rgbs=rgbs,
                seq_name=seq_name,
                dataset_root=self.dataset_dir,
                moge2_cache_subdir="moge2_cache",
                skip_if_cached=self.skip_depth_computation_if_cached,
            )

        elif depths_dict is not None:
            depths = torch.stack([torch.from_numpy(depths_dict[cam]) for cam in cam_names]).unsqueeze(2)
            depth_confs = depths.new_zeros(depths.shape)
            depth_confs[depths > 0] = 1000

        else:
            raise ValueError("No depths available/configured")

        # Sometimes the first frames are noisy, e.g., due to timesync calibration
        if self.drop_first_n_frames:
            assert type(self.drop_first_n_frames) == int
            n_frames -= self.drop_first_n_frames
            rgbs = rgbs[:, self.drop_first_n_frames:]
            depths = depths[:, self.drop_first_n_frames:]
            depth_confs = depth_confs[:, self.drop_first_n_frames:]
            intrs = intrs[:, self.drop_first_n_frames:]
            extrs = extrs[:, self.drop_first_n_frames:]

        if self.scene_normalization_mode == "auto":
            scale, translation = compute_auto_scene_normalization(
                depths, depth_confs, extrs, intrs,
                conf_thresh=self.scene_normalization_auto_conf_thresh,
                target_radius=self.scene_normalization_auto_target_radius,
                rescale_by_camera_radius=self.scene_normalization_auto_rescale_by_camera_radius,
            )
            rot = torch.eye(3, dtype=torch.float32, device=depths.device)
        elif self.scene_normalization_mode == "manual":
            assert self.scene_normalization_manual_scale is not None
            assert self.scene_normalization_manual_rotation is not None
            assert self.scene_normalization_manual_translation is not None
            scale = self.scene_normalization_manual_scale
            rot = self.scene_normalization_manual_rotation.to(depths.device)
            translation = self.scene_normalization_manual_translation.to(depths.device)
        elif self.scene_normalization_mode == "none":
            scale = 1.0
            rot = torch.eye(3, dtype=torch.float32, device=depths.device)
            translation = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32, device=depths.device)
        else:
            raise ValueError(f"Unknown scene_normalization_mode: {self.scene_normalization_mode}")

        depths_trans, extrs_trans, _, _, _ = transform_scene(scale, rot, translation, depths, extrs, None, None, None)

        assert rgbs.shape == (n_views, n_frames, 3, H, W)
        assert depths.shape == (n_views, n_frames, 1, H, W)
        assert depth_confs.shape == (n_views, n_frames, 1, H, W)
        assert intrs.shape == (n_views, n_frames, 3, 3)
        assert extrs.shape == (n_views, n_frames, 3, 4)
        assert extrs_trans.shape == (n_views, n_frames, 3, 4)

        if self.stream_viz_to_rerun:
            import rerun as rr
            from mvtracker.utils.visualizer_rerun import log_pointclouds_to_rerun
            rr.init(f"3dpt", recording_id="v0.16")
            rr.connect_tcp()
            log_pointclouds_to_rerun(f"generic-1-before-norm", idx, rgbs[None], depths[None],
                                     intrs[None], extrs[None], depth_confs[None], [1.0])
            log_pointclouds_to_rerun(f"generic-2-after-norm", idx, rgbs[None], depths[None],
                                     intrs[None], extrs_trans[None], depth_confs[None], [1.0])

        datapoint = Datapoint(
            video=rgbs.float(),
            videodepth=depths_trans.float(),
            videodepthconf=depth_confs.float(),
            feats=None,
            segmentation=torch.ones((n_views, n_frames, 1, H, W), dtype=torch.float32),
            trajectory=None,
            trajectory_3d=None,
            visibility=None,
            valid=None,
            seq_name=seq_name,
            intrs=intrs.float(),
            extrs=extrs_trans.float(),
            query_points=None,
            query_points_3d=None,
            trajectory_category=None,
            track_upscaling_factor=1.0,
            novel_video=None,
            novel_intrs=None,
            novel_extrs=None,
        )

        return datapoint, True


def compute_auto_scene_normalization(
        depths,
        depth_confs,
        extrs,
        intrs,
        conf_thresh=4.8,
        target_radius=6.3,
        rescale_by_camera_radius=True,
):
    V, T, _, H, W = depths.shape
    device = depths.device

    extrs_square = torch.eye(4, device=device)[None, None].repeat(V, T, 1, 1)
    extrs_square[:, :, :3, :] = extrs
    extrs_inv = torch.inverse(extrs_square.float())
    intrs_inv = torch.inverse(intrs.float())

    y, x = torch.meshgrid(torch.arange(H, device=device), torch.arange(W, device=device), indexing="ij")
    homog = torch.stack([x, y, torch.ones_like(x)], dim=-1).reshape(-1, 3).float()
    homog = homog[None].expand(V, -1, -1)

    pts_all = []
    for v in range(V):
        d = depths[v, 0, 0]
        c = depth_confs[v, 0, 0]
        mask = (c > conf_thresh) & (d > 0)
        if mask.sum() < 100:
            continue

        d_flat = d.flatten()
        conf_mask = mask.flatten()
        intr_inv = intrs_inv[v, 0]
        extr_inv = extrs_inv[v, 0]

        cam_pts = (intr_inv @ homog[v].T).T * d_flat[:, None]
        cam_pts = cam_pts[conf_mask]
        cam_pts_h = torch.cat([cam_pts, torch.ones_like(cam_pts[:, :1])], dim=-1)
        world_pts = (extr_inv @ cam_pts_h.T).T[:, :3]

        pts_all.append(world_pts)

    pts_all = torch.cat(pts_all, dim=0)
    if pts_all.shape[0] < 100:
        raise RuntimeError("Too few valid points for normalization.")

    # --- Center scene ---
    centroid = pts_all.mean(dim=0)
    pts_centered = pts_all - centroid

    # --- Lift scene so floor is at z=0 ---
    floor_z = pts_centered[:, 2].quantile(0.12)  # robust floor estimate
    pts_lifted = pts_centered.clone()
    pts_lifted[:, 2] -= floor_z

    # --- Compute scale ---
    if rescale_by_camera_radius:
        cam_centers = extrs[:, 0, :, 3]  # (V, 3)
        cam_centers_centered = cam_centers - centroid  # shift
        cam_centers_centered[:, 2] -= floor_z  # lift
        cam_dists = cam_centers_centered.norm(dim=1)
        median_dist = cam_dists.median()
        scale = target_radius / median_dist
    else:
        scene_radius = pts_lifted.norm(dim=1).quantile(0.95)
        scale = target_radius / scene_radius

    # --- Compute translation (after scaling) ---
    translate = -scale * centroid
    translate[2] -= scale * floor_z  # lift to z=0

    return scale, translate


def _ensure_moge2_cache_and_load(rgbs, seq_name, dataset_root, moge2_cache_subdir, skip_if_cached=True):
    """
    Raw MoGe-2 depth (metric) with per-sequence caching.
    Returns (depths, confs) shaped [V,T,1,H,W] on CPU.
    """
    V, T, _, H, W = rgbs.shape
    cache_root = os.path.join(dataset_root, moge2_cache_subdir, seq_name)
    os.makedirs(cache_root, exist_ok=True)
    depths_path = os.path.join(cache_root, "moge2_depths.npy")
    confs_path = os.path.join(cache_root, "moge2_confs.npy")

    if skip_if_cached and os.path.isfile(depths_path) and os.path.isfile(confs_path):
        d = torch.from_numpy(np.load(depths_path)).float()  # [V,T,H,W]
        c = torch.from_numpy(np.load(confs_path)).float()  # [V,T,H,W]
        return d.unsqueeze(2), c.unsqueeze(2)

    d = _moge_depths(seq_name, rgbs, cache_root)  # [V,T,H,W], CPU float

    # Simple constant confidence for MoGe-2
    c = torch.full_like(d, 100.0)

    np.save(depths_path, d.numpy())
    np.save(confs_path, c.numpy())
    return d.unsqueeze(2), c.unsqueeze(2)


def _ensure_monofusion_cache_and_load(rgbs, seq_name, dataset_root, monofusion_cache_subdir, skip_if_cached=True):
    """
    MONOFUSION:
      - Background mask: patch-change detector over temporal window (static -> BG)
      - DUSt3R depth: load per frame/view; build static background depth by BG-temporal-average.
      - MoGe-2 monocular depth per frame/view; align to background by affine (a,b).
      - Merge BG (DUSt3R static) with FG (aligned MoGe).
      - Cache final depths & confs.
    """
    V, T, _, H, W = rgbs.shape

    cache_root = os.path.join(dataset_root, monofusion_cache_subdir, seq_name)
    os.makedirs(cache_root, exist_ok=True)
    final_depths_path = os.path.join(cache_root, "final_depths.npy")
    final_confs_path = os.path.join(cache_root, "final_confs.npy")

    if skip_if_cached and os.path.isfile(final_depths_path) and os.path.isfile(final_confs_path):
        fd = torch.from_numpy(np.load(final_depths_path))  # [V,T,H,W]
        fc = torch.from_numpy(np.load(final_confs_path))  # [V,T,H,W]
        return fd.unsqueeze(2), fc.unsqueeze(2)

    # ---- DUSt3R depths per frame/view ----
    depth_root = os.path.join(dataset_root, f"duster_depths__{seq_name}")
    if not os.path.exists(os.path.join(depth_root, f"3d_model__{T - 1:05d}__scene.npz")):
        if "../duster" not in sys.path:
            sys.path.insert(0, "../duster")
        from scripts.egoexo4d_preprocessing import main_estimate_duster_depth
        pkl_path = os.path.join(dataset_root, f"{seq_name}.pkl")

        # Re-enable autograd locally (overrides any surrounding no_grad/inference_mode)
        with ExitStack() as stack:
            stack.enter_context(torch.inference_mode(False))
            stack.enter_context(torch.enable_grad())
            main_estimate_duster_depth(pkl_path, depth_root, skip_if_cached)

    duster_depths = []
    for t in range(T):
        scene_path = os.path.join(depth_root, f"3d_model__{t:05d}__scene.npz")
        scene = np.load(scene_path)
        d = torch.from_numpy(scene["depths"])  # [V, H', W']
        d = interpolate(d[:, None], size=(H, W), mode="nearest")[:, 0]  # [V, H, W]
        duster_depths.append(d)
    duster_depths = torch.stack(duster_depths, dim=1)  # [V, T, H, W]

    # ---- Background mask (patch-change) ----
    compute_device = "cuda" if torch.cuda.is_available() else "cpu"
    bg_mask = _static_bg_mask_from_window(rgbs.to(compute_device)).cpu()  # [V,T,H,W] bool

    # ---- Static background depth per camera via temporal average on BG pixels ----
    V, T, _, _ = duster_depths.shape
    D_bg = torch.zeros((V, H, W), dtype=torch.float32)
    for v in range(V):
        valid = bg_mask[v]  # [T,H,W]
        num = (duster_depths[v] * valid).sum(dim=0)
        den = valid.sum(dim=0).clamp_min(1)
        D_bg[v] = num / den

    # ---- MoGe-2 monocular depths per frame/view ----
    moge_depths = _moge_depths(seq_name, rgbs, cache_root)  # [V,T,H,W]

    # ---- Align MoGe to background (solve a,b on BG pixels) ----
    compute_device = "cuda" if torch.cuda.is_available() else "cpu"
    moge_depths = moge_depths.to(compute_device, dtype=torch.float32)  # [V,T,H,W]
    D_bg_exp = D_bg[:, None].expand_as(moge_depths).to(compute_device)  # [V,T,H,W]
    bg_mask = bg_mask.to(compute_device)  # [V,T,H,W]

    # Valid BG pixels
    valid = bg_mask & torch.isfinite(moge_depths) & (moge_depths > 0) \
            & torch.isfinite(D_bg_exp) & (D_bg_exp > 0)

    # Flatten over pixels
    X = moge_depths.view(V, T, -1)  # [V,T,HW]
    Y = D_bg_exp.view(V, T, -1)  # [V,T,HW]
    M = valid.view(V, T, -1).float()  # [V,T,HW]

    # Count valid pixels
    n = M.sum(dim=-1)  # [V,T]
    min_bg = 200
    if (n < min_bg).any():
        bad = torch.nonzero(n < min_bg, as_tuple=False)
        raise RuntimeError(
            f"Too few background pixels in frames: {[(int(v), int(t)) for v, t in bad.tolist()]}"
        )

    # Sufficient statistics
    sx = (X * M).sum(dim=-1)
    sy = (Y * M).sum(dim=-1)
    sxx = (X * X * M).sum(dim=-1)
    sxy = (X * Y * M).sum(dim=-1)

    # Closed-form least squares for a, b
    eps = 1e-8
    mx = sx / n
    my = sy / n
    varx = sxx / n - mx * mx
    cov = sxy / n - mx * my

    a = cov / (varx + eps)  # [V,T]
    b = my - a * mx

    # Apply alignment
    aligned_moge = (a[..., None] * X + b[..., None]).view(V, T, H, W)

    # Optionally save scale/shift
    scale = a.float().cpu()
    shift = b.float().cpu()

    # ---- Merge FG/BG ----
    final_depths = torch.where(bg_mask, D_bg_exp, aligned_moge)  # [V,T,H,W]

    # ---- Confidence map: high for BG, moderate for FG ----
    final_confs = torch.zeros_like(final_depths)
    final_confs[bg_mask] = 1000.0
    final_confs[~bg_mask] = 10.0

    # ---- Cache results ----
    np.save(final_depths_path, final_depths.cpu().numpy())
    np.save(final_confs_path, final_confs.cpu().numpy())
    np.save(os.path.join(cache_root, "scale.npy"), scale.cpu().numpy())
    np.save(os.path.join(cache_root, "shift.npy"), shift.cpu().numpy())

    return final_depths.unsqueeze(2).cpu(), final_confs.unsqueeze(2).cpu()


def _static_bg_mask_from_window(
        rgbs: torch.Tensor,
        win: int = -1,
        r: int = 7,  # spatial patch radius -> (2r+1)x(2r+1)
        diff_thresh: float = 10.0  # uint8 scale threshold
):
    """
    Fast BG detector using 3D max-pooling over frame-to-frame diffs.
    """
    V, T, C, H, W = rgbs.shape
    device = rgbs.device

    if T == 1:
        return torch.ones((V, T, H, W), dtype=torch.bool, device=device)

    if win == -1:
        win = T

    # 1) Frame-to-frame abs diff (channel-mean): boundaries of length T-1
    x = rgbs.float()
    diffs = (x[:, 1:] - x[:, :-1]).abs().mean(dim=2)  # [V, T-1, H, W]
    diffs = diffs.unsqueeze(1)  # [V, 1, T-1, H, W]  (N,C,D,H,W for 3D pool)

    # 2) 3D max pool over time & space:
    #    - temporal kernel spans (2*win-1) boundaries
    #    - spatial kernel spans (2r+1)x(2r+1) patch
    kt = max(1, 2 * win - 1)
    kh = kw = 2 * r + 1
    pt = (kt - 1) // 2
    ph = pw = r
    pooled = F.max_pool3d(diffs, kernel_size=(kt, kh, kw), stride=1, padding=(pt, ph, pw))
    pooled = pooled[:, 0]  # [V, T-1, H, W]

    # 3) Map boundary maxima back to frame centers (symmetric nearest-window approx)
    change = torch.zeros((V, T, H, W), device=device, dtype=pooled.dtype)
    change[:, 0] = pooled[:, 0]
    change[:, 1:-1] = torch.maximum(pooled[:, :-1], pooled[:, 1:])
    change[:, -1] = pooled[:, -1]

    # 4) Threshold -> background
    bg_mask = (change < diff_thresh)
    return bg_mask


def _moge_depths(seq_name, rgbs, cache_root, resize_to=512, batch_size=18):
    """Runs (and caches) MoGe-2; returns [V,T,H,W] float32 at native resolution."""

    # pip install git+https://github.com/microsoft/MoGe.git
    from moge.model.v2 import MoGeModel as MoGe2Model

    depths_path = os.path.join(cache_root, "moge_depths.npy")
    if os.path.isfile(depths_path):
        logging.info(f"Loading cached MoGe-2 depths for {seq_name} from {depths_path}")
        return torch.from_numpy(np.load(depths_path)).float()

    V, T, C, H, W = rgbs.shape
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = MoGe2Model.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device).eval()

    if resize_to is None:
        h1, w1 = H, W
    else:
        if H >= W:
            h1, w1 = int(resize_to), max(1, round(resize_to * W / H))
        else:
            w1, h1 = int(resize_to), max(1, round(resize_to * H / W))

    imgs = rgbs.view(V * T, C, H, W).float()
    if (h1, w1) != (H, W):
        imgs = F.interpolate(imgs, size=(h1, w1), mode="bilinear", align_corners=False)
    imgs = (imgs / 255.0).to(device, non_blocking=True)  # [N,3,h1,w1]

    out_small = torch.empty((V * T, h1, w1), dtype=torch.float32, device=device)

    with torch.inference_mode(), torch.autocast(device_type=device.type, dtype=torch.bfloat16,
                                                enabled=(device.type == "cuda")):
        N = imgs.shape[0]
        for i in range(0, N, batch_size):
            chunk = imgs[i:i + batch_size]  # [b,3,h1,w1]
            pred = model.infer(chunk)  # expects batched input
            assert isinstance(pred, dict) and "depth" in pred, "MoGe-2 infer() must return dict with 'depth'."
            d = torch.as_tensor(pred["depth"], device=device)
            assert d.ndim == 3 and d.shape[0] == chunk.shape[0] and tuple(d.shape[1:]) == (h1, w1), \
                f"Depth shape {tuple(d.shape)} != ({chunk.shape[0]},{h1},{w1})"
            out_small[i:i + chunk.shape[0]] = d

    if (h1, w1) != (H, W):
        out = F.interpolate(out_small[:, None], size=(H, W), mode="bilinear", align_corners=False)[:, 0]
    else:
        out = out_small
    out = out.clamp_min(0).view(V, T, H, W).cpu()

    np.save(depths_path, out.numpy())
    return out


def _ensure_vggt_raw_cache_and_load(
        rgbs: torch.Tensor,  # uint8 [V,T,3,H,W]
        seq_name: str,
        dataset_root: str,
        vggt_cache_subdir: str = "vggt_cache",
        skip_if_cached: bool = True,
        model_id: str = "facebook/VGGT-1B",
):
    """
    Run VGGT and cache RAW predictions (no alignment).
    Returns CPU float32 tensors:
      depths_raw   [V,T,1,H,W]
      confs        [V,T,1,H,W]  (constant 100)
      intrs_raw    [V,T,3,3]
      extrs_raw    [V,T,3,4]    (world->cam as predicted by VGGT)
    """
    from mvtracker.models.core.vggt.models.vggt import VGGT
    from mvtracker.models.core.vggt.utils.pose_enc import pose_encoding_to_extri_intri

    assert rgbs.dtype == torch.uint8 and rgbs.ndim == 5 and rgbs.shape[2] == 3, "rgbs must be uint8 [V,T,3,H,W]"
    V, T, _, H, W = rgbs.shape
    cache_root = os.path.join(dataset_root, vggt_cache_subdir, seq_name)
    os.makedirs(cache_root, exist_ok=True)

    f_depths_raw = os.path.join(cache_root, "vggt_depths_raw.npy")  # [V,T,H,W]
    f_confs = os.path.join(cache_root, "vggt_confs.npy")  # [V,T,H,W]
    f_intr_raw = os.path.join(cache_root, "vggt_intrinsics_raw.npy")
    f_extr_raw = os.path.join(cache_root, "vggt_extrinsics_raw.npy")

    all_cached = all(os.path.isfile(p) for p in [f_depths_raw, f_confs, f_intr_raw, f_extr_raw])
    if skip_if_cached and all_cached:
        depths_raw = torch.from_numpy(np.load(f_depths_raw)).float().unsqueeze(2)
        confs = torch.from_numpy(np.load(f_confs)).float().unsqueeze(2)
        intrs_raw = torch.from_numpy(np.load(f_intr_raw)).float()
        extrs_raw = torch.from_numpy(np.load(f_extr_raw)).float()
        return depths_raw, confs, intrs_raw, extrs_raw

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = VGGT.from_pretrained(model_id).to(device).eval()
    amp_dtype = torch.bfloat16 if (
            device.type == "cuda" and torch.cuda.get_device_capability()[0] >= 8) else torch.float16

    def _compute_pad_to_518(H0: int, W0: int, target: int = 518) -> Tuple[int, int, int, int, int, int]:
        """
        Mirror VGGT's load_and_preprocess_images(mode='pad') padding math so we can undo it.
        Returns: new_h, new_w, pad_top, pad_bottom, pad_left, pad_right
        """
        # Make largest dim target, keep aspect, round smaller dim to /14*14, then pad to (target, target)
        if W0 >= H0:
            new_w = target
            new_h = int(round((H0 * (new_w / W0)) / 14.0) * 14)
            h_pad = max(0, target - new_h)
            w_pad = 0
        else:
            new_h = target
            new_w = int(round((W0 * (new_h / H0)) / 14.0) * 14)
            h_pad = 0
            w_pad = max(0, target - new_w)

        pad_top = h_pad // 2
        pad_bottom = h_pad - pad_top
        pad_left = w_pad // 2
        pad_right = w_pad - pad_left
        return new_h, new_w, pad_top, pad_bottom, pad_left, pad_right

    depths_raw_arr = torch.empty((V, T, H, W), dtype=torch.float32)
    confs_arr = torch.full((V, T, H, W), 100.0, dtype=torch.float32)
    intr_raw_arr = torch.empty((V, T, 3, 3), dtype=torch.float32)
    extr_raw_arr = torch.empty((V, T, 3, 4), dtype=torch.float32)

    with torch.no_grad(), torch.cuda.amp.autocast(enabled=(device.type == "cuda"), dtype=amp_dtype):
        for t in tqdm(range(T), desc=f"VGGT RAW {seq_name}", unit="f"):
            image_items = [rgbs[v, t].cpu() for v in range(V)]  # each: [3,H,W] uint8
            images = _vggt_load_and_preprocess_images(image_items, mode="pad").to(device)[None]  # [1,V,3,518,518]

            tokens, ps_idx = model.aggregator(images)
            pose_enc = model.camera_head(tokens)[-1]
            extr_pred, intr_pred = pose_encoding_to_extri_intri(pose_enc, images.shape[-2:])  # [1,V,3,4],[1,V,3,3]
            depth_maps, _ = model.depth_head(tokens, images, ps_idx)  # [1,V,518,518]

            # per-view: undo pad, resize back to (H0,W0), adjust intrinsics
            d_full_list, K_list = [], []
            for v in range(V):
                H0, W0 = int(rgbs[v, t].shape[-2]), int(rgbs[v, t].shape[-1])
                new_h, new_w, pt, pb, pl, pr = _compute_pad_to_518(H0, W0)

                # crop padding region out of the 518x518 depth
                d_small = depth_maps[0, v:v + 1, pt:518 - pb, pl:518 - pr]  # [1,new_h,new_w]
                d_full_v = F.interpolate(d_small[:, None, :, :, 0], size=(H0, W0), mode="nearest")[:, 0]  # [1,H0,W0]
                d_full_list.append(d_full_v.squeeze(0))

                # adjust intrinsics: subtract removed pad, then scale to (H0,W0)
                K = intr_pred[0, v].detach().cpu().float().clone()
                K[0, 2] -= float(pl)
                K[1, 2] -= float(pt)
                S = torch.tensor([[W0 / float(new_w), 0.0, 0.0],
                                  [0.0, H0 / float(new_h), 0.0],
                                  [0.0, 0.0, 1.0]], dtype=torch.float32)
                K_list.append((S @ K).unsqueeze(0))

            depths_raw_arr[:, t] = torch.stack(d_full_list, dim=0)
            intr_raw_arr[:, t] = torch.cat(K_list, dim=0)
            extr_raw_arr[:, t] = extr_pred[0].detach().cpu().float()  # raw VGGT w2c

    # save raw cache
    np.save(f_depths_raw, depths_raw_arr.numpy())
    np.save(f_confs, confs_arr.numpy())
    np.save(f_intr_raw, intr_raw_arr.numpy())
    np.save(f_extr_raw, extr_raw_arr.numpy())

    return depths_raw_arr.unsqueeze(2), confs_arr.unsqueeze(2), intr_raw_arr, extr_raw_arr


def _vggt_load_and_preprocess_images(image_items, mode="crop"):
    """
    Same as VGGT loader, but accepts in-memory items as well.
    """
    if len(image_items) == 0:
        raise ValueError("At least 1 image is required")

    # Validate mode
    if mode not in ["crop", "pad"]:
        raise ValueError("Mode must be either 'crop' or 'pad'")

    images = []
    shapes = set()
    to_tensor = TF.ToTensor()
    target_size = 518

    def _to_pil(item):
        # path
        if isinstance(item, str):
            img = Image.open(item)
            return img
        # numpy HWC
        if isinstance(item, np.ndarray):
            if item.ndim == 3 and item.shape[2] in (3, 4):
                if item.dtype != np.uint8:
                    item = item.astype(np.uint8)
                return Image.fromarray(item)
        # torch CHW
        if torch.is_tensor(item):
            x = item
            if x.ndim == 3 and x.shape[0] in (3, 4):
                if x.dtype == torch.uint8:
                    arr = x.permute(1, 2, 0).cpu().numpy()
                    return Image.fromarray(arr)
                else:
                    # assume float [0,1]
                    arr = (x.clamp(0, 1) * 255.0).byte().permute(1, 2, 0).cpu().numpy()
                    return Image.fromarray(arr)
        raise ValueError("Unsupported image item type/shape")

    for item in image_items:
        img = _to_pil(item)

        # If there's an alpha channel, blend onto white background:
        if img.mode == "RGBA":
            # Create white background
            background = Image.new("RGBA", img.size, (255, 255, 255, 255))
            # Alpha composite onto the white background
            img = Image.alpha_composite(background, img)

        # Now convert to "RGB" (this step assigns white for transparent areas)
        img = img.convert("RGB")

        width, height = img.size

        if mode == "pad":
            # Make the largest dimension 518px while maintaining aspect ratio
            if width >= height:
                new_width = target_size
                new_height = round(height * (new_width / width) / 14) * 14  # Make divisible by 14
            else:
                new_height = target_size
                new_width = round(width * (new_height / height) / 14) * 14  # Make divisible by 14
        else:  # mode == "crop"
            # Original behavior: set width to 518px
            new_width = target_size
            # Calculate height maintaining aspect ratio, divisible by 14
            new_height = round(height * (new_width / width) / 14) * 14

        # Resize with new dimensions (width, height)
        img = img.resize((new_width, new_height), Image.Resampling.BICUBIC)
        img = to_tensor(img)  # Convert to tensor (0, 1)

        # Center crop height if it's larger than 518 (only in crop mode)
        if mode == "crop" and new_height > target_size:
            start_y = (new_height - target_size) // 2
            img = img[:, start_y: start_y + target_size, :]

        # For pad mode, pad to make a square of target_size x target_size
        if mode == "pad":
            h_padding = target_size - img.shape[1]
            w_padding = target_size - img.shape[2]

            if h_padding > 0 or w_padding > 0:
                pad_top = h_padding // 2
                pad_bottom = h_padding - pad_top
                pad_left = w_padding // 2
                pad_right = w_padding - pad_left

                # Pad with white (value=1.0)
                img = torch.nn.functional.pad(
                    img, (pad_left, pad_right, pad_top, pad_bottom), mode="constant", value=1.0
                )

        shapes.add((img.shape[1], img.shape[2]))
        images.append(img)

    # Check if we have different shapes
    # In theory our model can also work well with different shapes
    if len(shapes) > 1:
        print(f"Warning: Found images with different shapes: {shapes}")
        # Find maximum dimensions
        max_height = max(shape[0] for shape in shapes)
        max_width = max(shape[1] for shape in shapes)

        # Pad images if necessary
        padded_images = []
        for img in images:
            h_padding = max_height - img.shape[1]
            w_padding = max_width - img.shape[2]

            if h_padding > 0 or w_padding > 0:
                pad_top = h_padding // 2
                pad_bottom = h_padding - pad_top
                pad_left = w_padding // 2
                pad_right = w_padding - pad_left

                img = torch.nn.functional.pad(
                    img, (pad_left, pad_right, pad_top, pad_bottom), mode="constant", value=1.0
                )
            padded_images.append(img)
        images = padded_images

    images = torch.stack(images)  # concatenate images

    # Ensure correct shape when single image
    if len(image_items) == 1:
        # Verify shape is (1, C, H, W)
        if images.dim() == 3:
            images = images.unsqueeze(0)

    return images


def _ensure_vggt_aligned_cache_and_load(
        rgbs: torch.Tensor,  # uint8 [V,T,3,H,W]
        seq_name: str,
        dataset_root: str,
        extrs_gt: torch.Tensor,  # [V,T,3,4] GT world->cam
        vggt_cache_subdir: str = "vggt_cache",
        skip_if_cached: bool = True,
        model_id: str = "facebook/VGGT-1B",
):
    """
    Ensure RAW VGGT cache exists (running VGGT if needed), then align VGGT cameras to GT via
    Umeyama (pred→gt) per frame. Returns CPU float32:

      depths_aligned  [V,T,1,H,W]   (RAW depths scaled by s)
      confs           [V,T,1,H,W]   (same constant 100 as RAW)
      intr_aligned    [V,T,3,3]     (equal to RAW intrinsics; alignment is Sim3 in world)
      extr_aligned    [V,T,3,4]     (VGGT w2c aligned to GT)
    """
    # 1) Get RAW results (runs VGGT if needed)
    depths_raw, confs_raw, intr_raw, extr_raw = _ensure_vggt_raw_cache_and_load(
        rgbs=rgbs,
        seq_name=seq_name,
        dataset_root=dataset_root,
        vggt_cache_subdir=vggt_cache_subdir,
        skip_if_cached=skip_if_cached,
        model_id=model_id,
    )

    # 2) Aligned cache file paths
    cache_root = os.path.join(dataset_root, vggt_cache_subdir, seq_name)
    f_depths_aln = os.path.join(cache_root, "vggt_depths_aligned.npy")
    f_intr_aln = os.path.join(cache_root, "vggt_intrinsics_aligned.npy")
    f_extr_aln = os.path.join(cache_root, "vggt_extrinsics_aligned.npy")

    # 3) If aligned already cached, return it
    if skip_if_cached and all(os.path.isfile(p) for p in [f_depths_aln, f_intr_aln, f_extr_aln]):
        depths_aln = torch.from_numpy(np.load(f_depths_aln)).float().unsqueeze(2)
        intr_aln = torch.from_numpy(np.load(f_intr_aln)).float()
        extr_aln = torch.from_numpy(np.load(f_extr_aln)).float()
        return depths_aln, confs_raw, intr_aln, extr_aln

    # 4) Compute alignment
    depths_raw_ = depths_raw.squeeze(2)  # [V,T,H,W]
    V, T, H, W = depths_raw_.shape
    assert extrs_gt.shape[:2] == (V, T), "GT extrinsics must be [V,T,3,4]"

    depths_aln = depths_raw_.clone()
    intr_aln = intr_raw.clone()  # intrinsics unchanged by world Sim3
    extr_aln = extr_raw.clone()

    def _camera_center_from_affine_extr(extr):
        extr_sq = np.eye(4, dtype=np.float32)[None].repeat(extr.shape[0], 0)
        extr_sq[:, :3, :4] = extr
        extr_sq_inv = np.linalg.inv(extr_sq)
        return extr_sq_inv[:, :3, 3]

    for t in range(T):
        gt_w2c = extrs_gt[:, t].cpu().numpy()
        pred_w2c = extr_raw[:, t].cpu().numpy()

        s, R_align, t_align = align_umeyama(
            _camera_center_from_affine_extr(gt_w2c),
            _camera_center_from_affine_extr(pred_w2c),
        )
        pred_w2c_aligned = apply_sim3_to_extrinsics(pred_w2c, s, R_align, t_align)

        extr_aln[:, t] = torch.from_numpy(np.array(pred_w2c_aligned)).float()

    # 5) Save aligned cache
    np.save(f_depths_aln, depths_aln.numpy())
    np.save(f_intr_aln, intr_aln.numpy())
    np.save(f_extr_aln, extr_aln.numpy())

    return depths_aln.unsqueeze(2), confs_raw, intr_aln, extr_aln


================================================
FILE: mvtracker/datasets/kubric_multiview_dataset.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

import logging
import os
import pathlib
import re
import time

import cv2
import kornia
import numpy as np
import torch
import torch.nn.functional as F
from PIL import Image
from scipy.spatial.transform import Rotation as R
from torch.utils.data import get_worker_info
from torchvision.transforms import ColorJitter, GaussianBlur
from torchvision.transforms import functional as F_torchvision

from mvtracker.datasets.utils import Datapoint, read_json, read_tiff, read_png, transform_scene, add_camera_noise, \
    aug_depth


class KubricMultiViewDataset(torch.utils.data.Dataset):

    @staticmethod
    def from_name(
            dataset_name: str,
            dataset_root: str,
            training_args=None,
            fabric=None,
            just_return_kwargs: bool = False,
            subset: str = "test",
    ):
        """
        Examples of evaluation datasets supported by this factory method:
        - kubric-multiview-v3
        - kubric-multiview-v3-duster0123
        - kubric-multiview-v3-duster01234567
        - kubric-multiview-v3-duster01234567cleaned
        - kubric-multiview-v3-duster01234567cleaned-views012
        - kubric-multiview-v3-duster01234567cleaned-views012-novelviews7
        - kubric-multiview-v3-duster01234567cleaned-views012-novelviews7-overfit-on-training
        - kubric-multiview-v3-duster01234567cleaned-views012-novelviews7-overfit-on-training-single
        - kubric-multiview-v3-duster01234567cleaned-views012-novelviews7-overfit-on-training-2dpt-single
        - kubric-multiview-v3-duster01234567cleaned-views012-novelviews7-overfit-on-training-2dpt-single-cached
        - kubric-multiview-v3-noise1.23cm

        Example of a training dataset:
        - kubric-multiview-v3-training
        """
        # Parse the dataset name, chunk by chunk
        non_parsed = dataset_name.replace("kubric-multiview-v3", "", 1)

        if non_parsed.startswith("-noise"):
            match = re.match(r"-noise([\d.]+)cm", non_parsed)
            assert match is not None
            depth_noise_std = float(match.group(1))
            depth_noise_std = depth_noise_std / 13  # real-world cm to kubric's metric unit
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            depth_noise_std = 0.0

        if non_parsed.startswith("-duster"):
            match = re.match(r"-duster(\d+)(cleaned)?", non_parsed)
            assert match is not None
            duster_views = list(map(int, match.group(1)))
            use_duster = True
            use_duster_cleaned = match.group(2) is not None
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            use_duster = False
            use_duster_cleaned = False
            duster_views = None

        if non_parsed.startswith("-views"):
            match = re.match(r"-views(\d+)", non_parsed)
            assert match is not None
            views = list(map(int, match.group(1)))
            if duster_views is not None:
                assert all(v in duster_views for v in views)
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            views = duster_views

        if non_parsed.startswith("-novelviews"):
            match = re.match(r"-novelviews(\d+)", non_parsed)
            assert match is not None
            novel_views = list(map(int, match.group(1)))
            non_parsed = non_parsed.replace(match.group(0), "", 1)
        else:
            novel_views = None

        if non_parsed.startswith("-training"):
            training = True
            non_parsed = non_parsed.replace("-training", "", 1)
            assert training_args is not None
            assert fabric is not None
        else:
            training = False

        if non_parsed.startswith("-overfit-on-training"):
            overfit_on_train = True
            non_parsed = non_parsed.replace("-overfit-on-training", "", 1)
            assert not training, "Either ...-training or ...-overfit-on-training[-single][-2dpt]"
            assert training_args is not None
            expected_training_dset_name = (dataset_name.replace("-overfit-on-training", "-training")
                                           .replace("-single", "").replace("2dpt", ""))
            assert training_args.datasets.train.name == expected_training_dset_name, \
                f"{expected_training_dset_name} != {training_args.datasets.train.name}"
        else:
            overfit_on_train = False

        if non_parsed.startswith("-single"):
            assert not training, "The single-point evaluation options is not relevant for a training dataset"
            single_point = True
            non_parsed = non_parsed.replace("-single", "", 1)
        else:
            single_point = False

        if non_parsed.startswith("-2dpt"):
            eval_2dpt = True
            non_parsed = non_parsed.replace("-2dpt", "", 1)
        else:
            eval_2dpt = False

        if non_parsed.startswith("-cached"):
            use_cached_tracks = True
            non_parsed = non_parsed.replace("-cached", "", 1)
        else:
            use_cached_tracks = False

        assert non_parsed == "", f"Unparsed part of the dataset name: {non_parsed}"

        kubric_kwargs = {
            "data_root": os.path.join(dataset_root, "kubric-multiview", subset),
            "seq_len": 24,
            "traj_per_sample": 512,
            "seed": 72,
            "sample_vis_1st_frame": False,
            "tune_per_scene": False,
            "max_videos": 30,
            "use_duster_depths": use_duster,
            "duster_views": duster_views,
            "clean_duster_depths": use_duster_cleaned,
            "views_to_return": views,
            "novel_views": novel_views,
            "num_views": -1 if views is not None else 4,
            "depth_noise_std": depth_noise_std,
            "ratio_dynamic": 0.5,
            "ratio_very_dynamic": 0.25,
            "use_cached_tracks": use_cached_tracks,
        }
        if training:
            kubric_kwargs["virtual_dataset_size"] = fabric.world_size * (training_args.trainer.num_steps + 1000)
        if training or overfit_on_train:
            kubric_kwargs["data_root"] = (
                os.path.join(training_args.datasets.root, "kubric-multiview", "train")
                if not training_args.modes.debug else
                os.path.join(training_args.datasets.root, "kubric-multiview", "validation")
            )
            kubric_kwargs["seq_len"] = training_args.datasets.train.sequence_len
            kubric_kwargs["traj_per_sample"] = training_args.datasets.train.traj_per_sample
            kubric_kwargs["max_depth"] = training_args.datasets.train.kubric_max_depth
            kubric_kwargs["tune_per_scene"] = training_args.modes.tune_per_scene
            if training:
                kubric_kwargs["max_videos"] = training_args.datasets.train.max_videos
            else:
                kubric_kwargs["max_videos"] = 30

            kubric_kwargs["augmentation_probability"] = training_args.augmentations.probability
            kubric_kwargs["enable_rgb_augs"] = training_args.augmentations.rgb
            kubric_kwargs["enable_depth_augs"] = training_args.augmentations.depth
            kubric_kwargs["enable_cropping_augs"] = training_args.augmentations.cropping
            kubric_kwargs["aug_crop_size"] = training_args.augmentations.cropping_size
            kubric_kwargs["enable_variable_trajpersample_augs"] = training_args.augmentations.variable_trajpersample
            kubric_kwargs["enable_scene_transform_augs"] = training_args.augmentations.scene_transform
            kubric_kwargs["enable_camera_params_noise_augs"] = training_args.augmentations.camera_params_noise
            kubric_kwargs["enable_variable_depth_type_augs"] = training_args.augmentations.variable_depth_type
            kubric_kwargs["enable_variable_num_views_augs"] = training_args.augmentations.variable_num_views
            kubric_kwargs["normalize_scene_following_vggt"] = training_args.augmentations.normalize_scene_following_vggt
            kubric_kwargs["enable_variable_vggt_crop_size_augs"] = training_args.augmentations.variable_vggt_crop_size
            kubric_kwargs["keep_principal_point_centered"] = training_args.augmentations.keep_principal_point_centered

            if training_args.modes.pretrain_only:
                kubric_kwargs["ratio_dynamic"] = 0.0
                kubric_kwargs["ratio_very_dynamic"] = 0.0

            if training_args.augmentations.variable_num_views:
                kubric_kwargs["num_views"] = None
                kubric_kwargs["views_to_return"] = None
                kubric_kwargs["duster_views"] = None
                kubric_kwargs["supported_duster_views_sets"] = [
                    [0, 1, 2, 3],
                    [0, 1, 2, 3, 4, 5, 6, 7],
                ]

        if just_return_kwargs:
            return kubric_kwargs

        return KubricMultiViewDataset(**kubric_kwargs)

    def __init__(
            self,
            data_root,
            views_to_return=None,
            novel_views=None,
            use_duster_depths=False,
            clean_duster_depths=False,
            duster_views=None,
            supported_duster_views_sets=None,
            seq_len=24,
            num_views=4,
            traj_per_sample=768,
            max_depth=1000,
            sample_vis_1st_frame=False,
            ratio_dynamic=0.5,
            ratio_very_dynamic=0.25,
            depth_noise_std=0.0,

            augmentation_probability=0.0,
            enable_rgb_augs=False,
            enable_depth_augs=False,
            enable_cropping_augs=False,
            aug_crop_size=(384, 512),
            enable_variable_trajpersample_augs=False,
            enable_scene_transform_augs=False,
            enable_camera_params_noise_augs=False,
            enable_variable_depth_type_augs=False,
            enable_variable_num_views_augs=False,

            normalize_scene_following_vggt=False,
            enable_variable_vggt_crop_size_augs=False,
            keep_principal_point_centered=False,

            static_cropping=False,
            seed=None,
            tune_per_scene=False,
            max_videos=None,
            virtual_dataset_size=None,
            max_tracks_to_preload=18000,
            perform_sanity_checks=False,
            use_cached_tracks=False,
    ):
        super(KubricMultiViewDataset, self).__init__()

        self.data_root = data_root
        self.views_to_return = views_to_return
        self.novel_views = novel_views
        self.use_duster_depths = use_duster_depths
        self.clean_duster_depths = clean_duster_depths
        self.duster_views = duster_views
        self.supported_duster_views_sets = supported_duster_views_sets
        if self.use_duster_depths:
            assert self.duster_views is not None, "When using Duster depths, duster_views must be set."
            if self.supported_duster_views_sets is None:
                self.supported_duster_views_sets = [self.duster_views]

        self.seq_len = seq_len
        self.num_views = num_views
        self.traj_per_sample = traj_per_sample
        self.sample_vis_1st_frame = sample_vis_1st_frame
        self.ratio_dynamic = ratio_dynamic
        self.ratio_very_dynamic = ratio_very_dynamic

        self.seed = seed
        self.add_index_to_seed = not tune_per_scene

        self.perform_sanity_checks = perform_sanity_checks
        self.use_cached_tracks = use_cached_tracks
        self.cache_name = self._cache_key()
        self.max_tracks_to_preload = max_tracks_to_preload
        if self.traj_per_sample is not None and self.max_tracks_to_preload is not None:
            assert self.traj_per_sample <= self.max_tracks_to_preload, "We need to preload more tracks than we sample."

        self.depth_noise_std = depth_noise_std

        # Augmentation settings
        self.augmentation_probability = augmentation_probability
        if any([enable_rgb_augs, enable_depth_augs, enable_variable_trajpersample_augs,
                enable_scene_transform_augs, enable_camera_params_noise_augs, enable_variable_num_views_augs,
                enable_variable_depth_type_augs]):
            assert self.augmentation_probability > 0, "Augmentations are enabled, but augmentation probability is 0%."
        if self.augmentation_probability > 0:
            assert not self.use_cached_tracks, "caching tracks not supported with augs"

        self.enable_rgb_augs = enable_rgb_augs
        self.enable_depth_augs = enable_depth_augs
        self.enable_cropping_augs = enable_cropping_augs
        self.enable_variable_trajpersample_augs = enable_variable_trajpersample_augs
        self.enable_scene_transform_augs = enable_scene_transform_augs
        self.enable_camera_params_noise_augs = enable_camera_params_noise_augs
        self.enable_variable_num_views_augs = enable_variable_num_views_augs
        self.enable_variable_depth_type_augs = enable_variable_depth_type_augs
        self.enable_variable_depth_type_augs__depth_type_probability = {
            "gt": 0.70, "duster": 0.20, "duster_cleaned": 0.10,
        }
        # TODO: self.enable_seqlen_augs = enable_seqlen_augs
        if self.enable_variable_depth_type_augs:
            assert not self.use_duster_depths, "Cannot force depth type when using variable depth type augs."
            assert not self.clean_duster_depths, "Cannot force depth type when using variable depth type augs."
        self.enable_variable_num_views_augs__n_views_probability = {
            # v2
            1: 0.20,
            2: 0.10,
            3: 0.10,
            4: 0.25,
            5: 0.10,
            6: 0.25,

            # # v1
            # 1: 0.20,
            # 2: 0.10,
            # 3: 0.10,
            # 4: 0.25,
            # 5: 0.10,
            # 6: 0.05,
            # 7: 0.05,
            # 8: 0.15,
        }
        self.enable_variable_num_views_augs__trajpersample_adjustment_factor = {
            1: 1.00,
            2: 1.00,
            3: 1.00,
            4: 1.00,
            5: 0.40,
            6: 0.25,
        }
        if self.enable_variable_num_views_augs:
            assert self.num_views is None, "Cannot use enable_variable_num_views_augs with num_views != None."
            assert self.views_to_return is None, "Cannot use enable_variable_num_views_augs with views_to_return."

        # photometric augmentation
        # TODO: "Override" ColorJitter and GaussianBlur to take in a random state
        #       in forward pass so we can assure reproducibility. This affects
        #       only training as augmentation is disabled during evaluation.
        self.photo_aug = ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.25 / 3.14)
        self.blur_aug = GaussianBlur(11, sigma=(0.1, 2.0))
        self.blur_aug_prob = 0.25
        self.color_aug_prob = 0.25

        # occlusion augmentation
        self.eraser_aug_prob = 0.5
        self.eraser_bounds = [2, 100]
        self.eraser_max = 10

        # occlusion augmentation
        self.replace_aug_prob = 0.5
        self.replace_bounds = [2, 100]
        self.replace_max = 10

        # spatial augmentations
        self.crop_size = aug_crop_size
        self.normalize_scene_following_vggt = normalize_scene_following_vggt
        self.enable_variable_vggt_crop_size_augs = enable_variable_vggt_crop_size_augs
        self.keep_principal_point_centered = keep_principal_point_centered

        self.max_depth = max_depth

        self.pad_bounds = [0, 45]
        self.resize_lim = [0.8, 1.2]
        self.resize_delta = 0.15
        self.max_crop_offset = 36
        if static_cropping or tune_per_scene:
            self.pad_bounds = [0, 1]
            self.resize_lim = [1.0, 1.0]
            self.resize_delta = 0.0
            self.max_crop_offset = 0

        if self.keep_principal_point_centered:
            self.pad_bounds = [0, 45]
            self.resize_lim = [1.02, 1.25]
            self.resize_delta = None
            self.max_crop_offset = None
            if static_cropping or tune_per_scene:
                self.pad_bounds = [0, 1]
                self.resize_lim = [1.04, 1.04]

        self.seq_names = [
            fname
            for fname in os.listdir(self.data_root)
            if os.path.isdir(os.path.join(self.data_root, fname))
               and not fname.startswith(".")
               and not fname.startswith("_")
        ]
        self.seq_names = sorted(self.seq_names, key=lambda x: int(x))
        seq_names_clean = []
        for seq_name in self.seq_names:
            scene_path = os.path.join(self.data_root, seq_name)
            view_folders = [
                d for d in os.listdir(scene_path)
                if os.path.isdir(os.path.join(scene_path, d)) and d.startswith('view_')
            ]
            if len(view_folders) == 0:
                logging.warning(f"Skipping {scene_path} because it has no views.")
                continue
            if self.num_views is not None and len(view_folders) < self.num_views:
                logging.warning(f"Skipping {scene_path} because it has {len(view_folders)} views (<{self.num_views}).")
                continue
            seq_names_clean.append(seq_name)
        self.seq_names = seq_names_clean

        if self.supported_duster_views_sets is not None:
            supported_duster_views_sets_cleaned = []
            for s in self.supported_duster_views_sets:
                duster_views_str = ''.join(str(v) for v in s)
                if os.path.isdir(os.path.join(self.data_root, self.seq_names[0], f"duster-views-{duster_views_str}")):
                    supported_duster_views_sets_cleaned.append(s)
                else:
                    logging.warning(f"Skipping duster views set {s} because it does not exist.")
            self.supported_duster_views_sets = supported_duster_views_sets_cleaned

        if tune_per_scene:
            self.seq_names = self.seq_names[3:4]
        if max_videos is not None:
            self.seq_names = self.seq_names[:max_videos]
        logging.info("Using %d videos from %s" % (len(self.seq_names), self.data_root))

        self.real_len = len(self.seq_names)
        if virtual_dataset_size is not None:
            self.virtual_len = virtual_dataset_size
        else:
            self.virtual_len = self.real_len
        logging.info(f"Real dataset size: {self.real_len}. Virtual dataset size: {self.virtual_len}.")

        self.getitem_calls = 0

    def _cache_key(self):
        name = f"cachedtracks--seed{self.seed}-dynamic{self.ratio_dynamic}-verydynamic-{self.ratio_very_dynamic}"
        if self.views_to_return is not None:
            name += f"-views{'_'.join(map(str, self.views_to_return))}"
        if self.traj_per_sample is not None:
            name += f"-n{self.traj_per_sample}"
        if self.num_views is not None:
            name += f"-numviews{self.num_views}"
        if self.seq_len is not None:
            name += f"-t{self.seq_len}"
        if self.sample_vis_1st_frame:
            name += f"-sample_vis_1st_frame"
        return name + "--v1"  # bump this if you change the selection policy

    def __len__(self):
        return self.virtual_len

    def __getitem__(self, index):
        index = index % self.real_len

        sample, gotit = self._getitem_helper(index)

        if not gotit:
            logging.warning("warning: sampling failed")
            # fake sample, so we can still collate
            num_views = self.num_views if self.num_views is not None else 4
            h, w = 384, 512
            traj_per_sample = self.traj_per_sample if self.traj_per_sample is not None else 768
            sample = Datapoint(
                video=torch.zeros((num_views, self.seq_len, 3, h, w)),
                videodepth=torch.zeros((num_views, self.seq_len, 1, h, w)),
                segmentation=torch.zeros((num_views, self.seq_len, 1, h, w)),
                trajectory=torch.zeros((self.seq_len, traj_per_sample, 2)),
                visibility=torch.zeros((self.seq_len, traj_per_sample)),
                valid=torch.zeros((self.seq_len, traj_per_sample)),
            )

        return sample, gotit

    def _getitem_helper(self, index):
        start_time_1 = time.time()

        gotit = True

        # Take a new seed from torch or use self.seed if set
        # The rest of the code will use generators initialized with this seed
        if self.seed is None:
            seed = torch.randint(0, 2 ** 32 - 1, (1,)).item()
        else:
            seed = self.seed
            if self.add_index_to_seed:
                seed += index
        rnd_torch = torch.Generator().manual_seed(seed)
        rnd_np = np.random.RandomState(seed=seed)

        # Load the data
        datapoint = KubricMultiViewDataset.getitem_raw_datapoint(os.path.join(self.data_root, self.seq_names[index]))

        traj3d_world = datapoint["tracks_3d"].numpy()
        tracks_segmentation_ids = datapoint["tracks_segmentation_ids"].numpy()
        tracked_objects = datapoint["tracked_objects"]
        camera_positions = datapoint["camera_positions"].numpy()
        lookat_positions = datapoint["lookat_positions"].numpy()
        views = datapoint["views"]

        # Take a random depth type, if enabled
        if self.enable_variable_depth_type_augs:
            assert self.use_duster_depths is False, "Cannot force depth type when using variable depth type augs."
            assert self.clean_duster_depths is False, "Cannot force depth type when using variable depth type augs."
            depth_type = rnd_np.choice(
                a=list(self.enable_variable_depth_type_augs__depth_type_probability.keys()),
                size=1,
                p=list(self.enable_variable_depth_type_augs__depth_type_probability.values()),
            )[0]
            use_duster_depths, clean_duster_depths = {
                "gt": (False, False),
                "duster": (True, False),
                "duster_cleaned": (True, True),
            }[depth_type]
        else:
            use_duster_depths = self.use_duster_depths
            clean_duster_depths = self.clean_duster_depths

        # Take a random number of views, if enabled
        all_views = sorted(list(range(len(views))))
        if self.enable_variable_num_views_augs:
            assert self.num_views is None, "Cannot use enable_variable_num_views_augs with num_views != None."
            assert self.views_to_return is None, "Cannot use enable_variable_num_views_augs with views_to_return."
            num_views = rnd_np.choice(
                a=list(self.enable_variable_num_views_augs__n_views_probability.keys()),
                size=1,
                p=list(self.enable_variable_num_views_augs__n_views_probability.values()),
            )[0]
            if use_duster_depths:
                num_views = min(num_views, max([len(s) for s in self.supported_duster_views_sets]))
                # Take only those that have the closest number of views that is greater or equal to num_views
                closest_num_views_in_supported_duster_views_set = min([
                    len(vs)
                    for vs in self.supported_duster_views_sets
                    if len(vs) >= num_views
                ])
                supported_duster_views_sets = [
                    vs
                    for vs in self.supported_duster_views_sets
                    if len(vs) == closest_num_views_in_supported_duster_views_set
                ]
                duster_views = supported_duster_views_sets[rnd_np.randint(len(supported_duster_views_sets))]
                views_to_return = rnd_np.choice(duster_views, num_views, replace=False).tolist()
            else:
                views_to_return = rnd_np.choice(all_views, num_views, replace=False).tolist()
                duster_views = views_to_return
        else:
            num_views = self.num_views
            if self.views_to_return is not None:
                assert num_views == -1, "Cannot use views_to_return with num_views != -1."
                views_to_return = self.views_to_return
            elif use_duster_depths:
                if self.duster_views is not None:
                    duster_views = self.duster_views
                else:
                    # Take only those that have the closest number of views that is greater or equal to num_views
                    closest_num_views_in_supported_duster_views_set = min([
                        len(vs)
                        for vs in self.supported_duster_views_sets
                        if len(vs) >= num_views
                    ])
                    supported_duster_views_sets = [
                        vs
                        for vs in self.supported_duster_views_sets
                        if len(vs) == closest_num_views_in_supported_duster_views_set
                    ]
                    duster_views = supported_duster_views_sets[rnd_np.randint(len(supported_duster_views_sets))]
                views_to_return = duster_views
            else:
                if num_views == -1:
                    # Take all views
                    views_to_return = all_views
                elif num_views is None:
                    # Randomly sample a number of views
                    n = rnd_np.randint(min(3, len(views)), len(views) + 1)
                    views_to_return = rnd_np.choice(all_views, n, replace=False).tolist()
                else:
                    # Take a fixed number of views
                    assert num_views > 0, "Fixed number of views must be positive."
                    assert num_views <= len(views), f"Not enough views available (idx={index})."
                    views_to_return = rnd_np.choice(all_views, num_views, replace=False).tolist()
            if self.duster_views is not None:

Download .txt

gitextract_p_4pogs3/

├── .gitignore
├── README.md
├── configs/
│   ├── eval.yaml
│   ├── experiment/
│   │   ├── mvtracker.yaml
│   │   ├── mvtracker_overfit.yaml
│   │   └── mvtracker_overfit_mini.yaml
│   ├── model/
│   │   ├── copycat.yaml
│   │   ├── cotracker1_offline.yaml
│   │   ├── cotracker1_online.yaml
│   │   ├── cotracker2_offline.yaml
│   │   ├── cotracker2_online.yaml
│   │   ├── cotracker3_offline.yaml
│   │   ├── cotracker3_online.yaml
│   │   ├── default.yaml
│   │   ├── delta.yaml
│   │   ├── locotrack.yaml
│   │   ├── mvtracker.yaml
│   │   ├── scenetracker.yaml
│   │   ├── spatialtrackerv2.yaml
│   │   ├── spatracker_monocular.yaml
│   │   ├── spatracker_monocular_pretrained.yaml
│   │   ├── spatracker_multiview.yaml
│   │   └── tapip3d.yaml
│   └── train.yaml
├── demo.py
├── hubconf.py
├── mvtracker/
│   ├── __init__.py
│   ├── cli/
│   │   ├── __init__.py
│   │   ├── eval.py
│   │   ├── train.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── helpers.py
│   │       ├── pylogger.py
│   │       └── rich_utils.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── dexycb_multiview_dataset.py
│   │   ├── generic_scene_dataset.py
│   │   ├── kubric_multiview_dataset.py
│   │   ├── panoptic_studio_multiview_dataset.py
│   │   ├── tap_vid_datasets.py
│   │   └── utils.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── evaluator_3dpt.py
│   │   └── metrics.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── copycat.py
│   │   │   ├── cotracker2/
│   │   │   │   ├── __init__.py
│   │   │   │   └── blocks.py
│   │   │   ├── dpt/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_model.py
│   │   │   │   ├── blocks.py
│   │   │   │   ├── midas_net.py
│   │   │   │   ├── models.py
│   │   │   │   ├── transforms.py
│   │   │   │   └── vit.py
│   │   │   ├── dynamic3dgs/
│   │   │   │   ├── LICENSE.md
│   │   │   │   ├── colormap.py
│   │   │   │   ├── export_depths_from_pretrained_checkpoint.py
│   │   │   │   ├── external.py
│   │   │   │   ├── helpers.py
│   │   │   │   ├── merge_tapvid3d_per_camera_annotations.py
│   │   │   │   ├── metadata_dexycb.py
│   │   │   │   ├── metadata_kubric.py
│   │   │   │   ├── reorganize_dexycb.py
│   │   │   │   ├── test.py
│   │   │   │   ├── track_2d.py
│   │   │   │   ├── track_3d.py
│   │   │   │   ├── train.py
│   │   │   │   └── visualize.py
│   │   │   ├── embeddings.py
│   │   │   ├── loftr/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── linear_attention.py
│   │   │   │   └── transformer.py
│   │   │   ├── losses.py
│   │   │   ├── model_utils.py
│   │   │   ├── monocular_baselines.py
│   │   │   ├── mvtracker/
│   │   │   │   ├── __init__.py
│   │   │   │   └── mvtracker.py
│   │   │   ├── ptv3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── model.py
│   │   │   │   └── serialization/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── default.py
│   │   │   │       ├── hilbert.py
│   │   │   │       └── z_order.py
│   │   │   ├── shape-of-motion/
│   │   │   │   ├── .gitignore
│   │   │   │   ├── .gitmodules
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── flow3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── configs.py
│   │   │   │   │   ├── data/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── base_dataset.py
│   │   │   │   │   │   ├── casual_dataset.py
│   │   │   │   │   │   ├── colmap.py
│   │   │   │   │   │   ├── iphone_dataset.py
│   │   │   │   │   │   ├── panoptic_dataset.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   ├── init_utils.py
│   │   │   │   │   ├── loss_utils.py
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   ├── params.py
│   │   │   │   │   ├── renderer.py
│   │   │   │   │   ├── scene_model.py
│   │   │   │   │   ├── tensor_dataclass.py
│   │   │   │   │   ├── trainer.py
│   │   │   │   │   ├── trajectories.py
│   │   │   │   │   ├── transforms.py
│   │   │   │   │   ├── validator.py
│   │   │   │   │   └── vis/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── playback_panel.py
│   │   │   │   │       ├── render_panel.py
│   │   │   │   │       ├── utils.py
│   │   │   │   │       └── viewer.py
│   │   │   │   └── launch_davis.py
│   │   │   ├── spatracker/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── blocks.py
│   │   │   │   ├── softsplat.py
│   │   │   │   ├── spatracker_monocular.py
│   │   │   │   └── spatracker_multiview.py
│   │   │   ├── vggt/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── heads/
│   │   │   │   │   ├── camera_head.py
│   │   │   │   │   ├── dpt_head.py
│   │   │   │   │   ├── head_act.py
│   │   │   │   │   ├── track_head.py
│   │   │   │   │   ├── track_modules/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── base_track_predictor.py
│   │   │   │   │   │   ├── blocks.py
│   │   │   │   │   │   ├── modules.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── layers/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── block.py
│   │   │   │   │   ├── drop_path.py
│   │   │   │   │   ├── layer_scale.py
│   │   │   │   │   ├── mlp.py
│   │   │   │   │   ├── patch_embed.py
│   │   │   │   │   ├── rope.py
│   │   │   │   │   ├── swiglu_ffn.py
│   │   │   │   │   └── vision_transformer.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── aggregator.py
│   │   │   │   │   └── vggt.py
│   │   │   │   └── utils/
│   │   │   │       ├── geometry.py
│   │   │   │       ├── load_fn.py
│   │   │   │       ├── pose_enc.py
│   │   │   │       ├── rotation.py
│   │   │   │       └── visual_track.py
│   │   │   └── vit/
│   │   │       ├── __init__.py
│   │   │       ├── common.py
│   │   │       └── encoder.py
│   │   └── evaluation_predictor_3dpt.py
│   └── utils/
│       ├── __init__.py
│       ├── basic.py
│       ├── eval_utils.py
│       ├── geom.py
│       ├── improc.py
│       ├── misc.py
│       ├── visualizer_mp4.py
│       └── visualizer_rerun.py
├── requirements.full.txt
├── requirements.txt
└── scripts/
    ├── 4ddress_preprocessing.py
    ├── __init__.py
    ├── compare_cdist-topk_against_pointops-knn.py
    ├── dex_ycb_to_neus_format.py
    ├── egoexo4d_preprocessing.py
    ├── estimate_depth_with_duster.py
    ├── hi4d_preprocessing.py
    ├── merge_comparison_mp4s.py
    ├── panoptic_studio_preprocessing.py
    ├── plot_aj_for_varying_depth_noise_levels.py
    ├── plot_aj_for_varying_n_of_views.py
    ├── profiling.md
    ├── selfcap_preprocessing.py
    ├── slurm/
    │   ├── eval.sh
    │   ├── mvtracker-nodepthaugs.sh
    │   ├── mvtracker.sh
    │   ├── spatracker.sh
    │   ├── test_reproducibility.sh
    │   ├── triplane-128.sh
    │   └── triplane-256.sh
    └── summarize_eval_results.py

Download .txt

SYMBOL INDEX (1221 symbols across 118 files)

FILE: demo.py
  function main (line 13) | def main():

FILE: hubconf.py
  function _load_ckpt (line 14) | def _load_ckpt(spec: str):
  function _extract_model_state (line 26) | def _extract_model_state(sd):
  function _build_model (line 44) | def _build_model(**overrides):
  function _load_into (line 69) | def _load_into(model, checkpoint_key: str):
  function mvtracker_model (line 78) | def mvtracker_model(*,
  function mvtracker_predictor (line 96) | def mvtracker_predictor(*,
  function mvtracker (line 132) | def mvtracker(pretrained: bool = True, device: str = "cuda"):
  function mvtracker_cleandepth (line 137) | def mvtracker_cleandepth(pretrained: bool = True, device: str = "cuda"):

FILE: mvtracker/cli/eval.py
  function main (line 8) | def main(cfg: DictConfig):

FILE: mvtracker/cli/train.py
  function fetch_optimizer (line 55) | def fetch_optimizer(trainer_cfg, model):
  function forward_batch_multi_view (line 78) | def forward_batch_multi_view(batch, model, cfg, step, train_iters, gamma...
  function run_test_eval (line 232) | def run_test_eval(cfg, evaluator, model, dataloaders, writer, step):
  function augment_train_iters (line 317) | def augment_train_iters(train_iters: int, current_step: int, warmup_step...
  function main (line 344) | def main(cfg: DictConfig):

FILE: mvtracker/cli/utils/helpers.py
  function extras (line 15) | def extras(cfg: DictConfig) -> None:
  function task_wrapper (line 50) | def task_wrapper(task_func: Callable) -> Callable:
  function get_metric_value (line 105) | def get_metric_value(metric_dict: Dict[str, Any], metric_name: Optional[...
  function maybe_close_wandb (line 129) | def maybe_close_wandb(fn: Callable) -> Callable:

FILE: mvtracker/cli/utils/pylogger.py
  class RankedLogger (line 7) | class RankedLogger(logging.LoggerAdapter):
    method __init__ (line 10) | def __init__(
    method log (line 27) | def log(self, level: int, msg: str, rank: Optional[int] = None, *args,...

FILE: mvtracker/cli/utils/rich_utils.py
  function print_config_tree (line 18) | def print_config_tree(
  function enforce_tags (line 74) | def enforce_tags(cfg: DictConfig, save_to_file: bool = False) -> None:

FILE: mvtracker/datasets/dexycb_multiview_dataset.py
  class DexYCBMultiViewDataset (line 20) | class DexYCBMultiViewDataset(Dataset):
    method from_name (line 23) | def from_name(dataset_name: str, dataset_root: str):
    method __init__ (line 115) | def __init__(
    method _get_sequence_names (line 146) | def _get_sequence_names(self, max_videos):
    method _cache_key (line 184) | def _cache_key(self):
    method __len__ (line 194) | def __len__(self):
    method __getitem__ (line 197) | def __getitem__(self, index):
    method _getitem_helper (line 207) | def _getitem_helper(self, index):
  function rerun_viz_scene (line 572) | def rerun_viz_scene(entity_prefix, rgbs, depths, intrs, extrs, tracks, r...

FILE: mvtracker/datasets/generic_scene_dataset.py
  class GenericSceneDataset (line 20) | class GenericSceneDataset(Dataset):
    method __init__ (line 21) | def __init__(
    method __len__ (line 95) | def __len__(self):
    method __getitem__ (line 98) | def __getitem__(self, idx):
  function compute_auto_scene_normalization (line 288) | def compute_auto_scene_normalization(
  function _ensure_moge2_cache_and_load (line 361) | def _ensure_moge2_cache_and_load(rgbs, seq_name, dataset_root, moge2_cac...
  function _ensure_monofusion_cache_and_load (line 387) | def _ensure_monofusion_cache_and_load(rgbs, seq_name, dataset_root, mono...
  function _static_bg_mask_from_window (line 511) | def _static_bg_mask_from_window(
  function _moge_depths (line 555) | def _moge_depths(seq_name, rgbs, cache_root, resize_to=512, batch_size=18):
  function _ensure_vggt_raw_cache_and_load (line 607) | def _ensure_vggt_raw_cache_and_load(
  function _vggt_load_and_preprocess_images (line 720) | def _vggt_load_and_preprocess_images(image_items, mode="crop"):
  function _ensure_vggt_aligned_cache_and_load (line 854) | def _ensure_vggt_aligned_cache_and_load(

FILE: mvtracker/datasets/kubric_multiview_dataset.py
  class KubricMultiViewDataset (line 27) | class KubricMultiViewDataset(torch.utils.data.Dataset):
    method from_name (line 30) | def from_name(
    method __init__ (line 206) | def __init__(
    method _cache_key (line 431) | def _cache_key(self):
    method __len__ (line 445) | def __len__(self):
    method __getitem__ (line 448) | def __getitem__(self, index):
    method _getitem_helper (line 470) | def _getitem_helper(self, index):
    method getitem_raw_datapoint (line 1114) | def getitem_raw_datapoint(scene_path, perform_2d_projection_sanity_che...
    method depth_from_euclidean_to_z (line 1258) | def depth_from_euclidean_to_z(depth, sensor_width, focal_length):
    method _add_photometric_augs (line 1276) | def _add_photometric_augs(
    method _add_cropping_augs (line 1405) | def _add_cropping_augs(self, crop_size, rgbs, depths, intrs, trajs, vi...
    method _add_cropping_augs_with_pp_at_center (line 1570) | def _add_cropping_augs_with_pp_at_center(self, crop_size, rgbs, depths...
    method _rescale_and_erase_depth_patches (line 1656) | def _rescale_and_erase_depth_patches(self, depths, trajs, visibles, rn...
    method _crop (line 1722) | def _crop(self, rgbs, trajs, crop_size):

FILE: mvtracker/datasets/panoptic_studio_multiview_dataset.py
  class PanopticStudioMultiViewDataset (line 19) | class PanopticStudioMultiViewDataset(Dataset):
    method from_name (line 21) | def from_name(dataset_name: str, dataset_root: str):
    method __init__ (line 100) | def __init__(
    method _get_sequence_names (line 127) | def _get_sequence_names(self, max_videos):
    method _cache_key (line 161) | def _cache_key(self):
    method __len__ (line 169) | def __len__(self):
    method __getitem__ (line 172) | def __getitem__(self, index):
    method _getitem_helper (line 182) | def _getitem_helper(self, index):

FILE: mvtracker/datasets/tap_vid_datasets.py
  function resize_video (line 30) | def resize_video(video: np.ndarray, output_size: Tuple[int, int]) -> np....
  function sample_queries_first (line 37) | def sample_queries_first(
  function sample_queries_strided (line 79) | def sample_queries_strided(
  class TapVidDataset (line 143) | class TapVidDataset(torch.utils.data.Dataset):
    method from_name (line 146) | def from_name(dataset_name: str, dataset_root: str):
    method __init__ (line 196) | def __init__(
    method __getitem__ (line 250) | def __getitem__(self, index):
    method __len__ (line 704) | def __len__(self):
  function zoedepth_nk (line 709) | def zoedepth_nk(rgbs, batch_size=2, device="cuda", cached_file=None):
  function rigid_registration (line 734) | def rigid_registration(
  function rigid_registration_ransac (line 762) | def rigid_registration_ransac(
  function to_homogeneous (line 795) | def to_homogeneous(x):
  function from_homogeneous (line 799) | def from_homogeneous(x, assert_homogeneous_part_is_equal_to_1=False, eps...
  function to_homogenous_torch (line 805) | def to_homogenous_torch(x):
  function moge (line 810) | def moge(rgbs, batch_size=10, device="cuda", cached_file=None, intrinsic...
  function megasam (line 920) | def megasam(rgbs: torch.Tensor, batch_size: int = 10, device: str = "cud...

FILE: mvtracker/datasets/utils.py
  class Datapoint (line 23) | class Datapoint:
  function collate_fn (line 55) | def collate_fn(batch):
  function try_to_cuda (line 144) | def try_to_cuda(t: Any) -> Any:
  function dataclass_to_cuda_ (line 161) | def dataclass_to_cuda_(obj):
  function read_json (line 176) | def read_json(filename: str) -> Any:
  function read_tiff (line 181) | def read_tiff(filename: str) -> np.ndarray:
  function read_png (line 189) | def read_png(filename: str, rescale_range=None) -> np.ndarray:
  function transform_scene (line 210) | def transform_scene(
  function add_camera_noise (line 304) | def add_camera_noise(intrs, extrs, noise_std_intr=0.01, noise_std_extr=0...
  function aug_depth (line 332) | def aug_depth(depth, grid=(8, 8), scale=(0.7, 1.3), shift=(-0.1, 0.1),
  function align_umeyama (line 362) | def align_umeyama(model, data, known_scale=False, yaw_only=False):
  function get_camera_center (line 397) | def get_camera_center(extr):
  function apply_sim3_to_extrinsics (line 403) | def apply_sim3_to_extrinsics(vggt_extrinsics, s, R_align, t_align):
  function get_best_yaw (line 418) | def get_best_yaw(C):
  function rot_z (line 431) | def rot_z(theta):

FILE: mvtracker/evaluation/evaluator_3dpt.py
  class NumpyEncoder (line 29) | class NumpyEncoder(json.JSONEncoder):
    method default (line 30) | def default(self, obj):
  function kmeans_sample (line 42) | def kmeans_sample(pts, count):
  function evaluate_3dpt (line 62) | def evaluate_3dpt(
  class Evaluator (line 176) | class Evaluator:
    method __init__ (line 177) | def __init__(
    method evaluate_sequence (line 212) | def evaluate_sequence(

FILE: mvtracker/evaluation/metrics.py
  function compute_metrics (line 10) | def compute_metrics(
  function compute_tapvid_metrics (line 61) | def compute_tapvid_metrics(
  function compute_tapvid_metrics_original (line 174) | def compute_tapvid_metrics_original(
  function evaluate_predictions (line 303) | def evaluate_predictions(

FILE: mvtracker/models/core/copycat.py
  class CopyCat (line 5) | class CopyCat(nn.Module):
    method __init__ (line 10) | def __init__(self):
    method forward (line 14) | def forward(

FILE: mvtracker/models/core/cotracker2/blocks.py
  function _ntuple (line 17) | def _ntuple(n):
  function exists (line 26) | def exists(val):
  function default (line 30) | def default(val, d):
  class Mlp (line 37) | class Mlp(nn.Module):
    method __init__ (line 40) | def __init__(
    method forward (line 61) | def forward(self, x):
  class ResidualBlock (line 70) | class ResidualBlock(nn.Module):
    method __init__ (line 71) | def __init__(self, in_planes, planes, norm_fn="group", stride=1):
    method forward (line 119) | def forward(self, x):
  class BasicEncoder (line 130) | class BasicEncoder(nn.Module):
    method __init__ (line 131) | def __init__(self, input_dim=3, output_dim=128, stride=4):
    method _make_layer (line 172) | def _make_layer(self, dim, stride=1):
    method forward (line 180) | def forward(self, x):
  class Attention (line 212) | class Attention(nn.Module):
    method __init__ (line 213) | def __init__(self, query_dim, context_dim=None, num_heads=8, dim_head=...
    method forward (line 224) | def forward(self, x, context=None, attn_mask=None):
  class FlashAttention (line 246) | class FlashAttention(nn.Module):
    method __init__ (line 247) | def __init__(self, query_dim, context_dim=None, num_heads=8, dim_head=...
    method forward (line 258) | def forward(self, x, context=None, attn_mask=None):
  class AttnBlock (line 274) | class AttnBlock(nn.Module):
    method __init__ (line 275) | def __init__(
    method forward (line 297) | def forward(self, x, attn_mask=None):
  class CrossAttnBlock (line 303) | class CrossAttnBlock(nn.Module):
    method __init__ (line 304) | def __init__(
    method forward (line 334) | def forward(self, x, context, attn_mask=None):
  class EfficientUpdateFormer (line 340) | class EfficientUpdateFormer(nn.Module):
    method __init__ (line 345) | def __init__(
    method initialize_weights (line 436) | def initialize_weights(self):
    method forward (line 455) | def forward(self, input_tensor, mask=None):

FILE: mvtracker/models/core/dpt/base_model.py
  class BaseModel (line 4) | class BaseModel(torch.nn.Module):
    method load (line 5) | def load(self, path):

FILE: mvtracker/models/core/dpt/blocks.py
  function _make_encoder (line 12) | def _make_encoder(
  function _make_scratch (line 77) | def _make_scratch(in_shape, out_shape, groups=1, expand=False):
  function _make_resnet_backbone (line 130) | def _make_resnet_backbone(resnet):
  function _make_pretrained_resnext101_wsl (line 143) | def _make_pretrained_resnext101_wsl(use_pretrained):
  class Interpolate (line 148) | class Interpolate(nn.Module):
    method __init__ (line 151) | def __init__(self, scale_factor, mode, align_corners=False):
    method forward (line 165) | def forward(self, x):
  class ResidualConvUnit (line 185) | class ResidualConvUnit(nn.Module):
    method __init__ (line 188) | def __init__(self, features):
    method forward (line 206) | def forward(self, x):
  class FeatureFusionBlock (line 223) | class FeatureFusionBlock(nn.Module):
    method __init__ (line 226) | def __init__(self, features):
    method forward (line 237) | def forward(self, *xs):
  class ResidualConvUnit_custom (line 257) | class ResidualConvUnit_custom(nn.Module):
    method __init__ (line 260) | def __init__(self, features, activation, bn):
    method forward (line 300) | def forward(self, x):
  class FeatureFusionBlock_custom (line 328) | class FeatureFusionBlock_custom(nn.Module):
    method __init__ (line 331) | def __init__(
    method forward (line 372) | def forward(self, *xs):

FILE: mvtracker/models/core/dpt/midas_net.py
  class MidasNet_large (line 12) | class MidasNet_large(BaseModel):
    method __init__ (line 15) | def __init__(self, path=None, features=256, non_negative=True):
    method forward (line 50) | def forward(self, x):

FILE: mvtracker/models/core/dpt/models.py
  function _make_fusion_block (line 14) | def _make_fusion_block(features, use_bn):
  class DPT (line 25) | class DPT(BaseModel):
    method __init__ (line 26) | def __init__(
    method forward (line 87) | def forward(self, x, only_enc=False):
  class DPTDepthModel (line 142) | class DPTDepthModel(DPT):
    method __init__ (line 143) | def __init__(
    method forward (line 167) | def forward(self, x):
  class DPTEncoder (line 179) | class DPTEncoder(DPT):
    method __init__ (line 180) | def __init__(
    method forward (line 197) | def forward(self, x):
  class DPTSegmentationModel (line 203) | class DPTSegmentationModel(DPT):
    method __init__ (line 204) | def __init__(self, num_classes, path=None, **kwargs):

FILE: mvtracker/models/core/dpt/transforms.py
  function apply_min_size (line 6) | def apply_min_size(sample, size, image_interpolation_method=cv2.INTER_AR...
  class Resize (line 48) | class Resize(object):
    method __init__ (line 51) | def __init__(
    method constrain_to_multiple_of (line 93) | def constrain_to_multiple_of(self, x, min_val=0, max_val=None):
    method get_size (line 104) | def get_size(self, width, height):
    method __call__ (line 161) | def __call__(self, sample):
  class NormalizeImage (line 196) | class NormalizeImage(object):
    method __init__ (line 199) | def __init__(self, mean, std):
    method __call__ (line 203) | def __call__(self, sample):
  class PrepareForNet (line 209) | class PrepareForNet(object):
    method __init__ (line 212) | def __init__(self):
    method __call__ (line 215) | def __call__(self, sample):

FILE: mvtracker/models/core/dpt/vit.py
  function get_activation (line 12) | def get_activation(name):
  function get_attention (line 22) | def get_attention(name):
  function get_mean_attention_map (line 45) | def get_mean_attention_map(attn, token, shape):
  class Slice (line 57) | class Slice(nn.Module):
    method __init__ (line 58) | def __init__(self, start_index=1):
    method forward (line 62) | def forward(self, x):
  class AddReadout (line 66) | class AddReadout(nn.Module):
    method __init__ (line 67) | def __init__(self, start_index=1):
    method forward (line 71) | def forward(self, x):
  class ProjectReadout (line 79) | class ProjectReadout(nn.Module):
    method __init__ (line 80) | def __init__(self, in_features, start_index=1):
    method forward (line 86) | def forward(self, x):
  class Transpose (line 93) | class Transpose(nn.Module):
    method __init__ (line 94) | def __init__(self, dim0, dim1):
    method forward (line 99) | def forward(self, x):
  function forward_vit (line 104) | def forward_vit(pretrained, x):
  function _resize_pos_embed (line 148) | def _resize_pos_embed(self, posemb, gs_h, gs_w):
  function forward_flex (line 165) | def forward_flex(self, x):
  function get_readout_oper (line 203) | def get_readout_oper(vit_features, features, use_readout, start_index=1):
  function _make_vit_b16_backbone (line 220) | def _make_vit_b16_backbone(
  function _make_vit_b_rn50_backbone (line 350) | def _make_vit_b_rn50_backbone(
  function _make_pretrained_vitb_rn50_384 (line 493) | def _make_pretrained_vitb_rn50_384(
  function _make_pretrained_vit_tiny (line 515) | def _make_pretrained_vit_tiny(
  function _make_pretrained_vitl16_384 (line 538) | def _make_pretrained_vitl16_384(
  function _make_pretrained_vitb16_384 (line 554) | def _make_pretrained_vitb16_384(
  function _make_pretrained_deitb16_384 (line 569) | def _make_pretrained_deitb16_384(
  function _make_pretrained_deitb16_distil_384 (line 584) | def _make_pretrained_deitb16_distil_384(

FILE: mvtracker/models/core/dynamic3dgs/export_depths_from_pretrained_checkpoint.py
  function load_scene_data (line 14) | def load_scene_data(params_path, seg_as_col=False):
  function render (line 33) | def render(w, h, k, w2c, timestep_data, near=0.01, far=100.0):
  function export_depth (line 41) | def export_depth(scene_root, output_root, checkpoint_path):

FILE: mvtracker/models/core/dynamic3dgs/external.py
  function build_rotation (line 24) | def build_rotation(q):
  function calc_mse (line 44) | def calc_mse(img1, img2):
  function calc_psnr (line 48) | def calc_psnr(img1, img2):
  function gaussian (line 53) | def gaussian(window_size, sigma):
  function create_window (line 58) | def create_window(window_size, channel):
  function calc_ssim (line 65) | def calc_ssim(img1, img2, window_size=11, size_average=True):
  function _ssim (line 76) | def _ssim(img1, img2, window, window_size, channel, size_average=True):
  function accumulate_mean2d_gradient (line 99) | def accumulate_mean2d_gradient(variables):
  function update_params_and_optimizer (line 106) | def update_params_and_optimizer(new_params, params, optimizer):
  function cat_params_to_optimizer (line 121) | def cat_params_to_optimizer(new_params, params, optimizer):
  function remove_points (line 138) | def remove_points(to_remove, params, variables, optimizer):
  function inverse_sigmoid (line 160) | def inverse_sigmoid(x):
  function densify (line 164) | def densify(params, variables, optimizer, i):

FILE: mvtracker/models/core/dynamic3dgs/helpers.py
  function setup_camera (line 9) | def setup_camera(w, h, k, w2c, near=0.01, far=100):
  function params2rendervar (line 35) | def params2rendervar(params):
  function l1_loss_v1 (line 47) | def l1_loss_v1(x, y):
  function l1_loss_v2 (line 51) | def l1_loss_v2(x, y):
  function weighted_l2_loss_v1 (line 55) | def weighted_l2_loss_v1(x, y, w):
  function weighted_l2_loss_v2 (line 59) | def weighted_l2_loss_v2(x, y, w):
  function quat_mult (line 63) | def quat_mult(q1, q2):
  function o3d_knn (line 73) | def o3d_knn(pts, num_knn):
  function params2cpu (line 86) | def params2cpu(params, is_initial_timestep):
  function save_params (line 95) | def save_params(output_params, seq, exp):

FILE: mvtracker/models/core/dynamic3dgs/merge_tapvid3d_per_camera_annotations.py
  function to_homogeneous (line 18) | def to_homogeneous(x):
  function from_homogeneous (line 22) | def from_homogeneous(x, assert_homogeneous_part_is_equal_to_1=False, eps...
  function load_scene_data (line 28) | def load_scene_data(params_path, seg_as_col=False):
  function render (line 47) | def render(h, w, k, w2c, timestep_data, near=0.01, far=100.0):
  function merge_annotations (line 55) | def merge_annotations(

FILE: mvtracker/models/core/dynamic3dgs/test.py
  function load_saved_params (line 17) | def load_saved_params(seq, exp):
  function prepare_test_dataset (line 25) | def prepare_test_dataset(t, md, seq, exclude_cam_ids):
  function render_image (line 47) | def render_image(cam, rendervar):
  function test (line 54) | def test(seq, exp, exclude_cam_ids=[]):

FILE: mvtracker/models/core/dynamic3dgs/track_2d.py
  function gaussian_influence (line 18) | def gaussian_influence(point, gaussians):
  function render_depth (line 72) | def render_depth(timestep_data, w2c, k):
  function load_scene_data (line 96) | def load_scene_data(seq, exp, seg_as_col=False):
  function unproject_2d_to_3d (line 124) | def unproject_2d_to_3d(query_pt, depth_map, intrinsics):
  function load_camera_params (line 139) | def load_camera_params(dataset_path, seq, cam_id_g):
  function c2w_convert (line 150) | def c2w_convert(point_3d, w2c):
  function w2c_convert (line 157) | def w2c_convert(point_3d_h, w2c):
  function track_query_point (line 163) | def track_query_point(scene_data, query_point, depth_map, w2c, k, t_give...

FILE: mvtracker/models/core/dynamic3dgs/track_3d.py
  function load_scene_data (line 19) | def load_scene_data(seq, exp, seg_as_col=False):
  function load_depth_maps (line 45) | def load_depth_maps(dataset_path, seq, cam_ids):
  function preload_camera_data (line 60) | def preload_camera_data(dataset_path, seq, cam_ids):
  function gaussian_influence (line 79) | def gaussian_influence(point, gaussians):
  function get_visibilities (line 135) | def get_visibilities(
  function track_query_point (line 170) | def track_query_point(

FILE: mvtracker/models/core/dynamic3dgs/train.py
  function get_dataset (line 17) | def get_dataset(t, md, seq):
  function get_batch (line 32) | def get_batch(todo_dataset, dataset):
  function initialize_params (line 39) | def initialize_params(seq, md):
  function initialize_optimizer (line 66) | def initialize_optimizer(params, variables):
  function get_loss (line 81) | def get_loss(params, curr_data, variables, is_initial_timestep):
  function initialize_per_timestep (line 133) | def initialize_per_timestep(params, variables, optimizer):
  function initialize_post_first_timestep (line 156) | def initialize_post_first_timestep(params, variables, optimizer, num_knn...
  function report_progress (line 179) | def report_progress(params, data, i, progress_bar, every_i=100):
  function train (line 189) | def train(seq, exp):

FILE: mvtracker/models/core/dynamic3dgs/visualize.py
  function load_scene_data (line 39) | def load_scene_data(params_path, seg_as_col=False):
  function render (line 62) | def render(w2c, k, timestep_data):
  function log_tracks_to_rerun (line 70) | def log_tracks_to_rerun(
  function visualize (line 165) | def visualize(seq, exp):

FILE: mvtracker/models/core/embeddings.py
  function get_3d_sincos_pos_embed (line 11) | def get_3d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra...
  function get_3d_sincos_pos_embed_from_grid (line 35) | def get_3d_sincos_pos_embed_from_grid(embed_dim, grid):
  function get_2d_sincos_pos_embed (line 53) | def get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra...
  function get_2d_sincos_pos_embed_from_grid (line 77) | def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
  function get_1d_sincos_pos_embed_from_grid (line 88) | def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
  function get_2d_embedding (line 109) | def get_2d_embedding(xy, C, cat_coords=True):
  function get_3d_embedding (line 134) | def get_3d_embedding(xyz, C, cat_coords=True):
  function get_4d_embedding (line 164) | def get_4d_embedding(xyzw, C, cat_coords=True):
  class Embedder_Fourier (line 202) | class Embedder_Fourier(nn.Module):
    method __init__ (line 203) | def __init__(self, input_dim, max_freq_log2, N_freqs,
    method forward (line 234) | def forward(self,

FILE: mvtracker/models/core/loftr/linear_attention.py
  function elu_feature_map (line 10) | def elu_feature_map(x):
  class LinearAttention (line 14) | class LinearAttention(Module):
    method __init__ (line 15) | def __init__(self, eps=1e-6):
    method forward (line 20) | def forward(self, queries, keys, values, q_mask=None, kv_mask=None):
  class FullAttention (line 50) | class FullAttention(Module):
    method __init__ (line 51) | def __init__(self, use_dropout=False, attention_dropout=0.1):
    method forward (line 56) | def forward(self, queries, keys, values, q_mask=None, kv_mask=None):

FILE: mvtracker/models/core/loftr/transformer.py
  function elu_feature_map (line 13) | def elu_feature_map(x):
  class FullAttention (line 17) | class FullAttention(Module):
    method __init__ (line 18) | def __init__(self, use_dropout=False, attention_dropout=0.1):
    method forward (line 23) | def forward(self, queries, keys, values, q_mask=None, kv_mask=None):
  class TransformerEncoderLayer (line 53) | class TransformerEncoderLayer(nn.Module):
    method __init__ (line 54) | def __init__(self,
    method forward (line 80) | def forward(self, x, source, x_mask=None, source_mask=None):
  class LocalFeatureTransformer (line 106) | class LocalFeatureTransformer(nn.Module):
    method __init__ (line 109) | def __init__(self, config):
    method _reset_parameters (line 120) | def _reset_parameters(self):
    method forward (line 125) | def forward(self, feat0, feat1, mask0=None, mask1=None):

FILE: mvtracker/models/core/losses.py
  function balanced_ce_loss (line 22) | def balanced_ce_loss(pred, gt, valid=None):
  function sequence_loss_3d (line 49) | def sequence_loss_3d(flow_preds, flow_gt, vis, valids, gamma=0.8, dmin=0...

FILE: mvtracker/models/core/model_utils.py
  function smart_cat (line 18) | def smart_cat(tensor1, tensor2, dim):
  function normalize_single (line 24) | def normalize_single(d):
  function normalize (line 32) | def normalize(d):
  function meshgrid2d (line 43) | def meshgrid2d(B, Y, X, stack=False, norm=False, device="cuda"):
  function reduce_masked_mean (line 63) | def reduce_masked_mean(x, mask, dim=None, keepdim=False):
  function bilinear_sample2d (line 81) | def bilinear_sample2d(im, x, y, return_inbounds=False):
  function procrustes_analysis (line 168) | def procrustes_analysis(X0, X1, Weight):  # [B,N,3]
  function bilinear_sampler (line 189) | def bilinear_sampler(input, coords, align_corners=True, padding_mode="bo...
  function sample_features4d (line 252) | def sample_features4d(input, coords):
  function sample_features5d (line 287) | def sample_features5d(input, coords):
  function pixel_xy_and_camera_z_to_world_space (line 320) | def pixel_xy_and_camera_z_to_world_space(pixel_xy, camera_z, intrs_inv, ...
  function world_space_to_pixel_xy_and_camera_z (line 344) | def world_space_to_pixel_xy_and_camera_z(world_xyz, intrs, extrs):
  function get_points_on_a_grid (line 361) | def get_points_on_a_grid(
  function init_pointcloud_from_rgbd (line 420) | def init_pointcloud_from_rgbd(
  function save_pointcloud_to_ply (line 485) | def save_pointcloud_to_ply(filename, points, colors, edges=None):

FILE: mvtracker/models/core/monocular_baselines.py
  class CoTrackerOfflineWrapper (line 16) | class CoTrackerOfflineWrapper(nn.Module):
    method __init__ (line 17) | def __init__(self, model_name="cotracker3_offline", grid_size=10):
    method forward (line 22) | def forward(self, rgbs, queries, **kwargs):
  class CoTrackerOnlineWrapper (line 39) | class CoTrackerOnlineWrapper(nn.Module):
    method __init__ (line 40) | def __init__(self, model_name="cotracker3_online", grid_size=10):
    method forward (line 45) | def forward(self, rgbs, queries, **kwargs):
  class SpaTrackerV2Wrapper (line 65) | class SpaTrackerV2Wrapper(nn.Module):
    method __init__ (line 85) | def __init__(
    method forward (line 104) | def forward(self, rgbs, depths, queries, queries_xyz_worldspace, intrs...
  class LocoTrackWrapper (line 160) | class LocoTrackWrapper(nn.Module):
    method __init__ (line 172) | def __init__(self, model_size="base"):
    method forward (line 179) | def forward(self, rgbs, queries, **kwargs):
  class TAPTRWrapper (line 216) | class TAPTRWrapper(nn.Module):
  class TAPIRWrapper (line 220) | class TAPIRWrapper(nn.Module):
  class PIPSWrapper (line 224) | class PIPSWrapper(nn.Module):
  class PIPSPlusPlusWrapper (line 228) | class PIPSPlusPlusWrapper(nn.Module):
  class SceneTrackerWrapper (line 232) | class SceneTrackerWrapper(nn.Module):
    method __init__ (line 244) | def __init__(
    method forward (line 264) | def forward(self, rgbs, depths, queries_with_z, **kwargs):
  class DELTAWrapper (line 293) | class DELTAWrapper(nn.Module):
    method __init__ (line 307) | def __init__(
    method forward (line 340) | def forward(self, rgbs, depths, queries, **kwargs):
  class TAPIP3DWrapper (line 370) | class TAPIP3DWrapper(nn.Module):
    method __init__ (line 383) | def __init__(
    method forward (line 409) | def forward(self, rgbs, depths, intrs, extrs, queries_xyz_worldspace, ...
    method inference (line 463) | def inference(
  class MonocularToMultiViewAdapter (line 540) | class MonocularToMultiViewAdapter(nn.Module):
    method __init__ (line 541) | def __init__(self, model, **kwargs):
    method forward (line 545) | def forward(
  function bilinear_sampler (line 737) | def bilinear_sampler(input, coords, align_corners=True, padding_mode="bo...

FILE: mvtracker/models/core/mvtracker/mvtracker.py
  function _knn_pointops (line 26) | def _knn_pointops(k: int, xyz_ref: torch.Tensor, xyz_query: torch.Tensor):
  function _knn_torch (line 75) | def _knn_torch(k: int, xyz_ref: torch.Tensor, xyz_query: torch.Tensor):
  class MVTracker (line 93) | class MVTracker(nn.Module):
    method __init__ (line 94) | def __init__(
    method fnet_fwd (line 185) | def fnet_fwd(self, rgbs_normalized, image_features=None):
    method init_stats (line 190) | def init_stats(self):
    method consume_stats (line 194) | def consume_stats(self):
    method forward_iteration (line 244) | def forward_iteration(
    method forward (line 412) | def forward(
  function compute_vggt_scene_normalization_transform (line 735) | def compute_vggt_scene_normalization_transform(depths, extrs, intrs):
  class PointcloudCorrBlock (line 769) | class PointcloudCorrBlock:
    method __init__ (line 770) | def __init__(
    method corr_sample (line 800) | def corr_sample(

FILE: mvtracker/models/core/ptv3/model.py
  function offset2bincount (line 30) | def offset2bincount(offset):
  function offset2batch (line 37) | def offset2batch(offset):
  function batch2offset (line 45) | def batch2offset(batch):
  class Point (line 49) | class Point(Dict):
    method __init__ (line 74) | def __init__(self, *args, **kwargs):
    method serialization (line 82) | def serialization(self, order="z", depth=None, shuffle_orders=False):
    method sparsify (line 139) | def sparsify(self, pad=96):
  class PointModule (line 178) | class PointModule(nn.Module):
    method __init__ (line 183) | def __init__(self, *args, **kwargs):
  class PointSequential (line 187) | class PointSequential(PointModule):
    method __init__ (line 193) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 208) | def __getitem__(self, idx):
    method __len__ (line 218) | def __len__(self):
    method add (line 221) | def add(self, module, name=None):
    method forward (line 228) | def forward(self, input):
  class PDNorm (line 256) | class PDNorm(PointModule):
    method __init__ (line 257) | def __init__(
    method forward (line 279) | def forward(self, point):
  class RPE (line 298) | class RPE(torch.nn.Module):
    method __init__ (line 299) | def __init__(self, patch_size, num_heads):
    method forward (line 308) | def forward(self, coord):
  class SerializedAttention (line 320) | class SerializedAttention(PointModule):
    method __init__ (line 321) | def __init__(
    method get_rel_pos (line 374) | def get_rel_pos(self, point, order):
    method get_padding_and_inverse (line 384) | def get_padding_and_inverse(self, point):
    method forward (line 441) | def forward(self, point):
  class MLP (line 494) | class MLP(nn.Module):
    method __init__ (line 495) | def __init__(
    method forward (line 511) | def forward(self, x):
  class Block (line 520) | class Block(PointModule):
    method __init__ (line 521) | def __init__(
    method forward (line 587) | def forward(self, point: Point):
  class SerializedPooling (line 610) | class SerializedPooling(PointModule):
    method __init__ (line 611) | def __init__(
    method forward (line 640) | def forward(self, point: Point):
  class SerializedUnpooling (line 716) | class SerializedUnpooling(PointModule):
    method __init__ (line 717) | def __init__(
    method forward (line 740) | def forward(self, point):
  class Embedding (line 754) | class Embedding(PointModule):
    method __init__ (line 755) | def __init__(
    method forward (line 782) | def forward(self, point: Point):
  class PointTransformerV3 (line 787) | class PointTransformerV3(PointModule):
    method __init__ (line 788) | def __init__(
    method forward (line 967) | def forward(self, data_dict):

FILE: mvtracker/models/core/ptv3/serialization/default.py
  function encode (line 10) | def encode(grid_coord, batch=None, depth=16, order="z"):
  function decode (line 29) | def decode(code, depth=16, order="z"):
  function z_order_encode (line 42) | def z_order_encode(grid_coord: torch.Tensor, depth: int = 16):
  function z_order_decode (line 49) | def z_order_decode(code: torch.Tensor, depth):
  function hilbert_encode (line 55) | def hilbert_encode(grid_coord: torch.Tensor, depth: int = 16):
  function hilbert_decode (line 59) | def hilbert_decode(code: torch.Tensor, depth: int = 16):

FILE: mvtracker/models/core/ptv3/serialization/hilbert.py
  function right_shift (line 12) | def right_shift(binary, k=1, axis=-1):
  function binary2gray (line 46) | def binary2gray(binary, axis=-1):
  function gray2binary (line 69) | def gray2binary(gray, axis=-1):
  function encode (line 91) | def encode(locs, num_dims, num_bits):
  function decode (line 201) | def decode(hilberts, num_dims, num_bits):

FILE: mvtracker/models/core/ptv3/serialization/z_order.py
  class KeyLUT (line 13) | class KeyLUT:
    method __init__ (line 14) | def __init__(self):
    method encode_lut (line 29) | def encode_lut(self, device=torch.device("cpu")):
    method decode_lut (line 35) | def decode_lut(self, device=torch.device("cpu")):
    method xyz2key (line 41) | def xyz2key(self, x, y, z, depth):
    method key2xyz (line 53) | def key2xyz(self, key, depth):
  function xyz2key (line 67) | def xyz2key(
  function key2xyz (line 105) | def key2xyz(key: torch.Tensor, depth: int = 16):

FILE: mvtracker/models/core/shape-of-motion/flow3d/configs.py
  class FGLRConfig (line 5) | class FGLRConfig:
  class BGLRConfig (line 15) | class BGLRConfig:
  class MotionLRConfig (line 24) | class MotionLRConfig:
  class SceneLRConfig (line 30) | class SceneLRConfig:
  class LossesConfig (line 37) | class LossesConfig:
  class OptimizerConfig (line 51) | class OptimizerConfig:

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/__init__.py
  function get_train_val_datasets (line 16) | def get_train_val_datasets(

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/base_dataset.py
  class BaseDataset (line 7) | class BaseDataset(Dataset):
    method num_frames (line 10) | def num_frames(self) -> int: ...
    method keyframe_idcs (line 13) | def keyframe_idcs(self) -> torch.Tensor:
    method get_w2cs (line 17) | def get_w2cs(self) -> torch.Tensor: ...
    method get_Ks (line 20) | def get_Ks(self) -> torch.Tensor: ...
    method get_image (line 23) | def get_image(self, index: int) -> torch.Tensor: ...
    method get_depth (line 26) | def get_depth(self, index: int) -> torch.Tensor: ...
    method get_mask (line 29) | def get_mask(self, index: int) -> torch.Tensor: ...
    method get_img_wh (line 31) | def get_img_wh(self) -> tuple[int, int]: ...
    method get_tracks_3d (line 34) | def get_tracks_3d(
    method get_bkgd_points (line 48) | def get_bkgd_points(
    method train_collate_fn (line 80) | def train_collate_fn(batch):

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/casual_dataset.py
  class DavisDataConfig (line 30) | class DavisDataConfig:
  class CustomDataConfig (line 54) | class CustomDataConfig:
  class CasualDataset (line 77) | class CasualDataset(BaseDataset):
    method __init__ (line 78) | def __init__(
    method num_frames (line 175) | def num_frames(self) -> int:
    method keyframe_idcs (line 179) | def keyframe_idcs(self) -> torch.Tensor:
    method __len__ (line 182) | def __len__(self):
    method get_w2cs (line 185) | def get_w2cs(self) -> torch.Tensor:
    method get_Ks (line 188) | def get_Ks(self) -> torch.Tensor:
    method get_img_wh (line 191) | def get_img_wh(self) -> tuple[int, int]:
    method get_image (line 194) | def get_image(self, index) -> torch.Tensor:
    method get_mask (line 200) | def get_mask(self, index) -> torch.Tensor:
    method get_depth (line 206) | def get_depth(self, index) -> torch.Tensor:
    method load_image (line 211) | def load_image(self, index) -> torch.Tensor:
    method load_mask (line 215) | def load_mask(self, index) -> torch.Tensor:
    method load_depth (line 232) | def load_depth(self, index) -> torch.Tensor:
    method load_target_tracks (line 240) | def load_target_tracks(
    method get_tracks_3d (line 257) | def get_tracks_3d(
    method get_bkgd_points (line 295) | def get_bkgd_points(
    method __getitem__ (line 398) | def __getitem__(self, index: int):
  function load_cameras (line 451) | def load_cameras(
  function compute_scene_norm (line 472) | def compute_scene_norm(

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/colmap.py
  function get_colmap_camera_params (line 10) | def get_colmap_camera_params(colmap_dir, img_files):
  class CameraModel (line 30) | class CameraModel:
  class Camera (line 37) | class Camera:
  class BaseImage (line 46) | class BaseImage:
  class Point3D (line 57) | class Point3D:
  class Image (line 66) | class Image(BaseImage):
    method qvec2rotmat (line 67) | def qvec2rotmat(self):
  function read_next_bytes (line 89) | def read_next_bytes(fid, num_bytes, format_char_sequence, endian_charact...
  function read_cameras_text (line 101) | def read_cameras_text(path: Union[str, Path]) -> Dict[int, Camera]:
  function read_cameras_binary (line 127) | def read_cameras_binary(path_to_model_file: Union[str, Path]) -> Dict[in...
  function read_images_text (line 160) | def read_images_text(path: Union[str, Path]) -> Dict[int, Image]:
  function read_images_binary (line 197) | def read_images_binary(path_to_model_file: Union[str, Path]) -> Dict[int...
  function read_points3D_text (line 243) | def read_points3D_text(path: Union[str, Path]):
  function read_points3d_binary (line 275) | def read_points3d_binary(path_to_model_file: Union[str, Path]) -> Dict[i...
  function qvec2rotmat (line 313) | def qvec2rotmat(qvec):
  function get_intrinsics_extrinsics (line 335) | def get_intrinsics_extrinsics(img, cameras):

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/iphone_dataset.py
  class iPhoneDataConfig (line 32) | class iPhoneDataConfig:
  class iPhoneDataset (line 51) | class iPhoneDataset(BaseDataset):
    method __init__ (line 52) | def __init__(
    method num_frames (line 348) | def num_frames(self) -> int:
    method __len__ (line 351) | def __len__(self):
    method get_w2cs (line 354) | def get_w2cs(self) -> torch.Tensor:
    method get_Ks (line 357) | def get_Ks(self) -> torch.Tensor:
    method get_image (line 360) | def get_image(self, index: int) -> torch.Tensor:
    method get_depth (line 363) | def get_depth(self, index: int) -> torch.Tensor:
    method get_mask (line 366) | def get_mask(self, index: int) -> torch.Tensor:
    method get_img_wh (line 369) | def get_img_wh(self) -> tuple[int, int]:
    method get_tracks_3d (line 377) | def get_tracks_3d(
    method get_bkgd_points (line 544) | def get_bkgd_points(
    method get_video_dataset (line 602) | def get_video_dataset(self) -> Dataset:
    method __getitem__ (line 605) | def __getitem__(self, index: int):
    method preprocess (line 685) | def preprocess(self, data):
  class iPhoneDatasetKeypointView (line 689) | class iPhoneDatasetKeypointView(Dataset):
    method __init__ (line 692) | def __init__(self, dataset: iPhoneDataset):
    method __len__ (line 718) | def __len__(self):
    method __getitem__ (line 721) | def __getitem__(self, index: int):
  class iPhoneDatasetVideoView (line 732) | class iPhoneDatasetVideoView(Dataset):
    method __init__ (line 735) | def __init__(self, dataset: iPhoneDataset):
    method __len__ (line 741) | def __len__(self):
    method __getitem__ (line 744) | def __getitem__(self, index):

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/panoptic_dataset.py
  class PanopticDataConfig (line 37) | class PanopticDataConfig:
  class PanopticStudioDatasetSoM (line 59) | class PanopticStudioDatasetSoM(BaseDataset):
    method __init__ (line 60) | def __init__(
    method num_frames (line 180) | def num_frames(self) -> int:
    method keyframe_idcs (line 184) | def keyframe_idcs(self) -> torch.Tensor:
    method __len__ (line 188) | def __len__(self):
    method get_w2cs (line 191) | def get_w2cs(self, view_index=0) -> torch.Tensor:
    method get_Ks (line 194) | def get_Ks(self, view_index=0) -> torch.Tensor:
    method get_img_wh (line 197) | def get_img_wh(self) -> tuple[int, int]:
    method get_image (line 200) | def get_image(self, index, view_index=0) -> torch.Tensor:
    method get_mask (line 203) | def get_mask(self, index, view_index=0) -> torch.Tensor:
    method get_depth (line 209) | def get_depth(self, index, view=0) -> torch.Tensor:
    method load_mask (line 213) | def load_mask(self, index, view=0) -> torch.Tensor:
    method load_depth (line 232) | def load_depth(self, index, view=0) -> torch.Tensor:
    method get_foreground_points (line 240) | def get_foreground_points(
    method get_bkgd_points (line 341) | def get_bkgd_points(
    method load_target_tracks (line 422) | def load_target_tracks(
    method get_tracks_3d (line 442) | def get_tracks_3d(
    method train_collate_fn (line 502) | def train_collate_fn(self, batch):
    method get_batches (line 590) | def get_batches(self, batch_size):
    method __getitem_as_batch__ (line 599) | def __getitem_as_batch__(self, batch_size):
    method __getitem_single_view__ (line 615) | def __getitem_single_view__(self, index: int, view: int):
    method __getitem__ (line 670) | def __getitem__(self, index: int):
  function compute_scene_norm (line 728) | def compute_scene_norm(

FILE: mvtracker/models/core/shape-of-motion/flow3d/data/utils.py
  class SceneNormDict (line 12) | class SceneNormDict(TypedDict):
  function to_device (line 17) | def to_device(batch, device):
  function normalize_coords (line 27) | def normalize_coords(coords, h, w):
  function postprocess_occlusions (line 32) | def postprocess_occlusions(occlusions, expected_dist):
  function parse_tapir_track_info (line 53) | def parse_tapir_track_info(occlusions, expected_dist):
  function get_tracks_3d_for_query_frame (line 69) | def get_tracks_3d_for_query_frame(
  function _get_padding (line 171) | def _get_padding(x, k, stride, padding, same: bool):
  function median_filter_2d (line 192) | def median_filter_2d(x, kernel_size=3, stride=1, padding=1, same: bool =...
  function masked_median_blur (line 207) | def masked_median_blur(image, mask, kernel_size=11):
  function _compute_zero_padding (line 253) | def _compute_zero_padding(kernel_size: Tuple[int, int]) -> Tuple[int, int]:
  function get_binary_kernel2d (line 259) | def get_binary_kernel2d(
  function _unpack_2d_ks (line 280) | def _unpack_2d_ks(kernel_size: tuple[int, int] | int) -> tuple[int, int]:
  function ndc_2_cam (line 294) | def ndc_2_cam(ndc_xyz, intrinsic, W, H):
  function depth2point_cam (line 303) | def depth2point_cam(sampled_depth, ref_intrinsic):
  function depth2point_world (line 323) | def depth2point_world(depth_image, intrinsic_matrix, extrinsic_matrix):
  function depth_pcd2normal (line 337) | def depth_pcd2normal(xyz):
  function normal_from_depth_image (line 353) | def normal_from_depth_image(depth, intrinsic_matrix, extrinsic_matrix):

FILE: mvtracker/models/core/shape-of-motion/flow3d/init_utils.py
  function init_fg_from_tracks_3d (line 32) | def init_fg_from_tracks_3d(
  function init_bg (line 61) | def init_bg(
  function init_motion_params_with_procrustes (line 115) | def init_motion_params_with_procrustes(
  function run_initial_optim (line 271) | def run_initial_optim(
  function random_quats (line 403) | def random_quats(N: int) -> torch.Tensor:
  function compute_means (line 419) | def compute_means(ts, fg: GaussianParams, bases: MotionBases):
  function vis_init_params (line 429) | def vis_init_params(
  function vis_se3_init_3d (line 446) | def vis_se3_init_3d(server, init_rots, init_ts, basis_centers):
  function vis_tracks_2d_video (line 473) | def vis_tracks_2d_video(
  function vis_tracks_3d (line 495) | def vis_tracks_3d(
  function sample_initial_bases_centers (line 532) | def sample_initial_bases_centers(
  function interp_masked (line 592) | def interp_masked(vals: cp.ndarray, mask: cp.ndarray, pad: int = 1) -> c...
  function batched_interp_masked (line 639) | def batched_interp_masked(

FILE: mvtracker/models/core/shape-of-motion/flow3d/loss_utils.py
  function masked_mse_loss (line 7) | def masked_mse_loss(pred, gt, mask=None, normalize=True, quantile: float...
  function masked_l1_loss (line 26) | def masked_l1_loss(pred, gt, mask=None, normalize=True, quantile: float ...
  function masked_huber_loss (line 45) | def masked_huber_loss(pred, gt, delta, mask=None, normalize=True):
  function trimmed_mse_loss (line 57) | def trimmed_mse_loss(pred, gt, quantile=0.9):
  function trimmed_l1_loss (line 64) | def trimmed_l1_loss(pred, gt, quantile=0.9):
  function compute_gradient_loss (line 71) | def compute_gradient_loss(pred, gt, mask, quantile=0.98):
  function knn (line 93) | def knn(x: torch.Tensor, k: int) -> tuple[np.ndarray, np.ndarray]:
  function get_weights_for_procrustes (line 102) | def get_weights_for_procrustes(clusters, visibilities=None):
  function compute_z_acc_loss (line 118) | def compute_z_acc_loss(means_ts_nb: torch.Tensor, w2cs: torch.Tensor):
  function compute_se3_smoothness_loss (line 138) | def compute_se3_smoothness_loss(
  function compute_accel_loss (line 154) | def compute_accel_loss(transls):

FILE: mvtracker/models/core/shape-of-motion/flow3d/metrics.py
  function compute_psnr (line 13) | def compute_psnr(
  function compute_pose_errors (line 45) | def compute_pose_errors(
  class mPSNR (line 81) | class mPSNR(PeakSignalNoiseRatio):
    method __init__ (line 85) | def __init__(self, **kwargs) -> None:
    method __len__ (line 96) | def __len__(self) -> int:
    method update (line 99) | def update(
    method compute (line 120) | def compute(self) -> torch.Tensor:
  class mSSIM (line 127) | class mSSIM(StructuralSimilarityIndexMeasure):
    method __init__ (line 130) | def __init__(self, **kwargs) -> None:
    method __len__ (line 139) | def __len__(self) -> int:
    method update (line 142) | def update(
    method compute (line 215) | def compute(self) -> torch.Tensor:
  class mLPIPS (line 220) | class mLPIPS(Metric):
    method __init__ (line 224) | def __init__(
    method __len__ (line 247) | def __len__(self) -> int:
    method update (line 250) | def update(
    method compute (line 274) | def compute(self) -> torch.Tensor:
  class PCK (line 282) | class PCK(Metric):
    method __init__ (line 286) | def __init__(self, **kwargs):
    method __len__ (line 291) | def __len__(self) -> int:
    method update (line 294) | def update(self, preds: torch.Tensor, targets: torch.Tensor, threshold...
    method compute (line 308) | def compute(self) -> torch.Tensor:

FILE: mvtracker/models/core/shape-of-motion/flow3d/params.py
  class GaussianParams (line 10) | class GaussianParams(nn.Module):
    method __init__ (line 11) | def __init__(
    method init_from_state_dict (line 51) | def init_from_state_dict(state_dict, prefix="params."):
    method num_gaussians (line 65) | def num_gaussians(self) -> int:
    method get_colors (line 68) | def get_colors(self) -> torch.Tensor:
    method get_scales (line 71) | def get_scales(self) -> torch.Tensor:
    method get_opacities (line 74) | def get_opacities(self) -> torch.Tensor:
    method get_quats (line 77) | def get_quats(self) -> torch.Tensor:
    method get_coefs (line 80) | def get_coefs(self) -> torch.Tensor:
    method densify_params (line 84) | def densify_params(self, should_split, should_dup):
    method cull_params (line 99) | def cull_params(self, should_cull):
    method reset_opacities (line 110) | def reset_opacities(self, new_val):
  class MotionBases (line 119) | class MotionBases(nn.Module):
    method __init__ (line 120) | def __init__(self, rots, transls):
    method init_from_state_dict (line 133) | def init_from_state_dict(state_dict, prefix="params."):
    method compute_transforms (line 139) | def compute_transforms(self, ts: torch.Tensor, coefs: torch.Tensor) ->...
  function check_gaussian_sizes (line 153) | def check_gaussian_sizes(
  function check_bases_sizes (line 179) | def check_bases_sizes(motion_rots: torch.Tensor, motion_transls: torch.T...

FILE: mvtracker/models/core/shape-of-motion/flow3d/renderer.py
  class Renderer (line 12) | class Renderer:
    method __init__ (line 13) | def __init__(
    method init_from_checkpoint (line 44) | def init_from_checkpoint(
    method render_fn (line 58) | def render_fn(self, camera_state: CameraState, img_wh: tuple[int, int]):

FILE: mvtracker/models/core/shape-of-motion/flow3d/scene_model.py
  class SceneModel (line 11) | class SceneModel(nn.Module):
    method __init__ (line 12) | def __init__(
    method num_gaussians (line 35) | def num_gaussians(self) -> int:
    method num_bg_gaussians (line 39) | def num_bg_gaussians(self) -> int:
    method num_fg_gaussians (line 43) | def num_fg_gaussians(self) -> int:
    method num_motion_bases (line 47) | def num_motion_bases(self) -> int:
    method has_bg (line 51) | def has_bg(self) -> bool:
    method compute_poses_bg (line 54) | def compute_poses_bg(self) -> tuple[torch.Tensor, torch.Tensor]:
    method compute_transforms (line 63) | def compute_transforms(
    method compute_poses_fg (line 72) | def compute_poses_fg(
    method compute_poses_all (line 104) | def compute_poses_all(
    method get_colors_all (line 118) | def get_colors_all(self) -> torch.Tensor:
    method get_scales_all (line 124) | def get_scales_all(self) -> torch.Tensor:
    method get_opacities_all (line 130) | def get_opacities_all(self) -> torch.Tensor:
    method init_from_state_dict (line 142) | def init_from_state_dict(state_dict, prefix=""):
    method render (line 158) | def render(

FILE: mvtracker/models/core/shape-of-motion/flow3d/tensor_dataclass.py
  class TensorDataclass (line 10) | class TensorDataclass:
    method __getitem__ (line 15) | def __getitem__(self, key) -> Self:
    method to (line 18) | def to(self, device: torch.device | str) -> Self:
    method map (line 29) | def map(self, fn: Callable[[torch.Tensor], torch.Tensor]) -> Self:
  class TrackObservations (line 63) | class TrackObservations(TensorDataclass):
    method check_sizes (line 70) | def check_sizes(self) -> bool:
    method filter_valid (line 81) | def filter_valid(self, valid_mask: torch.Tensor) -> Self:
  class StaticObservations (line 86) | class StaticObservations(TensorDataclass):
    method check_sizes (line 91) | def check_sizes(self) -> bool:
    method filter_valid (line 95) | def filter_valid(self, valid_mask: torch.Tensor) -> Self:

FILE: mvtracker/models/core/shape-of-motion/flow3d/trainer.py
  class Trainer (line 27) | class Trainer:
    method __init__ (line 28) | def __init__(
    method set_epoch (line 97) | def set_epoch(self, epoch: int):
    method save_checkpoint (line 100) | def save_checkpoint(self, path: str):
    method init_from_checkpoint (line 115) | def init_from_checkpoint(
    method load_checkpoint_optimizers (line 133) | def load_checkpoint_optimizers(self, opt_ckpt):
    method load_checkpoint_schedulers (line 137) | def load_checkpoint_schedulers(self, sched_ckpt):
    method render_fn (line 142) | def render_fn(self, camera_state: CameraState, img_wh: tuple[int, int]):
    method train_step (line 164) | def train_step(self, batch):
    method compute_losses (line 211) | def compute_losses(self, batch):
    method log_dict (line 555) | def log_dict(self, stats: dict):
    method run_control_steps (line 559) | def run_control_steps(self):
    method _prepare_control_step (line 586) | def _prepare_control_step(self) -> bool:
    method _densify_control_step (line 621) | def _densify_control_step(self, global_step):
    method _cull_control_step (line 698) | def _cull_control_step(self, global_step):
    method _reset_opacity_control_step (line 745) | def _reset_opacity_control_step(self):
    method configure_optimizers (line 757) | def configure_optimizers(self):
  function dup_in_optim (line 786) | def dup_in_optim(optimizer, new_params: list, should_dup: torch.Tensor, ...
  function remove_from_optim (line 808) | def remove_from_optim(optimizer, new_params: list, _should_cull: torch.T...
  function reset_in_optim (line 826) | def reset_in_optim(optimizer, new_params: list):

FILE: mvtracker/models/core/shape-of-motion/flow3d/trajectories.py
  function get_avg_w2c (line 9) | def get_avg_w2c(w2cs: torch.Tensor):
  function get_lookat (line 26) | def get_lookat(origins: torch.Tensor, viewdirs: torch.Tensor) -> torch.T...
  function get_lookat_w2cs (line 50) | def get_lookat_w2cs(positions: torch.Tensor, lookat: torch.Tensor, up: t...
  function get_arc_w2cs (line 70) | def get_arc_w2cs(
  function get_lemniscate_w2cs (line 97) | def get_lemniscate_w2cs(
  function get_spiral_w2cs (line 127) | def get_spiral_w2cs(
  function get_wander_w2cs (line 162) | def get_wander_w2cs(ref_w2c, focal_length, num_frames, **_):

FILE: mvtracker/models/core/shape-of-motion/flow3d/transforms.py
  function rt_to_mat4 (line 8) | def rt_to_mat4(
  function rmat_to_cont_6d (line 33) | def rmat_to_cont_6d(matrix):
  function cont_6d_to_rmat (line 41) | def cont_6d_to_rmat(cont_6d):
  function solve_procrustes (line 56) | def solve_procrustes(

FILE: mvtracker/models/core/shape-of-motion/flow3d/validator.py
  class Validator (line 30) | class Validator:
    method __init__ (line 31) | def __init__(
    method reset_metrics (line 61) | def reset_metrics(self):
    method validate (line 74) | def validate(self):
    method validate_imgs (line 81) | def validate_imgs(self):
    method validate_keypoints (line 151) | def validate_keypoints(self):
    method save_train_videos (line 241) | def save_train_videos(self, epoch: int):

FILE: mvtracker/models/core/shape-of-motion/flow3d/vis/playback_panel.py
  function add_gui_playback_group (line 7) | def add_gui_playback_group(

FILE: mvtracker/models/core/shape-of-motion/flow3d/vis/render_panel.py
  class Keyframe (line 37) | class Keyframe:
    method from_camera (line 48) | def from_camera(time: float, camera: viser.CameraHandle, aspect: float...
  class CameraPath (line 61) | class CameraPath:
    method __init__ (line 62) | def __init__(
    method set_keyframes_visible (line 88) | def set_keyframes_visible(self, visible: bool) -> None:
    method add_camera (line 93) | def add_camera(
    method update_aspect (line 231) | def update_aspect(self, aspect: float) -> None:
    method get_aspect (line 236) | def get_aspect(self) -> float:
    method reset (line 241) | def reset(self) -> None:
    method spline_t_from_t_sec (line 249) | def spline_t_from_t_sec(self, time: np.ndarray) -> np.ndarray:
    method interpolate_pose_and_fov_rad (line 283) | def interpolate_pose_and_fov_rad(
    method update_spline (line 328) | def update_spline(self) -> None:
    method compute_duration (line 472) | def compute_duration(self) -> float:
    method compute_transition_times_cumsum (line 487) | def compute_transition_times_cumsum(self) -> np.ndarray:
  class RenderTabState (line 517) | class RenderTabState:
  function populate_render_tab (line 526) | def populate_render_tab(

FILE: mvtracker/models/core/shape-of-motion/flow3d/vis/utils.py
  class Singleton (line 14) | class Singleton(type):
    method __call__ (line 17) | def __call__(cls, *args, **kwargs):
  class VisManager (line 23) | class VisManager(metaclass=Singleton):
  function get_server (line 27) | def get_server(port: int | None = None) -> ViserServer:
  function project_2d_tracks (line 37) | def project_2d_tracks(tracks_3d_w, Ks, T_cw, return_depth=False):
  function draw_keypoints_video (line 56) | def draw_keypoints_video(
  function draw_keypoints_cv2 (line 77) | def draw_keypoints_cv2(img, kps, colors=None, occs=None, radius=3):
  function draw_tracks_2d (line 96) | def draw_tracks_2d(
  function generate_line_verts_faces (line 150) | def generate_line_verts_faces(starts, ends, line_width):
  function generate_point_verts_faces (line 188) | def generate_point_verts_faces(points, point_size, num_segments=10):
  function pixel_to_verts_clip (line 227) | def pixel_to_verts_clip(pixels, img_wh, z: float | torch.Tensor = 0.0, w...
  function draw_tracks_2d_th (line 234) | def draw_tracks_2d_th(
  function make_video_divisble (line 358) | def make_video_divisble(
  function apply_float_colormap (line 367) | def apply_float_colormap(img: torch.Tensor, colormap: str = "turbo") -> ...
  function apply_depth_colormap (line 391) | def apply_depth_colormap(
  function float2uint8 (line 418) | def float2uint8(x):
  function uint82float (line 422) | def uint82float(img):
  function drawMatches (line 426) | def drawMatches(
  function plot_correspondences (line 512) | def plot_correspondences(

FILE: mvtracker/models/core/shape-of-motion/flow3d/vis/viewer.py
  class DynamicViewer (line 13) | class DynamicViewer(Viewer):
    method __init__ (line 14) | def __init__(
    method _define_guis (line 32) | def _define_guis(self):

FILE: mvtracker/models/core/shape-of-motion/launch_davis.py
  function main (line 7) | def main(

FILE: mvtracker/models/core/spatracker/blocks.py
  function _ntuple (line 17) | def _ntuple(n):
  function exists (line 26) | def exists(val):
  function default (line 30) | def default(val, d):
  class Mlp (line 37) | class Mlp(nn.Module):
    method __init__ (line 40) | def __init__(
    method forward (line 61) | def forward(self, x):
  class ResidualBlock (line 70) | class ResidualBlock(nn.Module):
    method __init__ (line 71) | def __init__(self, in_planes, planes, norm_fn="group", stride=1):
    method forward (line 119) | def forward(self, x):
  class BasicEncoder (line 130) | class BasicEncoder(nn.Module):
    method __init__ (line 131) | def __init__(
    method _make_layer (line 206) | def _make_layer(self, dim, stride=1):
    method forward (line 214) | def forward(self, x, feat_PE=None):
  class DeeperBasicEncoder (line 287) | class DeeperBasicEncoder(nn.Module):
    method __init__ (line 288) | def __init__(
    method _make_layer (line 352) | def _make_layer(self, dim, stride=1):
    method forward (line 360) | def forward(self, x, feat_PE=None):
  class CorrBlock (line 423) | class CorrBlock:
    method __init__ (line 424) | def __init__(self, fmaps, num_levels=4, radius=4, depths_dnG=None):
    method sample (line 450) | def sample(self, coords):
    method corr (line 476) | def corr(self, targets):
    method corr_sample (line 492) | def corr_sample(self, targets, coords, coords_dp=None):
  class Attention (line 536) | class Attention(nn.Module):
    method __init__ (line 537) | def __init__(self, query_dim, num_heads=8, dim_head=48, qkv_bias=False...
    method forward (line 547) | def forward(self, x, attn_bias=None):
  class AttnBlock (line 572) | class AttnBlock(nn.Module):
    method __init__ (line 577) | def __init__(self, hidden_size, num_heads, mlp_ratio=4.0,
    method forward (line 598) | def forward(self, x):
  function bilinear_sampler (line 604) | def bilinear_sampler(img, coords, mode="bilinear", mask=False):
  class EUpdateFormer (line 622) | class EUpdateFormer(nn.Module):
    method __init__ (line 627) | def __init__(
    method initialize_weights (line 693) | def initialize_weights(self):
    method forward (line 702) | def forward(self, input_tensor, se3_feature):
  function pix2cam (line 734) | def pix2cam(coords,
  function cam2pix (line 756) | def cam2pix(coords,

FILE: mvtracker/models/core/spatracker/softsplat.py
  function cuda_int32 (line 19) | def cuda_int32(intIn: int):
  function cuda_float32 (line 23) | def cuda_float32(fltIn: float):
  function cuda_kernel (line 27) | def cuda_kernel(strFunction: str, strKernel: str, objVariables: typing.D...
  function cuda_launch (line 206) | def cuda_launch(strKey: str):
  function softsplat (line 218) | def softsplat(
  class softsplat_func (line 278) | class softsplat_func(torch.autograd.Function):
    method forward (line 281) | def forward(self, tenIn, tenFlow, H=None, W=None):
    method backward (line 360) | def backward(self, tenOutgrad):
  function cuda_int64 (line 535) | def cuda_int64(intIn: int):
  function cuda_kernel_longlong (line 539) | def cuda_kernel_longlong(strFunction: str, strKernel: str, objVariables:...
  class softsplat_pointcloud_func (line 729) | class softsplat_pointcloud_func(torch.autograd.Function):
    method forward (line 732) | def forward(self, tenIn, tenFlow, H=None, W=None):
    method backward (line 811) | def backward(self, tenOutgrad):

FILE: mvtracker/models/core/spatracker/spatracker_monocular.py
  function sample_pos_embed (line 36) | def sample_pos_embed(grid_size, embed_dim, coords):
  class SpaTracker (line 59) | class SpaTracker(nn.Module):
    method __init__ (line 60) | def __init__(
    method prepare_track (line 147) | def prepare_track(self, rgbds, queries):
    method sample_trifeat (line 221) | def sample_trifeat(self, t,
    method neural_arap (line 260) | def neural_arap(self, coords, Traj_arap, intrs_S, T_mark):
    method gradient_arap (line 300) | def gradient_arap(self, coords, aff_avg=None, aff_std=None, aff_f_sg=N...
    method forward_iteration (line 317) | def forward_iteration(
    method forward (line 478) | def forward(self, rgbds, queries, iters=4, feat_init=None, is_train=Fa...
  class SpaTrackerMultiViewAdapter (line 672) | class SpaTrackerMultiViewAdapter(nn.Module):
    method __init__ (line 673) | def __init__(self, **kwargs):
    method forward (line 677) | def forward(

FILE: mvtracker/models/core/spatracker/spatracker_multiview.py
  class MultiViewSpaTracker (line 21) | class MultiViewSpaTracker(nn.Module):
    method __init__ (line 31) | def __init__(
    method sample_trifeat (line 151) | def sample_trifeat(self, t, coords, featMapxy, featMapyz, featMapxz):
    method forward_iteration (line 186) | def forward_iteration(
    method forward (line 388) | def forward(
    method _plot_pointcloud (line 861) | def _plot_pointcloud(logs_path, filename, xyz, c, q_xyz=None, q_c=None,
    method _plot_featuremaps (line 883) | def _plot_featuremaps(

FILE: mvtracker/models/core/vggt/heads/camera_head.py
  class CameraHead (line 19) | class CameraHead(nn.Module):
    method __init__ (line 26) | def __init__(
    method forward (line 83) | def forward(self, aggregated_tokens_list: list, num_iterations: int = ...
    method trunk_fn (line 105) | def trunk_fn(self, pose_tokens: torch.Tensor, num_iterations: int) -> ...
  function modulate (line 157) | def modulate(x: torch.Tensor, shift: torch.Tensor, scale: torch.Tensor) ...

FILE: mvtracker/models/core/vggt/heads/dpt_head.py
  class DPTHead (line 21) | class DPTHead(nn.Module):
    method __init__ (line 43) | def __init__(
    method forward (line 128) | def forward(
    method _forward_impl (line 191) | def _forward_impl(
    method _apply_pos_embed (line 273) | def _apply_pos_embed(self, x: torch.Tensor, W: int, H: int, ratio: flo...
    method scratch_forward (line 285) | def scratch_forward(self, features: List[torch.Tensor]) -> torch.Tensor:
  function _make_fusion_block (line 323) | def _make_fusion_block(features: int, size: int = None, has_residual: bo...
  function _make_scratch (line 337) | def _make_scratch(in_shape: List[int], out_shape: int, groups: int = 1, ...
  class ResidualConvUnit (line 368) | class ResidualConvUnit(nn.Module):
    method __init__ (line 371) | def __init__(self, features, activation, bn, groups=1):
    method forward (line 390) | def forward(self, x):
  class FeatureFusionBlock (line 413) | class FeatureFusionBlock(nn.Module):
    method __init__ (line 416) | def __init__(
    method forward (line 456) | def forward(self, *xs, size=None):
  function custom_interpolate (line 483) | def custom_interpolate(

FILE: mvtracker/models/core/vggt/heads/head_act.py
  function activate_pose (line 12) | def activate_pose(pred_pose_enc, trans_act="linear", quat_act="linear", ...
  function base_pose_act (line 38) | def base_pose_act(pose_enc, act_type="linear"):
  function activate_head (line 61) | def activate_head(out, activation="norm_exp", conf_activation="expp1"):
  function inverse_log_transform (line 115) | def inverse_log_transform(y):

FILE: mvtracker/models/core/vggt/heads/track_head.py
  class TrackHead (line 12) | class TrackHead(nn.Module):
    method __init__ (line 18) | def __init__(
    method forward (line 72) | def forward(self, aggregated_tokens_list, images, patch_start_idx, que...

FILE: mvtracker/models/core/vggt/heads/track_modules/base_track_predictor.py
  class BaseTrackerPredictor (line 17) | class BaseTrackerPredictor(nn.Module):
    method __init__ (line 18) | def __init__(
    method forward (line 82) | def forward(self, query_points, fmaps=None, iters=6, return_feat=False...

FILE: mvtracker/models/core/vggt/heads/track_modules/blocks.py
  class EfficientUpdateFormer (line 19) | class EfficientUpdateFormer(nn.Module):
    method __init__ (line 24) | def __init__(
    method initialize_weights (line 90) | def initialize_weights(self):
    method forward (line 100) | def forward(self, input_tensor, mask=None):
  class CorrBlock (line 147) | class CorrBlock:
    method __init__ (line 148) | def __init__(self, fmaps, num_levels=4, radius=4, multiple_track_feats...
    method corr_sample (line 186) | def corr_sample(self, targets, coords):
  function compute_corr_level (line 241) | def compute_corr_level(fmap1, fmap2s, C):

FILE: mvtracker/models/core/vggt/heads/track_modules/modules.py
  function _ntuple (line 19) | def _ntuple(n):
  function exists (line 28) | def exists(val):
  function default (line 32) | def default(val, d):
  class ResidualBlock (line 39) | class ResidualBlock(nn.Module):
    method __init__ (line 44) | def __init__(self, in_planes, planes, norm_fn="group", stride=1, kerne...
    method forward (line 100) | def forward(self, x):
  class Mlp (line 111) | class Mlp(nn.Module):
    method __init__ (line 114) | def __init__(
    method forward (line 138) | def forward(self, x):
  class AttnBlock (line 147) | class AttnBlock(nn.Module):
    method __init__ (line 148) | def __init__(
    method forward (line 170) | def forward(self, x, mask=None):
  class CrossAttnBlock (line 187) | class CrossAttnBlock(nn.Module):
    method __init__ (line 188) | def __init__(self, hidden_size, context_dim, num_heads=1, mlp_ratio=4....
    method forward (line 206) | def forward(self, x, context, mask=None):

FILE: mvtracker/models/core/vggt/heads/track_modules/utils.py
  function get_2d_sincos_pos_embed (line 18) | def get_2d_sincos_pos_embed(embed_dim: int, grid_size: Union[int, Tuple[...
  function get_2d_sincos_pos_embed_from_grid (line 46) | def get_2d_sincos_pos_embed_from_grid(embed_dim: int, grid: torch.Tensor...
  function get_1d_sincos_pos_embed_from_grid (line 67) | def get_1d_sincos_pos_embed_from_grid(embed_dim: int, pos: torch.Tensor)...
  function get_2d_embedding (line 93) | def get_2d_embedding(xy: torch.Tensor, C: int, cat_coords: bool = True) ...
  function bilinear_sampler (line 127) | def bilinear_sampler(input, coords, align_corners=True, padding_mode="bo...
  function sample_features4d (line 196) | def sample_features4d(input, coords):

FILE: mvtracker/models/core/vggt/heads/utils.py
  function position_grid_to_embed (line 11) | def position_grid_to_embed(pos_grid: torch.Tensor, embed_dim: int, omega...
  function make_sincos_pos_embed (line 36) | def make_sincos_pos_embed(embed_dim: int, pos: torch.Tensor, omega_0: fl...
  function create_uv_grid (line 65) | def create_uv_grid(

FILE: mvtracker/models/core/vggt/layers/attention.py
  class Attention (line 21) | class Attention(nn.Module):
    method __init__ (line 22) | def __init__(
    method forward (line 50) | def forward(self, x: Tensor, pos=None) -> Tensor:
  class MemEffAttention (line 80) | class MemEffAttention(Attention):
    method forward (line 81) | def forward(self, x: Tensor, attn_bias=None, pos=None) -> Tensor:

FILE: mvtracker/models/core/vggt/layers/block.py
  class Block (line 27) | class Block(nn.Module):
    method __init__ (line 28) | def __init__(
    method forward (line 81) | def forward(self, x: Tensor, pos=None) -> Tensor:
  function drop_add_residual_stochastic_depth (line 110) | def drop_add_residual_stochastic_depth(
  function get_branges_scales (line 140) | def get_branges_scales(x, sample_drop_ratio=0.0):
  function add_residual (line 148) | def add_residual(x, brange, residual, residual_scale_factor, scaling_vec...
  function get_attn_bias_and_cat (line 163) | def get_attn_bias_and_cat(x_list, branges=None):
  function drop_add_residual_stochastic_depth_list (line 187) | def drop_add_residual_stochastic_depth_list(
  class NestedTensorBlock (line 210) | class NestedTensorBlock(Block):
    method forward_nested (line 211) | def forward_nested(self, x_list: List[Tensor]) -> List[Tensor]:
    method forward (line 251) | def forward(self, x_or_x_list):

FILE: mvtracker/models/core/vggt/layers/drop_path.py
  function drop_path (line 14) | def drop_path(x, drop_prob: float = 0.0, training: bool = False):
  class DropPath (line 26) | class DropPath(nn.Module):
    method __init__ (line 29) | def __init__(self, drop_prob=None):
    method forward (line 33) | def forward(self, x):

FILE: mvtracker/models/core/vggt/layers/layer_scale.py
  class LayerScale (line 15) | class LayerScale(nn.Module):
    method __init__ (line 16) | def __init__(
    method forward (line 26) | def forward(self, x: Tensor) -> Tensor:

FILE: mvtracker/models/core/vggt/layers/mlp.py
  class Mlp (line 16) | class Mlp(nn.Module):
    method __init__ (line 17) | def __init__(
    method forward (line 34) | def forward(self, x: Tensor) -> Tensor:

FILE: mvtracker/models/core/vggt/layers/patch_embed.py
  function make_2tuple (line 16) | def make_2tuple(x):
  class PatchEmbed (line 25) | class PatchEmbed(nn.Module):
    method __init__ (line 37) | def __init__(
    method forward (line 68) | def forward(self, x: Tensor) -> Tensor:
    method flops (line 83) | def flops(self) -> float:

FILE: mvtracker/models/core/vggt/layers/rope.py
  class PositionGetter (line 24) | class PositionGetter:
    method __init__ (line 35) | def __init__(self):
    method __call__ (line 39) | def __call__(self, batch_size: int, height: int, width: int, device: t...
  class RotaryPositionEmbedding2D (line 62) | class RotaryPositionEmbedding2D(nn.Module):
    method __init__ (line 79) | def __init__(self, frequency: float = 100.0, scaling_factor: float = 1...
    method _compute_frequency_components (line 86) | def _compute_frequency_components(
    method _rotate_features (line 120) | def _rotate_features(x: torch.Tensor) -> torch.Tensor:
    method _apply_1d_rope (line 133) | def _apply_1d_rope(
    method forward (line 154) | def forward(self, tokens: torch.Tensor, positions: torch.Tensor) -> to...

FILE: mvtracker/models/core/vggt/layers/swiglu_ffn.py
  class SwiGLUFFN (line 14) | class SwiGLUFFN(nn.Module):
    method __init__ (line 15) | def __init__(
    method forward (line 30) | def forward(self, x: Tensor) -> Tensor:
  class SwiGLUFFNFused (line 54) | class SwiGLUFFNFused(SwiGLU):
    method __init__ (line 55) | def __init__(

FILE: mvtracker/models/core/vggt/layers/vision_transformer.py
  function named_apply (line 24) | def named_apply(fn: Callable, module: nn.Module, name="", depth_first=Tr...
  class BlockChunk (line 35) | class BlockChunk(nn.ModuleList):
    method forward (line 36) | def forward(self, x):
  class DinoVisionTransformer (line 42) | class DinoVisionTransformer(nn.Module):
    method __init__ (line 43) | def __init__(
    method init_weights (line 176) | def init_weights(self):
    method interpolate_pos_encoding (line 183) | def interpolate_pos_encoding(self, x, w, h):
    method prepare_tokens_with_masks (line 217) | def prepare_tokens_with_masks(self, x, masks=None):
    method forward_features_list (line 238) | def forward_features_list(self, x_list, masks_list):
    method forward_features (line 262) | def forward_features(self, x, masks=None):
    method _get_intermediate_layers_not_chunked (line 283) | def _get_intermediate_layers_not_chunked(self, x, n=1):
    method _get_intermediate_layers_chunked (line 295) | def _get_intermediate_layers_chunked(self, x, n=1):
    method get_intermediate_layers (line 309) | def get_intermediate_layers(
    method forward (line 335) | def forward(self, *args, is_training=True, **kwargs):
  function init_weights_vit_timm (line 343) | def init_weights_vit_timm(module: nn.Module, name: str = ""):
  function vit_small (line 351) | def vit_small(patch_size=16, num_register_tokens=0, **kwargs):
  function vit_base (line 365) | def vit_base(patch_size=16, num_register_tokens=0, **kwargs):
  function vit_large (line 379) | def vit_large(patch_size=16, num_register_tokens=0, **kwargs):
  function vit_giant2 (line 393) | def vit_giant2(patch_size=16, num_register_tokens=0, **kwargs):

FILE: mvtracker/models/core/vggt/models/aggregator.py
  class Aggregator (line 24) | class Aggregator(nn.Module):
    method __init__ (line 50) | def __init__(
    method __build_patch_embed__ (line 146) | def __build_patch_embed__(
    method forward (line 187) | def forward(
    method _process_frame_attention (line 266) | def _process_frame_attention(self, tokens, B, S, P, C, frame_idx, pos=...
    method _process_global_attention (line 287) | def _process_global_attention(self, tokens, B, S, P, C, global_idx, po...
  function slice_expand_and_flatten (line 308) | def slice_expand_and_flatten(token_tensor, B, S):

FILE: mvtracker/models/core/vggt/models/vggt.py
  class VGGT (line 17) | class VGGT(nn.Module, PyTorchModelHubMixin):
    method __init__ (line 18) | def __init__(self, img_size=518, patch_size=14, embed_dim=1024):
    method forward (line 27) | def forward(

FILE: mvtracker/models/core/vggt/utils/geometry.py
  function unproject_depth_map_to_point_map (line 12) | def unproject_depth_map_to_point_map(
  function depth_to_world_coords_points (line 44) | def depth_to_world_coords_points(
  function depth_to_cam_coords_points (line 84) | def depth_to_cam_coords_points(depth_map: np.ndarray, intrinsic: np.ndar...
  function closed_form_inverse_se3 (line 117) | def closed_form_inverse_se3(se3, R=None, T=None):

FILE: mvtracker/models/core/vggt/utils/load_fn.py
  function load_and_preprocess_images (line 12) | def load_and_preprocess_images(image_path_list, mode="crop"):

FILE: mvtracker/models/core/vggt/utils/pose_enc.py
  function extri_intri_to_pose_encoding (line 11) | def extri_intri_to_pose_encoding(
  function pose_encoding_to_extri_intri (line 65) | def pose_encoding_to_extri_intri(

FILE: mvtracker/models/core/vggt/utils/rotation.py
  function quat_to_mat (line 14) | def quat_to_mat(quaternions: torch.Tensor) -> torch.Tensor:
  function mat_to_quat (line 47) | def mat_to_quat(matrix: torch.Tensor) -> torch.Tensor:
  function _sqrt_positive_part (line 112) | def _sqrt_positive_part(x: torch.Tensor) -> torch.Tensor:
  function standardize_quaternion (line 126) | def standardize_quaternion(quaternions: torch.Tensor) -> torch.Tensor:

FILE: mvtracker/models/core/vggt/utils/visual_track.py
  function color_from_xy (line 13) | def color_from_xy(x, y, W, H, cmap_name="hsv"):
  function get_track_colors_by_position (line 37) | def get_track_colors_by_position(tracks_b, vis_mask_b=None, image_width=...
  function visualize_tracks_on_images (line 80) | def visualize_tracks_on_images(

FILE: mvtracker/models/core/vit/common.py
  class MLPBlock (line 13) | class MLPBlock(nn.Module):
    method __init__ (line 14) | def __init__(
    method forward (line 25) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LayerNorm2d (line 31) | class LayerNorm2d(nn.Module):
    method __init__ (line 32) | def __init__(self, num_channels: int, eps: float = 1e-6) -> None:
    method forward (line 38) | def forward(self, x: torch.Tensor) -> torch.Tensor:

FILE: mvtracker/models/core/vit/encoder.py
  class ImageEncoderViT (line 19) | class ImageEncoderViT(nn.Module):
    method __init__ (line 20) | def __init__(
    method forward (line 108) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class Block (line 122) | class Block(nn.Module):
    method __init__ (line 125) | def __init__(
    method forward (line 169) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class Attention (line 188) | class Attention(nn.Module):
    method __init__ (line 191) | def __init__(
    method forward (line 227) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  function window_partition (line 246) | def window_partition(x: torch.Tensor, window_size: int) -> Tuple[torch.T...
  function window_unpartition (line 270) | def window_unpartition(
  function get_rel_pos (line 295) | def get_rel_pos(q_size: int, k_size: int, rel_pos: torch.Tensor) -> torc...
  function add_decomposed_rel_pos (line 328) | def add_decomposed_rel_pos(
  class PatchEmbed (line 367) | class PatchEmbed(nn.Module):
    method __init__ (line 372) | def __init__(
    method forward (line 394) | def forward(self, x: torch.Tensor) -> torch.Tensor:

FILE: mvtracker/models/evaluation_predictor_3dpt.py
  class EvaluationPredictor (line 17) | class EvaluationPredictor(torch.nn.Module):
    method __init__ (line 18) | def __init__(
    method forward (line 47) | def forward(
  function get_uniformly_sampled_pts (line 417) | def get_uniformly_sampled_pts(
  function get_superpoint_sampled_pts (line 431) | def get_superpoint_sampled_pts(
  function get_sift_sampled_pts (line 450) | def get_sift_sampled_pts(

FILE: mvtracker/utils/basic.py
  function sub2ind (line 11) | def sub2ind(height, width, y, x):
  function ind2sub (line 15) | def ind2sub(height, width, ind):
  function get_lr_str (line 21) | def get_lr_str(lr):
  function strnum (line 27) | def strnum(x):
  function assert_same_shape (line 36) | def assert_same_shape(t1, t2):
  function print_stats (line 41) | def print_stats(name, tensor):
  function print_stats_py (line 48) | def print_stats_py(name, tensor):
  function print_ (line 54) | def print_(name, tensor):
  function mkdir (line 59) | def mkdir(path):
  function normalize_single (line 64) | def normalize_single(d):
  function normalize (line 72) | def normalize(d):
  function hard_argmax2d (line 83) | def hard_argmax2d(tensor):
  function argmax2d (line 101) | def argmax2d(heat, hard=True):
  function reduce_masked_mean (line 126) | def reduce_masked_mean(x, mask, dim=None, keepdim=False):
  function reduce_masked_median (line 146) | def reduce_masked_median(x, mask, keep_batch=False):
  function pack_seqdim (line 182) | def pack_seqdim(tensor, B):
  function unpack_seqdim (line 191) | def unpack_seqdim(tensor, B):
  function meshgrid2d (line 201) | def meshgrid2d(B, Y, X, stack=False, norm=False, device='cuda', on_chans...
  function meshgrid3d (line 228) | def meshgrid3d(B, Z, Y, X, stack=False, norm=False, device='cuda'):
  function normalize_grid2d (line 261) | def normalize_grid2d(grid_y, grid_x, Y, X, clamp_extreme=True):
  function normalize_grid3d (line 273) | def normalize_grid3d(grid_z, grid_y, grid_x, Z, Y, X, clamp_extreme=True):
  function gridcloud2d (line 287) | def gridcloud2d(B, Y, X, norm=False, device='cuda'):
  function gridcloud3d (line 298) | def gridcloud3d(B, Z, Y, X, norm=False, device='cuda'):
  function readPFM (line 313) | def readPFM(file):
  function normalize_boxlist2d (line 351) | def normalize_boxlist2d(boxlist2d, H, W):
  function unnormalize_boxlist2d (line 362) | def unnormalize_boxlist2d(boxlist2d, H, W):
  function unnormalize_box2d (line 373) | def unnormalize_box2d(box2d, H, W):
  function normalize_box2d (line 377) | def normalize_box2d(box2d, H, W):
  function get_gaussian_kernel_2d (line 381) | def get_gaussian_kernel_2d(channels, kernel_size=3, sigma=2.0, mid_one=F...
  function gaussian_blur_2d (line 403) | def gaussian_blur_2d(input, kernel_size=3, sigma=2.0, reflect_pad=False,...
  function gradient2d (line 415) | def gradient2d(x, absolute=False, square=False, return_sum=False):
  function to_homogeneous (line 437) | def to_homogeneous(x):
  function from_homogeneous (line 441) | def from_homogeneous(x, assert_homogeneous_part_is_equal_to_1=False, eps...
  function time_now (line 447) | def time_now():

FILE: mvtracker/utils/eval_utils.py
  function medianTrajError (line 12) | def medianTrajError(output, target):
  function averageTrajError (line 20) | def averageTrajError(output, target):
  function pointTrack (line 27) | def pointTrack(queryPoint, anchorPos, anchorRot):
  function qToRot (line 37) | def qToRot(q):
  function get3DCov (line 52) | def get3DCov(scale, rotation, scale_mod = 1):
  function getAll3DCov (line 66) | def getAll3DCov(scales, rotations, scale_mod = 1):
  function getContributions (line 74) | def getContributions(mean3Ds, cov3Ds, query):

FILE: mvtracker/utils/geom.py
  function matmul2 (line 6) | def matmul2(mat1, mat2):
  function matmul3 (line 10) | def matmul3(mat1, mat2, mat3):
  function eye_3x3 (line 14) | def eye_3x3(B, device='cuda'):
  function eye_4x4 (line 19) | def eye_4x4(B, device='cuda'):
  function safe_inverse (line 24) | def safe_inverse(a):  # parallel version
  function safe_inverse_single (line 35) | def safe_inverse_single(a):
  function split_intrinsics (line 46) | def split_intrinsics(K):
  function apply_pix_T_cam (line 55) | def apply_pix_T_cam(pix_T_cam, xyz):
  function apply_pix_T_cam_py (line 79) | def apply_pix_T_cam_py(pix_T_cam, xyz):
  function get_camM_T_camXs (line 103) | def get_camM_T_camXs(origin_T_camXs, ind=0):
  function apply_4x4 (line 113) | def apply_4x4(RT, xyz):
  function apply_4x4_py (line 125) | def apply_4x4_py(RT, xyz):
  function apply_3x3 (line 142) | def apply_3x3(RT, xy):
  function generate_polygon (line 154) | def generate_polygon(ctr_x, ctr_y, avg_r, irregularity, spikiness, num_v...
  function get_random_affine_2d (line 199) | def get_random_affine_2d(B, rot_min=-5.0, rot_max=5.0, tx_min=-0.1, tx_m...
  function get_centroid_from_box2d (line 291) | def get_centroid_from_box2d(box2d):
  function normalize_boxlist2d (line 301) | def normalize_boxlist2d(boxlist2d, H, W):
  function unnormalize_boxlist2d (line 312) | def unnormalize_boxlist2d(boxlist2d, H, W):
  function unnormalize_box2d (line 323) | def unnormalize_box2d(box2d, H, W):
  function normalize_box2d (line 327) | def normalize_box2d(box2d, H, W):
  function get_size_from_box2d (line 331) | def get_size_from_box2d(box2d):
  function crop_and_resize (line 341) | def crop_and_resize(im, boxlist, PH, PW, boxlist_is_normalized=False):
  function get_boxlist_from_centroid_and_size (line 403) | def get_boxlist_from_centroid_and_size(cy, cx, h, w):  # , clip=False):
  function get_box2d_from_mask (line 420) | def get_box2d_from_mask(mask, normalize=False):
  function convert_box2d_to_intrinsics (line 444) | def convert_box2d_to_intrinsics(box2d, pix_T_cam, H, W, use_image_aspect...
  function pixels2camera (line 512) | def pixels2camera(x, y, z, fx, fy, x0, y0):
  function camera2pixels (line 538) | def camera2pixels(xyz, pix_T_cam):
  function depth2pointcloud (line 562) | def depth2pointcloud(z, pix_T_cam):

FILE: mvtracker/utils/improc.py
  function _convert (line 16) | def _convert(input_, type_):
  function _generic_transform_sk_3d (line 23) | def _generic_transform_sk_3d(transform, in_type='', out_type=''):
  function preprocess_color_tf (line 47) | def preprocess_color_tf(x):
  function preprocess_color (line 52) | def preprocess_color(x):
  function pca_embed (line 59) | def pca_embed(emb, keep, valid=None):
  function pca_embed_together (line 105) | def pca_embed_together(emb, keep):
  function reduce_emb (line 129) | def reduce_emb(emb, valid=None, inbound=None, together=False):
  function get_feat_pca (line 150) | def get_feat_pca(feat, valid=None):
  function gif_and_tile (line 159) | def gif_and_tile(ims, just_gif=False):
  function back2color (line 174) | def back2color(i, blacken_zeros=False):
  function convert_occ_to_height (line 183) | def convert_occ_to_height(occ, reduce_axis=3):
  function xy2heatmap (line 207) | def xy2heatmap(xy, sigma, grid_xs, grid_ys, norm=False):
  function xy2heatmaps (line 244) | def xy2heatmaps(xy, Y, X, sigma=30.0, norm=True):
  function draw_circles_at_xy (line 260) | def draw_circles_at_xy(xy, Y, X, sigma=12.5, round=False):
  function seq2color (line 270) | def seq2color(im, norm=True, colormap='coolwarm'):
  function colorize (line 331) | def colorize(d):
  function oned2inferno (line 372) | def oned2inferno(d, norm=True, do_colorize=False):
  function oned2gray (line 400) | def oned2gray(d, norm=True):
  function draw_frame_id_on_vis (line 418) | def draw_frame_id_on_vis(vis, frame_id, scale=0.5, left=5, top=20):
  class ColorMap2d (line 449) | class ColorMap2d:
    method __init__ (line 450) | def __init__(self, filename=None):
    method __call__ (line 457) | def __call__(self, X):
  function get_n_colors (line 470) | def get_n_colors(N, sequential=False):
  class Summ_writer (line 484) | class Summ_writer(object):
    method __init__ (line 485) | def __init__(self, writer, global_step, log_freq=10, fps=8, scalar_fre...
    method summ_gif (line 495) | def summ_gif(self, name, tensor, blacken_zeros=False):
    method draw_boxlist2d_on_image (line 515) | def draw_boxlist2d_on_image(self, rgb, boxlist, scores=None, tids=None...
    method draw_boxlist2d_on_image_py (line 540) | def draw_boxlist2d_on_image_py(self, rgb, boxlist, scores, tids, linew...
    method summ_boxlist2d (line 609) | def summ_boxlist2d(self, name, rgb, boxlist, scores=None, tids=None, f...
    method summ_rgbs (line 614) | def summ_rgbs(self, name, ims, frame_ids=None, blacken_zeros=False, on...
    method summ_rgb (line 640) | def summ_rgb(self, name, ims, blacken_zeros=False, frame_id=None, only...
    method flow2color (line 665) | def flow2color(self, flow, clip=50.0):
    method summ_flow (line 704) | def summ_flow(self, name, im, clip=0.0, only_return=False, frame_id=No...
    method summ_oneds (line 711) | def summ_oneds(self, name, ims, frame_ids=None, bev=False, fro=False, ...
    method summ_oned (line 767) | def summ_oned(self, name, im, bev=False, fro=False, logvis=False, max_...
    method summ_feats (line 804) | def summ_feats(self, name, feats, valids=None, pca=True, fro=False, on...
    method summ_feat (line 852) | def summ_feat(self, name, feat, valid=None, pca=True, only_return=Fals...
    method summ_scalar (line 880) | def summ_scalar(self, name, value):
    method summ_seg (line 890) | def summ_seg(self, name, seg, only_return=False, frame_id=None, colorm...
    method summ_pts_on_rgb (line 941) | def summ_pts_on_rgb(self, name, trajs, rgb, valids=None, frame_id=None...
    method summ_pts_on_rgbs (line 984) | def summ_pts_on_rgbs(self, name, trajs, rgbs, valids=None, frame_ids=N...
    method summ_traj2ds_on_rgbs (line 1028) | def summ_traj2ds_on_rgbs(self, name, trajs, rgbs, valids=None, frame_i...
    method summ_traj2ds_on_rgbs2 (line 1113) | def summ_traj2ds_on_rgbs2(self, name, trajs, visibles, rgbs, valids=No...
    method summ_traj2ds_on_rgb (line 1175) | def summ_traj2ds_on_rgb(self, name, trajs, rgb, valids=None, show_dots...
    method draw_traj_on_image_py (line 1219) | def draw_traj_on_image_py(self, rgb, traj, S=50, linewidth=1, show_dot...
    method draw_traj_on_images_py (line 1271) | def draw_traj_on_images_py(self, rgbs, traj, S=50, linewidth=1, show_d...
    method draw_circs_on_image_py (line 1298) | def draw_circs_on_image_py(self, rgb, xy, colors=None, linewidth=10, r...
    method draw_circ_on_images_py (line 1336) | def draw_circ_on_images_py(self, rgbs, traj, vis, S=50, linewidth=1, s...
    method summ_traj_as_crops (line 1376) | def summ_traj_as_crops(self, name, trajs_e, rgbs, frame_id=None, only_...
    method summ_occ (line 1445) | def summ_occ(self, name, occ, reduce_axes=[3], bev=False, fro=False, p...
  function erode2d (line 1464) | def erode2d(im, times=1, device='cuda'):
  function dilate2d (line 1471) | def dilate2d(im, times=1, device='cuda', mode='square'):

FILE: mvtracker/utils/misc.py
  function count_parameters (line 6) | def count_parameters(model):
  function posemb_sincos_2d_xy (line 21) | def posemb_sincos_2d_xy(xy, C, temperature=10000, dtype=torch.float32, c...
  class SimplePool (line 41) | class SimplePool():
    method __init__ (line 42) | def __init__(self, pool_size, version='pt'):
    method __len__ (line 51) | def __len__(self):
    method mean (line 54) | def mean(self, min_size=1):
    method sample (line 71) | def sample(self, with_replacement=True):
    method fetch (line 78) | def fetch(self, num=None):
    method is_full (line 96) | def is_full(self):
    method empty (line 100) | def empty(self):
    method update (line 103) | def update(self, items):
  function farthest_point_sample (line 117) | def farthest_point_sample(xyz, npoint, include_ends=False, deterministic...
  function farthest_point_sample_py (line 154) | def farthest_point_sample_py(xyz, npoint):

FILE: mvtracker/utils/visualizer_mp4.py
  function read_video_from_path (line 25) | def read_video_from_path(path):
  class Visualizer (line 42) | class Visualizer:
    method __init__ (line 43) | def __init__(
    method visualize (line 71) | def visualize(
    method save_video (line 172) | def save_video(video, save_dir, filename, writer=None, fps=12, step=0):
    method draw_tracks_on_video (line 190) | def draw_tracks_on_video(
    method _draw_pred_tracks (line 374) | def _draw_pred_tracks(
    method _draw_gt_tracks (line 418) | def _draw_gt_tracks(
  function put_debug_text_onto_image (line 455) | def put_debug_text_onto_image(img: np.ndarray, text: str, font_scale: fl...
  class MultiViewVisualizer (line 500) | class MultiViewVisualizer(Visualizer):
    method __init__ (line 501) | def __init__(self, **kwargs):
    method visualize (line 504) | def visualize(
  function log_mp4_track_viz (line 582) | def log_mp4_track_viz(

FILE: mvtracker/utils/visualizer_rerun.py
  function setup_libs (line 13) | def setup_libs(latex=False):
  function log_pointclouds_to_rerun (line 38) | def log_pointclouds_to_rerun(
  function _log_tracks_to_rerun (line 169) | def _log_tracks_to_rerun(
  function _log_tracks_to_rerun_lightweight (line 262) | def _log_tracks_to_rerun_lightweight(
  function log_tracks_to_rerun (line 368) | def log_tracks_to_rerun(

FILE: scripts/4ddress_preprocessing.py
  function load_pickle (line 247) | def load_pickle(p):
  function save_pickle (line 252) | def save_pickle(p, data):
  function load_image (line 257) | def load_image(path):
  function extract_4d_dress_data (line 261) | def extract_4d_dress_data(
  function crete_overview_pngs (line 499) | def crete_overview_pngs(dataset_root, subject_names, overview_dir):
  function crete_overview_mp4s (line 547) | def crete_overview_mp4s(dataset_root, subject_names, overview_dir, fps=30):

FILE: scripts/compare_cdist-topk_against_pointops-knn.py
  function knn_torch (line 9) | def knn_torch(k: int, xyz_ref: torch.Tensor, xyz_query: torch.Tensor):
  function knn_pointops (line 15) | def knn_pointops(k: int, xyz_ref: torch.Tensor, xyz_query: torch.Tensor):
  function benchmark (line 36) | def benchmark(fn, name, HALF_PRECISION=False, iters=100):

FILE: scripts/dex_ycb_to_neus_format.py
  function sample_surface (line 188) | def sample_surface(mesh: trimesh.Trimesh, count, face_weight=None, seed=...
  function pick_points_from_mesh (line 271) | def pick_points_from_mesh(mesh, picked_faces, picked_weights, reference_...
  class SequenceLoader (line 296) | class SequenceLoader():
    method __init__ (line 299) | def __init__(
    method _load_frame_rgbd (line 1376) | def _load_frame_rgbd(self, c, i):
    method _deproject_depth_and_filter_points (line 1396) | def _deproject_depth_and_filter_points(self, d, c):
    method transform_ycb (line 1434) | def transform_ycb(self,
    method serials (line 1505) | def serials(self):
    method num_cameras (line 1509) | def num_cameras(self):
    method num_frames (line 1513) | def num_frames(self):
    method dimensions (line 1517) | def dimensions(self):
    method ycb_ids (line 1521) | def ycb_ids(self):
    method K (line 1525) | def K(self):
    method master_intrinsics (line 1529) | def master_intrinsics(self):
    method step (line 1532) | def step(self):
    method _update_pcd (line 1538) | def _update_pcd(self):
    method pcd_rgb (line 1548) | def pcd_rgb(self):
    method pcd_vert (line 1555) | def pcd_vert(self):
    method pcd_tex_coord (line 1562) | def pcd_tex_coord(self):
    method pcd_mask (line 1566) | def pcd_mask(self):
    method ycb_group_layer (line 1573) | def ycb_group_layer(self):
    method num_ycb (line 1577) | def num_ycb(self):
    method ycb_model_dir (line 1581) | def ycb_model_dir(self):
    method ycb_count (line 1585) | def ycb_count(self):
    method ycb_material (line 1589) | def ycb_material(self):
    method ycb_pose (line 1593) | def ycb_pose(self):
    method ycb_vert (line 1600) | def ycb_vert(self):
    method ycb_norm (line 1607) | def ycb_norm(self):
    method ycb_tex_coords (line 1614) | def ycb_tex_coords(self):
    method mano_group_layer (line 1618) | def mano_group_layer(self):
    method num_mano (line 1622) | def num_mano(self):
    method mano_vert (line 1626) | def mano_vert(self):
    method mano_norm (line 1633) | def mano_norm(self):
    method mano_line (line 1640) | def mano_line(self):
    method mano_joint_3d (line 1647) | def mano_joint_3d(self):
  function visualize_3dpt_tracks (line 1659) | def visualize_3dpt_tracks(tracks_path, output_video_path):
  function main (line 1759) | def main():

FILE: scripts/egoexo4d_preprocessing.py
  function main_preprocess_egoexo4d (line 90) | def main_preprocess_egoexo4d(
  function main_estimate_duster_depth (line 389) | def main_estimate_duster_depth(

FILE: scripts/estimate_depth_with_duster.py
  function seed_all (line 82) | def seed_all(seed):
  function get_view_visibility (line 104) | def get_view_visibility(scene, pts):
  function get_3D_model_from_scene (line 154) | def get_3D_model_from_scene(
  function get_2D_matches (line 296) | def get_2D_matches(output_file_prefix, scene, input_views, min_conf_thr,...
  function load_images (line 358) | def load_images(folder_or_list, size, square_ok=False, verbose=True):
  function tensor_to_pil (line 415) | def tensor_to_pil(img_tensor):
  function load_tensor_images (line 420) | def load_tensor_images(tensor_list, size, square_ok=False, verbose=True):
  function global_aligner (line 452) | def global_aligner(dust3r_output, device, **optim_kw):
  function load_known_camera_parameters_from_neus_dataset (line 458) | def load_known_camera_parameters_from_neus_dataset(dataset_path, input_v...
  function run_duster (line 495) | def run_duster(
  function main_on_neus_scene (line 748) | def main_on_neus_scene(scene_root, views_selection, **duster_kwargs):
  function main_on_kubric_scene (line 787) | def main_on_kubric_scene(scene_root, views_selection, **duster_kwargs):
  function main_on_d3dgs_panoptic_scene (line 829) | def main_on_d3dgs_panoptic_scene(

FILE: scripts/hi4d_preprocessing.py
  function load_pickle (line 111) | def load_pickle(p):
  function save_pickle (line 137) | def save_pickle(p, data):
  function load_image (line 143) | def load_image(path):
  function _safe_load_rgb_cameras (line 147) | def _safe_load_rgb_cameras(npz_path: str) -> Dict[str, np.ndarray]:
  function _find_all_frames_for_action (line 162) | def _find_all_frames_for_action(images_root: str, cam_ids: List[int]) ->...
  function _mesh_path_for_frame (line 179) | def _mesh_path_for_frame(frames_dir: str, frame_idx: int) -> str:
  function extract_hi4d_action_to_pkl (line 186) | def extract_hi4d_action_to_pkl(

FILE: scripts/merge_comparison_mp4s.py
  function create_title_image (line 21) | def create_title_image(text, width, height=50, bg_color=(255, 255, 255)):
  function merge_mp4s (line 44) | def merge_mp4s(mp4s_title_to_path_dict, merged_mp4_output_path, num_colu...

FILE: scripts/panoptic_studio_preprocessing.py
  function parse_args (line 62) | def parse_args():

FILE: scripts/plot_aj_for_varying_depth_noise_levels.py
  function set_size (line 9) | def set_size(width, fraction=1, golden_ratio=(5 ** .5 - 1) / 2):
  function setup_plot (line 44) | def setup_plot():
  function plot_aj (line 51) | def plot_aj(

FILE: scripts/plot_aj_for_varying_n_of_views.py
  function set_size (line 10) | def set_size(width, fraction=1):
  function setup_plot (line 45) | def setup_plot():
  function plot_aj (line 52) | def plot_aj(

FILE: scripts/selfcap_preprocessing.py
  function main_preprocess_selfcap (line 52) | def main_preprocess_selfcap(

FILE: scripts/summarize_eval_results.py
  function find_file_with_max_steps (line 206) | def find_file_with_max_steps(folder):
  function create_table (line 222) | def create_table(
  function kubric_single_point (line 278) | def kubric_single_point():
  function kubric_before_gt0123 (line 291) | def kubric_before_gt0123():
  function kubric (line 310) | def kubric():
  function kubric_duster (line 330) | def kubric_duster():
  function mv3_kubric_duster_transformed (line 363) | def mv3_kubric_duster_transformed():
  function mv3_kubric_nviews (line 383) | def mv3_kubric_nviews():
  function mv3_kubric_duster_nviews (line 401) | def mv3_kubric_duster_nviews():
  function kubric_nviews (line 417) | def kubric_nviews():
  function tavid2d_davis (line 678) | def tavid2d_davis():
  function dexycb (line 708) | def dexycb():
  function kubric_refactored (line 743) | def kubric_refactored():
  function panoptic (line 785) | def panoptic():
  function kubric_single (line 809) | def kubric_single():
  function dexycb_single (line 836) | def dexycb_single():
  function panoptic_single (line 863) | def panoptic_single():
  function ablation_2dpt (line 899) | def ablation_2dpt():
  function one_to_rule_them_all (line 921) | def one_to_rule_them_all(models, datasets, separate_datasets=True, **cre...
  function ablation_model_params (line 943) | def ablation_model_params():
  function ablation_camera_setups (line 957) | def ablation_camera_setups():
  function ablation_num_views (line 969) | def ablation_num_views(separate_datasets):

Download .json

Condensed preview — 183 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,973K chars).

[
  {
    "path": ".gitignore",
    "chars": 165,
    "preview": ".idea\n__pycache__/\n*.DS_Store\n*.pth\n*.pt\n*.mp4\n*.npy\nvis_results/\ncheckpoints/\nlogs/\nslurm_logs/\nsubmit*\nlogs*\n\n/running"
  },
  {
    "path": "README.md",
    "chars": 16264,
    "preview": "<div align=\"center\" style=\"line-height:1.2; margin:0; padding:0;\">\n<h1 style=\"margin-bottom:0em;\">Multi-View 3D Point Tr"
  },
  {
    "path": "configs/eval.yaml",
    "chars": 1173,
    "preview": "defaults:\n  - train\n  - _self_\n\nmodes:\n  eval_only: true\n\ntrainer:\n  precision: 32-true\n\n# Optional overrides specific t"
  },
  {
    "path": "configs/experiment/mvtracker.yaml",
    "chars": 96,
    "preview": "# @package _global_\ndefaults:\n  - override /model: mvtracker\n\nexperiment_path: ./logs/mvtracker\n"
  },
  {
    "path": "configs/experiment/mvtracker_overfit.yaml",
    "chars": 883,
    "preview": "# @package _global_\ndefaults:\n  - override /model: mvtracker\n\nexperiment_path: ./logs/debug/mvtracker-overfit\n\ndatasets:"
  },
  {
    "path": "configs/experiment/mvtracker_overfit_mini.yaml",
    "chars": 173,
    "preview": "# @package _global_\ndefaults:\n  - mvtracker_overfit\n\nexperiment_path: ./logs/debug/mvtracker-overfit-mini\n\ndatasets:\n  t"
  },
  {
    "path": "configs/model/copycat.yaml",
    "chars": 100,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.copycat.CopyCat\n"
  },
  {
    "path": "configs/model/cotracker1_offline.yaml",
    "chars": 268,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/cotracker1_online.yaml",
    "chars": 274,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/cotracker2_offline.yaml",
    "chars": 266,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/cotracker2_online.yaml",
    "chars": 272,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/cotracker3_offline.yaml",
    "chars": 274,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/cotracker3_online.yaml",
    "chars": 272,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/default.yaml",
    "chars": 1110,
    "preview": "# @package _global_\nmodel:\n  _target_: ???\n\ntrainer:\n  train_iters: 4\n\nevaluation:\n  eval_iters: 4\n  interp_shape: null\n"
  },
  {
    "path": "configs/model/delta.yaml",
    "chars": 317,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/locotrack.yaml",
    "chars": 277,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/mvtracker.yaml",
    "chars": 1253,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.mvtracker.mvtracker.MVTracker\n  slid"
  },
  {
    "path": "configs/model/scenetracker.yaml",
    "chars": 338,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/spatialtrackerv2.yaml",
    "chars": 489,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/model/spatracker_monocular.yaml",
    "chars": 864,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.spatracker.spatracker_monocular.SpaT"
  },
  {
    "path": "configs/model/spatracker_monocular_pretrained.yaml",
    "chars": 706,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.spatracker.spatracker_monocular.SpaT"
  },
  {
    "path": "configs/model/spatracker_multiview.yaml",
    "chars": 1152,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.spatracker.spatracker_multiview.Mult"
  },
  {
    "path": "configs/model/tapip3d.yaml",
    "chars": 663,
    "preview": "# @package _global_\ndefaults:\n  - default\n\nmodel:\n  _target_: mvtracker.models.core.monocular_baselines.MonocularToMulti"
  },
  {
    "path": "configs/train.yaml",
    "chars": 3526,
    "preview": "defaults:\n  - _self_\n  - model: mvtracker\n\nexperiment_path: ???                # where to store checkpoints, visualizati"
  },
  {
    "path": "demo.py",
    "chars": 6466,
    "preview": "import argparse\nimport os\nimport warnings\n\nimport numpy as np\nimport rerun as rr  # pip install rerun-sdk==0.21.0\nimport"
  },
  {
    "path": "hubconf.py",
    "chars": 4839,
    "preview": "# Copyright (c) ETH VLG.\n# Licensed under the terms in the LICENSE file at the root of this repo.\n\nfrom pathlib import P"
  },
  {
    "path": "mvtracker/__init__.py",
    "chars": 197,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/cli/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/cli/eval.py",
    "chars": 268,
    "preview": "import hydra\nfrom omegaconf import DictConfig\n\nfrom mvtracker.cli.train import main as train_main\n\n\n@hydra.main(version_"
  },
  {
    "path": "mvtracker/cli/train.py",
    "chars": 45018,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n# This source code is licensed under the lic"
  },
  {
    "path": "mvtracker/cli/utils/__init__.py",
    "chars": 151,
    "preview": "from .pylogger import RankedLogger\nfrom .rich_utils import enforce_tags, print_config_tree\nfrom .helpers import extras, "
  },
  {
    "path": "mvtracker/cli/utils/helpers.py",
    "chars": 4835,
    "preview": "import faulthandler\nimport warnings\nfrom functools import wraps\nfrom importlib.util import find_spec\nfrom typing import "
  },
  {
    "path": "mvtracker/cli/utils/pylogger.py",
    "chars": 2432,
    "preview": "import logging\nfrom typing import Mapping, Optional\n\nfrom lightning_utilities.core.rank_zero import rank_prefixed_messag"
  },
  {
    "path": "mvtracker/cli/utils/rich_utils.py",
    "chars": 3234,
    "preview": "from pathlib import Path\nfrom typing import Sequence, Optional\n\nimport rich\nimport rich.syntax\nimport rich.tree\nfrom hyd"
  },
  {
    "path": "mvtracker/datasets/__init__.py",
    "chars": 244,
    "preview": "from .dexycb_multiview_dataset import DexYCBMultiViewDataset\nfrom .kubric_multiview_dataset import KubricMultiViewDatase"
  },
  {
    "path": "mvtracker/datasets/dexycb_multiview_dataset.py",
    "chars": 30909,
    "preview": "import logging\nimport os\nimport pathlib\nimport re\nimport time\nimport warnings\n\nimport cv2\nimport matplotlib\nimport numpy"
  },
  {
    "path": "mvtracker/datasets/generic_scene_dataset.py",
    "chars": 38887,
    "preview": "import logging\nimport os\nimport pickle\nimport sys\nfrom contextlib import ExitStack\nfrom typing import Tuple\n\nimport nump"
  },
  {
    "path": "mvtracker/datasets/kubric_multiview_dataset.py",
    "chars": 86013,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n# This source code is licensed under the lic"
  },
  {
    "path": "mvtracker/datasets/panoptic_studio_multiview_dataset.py",
    "chars": 21421,
    "preview": "import logging\nimport os\nimport pathlib\nimport re\nimport time\nimport warnings\n\nimport cv2\nimport numpy as np\nimport pand"
  },
  {
    "path": "mvtracker/datasets/tap_vid_datasets.py",
    "chars": 41548,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/datasets/utils.py",
    "chars": 14587,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/evaluation/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/evaluation/evaluator_3dpt.py",
    "chars": 44072,
    "preview": "import json\nimport logging\nimport os\nimport re\nimport time\nimport warnings\nfrom collections import namedtuple\nfrom typin"
  },
  {
    "path": "mvtracker/evaluation/metrics.py",
    "chars": 18675,
    "preview": "import logging\nimport warnings\nfrom typing import Mapping\n\nimport numpy as np\nimport pandas as pd\nimport torch\n\n\ndef com"
  },
  {
    "path": "mvtracker/models/__init__.py",
    "chars": 197,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/__init__.py",
    "chars": 197,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/copycat.py",
    "chars": 1253,
    "preview": "import torch\nfrom torch import nn as nn\n\n\nclass CopyCat(nn.Module):\n    \"\"\"\n    Dummy, no-movement baseline that always "
  },
  {
    "path": "mvtracker/models/core/cotracker2/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/cotracker2/blocks.py",
    "chars": 16701,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/dpt/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/dpt/base_model.py",
    "chars": 367,
    "preview": "import torch\n\n\nclass BaseModel(torch.nn.Module):\n    def load(self, path):\n        \"\"\"Load model from file.\n\n        Arg"
  },
  {
    "path": "mvtracker/models/core/dpt/blocks.py",
    "chars": 9564,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom mvtracker.models.core.dpt.vit import (\n    _make_pretrained_vitb_rn50_384,\n    "
  },
  {
    "path": "mvtracker/models/core/dpt/midas_net.py",
    "chars": 2788,
    "preview": "\"\"\"MidashNet: Network for monocular depth estimation trained by mixing several datasets.\nThis file contains code that is"
  },
  {
    "path": "mvtracker/models/core/dpt/models.py",
    "chars": 7096,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom mvtracker.models.core.dpt.base_model import Bas"
  },
  {
    "path": "mvtracker/models/core/dpt/transforms.py",
    "chars": 7886,
    "preview": "import cv2\nimport math\nimport numpy as np\n\n\ndef apply_min_size(sample, size, image_interpolation_method=cv2.INTER_AREA):"
  },
  {
    "path": "mvtracker/models/core/dpt/vit.py",
    "chars": 18127,
    "preview": "import types\n\nimport math\nimport timm\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nactivations = "
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/LICENSE.md",
    "chars": 1958,
    "preview": "## Notes on license:\n\nThe code in this repository (except in external.py) is licensed under the MIT licence.\n\nHowever, f"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/colormap.py",
    "chars": 6668,
    "preview": "import numpy as np\n\ncolormap = np.array([\n    # 0     ,         0,         0,\n    0.5020, 0, 0,\n    0, 0.5020, 0,\n    0."
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/export_depths_from_pretrained_checkpoint.py",
    "chars": 5356,
    "preview": "import json\nimport os\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\nfrom PIL import Image\nfrom diff_gaussian"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/external.py",
    "chars": 9485,
    "preview": "\"\"\"\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# T"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/helpers.py",
    "chars": 3364,
    "preview": "import os\n\nimport numpy as np\nimport open3d as o3d\nimport torch\nfrom diff_gaussian_rasterization import GaussianRasteriz"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/merge_tapvid3d_per_camera_annotations.py",
    "chars": 28981,
    "preview": "import json\nimport os\nimport warnings\nfrom pathlib import Path\n\nimport matplotlib\nimport numpy as np\nimport rerun as rr\n"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/metadata_dexycb.py",
    "chars": 2499,
    "preview": "import json\nimport os\nfrom collections import defaultdict\n\nimport numpy as np\n\n# Configurable parameters\nBASE_PATH = \".\""
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/metadata_kubric.py",
    "chars": 3472,
    "preview": "import json\nimport os\nfrom collections import defaultdict\n\nimport kornia\nimport numpy as np\nimport torch\n\nBASE_PATH = \"."
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/reorganize_dexycb.py",
    "chars": 1300,
    "preview": "import os\n\nsource_roots = [f for f in os.listdir(\".\") if f.startswith(\"2020\")]\nimport os\nimport shutil\n\nsource_roots = ["
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/test.py",
    "chars": 4275,
    "preview": "import json\nimport os\n\nimport numpy as np\nimport torch\nimport torchvision\nfrom PIL import Image\nfrom diff_gaussian_raste"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/track_2d.py",
    "chars": 8585,
    "preview": "import json\nimport os\n\nimport numpy as np\nimport torch\nfrom diff_gaussian_rasterization import GaussianRasterizer as Ren"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/track_3d.py",
    "chars": 12042,
    "preview": "import os\n\nimport cv2\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\nfrom external import build_rotation\n\nREMOVE"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/train.py",
    "chars": 10420,
    "preview": "import copy\nimport json\nimport os\nfrom random import randint\n\nimport numpy as np\nimport torch\nfrom PIL import Image\nfrom"
  },
  {
    "path": "mvtracker/models/core/dynamic3dgs/visualize.py",
    "chars": 13765,
    "preview": "import json\nimport os\nfrom pathlib import Path\n\nimport matplotlib\nimport numpy as np\nimport rerun as rr\nimport torch\nfro"
  },
  {
    "path": "mvtracker/models/core/embeddings.py",
    "chars": 8750,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/loftr/__init__.py",
    "chars": 49,
    "preview": "from .transformer import LocalFeatureTransformer\n"
  },
  {
    "path": "mvtracker/models/core/loftr/linear_attention.py",
    "chars": 2796,
    "preview": "\"\"\"\nLinear Transformer proposed in \"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention\"\nModif"
  },
  {
    "path": "mvtracker/models/core/loftr/transformer.py",
    "chars": 5239,
    "preview": "'''\nmodified from\nhttps://github.com/zju3dv/LoFTR/blob/master/src/loftr/loftr_module/transformer.py\n'''\nimport copy\n\nimp"
  },
  {
    "path": "mvtracker/models/core/losses.py",
    "chars": 2639,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/model_utils.py",
    "chars": 18040,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n# This source code is licensed under the lic"
  },
  {
    "path": "mvtracker/models/core/monocular_baselines.py",
    "chars": 33014,
    "preview": "import logging\nimport sys\nimport warnings\nfrom typing import Tuple\n\nimport numpy as np\nimport torch\nimport torch.nn.func"
  },
  {
    "path": "mvtracker/models/core/mvtracker/__init__.py",
    "chars": 197,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/mvtracker/mvtracker.py",
    "chars": 46779,
    "preview": "import logging\nimport os\nfrom collections import defaultdict\nfrom typing import Optional, Callable\n\nimport numpy as np\ni"
  },
  {
    "path": "mvtracker/models/core/ptv3/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/ptv3/model.py",
    "chars": 36928,
    "preview": "\"\"\"\r\nPoint Transformer - V3 Mode1\r\nPointcept detached version\r\n\r\nAuthor: Xiaoyang Wu (xiaoyang.wu.cs@gmail.com)\r\nPlease "
  },
  {
    "path": "mvtracker/models/core/ptv3/serialization/__init__.py",
    "chars": 137,
    "preview": "from .default import (\r\n    encode,\r\n    decode,\r\n    z_order_encode,\r\n    z_order_decode,\r\n    hilbert_encode,\r\n    hil"
  },
  {
    "path": "mvtracker/models/core/ptv3/serialization/default.py",
    "chars": 2067,
    "preview": "import torch\r\n\r\nfrom .hilbert import decode as hilbert_decode_\r\nfrom .hilbert import encode as hilbert_encode_\r\nfrom .z_"
  },
  {
    "path": "mvtracker/models/core/ptv3/serialization/hilbert.py",
    "chars": 10250,
    "preview": "\"\"\"\r\nHilbert Order\r\nModified from https://github.com/PrincetonLIPS/numpy-hilbert-curve\r\n\r\nAuthor: Xiaoyang Wu (xiaoyang."
  },
  {
    "path": "mvtracker/models/core/ptv3/serialization/z_order.py",
    "chars": 4299,
    "preview": "# --------------------------------------------------------\r\n# Octree-based Sparse Convolutional Neural Networks\r\n# Copyr"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/.gitignore",
    "chars": 126,
    "preview": "*.pth\n*.npy\n*.mp4\noutputs/\nwork_dirs/\n*__pycache__*\n.vscode/\n.envrc\n.bak/\ndatasets/\n\npreproc/checkpoints\npreproc/checkpo"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/.gitmodules",
    "chars": 330,
    "preview": "[submodule \"preproc/tapnet\"]\n\tpath = preproc/tapnet\n\turl = https://github.com/google-deepmind/tapnet.git\n[submodule \"pre"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/LICENSE",
    "chars": 1066,
    "preview": "MIT License\n\nCopyright (c) 2024 Vickie Ye\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/README.md",
    "chars": 2317,
    "preview": "# Shape of Motion: 4D Reconstruction from a Single Video\n**[Project Page](https://shape-of-motion.github.io/) | [Arxiv]("
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/configs.py",
    "chars": 1485,
    "preview": "from dataclasses import dataclass\n\n\n@dataclass\nclass FGLRConfig:\n    means: float = 1.6e-4\n    opacities: float = 1e-2\n "
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/__init__.py",
    "chars": 1658,
    "preview": "from dataclasses import asdict, replace\n\nfrom torch.utils.data import Dataset\n\nfrom .base_dataset import BaseDataset\nfro"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/base_dataset.py",
    "chars": 3649,
    "preview": "from abc import abstractmethod\n\nimport torch\nfrom torch.utils.data import Dataset, default_collate\n\n\nclass BaseDataset(D"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/casual_dataset.py",
    "chars": 18466,
    "preview": "import os\nfrom dataclasses import dataclass\nfrom functools import partial\nfrom typing import Literal, cast\n\nimport cv2\ni"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/colmap.py",
    "chars": 12684,
    "preview": "import os\nimport struct\nfrom dataclasses import dataclass\nfrom pathlib import Path\nfrom typing import Dict, Union\n\nimpor"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/iphone_dataset.py",
    "chars": 33241,
    "preview": "import json\nimport os\nimport os.path as osp\nfrom dataclasses import dataclass\nfrom glob import glob\nfrom itertools impor"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/panoptic_dataset.py",
    "chars": 32382,
    "preview": "import os\nfrom dataclasses import dataclass\nfrom functools import partial\nfrom typing import Literal, cast\n\nimport cv2\ni"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/data/utils.py",
    "chars": 11553,
    "preview": "from typing import List, Optional, Tuple, TypedDict\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch."
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/init_utils.py",
    "chars": 22437,
    "preview": "import time\nfrom typing import Literal\n\nimport cupy as cp\nimport imageio.v3 as iio\nimport numpy as np\n\n# from pytorch3d."
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/loss_utils.py",
    "chars": 5563,
    "preview": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom sklearn.neighbors import NearestNeighbors\n\n\ndef mas"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/metrics.py",
    "chars": 10820,
    "preview": "from typing import Literal\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom torchmetrics.functional"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/params.py",
    "chars": 6227,
    "preview": "import math\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom flow3d.transforms import cont_6d_t"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/renderer.py",
    "chars": 3023,
    "preview": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom loguru import logger as guru\nfrom nerfview import C"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/scene_model.py",
    "chars": 10355,
    "preview": "import roma\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom gsplat.rendering import rasterizatio"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/tensor_dataclass.py",
    "chars": 2832,
    "preview": "from dataclasses import dataclass\nfrom typing import Callable, TypeVar\n\nimport torch\nfrom typing_extensions import Self\n"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/trainer.py",
    "chars": 31472,
    "preview": "import functools\nimport time\nfrom dataclasses import asdict\nfrom typing import cast\n\nimport numpy as np\nimport torch\nimp"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/trajectories.py",
    "chars": 6223,
    "preview": "import numpy as np\nimport roma\nimport torch\nimport torch.nn.functional as F\n\nfrom .transforms import rt_to_mat4\n\n\ndef ge"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/transforms.py",
    "chars": 4114,
    "preview": "from typing import Literal\n\nimport roma\nimport torch\nimport torch.nn.functional as F\n\n\ndef rt_to_mat4(\n    R: torch.Tens"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/validator.py",
    "chars": 15704,
    "preview": "import functools\nimport os\nimport os.path as osp\nimport time\nfrom dataclasses import asdict\nfrom typing import cast\n\nimp"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/vis/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/vis/playback_panel.py",
    "chars": 2082,
    "preview": "import threading\nimport time\n\nimport viser\n\n\ndef add_gui_playback_group(\n    server: viser.ViserServer,\n    num_frames: "
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/vis/render_panel.py",
    "chars": 43798,
    "preview": "# Copyright 2022 the Regents of the University of California, Nerfstudio Team and contributors. All rights reserved.\n#\n#"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/vis/utils.py",
    "chars": 16290,
    "preview": "import colorsys\nfrom typing import cast\n\nimport cv2\nimport numpy as np\n\nimport nvdiffrast.torch as dr\nimport torch\nimpor"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/flow3d/vis/viewer.py",
    "chars": 2553,
    "preview": "from pathlib import Path\nfrom typing import Callable, Literal, Optional, Tuple, Union\n\nimport numpy as np\nfrom jaxtyping"
  },
  {
    "path": "mvtracker/models/core/shape-of-motion/launch_davis.py",
    "chars": 1023,
    "preview": "import os\nimport subprocess\nfrom concurrent.futures import ProcessPoolExecutor\nimport tyro\n\n\ndef main(\n    devices: list"
  },
  {
    "path": "mvtracker/models/core/spatracker/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/spatracker/blocks.py",
    "chars": 25844,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/spatracker/softsplat.py",
    "chars": 45628,
    "preview": "#!/usr/bin/env python\n\n\"\"\"The code of softsplat function is modified from:\nhttps://github.com/sniklaus/softmax-splatting"
  },
  {
    "path": "mvtracker/models/core/spatracker/spatracker_monocular.py",
    "chars": 48722,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\nimport logging\nimport warnings\n\nimport numpy"
  },
  {
    "path": "mvtracker/models/core/spatracker/spatracker_multiview.py",
    "chars": 46518,
    "preview": "import logging\nimport os\nimport warnings\n\nimport cv2\nimport numpy as np\nimport torch\nfrom einops import rearrange\nfrom m"
  },
  {
    "path": "mvtracker/models/core/vggt/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/vggt/heads/camera_head.py",
    "chars": 6041,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/dpt_head.py",
    "chars": 18430,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/head_act.py",
    "chars": 3741,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_head.py",
    "chars": 4254,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_modules/__init__.py",
    "chars": 198,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_modules/base_track_predictor.py",
    "chars": 8027,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_modules/blocks.py",
    "chars": 10024,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_modules/modules.py",
    "chars": 6310,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/track_modules/utils.py",
    "chars": 8171,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/heads/utils.py",
    "chars": 3878,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/layers/__init__.py",
    "chars": 382,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/layers/attention.py",
    "chars": 3146,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/block.py",
    "chars": 9519,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/drop_path.py",
    "chars": 1157,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/layer_scale.py",
    "chars": 820,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/mlp.py",
    "chars": 1269,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/patch_embed.py",
    "chars": 2829,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/rope.py",
    "chars": 7676,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/swiglu_ffn.py",
    "chars": 2190,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/layers/vision_transformer.py",
    "chars": 15239,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# This source code is licensed under the Apache License, Version "
  },
  {
    "path": "mvtracker/models/core/vggt/models/aggregator.py",
    "chars": 12527,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/models/vggt.py",
    "chars": 4494,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/utils/geometry.py",
    "chars": 5880,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/utils/load_fn.py",
    "chars": 5885,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/utils/pose_enc.py",
    "chars": 5543,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/utils/rotation.py",
    "chars": 4676,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vggt/utils/visual_track.py",
    "chars": 8354,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "mvtracker/models/core/vit/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "mvtracker/models/core/vit/common.py",
    "chars": 1495,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/core/vit/encoder.py",
    "chars": 14656,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/models/evaluation_predictor_3dpt.py",
    "chars": 23331,
    "preview": "import os\nimport random\nfrom typing import Optional, Tuple\n\nimport numpy as np\nimport torch\nimport torch.nn.functional a"
  },
  {
    "path": "mvtracker/utils/__init__.py",
    "chars": 197,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/utils/basic.py",
    "chars": 12572,
    "preview": "import os\nfrom datetime import datetime\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n\nEPS = 1e-6\n\n\nd"
  },
  {
    "path": "mvtracker/utils/eval_utils.py",
    "chars": 2100,
    "preview": "import os\n\nimport matplotlib\nimport numpy as np\nimport rerun as rr\nimport json\nfrom tqdm import tqdm\nfrom scipy.stats im"
  },
  {
    "path": "mvtracker/utils/geom.py",
    "chars": 16502,
    "preview": "import numpy as np\nimport torch\nimport torchvision.ops as ops\n\n\ndef matmul2(mat1, mat2):\n    return torch.matmul(mat1, m"
  },
  {
    "path": "mvtracker/utils/improc.py",
    "chars": 52617,
    "preview": "import cv2\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nimport torchv"
  },
  {
    "path": "mvtracker/utils/misc.py",
    "chars": 5479,
    "preview": "import numpy as np\nimport torch\nfrom prettytable import PrettyTable\n\n\ndef count_parameters(model):\n    table = PrettyTab"
  },
  {
    "path": "mvtracker/utils/visualizer_mp4.py",
    "chars": 28115,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the li"
  },
  {
    "path": "mvtracker/utils/visualizer_rerun.py",
    "chars": 23450,
    "preview": "from typing import Union, Optional, List, Dict, Any\n\nimport matplotlib\nimport numpy as np\nimport pandas as pd\nimport rer"
  },
  {
    "path": "requirements.full.txt",
    "chars": 568,
    "preview": "# Minimal runtime\nnumpy==1.24.3\nhuggingface-hub==0.30.2\neasydict==1.13\npandas==2.2.2\neinops==0.7.0\nopencv-python==4.11.0"
  },
  {
    "path": "requirements.txt",
    "chars": 306,
    "preview": "# Minimal dependencies\nnumpy==1.24.3\nhuggingface-hub==0.30.2\neasydict==1.13\npandas==2.2.2\neinops==0.7.0\nopencv-python==4"
  },
  {
    "path": "scripts/4ddress_preprocessing.py",
    "chars": 51224,
    "preview": "\"\"\"\nFirst download the dataset. You'll have to fill in an online ETH form\nand then wait for a few days to get a temporar"
  },
  {
    "path": "scripts/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "scripts/compare_cdist-topk_against_pointops-knn.py",
    "chars": 2543,
    "preview": "import time\n\nimport torch\nfrom pointops import knn_query\n\nB, N, M, D, K = 12, 49152, 928, 3, 16\n\n\ndef knn_torch(k: int, "
  },
  {
    "path": "scripts/dex_ycb_to_neus_format.py",
    "chars": 81755,
    "preview": "\"\"\"\nBefore running the script, you need to install the toolkit and other\ndependencies, as well as download the data and "
  },
  {
    "path": "scripts/egoexo4d_preprocessing.py",
    "chars": 20131,
    "preview": "\"\"\"\nEnvironment setup:\n```bash\ncd ..\n\n# Clone the projectaria_tools repository\ngit clone -b 1.5.0 https://github.com/fac"
  },
  {
    "path": "scripts/estimate_depth_with_duster.py",
    "chars": 45325,
    "preview": "\"\"\"\nSet up the environment:\n```sh\ncd /local/home/frrajic/xode\n\ngit clone --recursive git@github.com:ethz-vlg/duster.git\n"
  },
  {
    "path": "scripts/hi4d_preprocessing.py",
    "chars": 28390,
    "preview": "\"\"\"\nFirst download the dataset. You'll have to fill in an online ETH form\nand then wait for a few days to get a temporar"
  },
  {
    "path": "scripts/merge_comparison_mp4s.py",
    "chars": 10957,
    "preview": "\"\"\"\nMerge MP4 files of different methods into a single side-by-side comparison,\nadding a small text bar for each method "
  },
  {
    "path": "scripts/panoptic_studio_preprocessing.py",
    "chars": 4048,
    "preview": "\"\"\"\nThis script will convert the Panoptic Studio subset of TAPVid-3D to multi-view 3D point tracking dataset.\n\nFirst, fo"
  },
  {
    "path": "scripts/plot_aj_for_varying_depth_noise_levels.py",
    "chars": 3833,
    "preview": "import os\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport seaborn as sns\n\n\n# set_size from https://jwalton.in"
  },
  {
    "path": "scripts/plot_aj_for_varying_n_of_views.py",
    "chars": 3856,
    "preview": "import os\n\nimport matplotlib.pyplot as plt\nimport matplotlib.ticker as ticker\nimport numpy as np\nimport seaborn as sns\n\n"
  },
  {
    "path": "scripts/profiling.md",
    "chars": 2077,
    "preview": "# Profiling Notes\n\nThis document summarizes how to run performance profiling using PyTorch’s built-in tools, and how to "
  },
  {
    "path": "scripts/selfcap_preprocessing.py",
    "chars": 12376,
    "preview": "\"\"\"\nSelfCap dataset (https://zju3dv.github.io/longvolcap/)\n\nDownload the dataset (but first fill in the form at https://"
  },
  {
    "path": "scripts/slurm/eval.sh",
    "chars": 8141,
    "preview": "#!/bin/bash\n#SBATCH --job-name=eval-058\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task=32\n#SBATCH"
  },
  {
    "path": "scripts/slurm/mvtracker-nodepthaugs.sh",
    "chars": 1262,
    "preview": "#!/bin/bash\n#SBATCH --job-name=mvtracker_200000_june2025_cleandepths\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=4\n#SBAT"
  },
  {
    "path": "scripts/slurm/mvtracker.sh",
    "chars": 1206,
    "preview": "#!/bin/bash\n#SBATCH --job-name=mvtracker_200000_june2025\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=4\n#SBATCH --cpus-pe"
  },
  {
    "path": "scripts/slurm/spatracker.sh",
    "chars": 1104,
    "preview": "#!/bin/bash\n#SBATCH --job-name=spatracker_monocular\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=4\n#SBATCH --cpus-per-tas"
  },
  {
    "path": "scripts/slurm/test_reproducibility.sh",
    "chars": 1387,
    "preview": "#!/bin/bash\n#SBATCH --job-name=repro-test-mvtracker\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=4\n#SBATCH --cpus-per-tas"
  },
  {
    "path": "scripts/slurm/triplane-128.sh",
    "chars": 1180,
    "preview": "#!/bin/bash\n#SBATCH --job-name=spatracker_multiview_128\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=4\n#SBATCH --cpus-per"
  },
  {
    "path": "scripts/slurm/triplane-256.sh",
    "chars": 1180,
    "preview": "#!/bin/bash\n#SBATCH --job-name=spatracker_multiview_256\n#SBATCH --nodes=8\n#SBATCH --ntasks-per-node=4\n#SBATCH --cpus-per"
  },
  {
    "path": "scripts/summarize_eval_results.py",
    "chars": 60154,
    "preview": "import os\nimport re\nimport warnings\n\nimport pandas as pd\n\nREMAP_KUBRIC = {\n    \"Method\": (\"\", \"Method\"),\n    \"average_ja"
  }
]

About this extraction

This page contains the full source code of the ethz-vlg/mvtracker GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 183 files (1.8 MB), approximately 515.4k tokens, and a symbol index with 1221 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo