Repository: danier97/LDMVFI
Branch: main
Commit: eee2dc3566f2
Files: 47
Total size: 425.2 KB
Directory structure:
gitextract_grs9wzig/
├── .gitignore
├── LICENSE
├── README.md
├── configs/
│ ├── autoencoder/
│ │ └── vqflow-f32.yaml
│ └── ldm/
│ └── ldmvfi-vqflow-f32-c256-concat_max.yaml
├── cupy_module/
│ ├── __init__.py
│ └── dsepconv.py
├── environment.yaml
├── evaluate.py
├── evaluate_vqm.py
├── interpolate_yuv.py
├── ldm/
│ ├── data/
│ │ ├── __init__.py
│ │ ├── bvi_vimeo.py
│ │ ├── testsets.py
│ │ ├── testsets_vqm.py
│ │ └── vfitransforms.py
│ ├── lr_scheduler.py
│ ├── models/
│ │ ├── autoencoder.py
│ │ └── diffusion/
│ │ ├── __init__.py
│ │ ├── ddim.py
│ │ └── ddpm.py
│ ├── modules/
│ │ ├── attention.py
│ │ ├── diffusionmodules/
│ │ │ ├── __init__.py
│ │ │ ├── model.py
│ │ │ ├── openaimodel.py
│ │ │ └── util.py
│ │ ├── ema.py
│ │ ├── losses/
│ │ │ ├── __init__.py
│ │ │ └── vqperceptual.py
│ │ └── maxvit.py
│ └── util.py
├── main.py
├── metrics/
│ ├── flolpips/
│ │ ├── .gitignore
│ │ ├── LICENSE
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── correlation/
│ │ │ └── correlation.py
│ │ ├── flolpips.py
│ │ ├── pretrained_networks.py
│ │ ├── pwcnet.py
│ │ └── utils.py
│ ├── lpips/
│ │ ├── __init__.py
│ │ ├── lpips.py
│ │ └── pretrained_networks.py
│ └── pytorch_ssim/
│ └── __init__.py
├── setup.py
└── utility.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.pth
*.ckpt
*__pycache__*
*.pyc
*egg*
*src/*
*.ipynb
logs/*
*delete*
eval_results*
*.idea*
*.pytorch
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 danielism97
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# LDMVFI: Video Frame Interpolation with Latent Diffusion Models
[**Duolikun Danier**](https://danier97.github.io/), [**Fan Zhang**](https://fan-aaron-zhang.github.io/), [**David Bull**](https://david-bull.github.io/)
[Project](TODO) | [arXiv](https://arxiv.org/abs/2303.09508) | [Video](https://drive.google.com/file/d/1oL6j_l3b2QEqsL0iO7qSZrGUXJaTpRWN/view?usp=share_link)

## Overview
We observe that most existing learning-based VFI models are trained to minimise the L1/L2/VGG loss between their outputs and the ground-truth frames. However, it was shown in previous works that these metrics do not correlate well with the **perceptual quality** of VFI. On the other hand, generative models, especially diffusion models, are showing remarkable results in generating visual content with high perceptual quality. In this work, we leverage the high-fidelity image/video generation capabilities of **latent diffusion models** to perform generative VFI.
## Dependencies and Installation
See [environment.yaml](./environment.yaml) for requirements on packages. Simple installation:
```
conda env create -f environment.yaml
```
## Pre-trained Model
The pre-trained model can be downloaded from [here](https://drive.google.com/file/d/1_Xx2fBYQT9O-6O3zjzX76O9XduGnCh_7/view?usp=share_link), and its corresponding config file is [this yaml](./configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml).
## Preparing datasets
### Training sets:
[[Vimeo-90K]](http://toflow.csail.mit.edu/) | [[BVI-DVC quintuplets]](https://drive.google.com/file/d/1i_CoqiNrZ2AU8DKjU8aHM1jIaDGW0fE5/view?usp=sharing)
### Test sets:
[[Middlebury]](https://vision.middlebury.edu/flow/data/) | [[UCF101]](https://sites.google.com/view/xiangyuxu/qvi_nips19) | [[DAVIS]](https://sites.google.com/view/xiangyuxu/qvi_nips19) | [[SNU-FILM]](https://myungsub.github.io/CAIN/)
To make use of the [evaluate.py](evaluate.py) and the files in [ldm/data/](./ldm/data/), the dataset folder names should be lower-case and structured as follows.
```
└──── /
├──── middlebury_others/
| ├──── input/
| | ├──── Beanbags/
| | ├──── ...
| | └──── Walking/
| └──── gt/
| ├──── Beanbags/
| ├──── ...
| └──── Walking/
├──── ucf101/
| ├──── 0/
| ├──── ...
| └──── 99/
├──── davis90/
| ├──── bear/
| ├──── ...
| └──── walking/
├──── snufilm/
| ├──── test-easy.txt
| ├──── ...
| └──── data/SNU-FILM/test/...
├──── bvidvc/quintuplets
| ├──── 00000/
| ├──── ...
| └──── 17599/
└──── vimeo_septuplet/
├──── sequences/
├──── sep_testlist.txt
└──── sep_trainlist.txt
```
## Evaluation
To evaluate LDMVFI (with DDIM sampler), for example, on the Middlebury dataset, using PSNR/SSIM/LPIPS, run the following command.
```
python evaluate.py \
--config configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml \
--ckpt \
--dataset Middlebury_others \
--metrics PSNR SSIM LPIPS \
--data_dir \
--out_dir eval_results/ldmvfi-vqflow-f32-c256-concat_max/ \
--use_ddim
```
This will create the directory `eval_results/ldmvfi-vqflow-f32-c256-concat_max/Middlebury_others/`, and store the interpolated frames, as well as a `results.txt` file in that directory. For other test sets, replace `Middlebury_other` with the corresponding class names defined in [ldm/data/testsets.py](ldm/data/testsets.py) (e.g. `Ucf101_triplet`).
\
To evaluate the model on perceptual video metric FloLPIPS, first evaluate the image metrics using the code above (so that the interpolated frames are saved in `eval_results/ldmvfi-vqflow-f32-c256-concat_max`), then run the following code.
```
python evaluate_vqm.py \
--exp ldmvfi-vqflow-f32-c256-concat_max \
--dataset Middlebury_others \
--metrics FloLPIPS \
--data_dir \
--out_dir eval_results/ldmvfi-vqflow-f32-c256-concat_max/ \
```
This will read the interpolated frames previously stored in `eval_results/ldmvfi-vqflow-f32-c256-concat_max/Middlebury_others/` then output the evaluation results to `results_vqm.txt` in the same folder.
\
To interpolate a video (in .yuv format), use the following code.
```
python interpolate_yuv.py \
--net LDMVFI \
--config configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml \
--ckpt \
--input_yuv \
--size \
--out_fps