Full Code of sovit-123/fasterrcnn-pytorch-training-pipeline for AI

main e7f2a3bac4e7 cached

91 files

4.2 MB

1.1M tokens

399 symbols

1 requests

Download .txt

Showing preview only (4,474K chars total). Download the full file or copy to clipboard to get everything.

Repository: sovit-123/fasterrcnn-pytorch-training-pipeline
Branch: main
Commit: e7f2a3bac4e7
Files: 91
Total size: 4.2 MB

Directory structure:
gitextract_6qmktazb/

├── .gitignore
├── .gitmodules
├── LICENSE
├── README.md
├── __init__.py
├── _config.yml
├── data/
│   └── README.md
├── data_configs/
│   ├── aquarium.yaml
│   ├── aquarium_yolo.yaml
│   ├── buggy_data.yaml
│   ├── coco.yaml
│   ├── coco128.yaml
│   ├── gtsdb.yaml
│   ├── smoke.yaml
│   ├── test_image_config.yaml
│   ├── test_video_config.yaml
│   ├── trash_icra.yaml
│   └── voc.yaml
├── datasets.py
├── docs/
│   ├── upcoming_updates.md
│   └── updates.md
├── eval.py
├── example_test_data/
│   └── README.md
├── export.py
├── inference.py
├── inference_video.py
├── models/
│   ├── __init__.py
│   ├── create_fasterrcnn_model.py
│   ├── fasterrcnn_convnext_small.py
│   ├── fasterrcnn_convnext_tiny.py
│   ├── fasterrcnn_custom_resnet.py
│   ├── fasterrcnn_darknet.py
│   ├── fasterrcnn_dinov3_convnext_base.py
│   ├── fasterrcnn_dinov3_convnext_large.py
│   ├── fasterrcnn_dinov3_convnext_small.py
│   ├── fasterrcnn_dinov3_convnext_tiny.py
│   ├── fasterrcnn_dinov3_convnext_tiny_multifeat.py
│   ├── fasterrcnn_dinov3_vitb16.py
│   ├── fasterrcnn_dinov3_vith16plus.py
│   ├── fasterrcnn_dinov3_vitl16.py
│   ├── fasterrcnn_dinov3_vits16.py
│   ├── fasterrcnn_dinov3_vits16plus.py
│   ├── fasterrcnn_efficientnet_b0.py
│   ├── fasterrcnn_efficientnet_b4.py
│   ├── fasterrcnn_mbv3_large.py
│   ├── fasterrcnn_mbv3_small_nano_head.py
│   ├── fasterrcnn_mini_darknet.py
│   ├── fasterrcnn_mini_darknet_nano_head.py
│   ├── fasterrcnn_mini_squeezenet1_1_small_head.py
│   ├── fasterrcnn_mini_squeezenet1_1_tiny_head.py
│   ├── fasterrcnn_mobilenetv3_large_320_fpn.py
│   ├── fasterrcnn_mobilenetv3_large_fpn.py
│   ├── fasterrcnn_mobilevit_xxs.py
│   ├── fasterrcnn_nano.py
│   ├── fasterrcnn_regnet_y_400mf.py
│   ├── fasterrcnn_resnet101.py
│   ├── fasterrcnn_resnet152.py
│   ├── fasterrcnn_resnet18.py
│   ├── fasterrcnn_resnet50_fpn.py
│   ├── fasterrcnn_resnet50_fpn_v2.py
│   ├── fasterrcnn_squeezenet1_0.py
│   ├── fasterrcnn_squeezenet1_1.py
│   ├── fasterrcnn_squeezenet1_1_small_head.py
│   ├── fasterrcnn_vgg16.py
│   ├── fasterrcnn_vitdet.py
│   ├── fasterrcnn_vitdet_tiny.py
│   ├── layers.py
│   ├── model_summary.py
│   └── utils.py
├── notebook_examples/
│   ├── custom_faster_rcnn_training_colab.ipynb
│   ├── custom_faster_rcnn_training_kaggle.ipynb
│   └── visualizations.ipynb
├── onnx_inference_image.py
├── onnx_inference_video.py
├── requirements.txt
├── requirements_blackwell.txt
├── sahi_inference.py
├── torch_utils/
│   ├── README.md
│   ├── __init__.py
│   ├── coco_eval.py
│   ├── coco_utils.py
│   ├── engine.py
│   └── utils.py
├── train.py
├── utils/
│   ├── __init__.py
│   ├── annotations.py
│   ├── general.py
│   ├── logging.py
│   ├── transforms.py
│   └── validate.py
└── weights/
    └── readme.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
data/*
!data/README.md
outputs/
inference_outputs
test.sh
*__pycache__/

# Custom weights folder (mainly for testing).
weights/*
!weights/readme.txt

# Weights and Biases
wandb/*

# IPython checkpoints
.ipynb_checkpoints


================================================
FILE: .gitmodules
================================================
[submodule "dinov3"]
	path = dinov3
	url = https://github.com/facebookresearch/dinov3.git


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Sovit Ranjan Rath

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# A Simple Pipeline to Train PyTorch FasterRCNN Model

![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)



Train PyTorch FasterRCNN models easily on any custom dataset. Choose between official PyTorch models trained on COCO dataset, or choose any backbone from Torchvision classification models, or even write your own custom backbones. 

***You can run a Faster RCNN model with Mini Darknet backbone and Mini Detection Head at more than 150 FPS on an RTX 3080***.

![](readme_images/gif_1.gif)

## Get Started

																								[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1oFxPpBeE8SzSQq7BTUv28IIqQeiHHLdj?usp=sharing) [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/code/sovitrath/custom-faster-rcnn-training-kaggle/notebook)

* [Find blog posts/tutorials on DebuggerCafe](#Tutorials)

## Updates

* **June 6 2025:** Support for both Pascal VOC and YOLO text file annotation type during training. Check [custom training section](#Train-on-Custom-Dataset)

* **August 28 2024:** SAHI image inference for all pretrained Torchvision Faster RCNN models integrated. [Find the script here](https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline/blob/main/sahi_inference.py).

* Filter classes to visualize during inference using the `--classes` command line argument with space separated class indices from the dataset YAML file. 

  For example, to visualize only persons in COCO dataset, use,  `python inference.py --classes 1 <rest of the command>`

  To visualize person and car, use, `python inference.py --classes 1 3 <rest of the command>`

* Added Deep SORT Real-Time tracking to `inference_video.py` and `onnx_video_inference.py`. Using `--track` command with the usual inference command. Support for **MobileNet** Re-ID for now.

## Custom Model Naming Conventions

***For this repository:***

* **Small head refers to 512 representation size in the Faster RCNN head and predictor.**
* **Tiny head refers to 256 representation size in the Faster RCNN head and predictor.**
* **Nano head refers to 128 representation size in the Faster RCNN head and predictor.**

## [Check All Available Model Flags](#A-List-of-All-Model-Flags-to-Use-With-the-Training-Script)

## Go To

* [Setup on Ubuntu](#Setup-for-Ubuntu)
* [Setup on Windows](#Setup-on-Windows)
* [Train on Custom Dataset](#Train-on-Custom-Dataset)
* [Inference](#Inference)
* [Evaluation](#Evaluation)
* [Available Models](#A-List-of-All-Model-Flags-to-Use-With-the-Training-Script)
* [Tutorials](#Tutorials)

## Setup on Ubuntu

1. Clone the repository.

   ```bash
   git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git
   ```

   Optional: Initialize DINOv3 submodule for training DINOv3 Faster RCNN models. 

   ```bash
   git submodule update --init
   ```

2. Install requirements as per GPU.
   Install requirements on **RTX 30/40** (**Ampere and Ada Lovelace**) series and **T4/P100 GPUs**.

   ```bash
   pip install -r requirements.txt
   ```

**OR**	

Install requirements for **RTX 50** series and **Blackwell GPUs**. First install PyTorch >= 2.8 with CUDA >= 12.9

```
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129
```

Install rest of the requirements

```
pip install -r requirements_blackwell.txt
```

## Setup on Windows

1. **First you need to install Microsoft Visual Studio from [here](https://my.visualstudio.com/Downloads?q=Visual%20Studio%202017)**. Sing In/Sing Up by clicking on **[this link](https://my.visualstudio.com/Downloads?q=Visual%20Studio%202017)** and download the **Visual Studio Community 2017** edition.

   ![](readme_images/vs-2017-annotated.jpg)

   Install with all the default chosen settings. It should be around 6 GB. Mainly, we need the C++ Build Tools.

2. Then install the proper **`pycocotools`** for Windows.

   ```bash
   pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
   ```

3. Clone the repository.

   ```bash
   git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git
   ```

4. Then install the remaining **[requirements](https://github.com/sovit-123/pytorch-efficientdet-api/blob/main/requirements.txt)** except for `pycocotools`.

   Install requirements on **RTX 30/40** (**Ampere and Ada Lovelace**) series and **T4/P100 GPUs**.

   ```bash
   pip install -r requirements.txt
   ```

**OR**	

Install requirements for **RTX 50** series and **Blackwell GPUs**. First install PyTorch >= 2.8 with CUDA >= 12.9

```
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129
```

Install rest of the requirements (apart from `pycocotools`)

```
pip install -r requirements_blackwell.txt
```

## Using Custom Weights

Some models like **DINOv3 based Faster RCNN** models require the pretrained weights to be present locally. Put all the DINOv3 backbone weights in the `weights` directory. The respective model files will load them from the directory.

You can download the weights by [filling the form here](https://github.com/facebookresearch/dinov3/tree/main?tab=readme-ov-file#pretrained-models).

For example, for the FasterRCNN DINOv3 ConvNext Tiny model (`models/fasterrcnn_dinov3_convnext_tiny.py`) the weights are loaded using the following syntax with relative path.

```python
# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_tiny_pretrain_lvd1689m-21b726bb.pth'

self.backbone = torch.hub.load(
    REPO_DIR, 
    "dinov3_convnext_tiny", 
    source='local', 
    weights=WEIGHTS_URL
)
```

## Train on Custom Dataset

Taking an exmaple of the [smoke dataset](https://www.kaggle.com/didiruh/smoke-pascal-voc) from Kaggle. Let's say that the dataset is in the `data/smoke_pascal_voc` directory in the following format. And the `smoke.yaml` is in the `data_configs` directory. Assuming, we store the smoke data in the `data` directory

```bash
├── data
│   ├── smoke_pascal_voc
│   │   ├── archive
│   │   │   ├── train
│   │   │   └── valid
│   └── README.md
├── data_configs
│   └── smoke.yaml
├── models
│   ├── create_fasterrcnn_model.py
│   ...
│   └── __init__.py
├── outputs
│   ├── inference
│   └── training
│       ...
├── readme_images
│   ...
├── torch_utils
│   ├── coco_eval.py
│   ...
├── utils
│   ├── annotations.py
│   ...
├── datasets.py
├── inference.py
├── inference_video.py
├── __init__.py
├── README.md
├── requirements.txt
└── train.py
```

The content of the `smoke.yaml` should be the following. The folder containing the annotation files can either point to **Pascal VOC XML** files or **YOLO text labels** folder. The images and labels (for both Pascal VOC XML and YOLO text files) can be either in the same folder or in different folders because the image and annotation files are matched based on the file names during dataset preparation. 

**If the data config file (shown below) points to Pascal VOC XML annotations, the `CLASSES` field can contain the class names in any order after the `__background__` class. If the data config file points to YOLO text file annotation folder, the `CLASSES` should contain the class names in the order as present in the YOLO dataset `data.yaml` file. This is necessary to maintain indexing order during training.**

![](readme_images/fasterrcnn_yolo_config_example.png)

```yaml
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/train/images
TRAIN_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/train/annotations # This can contain .xml or .txt files
# VALID_DIR should be relative to train.py
VALID_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/valid/images
VALID_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/valid/annotations # This can contain .xml or .txt files

# Class names.
CLASSES: [
    '__background__',
    'smoke'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 2

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True
```

***Note that*** *the data and annotations can be in the same directory as well. In that case, the TRAIN_DIR_IMAGES and TRAIN_DIR_LABELS will save the same path. Similarly for VALID images and labels. The `datasets.py` will take care of that*.

Next, to start the training, you can use the following command.

**Command format:**

During training, we need to provide a `--label-type` argument which should be either `yolo` or `pascal_voc` depending on the annotation folder path in the data configuration file above. Default is `pascal_voc`

```bash
python train.py --data <path to the data config YAML file> --epochs 100 --model <model name (defaults to fasterrcnn_resnet50)> --name <folder name inside output/training/> --batch 16 --label-type <pascal_voc or yolo>
```

**In this case, the exact command would be:**

```bash
python train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16 --label-type pascal_voc
```

**The terimal output should be similar to the following:**

```
Number of training samples: 665
Number of validation samples: 72

3,191,405 total parameters.
3,191,405 training parameters.
Epoch     0: adjusting learning rate of group 0 to 1.0000e-03.
Epoch: [0]  [ 0/84]  eta: 0:02:17  lr: 0.000013  loss: 1.6518 (1.6518)  time: 1.6422  data: 0.2176  max mem: 1525
Epoch: [0]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.6540 (1.8020)  time: 0.0769  data: 0.0077  max mem: 1548
Epoch: [0] Total time: 0:00:08 (0.0984 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0928 (0.0928)  evaluator_time: 0.0245 (0.0245)  time: 0.2972  data: 0.1534  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)  time: 0.1652  data: 0.0239  max mem: 1548
Test: Total time: 0:00:01 (0.1691 s / it)
Averaged stats: model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.029
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.074
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.167
SAVING PLOTS COMPLETE...
...
Epoch: [4]  [ 0/84]  eta: 0:00:20  lr: 0.001000  loss: 0.9575 (0.9575)  time: 0.2461  data: 0.1662  max mem: 1548
Epoch: [4]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.1325 (1.1624)  time: 0.0762  data: 0.0078  max mem: 1548
Epoch: [4] Total time: 0:00:06 (0.0801 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0369 (0.0369)  evaluator_time: 0.0237 (0.0237)  time: 0.2494  data: 0.1581  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)  time: 0.1076  data: 0.0271  max mem: 1548
Test: Total time: 0:00:01 (0.1116 s / it)
Averaged stats: model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.137
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.428
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.306
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.140
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
SAVING PLOTS COMPLETE...
```

## Distributed Training

**Training on 2 GPUs**:

```bash
export CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16
```

## Inference

### Image Inference on COCO Pretrained Model

By default using **Faster RCNN ResNet50 FPN V2** model.

```bash
python inference.py
```

Use model of your choice with an image input.

```bash
python inference.py --model fasterrcnn_mobilenetv3_large_fpn --input example_test_data/image_1.jpg
```

### Image Inference in Custom Trained Model

In this case you only need to give the weights file path and input file path. The config file and the model name are optional. If not provided they will will be automatically inferred from the weights file.

```bash
python inference.py --input data/inference_data/image_1.jpg --weights outputs/training/smoke_training/last_model_state.pth
```

### Video Inference on COCO Pretrrained Model

```bash
python inference_video.py
```

### Video Inference in Custom Trained Model

```bash
python inference_video.py --input data/inference_data/video_1.mp4 --weights outputs/training/smoke_training/last_model_state.pth 
```

### Tracking using COCO Pretrained Models

```bash
# Track all COCO classes (Faster RCNN ResNet50 FPN V2).
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show

# Track all COCO classes (Faster RCNN ResNet50 FPN V2) using own video.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_1.mp4

# Tracking only person class (index 1 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1

# Tracking only person and car classes (indices 1 and 3 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1 3

# Tracking using custom trained weights. Just provide the path to the weights instead of model name.
python inference_video.py --track --weights outputs/training/fish_det/best_model.pth --show --input ../inference_data/video_6.mp4
```

## Evaluation

Replace the required arguments according to your need.

```bash
python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4
```

You can use the following command to show a table for **class-wise Average Precision** (`--verbose` additionally needed).

```bash
python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4 --verbose
```

## A List of All Model Flags to Use With the Training Script

The following command expects the `coco` dataset to be present one directory back inside the `input` folder in XML format. You can find the dataset [here on Kaggle](https://www.kaggle.com/datasets/sovitrath/coco-xml-format). Check the `data_configs/coco.yaml` for more details. You can change the relative dataset path in the YAML file according to your structure.

```bash
# Usage 
python train.py --model fasterrcnn_resnet50_fpn_v2 --data data_configs/coco.yaml
```

**OR USE ANY ONE OF THE FOLLOWING**

```
[
    'fasterrcnn_convnext_small',
    'fasterrcnn_convnext_tiny',
    'fasterrcnn_custom_resnet', 
    'fasterrcnn_darknet',
    'fasterrcnn_efficientnet_b0',
    'fasterrcnn_efficientnet_b4',
    'fasterrcnn_mbv3_small_nano_head',
    'fasterrcnn_mbv3_large',
    'fasterrcnn_mini_darknet_nano_head',
    'fasterrcnn_mini_darknet',
    'fasterrcnn_mini_squeezenet1_1_small_head',
    'fasterrcnn_mini_squeezenet1_1_tiny_head',
    'fasterrcnn_mobilenetv3_large_320_fpn', # Torchvision COCO pretrained
    'fasterrcnn_mobilenetv3_large_fpn', # Torchvision COCO pretrained
    'fasterrcnn_nano',
    'fasterrcnn_resnet18',
    'fasterrcnn_resnet50_fpn_v2', # Torchvision COCO pretrained
    'fasterrcnn_resnet50_fpn',  # Torchvision COCO pretrained
    'fasterrcnn_resnet101',
    'fasterrcnn_resnet152',
    'fasterrcnn_squeezenet1_0',
    'fasterrcnn_squeezenet1_1_small_head',
    'fasterrcnn_squeezenet1_1',
    'fasterrcnn_vitdet',
    'fasterrcnn_vitdet_tiny',
    'fasterrcnn_mobilevit_xxs',
    'fasterrcnn_regnet_y_400mf'
]
```

## Tutorials

* [Wheat Detection using Faster RCNN and PyTorch](https://debuggercafe.com/wheat-detection-using-faster-rcnn-and-pytorch/)
* [Plant Disease Detection using the PlantDoc Dataset and PyTorch Faster RCNN](https://debuggercafe.com/plant-disease-detection-using-plantdoc/)
* [Small Scale Traffic Light Detection using PyTorch](https://debuggercafe.com/small-scale-traffic-light-detection/)


================================================
FILE: __init__.py
================================================


================================================
FILE: _config.yml
================================================
theme: jekyll-theme-dinky

================================================
FILE: data/README.md
================================================
# README



**A list of training and inference datasets to try out the custom training on.**

**You can also download the videos/images from [Inference Data](#Inference-Data) section to test out your trained models.**



## Dataset Links 

* Uno Cards.v2-raw.voc: https://public.roboflow.com/object-detection/uno-cards
* Aquarium Combined.v2-raw-1024.voc: https://public.roboflow.com/object-detection/aquarium/2/download/voc
* Pothole.v1-raw.voc: https://public.roboflow.com/object-detection/pothole/1
* `Chess Pieces.v23-raw.voc`: https://public.roboflow.com/object-detection/chess-full/23
* `GTSDB`: https://benchmark.ini.rub.de/gtsdb_news.html
* `PlantDoc.v1-resize-416x416.voc`: https://public.roboflow.com/object-detection/plantdoc.
* `smoke_pascal_voc`: https://www.kaggle.com/didiruh/smoke-pascal-voc.
* `Exclusively Dark (ExDARK) dataset`: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset.
* Pascal VOC 2007 and 2012 dataset: https://pjreddie.com/projects/pascal-voc-dataset-mirror/



## Inference Data

* `inference_data/`
  * `video_1.mp4`: https://www.youtube.com/watch?v=9xj1MX4NJ_k
  * `video_2.mp4`: https://www.youtube.com/watch?v=BQo87tGRM74
  * `video_3.mp4`: https://pixabay.com/videos/chess-chess-pieces-chess-board-19598/
  * `video_4.mp4`: https://www.youtube.com/watch?v=vnqkraiSiTI
  * `video_5.mp4`: https://www.youtube.com/watch?v=yOlYNps3Lq8
  * `video_6.mp4`: Video by Magda Ehlers: https://www.pexels.com/video/marine-life-of-fishes-and-corals-underwater-3765078/.
    * https://www.pexels.com/video/marine-life-of-fishes-and-corals-underwater-3765078/.
  * `video_7.mp4`: Video by Taryn Elliott: https://www.pexels.com/video/tropical-fish-tank-5548128/.
    * https://www.pexels.com/video/tropical-fish-tank-5548128/
  * `video_8.mp4`: Video by Magda Ehlers: https://www.pexels.com/video/a-school-of-fish-swimming-in-an-aquarium-2556513/.
    * https://www.pexels.com/video/a-school-of-fish-swimming-in-an-aquarium-2556513/
  * `image_3.jpg`: Photo by Hung Tran: https://www.pexels.com/photo/school-of-fish-in-water-3699434/.
    * https://www.pexels.com/photo/school-of-fish-in-water-3699434/

================================================
FILE: data_configs/aquarium.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/Aquarium Combined.v2-raw-1024.voc/train'
TRAIN_DIR_LABELS: '../input/Aquarium Combined.v2-raw-1024.voc/train'
VALID_DIR_IMAGES: '../input/Aquarium Combined.v2-raw-1024.voc/valid'
VALID_DIR_LABELS: '../input/Aquarium Combined.v2-raw-1024.voc/valid'
# Optional test data path. If given, test paths (data) will be used in
# `eval.py`.
TEST_DIR_IMAGES: '../input/Aquarium Combined.v2-raw-1024.voc/test'
TEST_DIR_LABELS: '../input/Aquarium Combined.v2-raw-1024.voc/test'

# Class names.
CLASSES: [
    '__background__',
    'fish', 'jellyfish', 'penguin',
    'shark', 'puffin', 'stingray',
    'starfish'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 8

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/aquarium_yolo.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/Fish Detection.v1i.yolov8/train/images'
TRAIN_DIR_LABELS: '../input/Fish Detection.v1i.yolov8/train/labels'
VALID_DIR_IMAGES: '../input/Fish Detection.v1i.yolov8/valid/images'
VALID_DIR_LABELS: '../input/Fish Detection.v1i.yolov8/valid/labels'
# Optional test data path. If given, test paths (data) will be used in
# `eval.py`.
TEST_DIR_IMAGES: '../input/Fish Detection.v1i.yolov8/test/images'
TEST_DIR_LABELS: '../input/Fish Detection.v1i.yolov8/test/labels'

# Class names.
# For YOLO annotations, give the names in order of the data.yaml file here.
CLASSES: [
    '__background__',
    'fish', 'jellyfish', 'penguin', 
    'puffin', 'shark', 'starfish', 'stingray'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 8

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/buggy_data.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/coco2017/train2017'
TRAIN_DIR_LABELS: '../input/coco2017/train_xml'
VALID_DIR_IMAGES: '../input/coco2017/val2017'
VALID_DIR_LABELS: '../input/coco2017/val_xml'

# Class names.
CLASSES: [
    '__background__',
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
    'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
    'hair drier', 'toothbrush'
]

COCO_91_CLASSES: [
    '__background__', 
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 81

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/coco.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/coco2017/train2017'
TRAIN_DIR_LABELS: '../input/coco2017/train_xml'
VALID_DIR_IMAGES: '../input/coco2017/val2017'
VALID_DIR_LABELS: '../input/coco2017/val_xml'

# Class names.
CLASSES: [
    '__background__',
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
    'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
    'hair drier', 'toothbrush'
]

COCO_91_CLASSES: [
    '__background__', 
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 81

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/coco128.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/coco_128/train'
TRAIN_DIR_LABELS: '../input/coco_128/train'
VALID_DIR_IMAGES: '../input/coco_128/valid'
VALID_DIR_LABELS: '../input/coco_128/valid'
TEST_DIR_IMAGES: '../input/coco_128/test' # Optional.
TEST_DIR_LABELS: '../input/coco_128/test' # Optional

# Class names.
CLASSES: [
    '__background__',
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
    'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
    'hair drier', 'toothbrush'
]

COCO_91_CLASSES: [
    '__background__', 
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 81

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/gtsdb.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/GTSDB/train_images'
TRAIN_DIR_LABELS: '../input/GTSDB/train_xmls'
VALID_DIR_IMAGES: '../input/GTSDB/valid_images'
VALID_DIR_LABELS: '../input/GTSDB/valid_xmls'
# Optional test data path. If given, test paths (data) will be used in
# `eval.py`.
# TEST_DIR_IMAGES: 
# TEST_DIR_LABELS: 

# Class names.
CLASSES: [
    '__background__',
    'Speed limit (20km/h)', 'Speed limit (30km/h)', 'Speed limit (50km/h)', 
    'Speed limit (60km/h)', 'Speed limit (70km/h)', 'Speed limit (80km/h)', 
    'End of speed limit (80km/h)', 'Speed limit (100km/h)', 
    'Speed limit (120km/h)', 'No passing', 
    'No passing for vehicles over 3.5 metric tons', 
    'Right-of-way at the next intersection', 'Priority road', 'Yield', 
    'Stop', 'No vehicles', 'Vehicles over 3.5 metric tons prohibited', 
    'No entry', 'General caution', 'Dangerous curve to the left', 
    'Dangerous curve to the right', 'Double curve', 'Bumpy road', 
    'Slippery road', 'Road narrows on the right', 'Road work', 
    'Traffic signals', 'Pedestrians', 'Children crossing', 
    'Bicycles crossing', 'Beware of ice/snow', 'Wild animals crossing', 
    'End of all speed and passing limits', 'Turn right ahead', 
    'Turn left ahead', 'Ahead only', 'Go straight or right', 
    'Go straight or left', 'Keep right', 'Keep left', 'Roundabout mandatory', 
    'End of no passing', 'End of no passing by vehicles over 3.5 metric tons'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 44

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/smoke.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: ../input/smoke_pascal_voc/archive/train/images
TRAIN_DIR_LABELS: ../input/smoke_pascal_voc/archive/train/annotations
VALID_DIR_IMAGES: ../input/smoke_pascal_voc/archive/valid/images
VALID_DIR_LABELS: ../input/smoke_pascal_voc/archive/valid/annotations
# Optional test data path. If given, test paths (data) will be used in
# `eval.py`.
# TEST_DIR_IMAGES: 
# TEST_DIR_LABELS: 

# Class names.
CLASSES: [
    '__background__',
    'smoke'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 2

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

================================================
FILE: data_configs/test_image_config.yaml
================================================
image_path: example_test_data/image_1.jpg
NC: 91
CLASSES: [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


================================================
FILE: data_configs/test_video_config.yaml
================================================
video_path: example_test_data/video_1.mp4
NC: 91
CLASSES: [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


================================================
FILE: data_configs/trash_icra.yaml
================================================
# Link => https://www.kaggle.com/datasets/sovitrath/underwater-trash-detection-icra

# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/trash_icra_xml/images/train'
TRAIN_DIR_LABELS: '../input/trash_icra_xml/labels/train'
VALID_DIR_IMAGES: '../input/trash_icra_xml/images/valid'
VALID_DIR_LABELS: '../input/trash_icra_xml/labels/valid'
# Optional test data path. If given, test paths (data) will be used in
# `eval.py`.
TEST_DIR_IMAGES: '../input/trash_icra_xml/images/test'
TEST_DIR_LABELS: '../input/trash_icra_xml/labels/test'

# Class names.
CLASSES: [
    '__background__',
    'plastic', 'bio', 'rov'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 4

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True


================================================
FILE: data_configs/voc.yaml
================================================
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: '../input/voc_07_12/voc_xml_dataset/train/images'
TRAIN_DIR_LABELS: '../input/voc_07_12/voc_xml_dataset/train/labels'
VALID_DIR_IMAGES: '../input/voc_07_12/voc_xml_dataset/valid/images'
VALID_DIR_LABELS: '../input/voc_07_12/voc_xml_dataset/valid/labels'

# Class names.
CLASSES: [
    '__background__',
    "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
    "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 21

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True


================================================
FILE: datasets.py
================================================
import torch
import cv2
import numpy as np
import os
import glob as glob
import random

from xml.etree import ElementTree as et
from torch.utils.data import Dataset, DataLoader
from utils.transforms import (
    get_train_transform, 
    get_valid_transform,
    get_train_aug,
    transform_mosaic
)
from tqdm.auto import tqdm


# the dataset class
class CustomDataset(Dataset):
    def __init__(
        self, 
        images_path, 
        labels_path, 
        img_size, 
        classes, 
        transforms=None, 
        use_train_aug=False,
        train=False, 
        mosaic=1.0,
        square_training=False,
        label_type='pascal_voc'
    ):
        self.transforms = transforms
        self.use_train_aug = use_train_aug
        self.images_path = images_path
        self.labels_path = labels_path
        self.img_size = img_size
        self.classes = classes
        self.train = train
        self.square_training = square_training
        self.mosaic_border = [-img_size // 2, -img_size // 2]
        self.image_file_types = ['*.jpg', '*.jpeg', '*.png', '*.ppm', '*.JPG']
        self.all_image_paths = []
        self.log_annot_issue_x = True
        self.mosaic = mosaic
        self.log_annot_issue_y = True
        self.label_type = label_type
        
        # get all the image paths in sorted order
        for file_type in self.image_file_types:
            self.all_image_paths.extend(glob.glob(os.path.join(self.images_path, file_type)))
        self.all_annot_paths = glob.glob(os.path.join(self.labels_path, '*.xml'))
        self.all_images = [image_path.split(os.path.sep)[-1] for image_path in self.all_image_paths]
        self.all_images = sorted(self.all_images)
        # Remove all annotations and images when no object is present.
        if self.label_type == 'pascal_voc':
            self.read_and_clean()

    def read_and_clean(self):
        print('Checking Labels and images...')
        images_to_remove = []
        problematic_images = []

        for image_name in tqdm(self.all_images, total=len(self.all_images)):
            possible_annot_name = os.path.join(self.labels_path, os.path.splitext(image_name)[0]+'.xml')
            if possible_annot_name not in self.all_annot_paths:
                print(f"⚠️ {possible_annot_name} not found... Removing {image_name}")
                images_to_remove.append(image_name)
                continue

            # Check for invalid bounding boxes
            tree = et.parse(possible_annot_name)
            root = tree.getroot()
            invalid_bbox = False

            for member in root.findall('object'):
                xmin = float(member.find('bndbox').find('xmin').text)
                xmax = float(member.find('bndbox').find('xmax').text)
                ymin = float(member.find('bndbox').find('ymin').text)
                ymax = float(member.find('bndbox').find('ymax').text)

                if xmin >= xmax or ymin >= ymax:
                    invalid_bbox = True
                    break

            if invalid_bbox:
                problematic_images.append(image_name)
                images_to_remove.append(image_name)

        # Remove problematic images and their annotations
        self.all_images = [img for img in self.all_images if img not in images_to_remove]
        self.all_annot_paths = [
            path for path in self.all_annot_paths 
            if not any(os.path.splitext(os.path.basename(path))[0] + ext in images_to_remove 
                       for ext in self.image_file_types)
        ]

        # Print warnings for problematic images
        if problematic_images:
            print("\n⚠️ The following images have invalid bounding boxes and will be removed:")
            for img in problematic_images:
                print(f"⚠️ {img}")

        print(f"Removed {len(images_to_remove)} problematic images and annotations.")

    def resize(self, im, square=False):
        if square:
            im = cv2.resize(im, (self.img_size, self.img_size))
        else:
            h0, w0 = im.shape[:2]  # orig hw
            r = self.img_size / max(h0, w0)  # ratio
            if r != 1:  # if sizes are not equal
                im = cv2.resize(im, (int(w0 * r), int(h0 * r)))
        return im

    def load_image_and_labels(self, index):
        image_name = self.all_images[index]
        image_path = os.path.join(self.images_path, image_name)

        # Read the image.
        image = cv2.imread(image_path)
        # Convert BGR to RGB color format.
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
        image_resized = self.resize(image, square=self.square_training)
        image_resized /= 255.0
        
        if self.label_type == 'pascal_voc':
            image, image_resized, orig_boxes, \
            boxes, labels, area, iscrowd, (image_width, image_height) \
            = self.load_pascal_voc(image, image_name, image_resized)

        if self.label_type == 'yolo':
            image, image_resized, orig_boxes, \
            boxes, labels, area, iscrowd, (image_width, image_height) \
            = self.load_yolo(image, image_name, image_resized)
        
        return image, image_resized, orig_boxes, \
            boxes, labels, area, iscrowd, (image_width, image_height)
    
    def load_pascal_voc(self, image, image_name, image_resized):
        # Capture the corresponding XML file for getting the annotations.
        annot_filename = os.path.splitext(image_name)[0] + '.xml'
        annot_file_path = os.path.join(self.labels_path, annot_filename)

        boxes = []
        orig_boxes = []
        labels = []
        
        # Get the height and width of the image.
        image_width = image.shape[1]
        image_height = image.shape[0]
                
        # Box coordinates for xml files are extracted and corrected for image size given.
        # try:
        tree = et.parse(annot_file_path)
        root = tree.getroot()
        for member in root.findall('object'):
            # Map the current object name to `classes` list to get
            # the label index and append to `labels` list.
            labels.append(self.classes.index(member.find('name').text))
            
            # xmin = left corner x-coordinates
            xmin = float(member.find('bndbox').find('xmin').text)
            # xmax = right corner x-coordinates
            xmax = float(member.find('bndbox').find('xmax').text)
            # ymin = left corner y-coordinates
            ymin = float(member.find('bndbox').find('ymin').text)
            # ymax = right corner y-coordinates
            ymax = float(member.find('bndbox').find('ymax').text)

            xmin, ymin, xmax, ymax = self.check_image_and_annotation(
                xmin, 
                ymin, 
                xmax, 
                ymax, 
                image_width, 
                image_height, 
                orig_data=True
            )

            orig_boxes.append([xmin, ymin, xmax, ymax])
            
            # Resize the bounding boxes according to the
            # desired `width`, `height`.
            xmin_final = (xmin/image_width)*image_resized.shape[1]
            xmax_final = (xmax/image_width)*image_resized.shape[1]
            ymin_final = (ymin/image_height)*image_resized.shape[0]
            ymax_final = (ymax/image_height)*image_resized.shape[0]

            xmin_final, ymin_final, xmax_final, ymax_final = self.check_image_and_annotation(
                xmin_final, 
                ymin_final, 
                xmax_final, 
                ymax_final, 
                image_resized.shape[1], 
                image_resized.shape[0],
                orig_data=False
            )
            
            boxes.append([xmin_final, ymin_final, xmax_final, ymax_final])
        # except:
        #     pass
        # Bounding box to tensor.
        boxes_length = len(boxes)
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # Area of the bounding boxes.
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) if boxes_length > 0 else torch.as_tensor(boxes, dtype=torch.float32)
        # No crowd instances.
        iscrowd = torch.zeros((boxes.shape[0],), dtype=torch.int64) if boxes_length > 0 else torch.as_tensor(boxes, dtype=torch.float32)
        # Labels to tensor.
        labels = torch.as_tensor(labels, dtype=torch.int64)

        return image, image_resized, orig_boxes, \
            boxes, labels, area, iscrowd, (image_width, image_height)
    
    def load_yolo(self, image, image_name, image_resized):
        # Capture the corresponding text file for getting the annotations.
        annot_filename = os.path.splitext(image_name)[0] + '.txt'
        annot_file_path = os.path.join(self.labels_path, annot_filename)

        boxes = []
        orig_boxes = []
        labels = []
        
        # Get the height and width of the image.
        image_width = image.shape[1]
        image_height = image.shape[0]

        with open(annot_file_path, 'r') as f:
            annot_file_content = f.readlines()
            f.close()

        for line in annot_file_content:
            label, norm_xc, norm_yc, norm_w, norm_h = line.split()
            label, norm_xc, norm_yc, norm_w, norm_h = \
                int(label), float(norm_xc), float(norm_yc), float(norm_w), float(norm_h)

            labels.append(label + 1)
            xc, w = norm_xc * image_width, norm_w * image_width 
            yc, h = norm_yc * image_height, norm_h * image_height

            xmin = xc - (w / 2)
            ymin = yc - (h / 2)
            xmax = xmin + w
            ymax = ymin + h

            xmin, ymin, xmax, ymax = self.check_image_and_annotation(
                xmin, 
                ymin, 
                xmax, 
                ymax, 
                image_width, 
                image_height, 
                orig_data=True
            )

            orig_boxes.append([xmin, ymin, xmax, ymax])

            # Resize the bounding boxes according to the
            # desired `width`, `height`.
            xmin_final = (xmin/image_width)*image_resized.shape[1]
            xmax_final = (xmax/image_width)*image_resized.shape[1]
            ymin_final = (ymin/image_height)*image_resized.shape[0]
            ymax_final = (ymax/image_height)*image_resized.shape[0]

            xmin_final, ymin_final, xmax_final, ymax_final = self.check_image_and_annotation(
                xmin_final, 
                ymin_final, 
                xmax_final, 
                ymax_final, 
                image_resized.shape[1], 
                image_resized.shape[0],
                orig_data=False
            )
            
            boxes.append([xmin_final, ymin_final, xmax_final, ymax_final])

        # Bounding box to tensor.
        boxes_length = len(boxes)
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # Area of the bounding boxes.
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) if boxes_length > 0 else torch.as_tensor(boxes, dtype=torch.float32)
        # No crowd instances.
        iscrowd = torch.zeros((boxes.shape[0],), dtype=torch.int64) if boxes_length > 0 else torch.as_tensor(boxes, dtype=torch.float32)
        # Labels to tensor.
        labels = torch.as_tensor(labels, dtype=torch.int64)

        return image, image_resized, orig_boxes, \
            boxes, labels, area, iscrowd, (image_width, image_height)

    def check_image_and_annotation(
        self, 
        xmin, 
        ymin, 
        xmax, 
        ymax, 
        width, 
        height, 
        orig_data=False
    ):
        """
        Check that all x_max and y_max are not more than the image
        width or height.
        """
        if ymax > height:
            ymax = height
        if xmax > width:
            xmax = width
        if xmax - xmin <= 1.0:
            if orig_data:
                # print(
                    # '\n',
                    # '!!! xmax is equal to xmin in data annotations !!!'
                    # 'Please check data'
                # )
                # print(
                    # 'Increasing xmax by 1 pixel to continue training for now...',
                    # 'THIS WILL ONLY BE LOGGED ONCE',
                    # '\n'
                # )
                self.log_annot_issue_x = False
            xmin = xmin - 1
        if ymax - ymin <= 1.0:
            if orig_data:
                # print(
                #     '\n',
                #     '!!! ymax is equal to ymin in data annotations !!!',
                #     'Please check data'
                # )
                # print(
                #     'Increasing ymax by 1 pixel to continue training for now...',
                #     'THIS WILL ONLY BE LOGGED ONCE',
                #     '\n'
                # )
                self.log_annot_issue_y = False
            ymin = ymin - 1
        return xmin, ymin, xmax, ymax


    def load_cutmix_image_and_boxes(self, index, resize_factor=512):
        """ 
        Adapted from: https://www.kaggle.com/shonenkov/oof-evaluation-mixup-efficientdet
        """
        s = self.img_size
        yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y
        indices = [index] + [random.randint(0, len(self.all_images) - 1) for _ in range(3)]

        # Create empty image with the above resized image.
        # result_image = np.full((h, w, 3), 1, dtype=np.float32)
        result_boxes = []
        result_classes = []

        for i, index in enumerate(indices):
            _, image_resized, orig_boxes, boxes, \
            labels, area, iscrowd, dims = self.load_image_and_labels(
                index=index
            )

            h, w = image_resized.shape[:2]

            if i == 0:
                # Create empty image with the above resized image.
                result_image = np.full((s * 2, s * 2, image_resized.shape[2]), 114/255, dtype=np.float32)  # base image with 4 tiles
                x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
                x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
            result_image[y1a:y2a, x1a:x2a] = image_resized[y1b:y2b, x1b:x2b]
            padw = x1a - x1b
            padh = y1a - y1b

            if len(orig_boxes) > 0:
                boxes[:, 0] += padw
                boxes[:, 1] += padh
                boxes[:, 2] += padw
                boxes[:, 3] += padh

                result_boxes.append(boxes)
                result_classes += labels

        final_classes = []
        if len(result_boxes) > 0:
            result_boxes = np.concatenate(result_boxes, 0)
            np.clip(result_boxes[:, 0:], 0, 2 * s, out=result_boxes[:, 0:])
            result_boxes = result_boxes.astype(np.int32)
            for idx in range(len(result_boxes)):
                if ((result_boxes[idx, 2] - result_boxes[idx, 0]) * (result_boxes[idx, 3] - result_boxes[idx, 1])) > 0:
                    final_classes.append(result_classes[idx])
            result_boxes = result_boxes[
                np.where((result_boxes[:, 2] - result_boxes[:, 0]) * (result_boxes[:, 3] - result_boxes[:, 1]) > 0)
            ]
        # Resize the mosaic image to the desired shape and transform boxes.
        result_image, result_boxes = transform_mosaic(
            result_image, result_boxes, self.img_size
        )
        return result_image, torch.tensor(result_boxes), \
            torch.tensor(np.array(final_classes)), area, iscrowd, dims

    def __getitem__(self, idx):
        if not self.train: # No mosaic during validation.
            image, image_resized, orig_boxes, boxes, \
                labels, area, iscrowd, dims = self.load_image_and_labels(
                index=idx
            )

        if self.train: 
            mosaic_prob = random.uniform(0.0, 1.0)
            if self.mosaic >= mosaic_prob:
                image_resized, boxes, labels, \
                    area, iscrowd, dims = self.load_cutmix_image_and_boxes(
                    idx, resize_factor=(self.img_size, self.img_size)
                )
            else:
                image, image_resized, orig_boxes, boxes, \
                    labels, area, iscrowd, dims = self.load_image_and_labels(
                    index=idx
                )

        # Prepare the final `target` dictionary.
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["area"] = area
        target["iscrowd"] = iscrowd
        image_id = torch.tensor([idx])
        target["image_id"] = image_id

        # Before transformation
        labels = labels.cpu().numpy().tolist()  # Convert tensor to list
        bboxes = target['boxes'].cpu().numpy().tolist()

        if self.use_train_aug: # Use train augmentation if argument is passed.
            train_aug = get_train_aug()
            sample = train_aug(image=image_resized,
                                     bboxes=target['boxes'],
                                     labels=labels)
            image_resized = sample['image']
            target['boxes'] = torch.Tensor(sample['bboxes']).to(torch.int64)
        else:
            sample = self.transforms(image=image_resized,
                                     bboxes=target['boxes'],
                                     labels=labels)
            image_resized = sample['image']
            target['boxes'] = torch.Tensor(sample['bboxes']).to(torch.int64)

        # Fix to enable training without target bounding boxes,
        # see https://discuss.pytorch.org/t/fasterrcnn-images-with-no-objects-present-cause-an-error/117974/4
        if np.isnan((target['boxes']).numpy()).any() or target['boxes'].shape == torch.Size([0]):
            target['boxes'] = torch.zeros((0, 4), dtype=torch.int64)
        return image_resized, target

    def __len__(self):
        return len(self.all_images)

def collate_fn(batch):
    """
    To handle the data loading as different images may have different number 
    of objects and to handle varying size tensors as well.
    """
    return tuple(zip(*batch))

# Prepare the final datasets and data loaders.
def create_train_dataset(
    train_dir_images, 
    train_dir_labels, 
    img_size, 
    classes,
    use_train_aug=False,
    mosaic=1.0,
    square_training=False,
    label_type='pascal_voc'
):
    train_dataset = CustomDataset(
        train_dir_images, 
        train_dir_labels,
        img_size, 
        classes, 
        get_train_transform(),
        use_train_aug=use_train_aug,
        train=True, 
        mosaic=mosaic,
        square_training=square_training,
        label_type=label_type
    )
    return train_dataset
def create_valid_dataset(
    valid_dir_images, 
    valid_dir_labels, 
    img_size, 
    classes,
    square_training=False,
    label_type='pascal_voc'
):
    valid_dataset = CustomDataset(
        valid_dir_images, 
        valid_dir_labels, 
        img_size, 
        classes, 
        get_valid_transform(),
        train=False, 
        square_training=square_training,
        label_type=label_type
    )
    return valid_dataset

def create_train_loader(
    train_dataset, batch_size, num_workers=0, batch_sampler=None
):
    train_loader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        # shuffle=True,
        num_workers=num_workers,
        collate_fn=collate_fn,
        sampler=batch_sampler
    )
    return train_loader

def create_valid_loader(
    valid_dataset, batch_size, num_workers=0, batch_sampler=None
):
    valid_loader = DataLoader(
        valid_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        collate_fn=collate_fn,
        sampler=batch_sampler
    )
    return valid_loader

================================================
FILE: docs/upcoming_updates.md
================================================
# Upcoming Updates



## Code Updates

- [x] Proper resuming of training for plots, epochs, with loaded optimizer state dict of provided weights path etc.
- [x] Saving logs to Weights&Biases.
- [x] Saving model to Weights&Biases.
- [ ] Adding plots to show class distribution.
- [ ] Conversion to TensorFlow => TFLite, ...
- [ ] Conversion to ONNX. 
- [ ] Example notebooks for writing custom backbones.
- [x] Default training size of 640x640.

## Model / Weights Related

- [ ] Releases for pretrained models with mosaic augmentations for
  - [ ] FasterRCNN ResNet18.
  - [ ] Mini Darknet Mini Head, Squeezenet1_1 Mini Head.
  - [ ] Mini Squeezenet1_1 Mini Head
  - [ ] Mini Squeezenet1_1 Tiny Head
  
- [ ] Adding pretrained models for industrial dataset/real-world datasets (Do we need to pretrain on COCO first?)
  - [ ] NuScenes/NuImages
  - [ ] IDD
  - [ ] Manga109
- [ ] Saving FP16 and INT8 weights.



================================================
FILE: docs/updates.md
================================================
# Updates

## 2023-8-18

* Filter classes to visualize during inference using the `--classes` command line argument with space separated class indices from the dataset YAML file. 

  For example, to visualize only persons in COCO dataset, use,  `python inference.py --classes 1 <rest of the command>`

  To visualize person and car, use, `python inference.py --classes 1 3 <rest of the command>`

* Added Deep SORT Real-Time tracking to `inference_video.py` and `onnx_video_inference.py`. Using `--track` command with the usual inference command. Support for **MobileNet** Re-ID for now.

## 2023-02-02

* New DenseNet backbones.
* Mosaic augmentation updated to be Ultralytics/YOLOv5/YOLOv8 like.
* Updated augmentations regime for better training results

## 2022-10-02

* Released a Mini Darknet Nano Head model pretrained on the Pascal VOC model for 600 epochs. [Find the release details here](https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline/releases/tag/Latest).

## 2022-09-09

* Can load COCO/Pascal VOC pretrained weights for transfer learning/fine tuning using the `--weights` flag and providing the path to the weights file.
* Resume training by providing the path to the previous trained weights using the `--weights` flag and `--resume-training` flag to continue from previous plots and load the optimizer state dictionary as well. Here, `--weights` will take the path to the `last_model.pth` and not `best_model.pth` or `last_model_state.pth`. The latter two models store the model weights dictionary only.
* Weights and Biases logging possible now.
* Default training resolution now 640x640. Provides slightly better results.

================================================
FILE: eval.py
================================================
"""
Run evaluation on a trained model to get mAP and class wise AP.

USAGE:
python eval.py --data data_configs/voc.yaml --weights outputs/training/fasterrcnn_convnext_small_voc_15e_noaug/best_model.pth --model fasterrcnn_convnext_small
"""
from datasets import (
    create_valid_dataset, create_valid_loader
)
from models.create_fasterrcnn_model import create_model
from torch_utils import utils
from torchmetrics.detection.mean_ap import MeanAveragePrecision
from pprint import pprint
from tqdm import tqdm

import torch
import argparse
import yaml
import torchvision
import time
import numpy as np

torch.multiprocessing.set_sharing_strategy('file_system')

if __name__ == '__main__':
    # Construct the argument parser.
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--data', 
        default='data_configs/test_image_config.yaml',
        help='(optional) path to the data config file'
    )
    parser.add_argument(
        '-m', '--model', 
        default='fasterrcnn_resnet50_fpn',
        help='name of the model'
    )
    parser.add_argument(
        '-mw', '--weights', 
        default=None,
        help='path to trained checkpoint weights if providing custom YAML file'
    )
    parser.add_argument(
        '-ims', '--imgsz', 
        default=640, 
        type=int, 
        help='image size to feed to the network'
    )
    parser.add_argument(
        '-w', '--workers', default=4, type=int,
        help='number of workers for data processing/transforms/augmentations'
    )
    parser.add_argument(
        '-b', '--batch', 
        default=8, 
        type=int, 
        help='batch size to load the data'
    )
    parser.add_argument(
        '-d', '--device', 
        default=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'),
        help='computation/training device, default is GPU if GPU present'
    )
    parser.add_argument(
        '-v', '--verbose',
        action='store_true',
        help='show class-wise mAP'
    )
    parser.add_argument(
        '-st', '--square-training',
        dest='square_training',
        action='store_true',
        help='Resize images to square shape instead of aspect ratio resizing \
              for single image training. For mosaic training, this resizes \
              single images to square shape first then puts them on a \
              square canvas.'
    )
    args = vars(parser.parse_args())

    # Load the data configurations
    with open(args['data']) as file:
        data_configs = yaml.safe_load(file)

    # Validation settings and constants.
    try: # Use test images if present.
        VALID_DIR_IMAGES = data_configs['TEST_DIR_IMAGES']
        VALID_DIR_LABELS = data_configs['TEST_DIR_LABELS']
    except: # Else use the validation images.
        VALID_DIR_IMAGES = data_configs['VALID_DIR_IMAGES']
        VALID_DIR_LABELS = data_configs['VALID_DIR_LABELS']
    NUM_CLASSES = data_configs['NC']
    CLASSES = data_configs['CLASSES']
    NUM_WORKERS = args['workers']
    DEVICE = args['device']
    BATCH_SIZE = args['batch']

    # Model configurations
    IMAGE_SIZE = args['imgsz']

    # Load the pretrained model
    create_model = create_model[args['model']]
    if args['weights'] is None:
        try:
            model, coco_model = create_model(num_classes=NUM_CLASSES, coco_model=True)
        except:
            model = create_model(num_classes=NUM_CLASSES, coco_model=True)
        if coco_model:
            COCO_91_CLASSES = data_configs['COCO_91_CLASSES']
            valid_dataset = create_valid_dataset(
                VALID_DIR_IMAGES, 
                VALID_DIR_LABELS, 
                IMAGE_SIZE, 
                COCO_91_CLASSES, 
                square_training=args['square_training']
            )

    # Load weights.
    if args['weights'] is not None:
        model = create_model(num_classes=NUM_CLASSES, coco_model=False)
        checkpoint = torch.load(args['weights'], map_location=DEVICE)
        model.load_state_dict(checkpoint['model_state_dict'])
        valid_dataset = create_valid_dataset(
            VALID_DIR_IMAGES, 
            VALID_DIR_LABELS, 
            IMAGE_SIZE, 
            CLASSES,
            square_training=args['square_training']
        )
    model.to(DEVICE).eval()
    
    valid_loader = create_valid_loader(valid_dataset, BATCH_SIZE, NUM_WORKERS)

    @torch.inference_mode()
    def evaluate(
        model, 
        data_loader, 
        device, 
        out_dir=None,
        classes=None,
        colors=None
    ):
        metric = MeanAveragePrecision(class_metrics=args['verbose'])
        n_threads = torch.get_num_threads()
        # FIXME remove this and make paste_masks_in_image run on the GPU
        torch.set_num_threads(1)
        cpu_device = torch.device("cpu")
        model.eval()
        metric_logger = utils.MetricLogger(delimiter="  ")
        header = "Test:"

        target = []
        preds = []
        counter = 0
        for images, targets in tqdm(metric_logger.log_every(data_loader, 100, header), total=len(data_loader)):
            counter += 1
            images = list(img.to(device) for img in images)

            if torch.cuda.is_available():
                torch.cuda.synchronize()
            model_time = time.time()
            with torch.no_grad():
                outputs = model(images)

            #####################################
            for i in range(len(images)):
                true_dict = dict()
                preds_dict = dict()
                true_dict['boxes'] = targets[i]['boxes'].detach().cpu()
                true_dict['labels'] = targets[i]['labels'].detach().cpu()
                preds_dict['boxes'] = outputs[i]['boxes'].detach().cpu()
                preds_dict['scores'] = outputs[i]['scores'].detach().cpu()
                preds_dict['labels'] = outputs[i]['labels'].detach().cpu()
                preds.append(preds_dict)
                target.append(true_dict)
            #####################################
            outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]

        # gather the stats from all processes
        metric_logger.synchronize_between_processes()
        torch.set_num_threads(n_threads)
        metric.update(preds, target)
        metric_summary = metric.compute()
        return metric_summary

    stats = evaluate(
        model, 
        valid_loader, 
        device=DEVICE,
        classes=CLASSES,
    )

    print('\n')
    pprint(stats)
    if args['verbose']:
        print('\n')
        pprint(f"Classes: {CLASSES}")
        print('\n')
        print('AP / AR per class')
        empty_string = ''
        if len(CLASSES) > 2: 
            num_hyphens = 73
            print('-'*num_hyphens)
            print(f"|    | Class{empty_string:<16}| AP{empty_string:<18}| AR{empty_string:<18}|")
            print('-'*num_hyphens)
            class_counter = 0
            for i in range(0, len(CLASSES)-1, 1):
                class_counter += 1
                print(f"|{class_counter:<3} | {CLASSES[i+1]:<20} | {np.array(stats['map_per_class'][i]):.3f}{empty_string:<15}| {np.array(stats['mar_100_per_class'][i]):.3f}{empty_string:<15}|")
            print('-'*num_hyphens)
            print(f"|Avg{empty_string:<23} | {np.array(stats['map']):.3f}{empty_string:<15}| {np.array(stats['mar_100']):.3f}{empty_string:<15}|")
        else:
            num_hyphens = 62
            print('-'*num_hyphens)
            print(f"|Class{empty_string:<10} | AP{empty_string:<18}| AR{empty_string:<18}|")
            print('-'*num_hyphens)
            print(f"|{CLASSES[1]:<15} | {np.array(stats['map']):.3f}{empty_string:<15}| {np.array(stats['mar_100']):.3f}{empty_string:<15}|")
            print('-'*num_hyphens)
            print(f"|Avg{empty_string:<12} | {np.array(stats['map']):.3f}{empty_string:<15}| {np.array(stats['mar_100']):.3f}{empty_string:<15}|")

================================================
FILE: example_test_data/README.md
================================================
# README



## Image / Video Credits and Attributions

* `image_1.jpg`: Image by <a href="https://pixabay.com/users/xdigitalphotos-24390146/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=6810885">Luu Do</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=6810885">Pixabay</a>.
  * https://pixabay.com/photos/car-traffic-city-city-life-road-6810885/
* `image_2.jpg`: Image by <a href="https://pixabay.com/users/publicdomainpictures-14/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=84665">PublicDomainPictures</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=84665">Pixabay</a>.
  * https://pixabay.com/photos/birds-flying-sky-orange-sky-dusk-84665/
* `video_1.mp4`: Video by <a href="https://pixabay.com/users/coverr-free-footage-1281706/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5638">Coverr-Free-Footage</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5638">Pixabay</a>.
  * https://pixabay.com/videos/scooters-traffic-street-motorcycle-5638/

================================================
FILE: export.py
================================================
"""
Export to ONNX.

Requirements:
pip install onnx onnxruntime

USAGE:
python export.py --weights outputs/training/fasterrcnn_resnet18_train/best_model.pth --data data_configs/coco.yaml --out model.onnx
"""

import torch
import argparse
import yaml
import os

from models.create_fasterrcnn_model import create_model

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '-w', '--weights', 
        default=None,
        help='path to trained checkpoint weights if providing custom YAML file'
    )
    parser.add_argument(
        '-d', '--device', 
        default=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'),
        help='computation/training device, default is GPU if GPU present'
    )
    parser.add_argument(
        '--data', 
        default=None,
        help='(optional) path to the data config file'
    )
    parser.add_argument(
        '--out',
        help='output model name, e.g. model.onnx',
        required=True, 
        type=str
    )
    parser.add_argument(
        '--width',
        default=640,
        type=int,
        help='onnx model input width'
    )
    parser.add_argument(
        '--height',
        default=640,
        type=int,
        help='onnx model input height'
    )
    args = vars(parser.parse_args())
    return args

def main(args):
    OUT_DIR = 'weights'
    if not os.path.exists(OUT_DIR):
        os.makedirs(OUT_DIR)
    # Load the data configurations.
    data_configs = None
    if args['data'] is not None:
        with open(args['data']) as file:
            data_configs = yaml.safe_load(file)
        NUM_CLASSES = data_configs['NC']
        CLASSES = data_configs['CLASSES']
    DEVICE = args['device']
    # Load weights if path provided.
    checkpoint = torch.load(args['weights'], map_location=DEVICE)
    # If config file is not given, load from model dictionary.    
    if data_configs is None:
        data_configs = True
        NUM_CLASSES = checkpoint['data']['NC']
    try:
        print('Building from model name arguments...')
        build_model = create_model[str(args['model'])]
    except:
        build_model = create_model[checkpoint['model_name']]
    model = build_model(num_classes=NUM_CLASSES, coco_model=False)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()

    # Input to the model
    x = torch.randn(1, 3, args['height'], args['width'])

    # Export the model
    torch.onnx.export(
        model,
        x,
        os.path.join(OUT_DIR, args['out']),
        export_params=True,
        opset_version=11,
        do_constant_folding=True,
        input_names=['input'],
        output_names = ['output'],
        dynamic_axes={
            'input': {0: 'batch_size', 2: 'height', 3: 'width'},
            'output' : {0 : 'batch_size'}
        }
    )
    print(f"Model saved to {os.path.join(OUT_DIR, args['out'])}")

if __name__ == '__main__':
    args = parse_opt()
    main(args)

================================================
FILE: inference.py
================================================
import numpy as np
import cv2
import pandas.io.common
import torch
import glob as glob
import os
import time
import argparse
import yaml
import matplotlib.pyplot as plt
import pandas

from models.create_fasterrcnn_model import create_model
from utils.annotations import (
    inference_annotations, convert_detections
)
from utils.general import set_infer_dir
from utils.transforms import infer_transforms, resize
from utils.logging import LogJSON

def collect_all_images(dir_test):
    """
    Function to return a list of image paths.

    :param dir_test: Directory containing images or single image path.

    Returns:
        test_images: List containing all image paths.
    """
    test_images = []
    if os.path.isdir(dir_test):
        image_file_types = ['*.jpg', '*.jpeg', '*.png', '*.ppm']
        for file_type in image_file_types:
            test_images.extend(glob.glob(f"{dir_test}/{file_type}"))
    else:
        test_images.append(dir_test)
    return test_images    

def parse_opt():
    # Construct the argument parser.
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '-i', '--input', 
        help='folder path to input input image (one image or a folder path)',
    )
    parser.add_argument(
        '-o', '--output',
        default=None, 
        help='folder path to output data',
    )
    parser.add_argument(
        '--data', 
        default=None,
        help='(optional) path to the data config file'
    )
    parser.add_argument(
        '-m', '--model', 
        default=None,
        help='name of the model'
    )
    parser.add_argument(
        '-w', '--weights', 
        default=None,
        help='path to trained checkpoint weights if providing custom YAML file'
    )
    parser.add_argument(
        '-th', '--threshold', 
        default=0.3, 
        type=float,
        help='detection threshold'
    )
    parser.add_argument(
        '-si', '--show',  
        action='store_true',
        help='visualize output only if this argument is passed'
    )
    parser.add_argument(
        '-mpl', '--mpl-show', 
        dest='mpl_show', 
        action='store_true',
        help='visualize using matplotlib, helpful in notebooks'
    )
    parser.add_argument(
        '-d', '--device', 
        default=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'),
        help='computation/training device, default is GPU if GPU present'
    )
    parser.add_argument(
        '-ims', '--imgsz', 
        default=None,
        type=int,
        help='resize image to, by default use the original frame/image size'
    )
    parser.add_argument(
        '-nlb', '--no-labels',
        dest='no_labels',
        action='store_true',
        help='do not show labels during on top of bounding boxes'
    )
    parser.add_argument(
        '--square-img',
        dest='square_img',
        action='store_true',
        help='whether to use square image resize, else use aspect ratio resize'
    )
    parser.add_argument(
        '--classes',
        nargs='+',
        type=int,
        default=None,
        help='filter classes by visualization, --classes 1 2 3'
    )
    parser.add_argument(
        '--track',
        action='store_true'
    )
    parser.add_argument(
        '--log-json',
        dest='log_json',
        action='store_true',
        help='store a json log file in COCO format in the output directory'
    )
    parser.add_argument(
        '-t', '--table', 
        dest='table', 
        action='store_true',
        help='outputs a csv file with a table summarizing the predicted boxes'
    )
    args = vars(parser.parse_args())
    return args

def main(args):
    # For same annotation colors each time.
    np.random.seed(42)

    # Load the data configurations.
    data_configs = None
    if args['data'] is not None:
        with open(args['data']) as file:
            data_configs = yaml.safe_load(file)
        NUM_CLASSES = data_configs['NC']
        CLASSES = data_configs['CLASSES']

    DEVICE = args['device']
    if args['output'] is not None:
        OUT_DIR = args['output']
        if not os.path.exists(OUT_DIR):
            os.makedirs(OUT_DIR)
    else:
        OUT_DIR=set_infer_dir() 

    # Load the pretrained model
    if args['weights'] is None:
        # If the config file is still None, 
        # then load the default one for COCO.
        if data_configs is None:
            with open(os.path.join('data_configs', 'test_image_config.yaml')) as file:
                data_configs = yaml.safe_load(file)
            NUM_CLASSES = data_configs['NC']
            CLASSES = data_configs['CLASSES']
        try:
            build_model = create_model[args['model']]
            model, coco_model = build_model(num_classes=NUM_CLASSES, coco_model=True)
        except:
            build_model = create_model['fasterrcnn_resnet50_fpn_v2']
            model, coco_model = build_model(num_classes=NUM_CLASSES, coco_model=True)
    # Load weights if path provided.
    if args['weights'] is not None:
        checkpoint = torch.load(args['weights'], map_location=DEVICE)
        # If config file is not given, load from model dictionary.
        if data_configs is None:
            data_configs = True
            NUM_CLASSES = checkpoint['data']['NC']
            CLASSES = checkpoint['data']['CLASSES']
        try:
            print('Building from model name arguments...')
            build_model = create_model[str(args['model'])]
        except:
            build_model = create_model[checkpoint['model_name']]
        model = build_model(num_classes=NUM_CLASSES, coco_model=False)
        model.load_state_dict(checkpoint['model_state_dict'])
    model.to(DEVICE).eval()

    COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
    if args['input'] == None:
        DIR_TEST = data_configs['image_path']
        test_images = collect_all_images(DIR_TEST)
    else:
        DIR_TEST = args['input']
        test_images = collect_all_images(DIR_TEST)
    print(f"Test instances: {len(test_images)}")

    # Define the detection threshold any detection having
    # score below this will be discarded.
    detection_threshold = args['threshold']

    # Define dictionary to collect boxes detected in each file 
    pred_boxes = {}
    box_id = 1

    if args['log_json']:
        log_json = LogJSON(os.path.join(OUT_DIR, 'log.json'))

    # To count the total number of frames iterated through.
    frame_count = 0
    # To keep adding the frames' FPS.
    total_fps = 0
    for i in range(len(test_images)):
        # Get the image file name for saving output later on.
        image_name = test_images[i].split(os.path.sep)[-1].split('.')[0]
        orig_image = cv2.imread(test_images[i])
        frame_height, frame_width, _ = orig_image.shape
        if args['imgsz'] != None:
            RESIZE_TO = args['imgsz']
        else:
            RESIZE_TO = frame_width
        # orig_image = image.copy()
        image_resized = resize(
            orig_image, RESIZE_TO, square=args['square_img']
        )
        image = image_resized.copy()
        # BGR to RGB
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = infer_transforms(image)
        # Add batch dimension.
        image = torch.unsqueeze(image, 0)
        start_time = time.time()
        with torch.no_grad():
            outputs = model(image.to(DEVICE))
        end_time = time.time()

        # Get the current fps.
        fps = 1 / (end_time - start_time)
        # Add `fps` to `total_fps`.
        total_fps += fps
        # Increment frame count.
        frame_count += 1
        # Load all detection to CPU for further operations.
        outputs = [{k: v.to('cpu') for k, v in t.items()} for t in outputs]

        # Carry further only if there are detected boxes.
        if len(outputs[0]['boxes']) != 0:
            draw_boxes, pred_classes, scores, labels = convert_detections(
                outputs, detection_threshold, CLASSES, args
            )
            orig_image = inference_annotations(
                draw_boxes, 
                pred_classes, 
                scores,
                CLASSES,
                COLORS, 
                orig_image, 
                image_resized,
                args
            )

            if args['show']:
                cv2.imshow('Prediction', orig_image)
                cv2.waitKey(1)
            if args['mpl_show']:
                plt.imshow(orig_image[:, :, ::-1])
                plt.axis('off')
                plt.show()

            if args['table']:
                for box, label in zip(draw_boxes, pred_classes):
                    xmin, ymin, xmax, ymax = box
                    width = xmax - xmin
                    height = ymax - ymin

                    pred_boxes[box_id] = {
                        "image": image_name,
                        "label": str(label),
                        "xmin": xmin,
                        "xmax": xmax,
                        "ymin": ymin,
                        "ymax": ymax,
                        "width": width,
                        "height": height,
                        "area": width * height
                    }                    
                    box_id = box_id + 1

                df = pandas.DataFrame.from_dict(pred_boxes, orient='index')
                df = df.fillna(0)
                df.to_csv(f"{OUT_DIR}/boxes.csv", index=False)

            if args['log_json']:
                log_json.update(orig_image, image_name, draw_boxes, labels, CLASSES)

        cv2.imwrite(f"{OUT_DIR}/{image_name}.jpg", orig_image)
        print(f"Image {i+1} done...")
        print('-'*50)

    print('TEST PREDICTIONS COMPLETE')
    cv2.destroyAllWindows()

    # Save JSON log file.
    if args['log_json']:
        log_json.save(os.path.join(OUT_DIR, 'log.json'))
        
    # Calculate and print the average FPS.
    avg_fps = total_fps / frame_count
    print(f"Average FPS: {avg_fps:.3f}")
    print('Path to output files: '+OUT_DIR)

if __name__ == '__main__':
    args = parse_opt()
    main(args)


================================================
FILE: inference_video.py
================================================
import numpy as np
import cv2
import torch
import glob as glob
import os
import time
import argparse
import yaml
import matplotlib.pyplot as plt
import pandas

from models.create_fasterrcnn_model import create_model
from utils.general import set_infer_dir
from utils.annotations import (
    inference_annotations, 
    annotate_fps, 
    convert_detections,
    convert_pre_track,
    convert_post_track
)
from utils.transforms import infer_transforms, resize
from torchvision import transforms as transforms
from deep_sort_realtime.deepsort_tracker import DeepSort
from utils.logging import LogJSON

def read_return_video_data(video_path):
    cap = cv2.VideoCapture(video_path)
    # Get the video's frame width and height
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    assert (frame_width != 0 and frame_height !=0), 'Please check video path...'
    return cap, frame_width, frame_height

def parse_opt():
        # Construct the argument parser.
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '-i', '--input', 
        help='path to input video',
    )
    parser.add_argument(
        '-o', '--output',
        default=None, 
        help='folder path to output data',
    )
    parser.add_argument(
        '--data', 
        default=None,
        help='(optional) path to the data config file'
    )
    parser.add_argument(
        '-m', '--model', 
        default=None,
        help='name of the model'
    )
    parser.add_argument(
        '-w', '--weights', 
        default=None,
        help='path to trained checkpoint weights if providing custom YAML file'
    )
    parser.add_argument(
        '-th', '--threshold', 
        default=0.3, 
        type=float,
        help='detection threshold'
    )
    parser.add_argument(
        '-si', '--show',  
        action='store_true',
        help='visualize output only if this argument is passed'
    )
    parser.add_argument(
        '-mpl', '--mpl-show', 
        dest='mpl_show', 
        action='store_true',
        help='visualize using matplotlib, helpful in notebooks'
    )
    parser.add_argument(
        '-d', '--device', 
        default=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'),
        help='computation/training device, default is GPU if GPU present'
    )
    parser.add_argument(
        '-ims', '--imgsz', 
        default=None,
        type=int,
        help='resize image to, by default use the original frame/image size'
    )
    parser.add_argument(
        '-nlb', '--no-labels',
        dest='no_labels',
        action='store_true',
        help='do not show labels during on top of bounding boxes'
    )
    parser.add_argument(
        '--square-img',
        dest='square_img',
        action='store_true',
        help='whether to use square image resize, else use aspect ratio resize'
    )
    parser.add_argument(
        '--classes',
        nargs='+',
        type=int,
        default=None,
        help='filter classes by visualization, --classes 1 2 3'
    )
    parser.add_argument(
        '--track',
        action='store_true'
    )
    parser.add_argument(
        '--log-json',
        dest='log_json',
        action='store_true',
        help='store a json log file in COCO format in the output directory'
    )
    args = vars(parser.parse_args())
    return args

def main(args):
    # For same annotation colors each time.
    np.random.seed(42)

    if args['track']: # Initialize Deep SORT tracker if tracker is selected.
        tracker = DeepSort(max_age=30)

    # Load the data configurations.
    data_configs = None
    if args['data'] is not None:
        with open(args['data']) as file:
            data_configs = yaml.safe_load(file)
        NUM_CLASSES = data_configs['NC']
        CLASSES = data_configs['CLASSES']
        
    DEVICE = args['device']

    if args['output'] is not None:
        OUT_DIR = args['output']
        if not os.path.exists(OUT_DIR):
            os.makedirs(OUT_DIR)
    else:
        OUT_DIR=set_infer_dir()

    VIDEO_PATH = None

    # Load the pretrained model
    if args['weights'] is None:
        # If the config file is still None, 
        # then load the default one for COCO.
        if data_configs is None:
            with open(os.path.join('data_configs', 'test_video_config.yaml')) as file:
                data_configs = yaml.safe_load(file)
            NUM_CLASSES = data_configs['NC']
            CLASSES = data_configs['CLASSES']
        try:
            build_model = create_model[args['model']]
            model, coco_model = build_model(num_classes=NUM_CLASSES, coco_model=True)
        except:
            build_model = create_model['fasterrcnn_resnet50_fpn_v2']
            model, coco_model = build_model(num_classes=NUM_CLASSES, coco_model=True)
    # Load weights if path provided.
    if args['weights'] is not None:
        checkpoint = torch.load(args['weights'], map_location=DEVICE)
        # If config file is not given, load from model dictionary.
        if data_configs is None:
            data_configs = True
            NUM_CLASSES = checkpoint['data']['NC']
            CLASSES = checkpoint['data']['CLASSES']
        try:
            print('Building from model name arguments...')
            build_model = create_model[str(args['model'])]
        except:
            build_model = create_model[checkpoint['model_name']]
        model = build_model(num_classes=NUM_CLASSES, coco_model=False)
        model.load_state_dict(checkpoint['model_state_dict'])
    model.to(DEVICE).eval()

    COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
    if args['input'] == None:
        VIDEO_PATH = data_configs['video_path']
    else:
        VIDEO_PATH = args['input']
    assert VIDEO_PATH is not None, 'Please provide path to an input video...'

    # Define the detection threshold any detection having
    # score below this will be discarded.
    detection_threshold = args['threshold']

    cap, frame_width, frame_height = read_return_video_data(VIDEO_PATH)

    save_name = VIDEO_PATH.split(os.path.sep)[-1].split('.')[0]
    # Define codec and create VideoWriter object.
    out = cv2.VideoWriter(f"{OUT_DIR}/{save_name}.mp4", 
                        cv2.VideoWriter_fourcc(*'mp4v'), 30, 
                        (frame_width, frame_height))
    if args['imgsz'] != None:
        RESIZE_TO = args['imgsz']
    else:
        RESIZE_TO = frame_width

    if args['log_json']:
        log_json = LogJSON(os.path.join(OUT_DIR, 'log.json'))

    frame_count = 0 # To count total frames.
    total_fps = 0 # To get the final frames per second.

    # read until end of video
    while(cap.isOpened()):
        # capture each frame of the video
        ret, frame = cap.read()
        if ret:
            orig_frame = frame.copy()
            frame = resize(frame, RESIZE_TO, square=args['square_img'])
            image = frame.copy()
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = infer_transforms(image)
            # Add batch dimension.
            image = torch.unsqueeze(image, 0)
            # Get the start time.
            start_time = time.time()
            with torch.no_grad():
                # Get predictions for the current frame.
                outputs = model(image.to(DEVICE))
            forward_end_time = time.time()

            forward_pass_time = forward_end_time - start_time
            
            # Get the current fps.
            fps = 1 / (forward_pass_time)
            # Add `fps` to `total_fps`.
            total_fps += fps
            # Increment frame count.
            frame_count += 1
            
            # Load all detection to CPU for further operations.
            outputs = [{k: v.to('cpu') for k, v in t.items()} for t in outputs]
                
            # Carry further only if there are detected boxes.
            if len(outputs[0]['boxes']) != 0:
                draw_boxes, pred_classes, scores, labels = convert_detections(
                    outputs, detection_threshold, CLASSES, args
                )
                if args['track']:
                    tracker_inputs = convert_pre_track(
                        draw_boxes, pred_classes, scores
                    )
                    # Update tracker with detections.
                    tracks = tracker.update_tracks(tracker_inputs, frame=frame)
                    draw_boxes, pred_classes, scores = convert_post_track(tracks) 
                frame = inference_annotations(
                    draw_boxes, 
                    pred_classes, 
                    scores,
                    CLASSES, 
                    COLORS, 
                    orig_frame, 
                    frame,
                    args
                )
            else:
                frame = orig_frame

            if args['log_json']:
                log_json.update(frame, save_name, draw_boxes, labels, CLASSES)

            frame = annotate_fps(frame, fps)

            final_end_time = time.time()
            forward_and_annot_time = final_end_time - start_time
            print_string = f"Frame: {frame_count}, Forward pass FPS: {fps:.3f}, "
            print_string += f"Forward pass time: {forward_pass_time:.3f} seconds, "
            print_string += f"Forward pass + annotation time: {forward_and_annot_time:.3f} seconds"
            print(print_string)            
            out.write(frame)

            if args['show']:
                cv2.imshow('Prediction', frame)
                # Press `q` to exit
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        else:
            break
    # Release VideoCapture().
    cap.release()
    # Close all frames and video windows.
    cv2.destroyAllWindows()

    # Save JSON log file.
    if args['log_json']:
        log_json.save(os.path.join(OUT_DIR, 'log.json'))

    # Calculate and print the average FPS.
    avg_fps = total_fps / frame_count
    print(f"Average FPS: {avg_fps:.3f}")
    print('Path to output files: '+OUT_DIR)

if __name__ == '__main__':
    args = parse_opt()
    main(args)

================================================
FILE: models/__init__.py
================================================
__all__ = [
    'fasterrcnn_convnext_small',
    'fasterrcnn_convnext_tiny',
    'fasterrcnn_custom_resnet', 
    'fasterrcnn_darknet',
    'fasterrcnn_efficientnet_b0',
    'fasterrcnn_efficientnet_b4',
    'fasterrcnn_mbv3_small_nano_head',
    'fasterrcnn_mbv3_large',
    'fasterrcnn_mini_darknet_nano_head',
    'fasterrcnn_mini_darknet',
    'fasterrcnn_mini_squeezenet1_1_small_head',
    'fasterrcnn_mini_squeezenet1_1_tiny_head',
    'fasterrcnn_mobilenetv3_large_320_fpn',
    'fasterrcnn_mobilenetv3_large_fpn',
    'fasterrcnn_nano',
    'fasterrcnn_resnet18',
    'fasterrcnn_resnet50_fpn_v2',
    'fasterrcnn_resnet50_fpn', 
    'fasterrcnn_resnet101',
    'fasterrcnn_resnet152',
    'fasterrcnn_squeezenet1_0',
    'fasterrcnn_squeezenet1_1_small_head',
    'fasterrcnn_squeezenet1_1',
    'fasterrcnn_vitdet',
    'fasterrcnn_vitdet_tiny',
    'fasterrcnn_mobilevit_xxs',
    'fasterrcnn_regnet_y_400mf',
    'fasterrcnn_vgg16',
    'fasterrcnn_dinov3_convnext_tiny',
    'fasterrcnn_dinov3_vits16',
    'fasterrcnn_dinov3_convnext_tiny_multifeat',
    'fasterrcnn_dinov3_vits16plus',
    'fasterrcnn_dinov3_vitb16',
    'fasterrcnn_dinov3_vitl16',
    'fasterrcnn_dinov3_vith16plus',
    'fasterrcnn_dinov3_convnext_small',
    'fasterrcnn_dinov3_convnext_base',
    'fasterrcnn_dinov3_convnext_large'
]

================================================
FILE: models/create_fasterrcnn_model.py
================================================
from models import *

def return_fasterrcnn_resnet50_fpn(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_resnet50_fpn.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_mobilenetv3_large_fpn(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mobilenetv3_large_fpn.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_mobilenetv3_large_320_fpn(
    num_classes, pretrained=True, coco_model=False
):    
    model = fasterrcnn_mobilenetv3_large_320_fpn.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_resnet18(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_resnet18.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_custom_resnet(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_custom_resnet.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_darknet(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_darknet.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_squeezenet1_0(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_squeezenet1_0.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_squeezenet1_1(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_squeezenet1_1.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_mini_darknet(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mini_darknet.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_squeezenet1_1_small_head(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_squeezenet1_1_small_head.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_mini_squeezenet1_1_small_head(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mini_squeezenet1_1_small_head.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_mini_squeezenet1_1_tiny_head(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mini_squeezenet1_1_tiny_head.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_mbv3_small_nano_head(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mbv3_small_nano_head.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_mini_darknet_nano_head(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mini_darknet_nano_head.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_efficientnet_b0(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_efficientnet_b0.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_nano(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_nano.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_resnet152(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_resnet152.create_model(
        num_classes, pretrained, coco_model
    )
    return model

def return_fasterrcnn_resnet50_fpn_v2(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_resnet50_fpn_v2.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_convnext_small(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_convnext_small.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_convnext_tiny(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_convnext_tiny.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_resnet101(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_resnet101.create_model(
        num_classes, pretrained=pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_vitdet(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_vitdet.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_vitdet_tiny(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_vitdet_tiny.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_mobilevit_xxs(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_mobilevit_xxs.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_regnet_y_400mf(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_regnet_y_400mf.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_vgg16(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_vgg16.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_convnext_tiny(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_convnext_tiny.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_vits16(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_vits16.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_convnext_tiny_multifeat(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_convnext_tiny_multifeat.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_vits16plus(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_vits16plus.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_vitb16(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_vitb16.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_vitl16(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_vitl16.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_vith16plus(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_vith16plus.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_convnext_small(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_convnext_small.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_convnext_base(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_convnext_base.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

def return_fasterrcnn_dinov3_convnext_large(
    num_classes, pretrained=True, coco_model=False
):
    model = fasterrcnn_dinov3_convnext_large.create_model(
        num_classes, pretrained, coco_model=coco_model
    )
    return model

create_model = {
    'fasterrcnn_resnet50_fpn': return_fasterrcnn_resnet50_fpn,
    'fasterrcnn_mobilenetv3_large_fpn': return_fasterrcnn_mobilenetv3_large_fpn,
    'fasterrcnn_mobilenetv3_large_320_fpn': return_fasterrcnn_mobilenetv3_large_320_fpn,
    'fasterrcnn_resnet18': return_fasterrcnn_resnet18,
    'fasterrcnn_custom_resnet': return_fasterrcnn_custom_resnet,
    'fasterrcnn_darknet': return_fasterrcnn_darknet,
    'fasterrcnn_squeezenet1_0': return_fasterrcnn_squeezenet1_0,
    'fasterrcnn_squeezenet1_1': return_fasterrcnn_squeezenet1_1,
    'fasterrcnn_mini_darknet': return_fasterrcnn_mini_darknet,
    'fasterrcnn_squeezenet1_1_small_head': return_fasterrcnn_squeezenet1_1_small_head,
    'fasterrcnn_mini_squeezenet1_1_small_head': return_fasterrcnn_mini_squeezenet1_1_small_head,
    'fasterrcnn_mini_squeezenet1_1_tiny_head': return_fasterrcnn_mini_squeezenet1_1_tiny_head,
    'fasterrcnn_mbv3_small_nano_head': return_fasterrcnn_mbv3_small_nano_head,
    'fasterrcnn_mini_darknet_nano_head': return_fasterrcnn_mini_darknet_nano_head,
    'fasterrcnn_efficientnet_b0': return_fasterrcnn_efficientnet_b0,
    'fasterrcnn_nano': return_fasterrcnn_nano,
    'fasterrcnn_resnet152': return_fasterrcnn_resnet152,
    'fasterrcnn_resnet50_fpn_v2': return_fasterrcnn_resnet50_fpn_v2,
    'fasterrcnn_convnext_small': return_fasterrcnn_convnext_small, 
    'fasterrcnn_convnext_tiny': return_fasterrcnn_convnext_tiny,
    'fasterrcnn_resnet101': return_fasterrcnn_resnet101,
    'fasterrcnn_vitdet': return_fasterrcnn_vitdet,
    'fasterrcnn_vitdet_tiny': return_fasterrcnn_vitdet_tiny,
    'fasterrcnn_mobilevit_xxs': return_fasterrcnn_mobilevit_xxs,
    'fasterrcnn_regnet_y_400mf': return_fasterrcnn_regnet_y_400mf,
    'fasterrcnn_vgg16': return_fasterrcnn_vgg16,
    'fasterrcnn_dinov3_convnext_tiny': return_fasterrcnn_dinov3_convnext_tiny,
    'fasterrcnn_dinov3_vits16': return_fasterrcnn_dinov3_vits16,
    'fasterrcnn_dinov3_convnext_tiny_multifeat': return_fasterrcnn_dinov3_convnext_tiny_multifeat,
    'fasterrcnn_dinov3_vits16plus': return_fasterrcnn_dinov3_vits16plus,
    'fasterrcnn_dinov3_vitb16': return_fasterrcnn_dinov3_vitb16,
    'fasterrcnn_dinov3_vitl16': return_fasterrcnn_dinov3_vitl16,
    'fasterrcnn_dinov3_vith16plus': return_fasterrcnn_dinov3_vith16plus,
    'fasterrcnn_dinov3_convnext_small': return_fasterrcnn_dinov3_convnext_small,
    'fasterrcnn_dinov3_convnext_base': return_fasterrcnn_dinov3_convnext_base,
    'fasterrcnn_dinov3_convnext_large': return_fasterrcnn_dinov3_convnext_large
}

================================================
FILE: models/fasterrcnn_convnext_small.py
================================================
"""
Faster RCNN model with the Convnext Small backbone from 
Torchvision classification models.

Reference: https://pytorch.org/vision/stable/models/generated/torchvision.models.convnext_small.html#torchvision.models.ConvNeXt_Small_Weights
"""

import torchvision

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained features.
    if pretrained:
        backbone = torchvision.models.convnext_small(weights='DEFAULT').features
    else:
        backbone = torchvision.models.convnext_small().features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    backbone.out_channels = 768

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_convnext_tiny.py
================================================
"""
Faster RCNN model with the Convnext Tiny backbone from 
Torchvision classification models.

Reference: https://pytorch.org/vision/stable/models/generated/torchvision.models.convnext_tiny.html#torchvision.models.convnext_tiny
"""

import torchvision

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained features.
    if pretrained:
        backbone = torchvision.models.convnext_tiny(weights='DEFAULT').features
    else:
        backbone = torchvision.models.convnext_tiny().features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    backbone.out_channels = 768

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_custom_resnet.py
================================================
import torchvision

from torch import nn
from torch.nn import functional as F
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class ResidualBlock(nn.Module):
    """
    Creates the Residual block of ResNet.
    """
    def __init__(
        self, in_channels, out_channels, use_1x1conv=True, strides=1
    ):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels,
                               kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(out_channels, out_channels,
                               kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels, out_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        inputs = x
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.bn2(self.conv2(x))
        if self.conv3:
            inputs = self.conv3(inputs)
        x += inputs
        return F.relu(x)

def create_resnet_block(
    input_channels,
    output_channels, 
    num_residuals,
):
        resnet_block = []
        for i in range(num_residuals):
            if i == 0:
                resnet_block.append(ResidualBlock(input_channels, output_channels,
                                    use_1x1conv=True, strides=2))
            else:
                resnet_block.append(ResidualBlock(output_channels, output_channels))
        return resnet_block

class CustomResNet(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.block1 = nn.Sequential(nn.Conv2d(3, 16, kernel_size=7, stride=2, padding=3),
                        nn.BatchNorm2d(16), nn.ReLU(),
                        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
        self.block2 = nn.Sequential(*create_resnet_block(16, 32, 2))
        self.block3 = nn.Sequential(*create_resnet_block(32, 64, 2))
        self.block4 = nn.Sequential(*create_resnet_block(64, 128, 2))
        self.block5 = nn.Sequential(*create_resnet_block(128, 256, 2))

        self.linear = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        bs, _, _, _ = x.shape
        x = F.adaptive_avg_pool2d(x, 1).reshape(bs, -1)
        x = self.linear(x)
        return x

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the pretrained ResNet18 backbone.
    # if pretrained:
        # print('Loading Tiny ImageNet weights...')
        # custom_resnet = CustomResNet(num_classes=200)
        # checkpoint = torch.load('outputs/custom_resnet_weights/model_best.pth.tar')
        # custom_resnet.load_state_dict(checkpoint['state_dict'])
    # else:
    print('Loading Custom ResNet with random weights')
    custom_resnet = CustomResNet(num_classes=10)

    block1 = custom_resnet.block1
    block2 = custom_resnet.block2
    block3 = custom_resnet.block3
    block4 = custom_resnet.block4
    block5 = custom_resnet.block5

    backbone = nn.Sequential(
        block1, block2, block3, block4, block5 
    )

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 512 for ResNet18.
    backbone.out_channels = 256

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_darknet.py
================================================
import torchvision
import torch

from torch import nn
from torch.nn import functional as F
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class DarkNet(nn.Module):
    def __init__(self, initialize_weights=True, num_classes=1000):
        super(DarkNet, self).__init__()

        self.num_classes = num_classes
        self.features = self._create_conv_layers()
        self.pool = self._pool()
        self.fcs = self._create_fc_layers()

        if initialize_weights:
            # random initialization of the weights...
            # ... just like the original paper
            self._initialize_weights()

    def _create_conv_layers(self):
        conv_layers = nn.Sequential(
            nn.Conv2d(3, 64, 7, stride=2, padding=3),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(64, 192, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(192, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(512, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 512, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 1024, 3, padding=1),
            nn.MaxPool2d(2),

            nn.Conv2d(1024, 512, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 1024, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(1024, 512, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(512, 1024, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
        )
        return conv_layers

    # def _create_fc_layers(self):
    #     fc_layers = nn.Sequential(
    #         nn.AvgPool2d(7),
    #         nn.Linear(1024, self.num_classes)
    #     )
    #     return fc_layers

    def _pool(self):
        pool = nn.Sequential(
            nn.AvgPool2d(7),
        )
        return pool
    
    def _create_fc_layers(self):
        fc_layers = nn.Sequential(
            nn.Linear(1024, self.num_classes)
        )
        return fc_layers

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_in',
                    nonlinearity='leaky_relu'
                )
                if m.bias is not None:
                        nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.features(x)
        x = self.pool(x)
        x = x.squeeze()
        x = self.fcs(x)
        return x

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the pretrained ResNet18 backbone.
    backbone = DarkNet(num_classes=10).features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 512 for ResNet18.
    backbone.out_channels = 1024

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_convnext_base.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_convnext_base', 
            source='local', 
            weights=WEIGHTS_URL
        )
        
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=1, reshape=True, return_class_token=False, norm=True
        )
        return out[0]
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 1024

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_convnext_large.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_large_pretrain_lvd1689m-61fa432d.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_convnext_large', 
            source='local', 
            weights=WEIGHTS_URL
        )
        
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=1, reshape=True, return_class_token=False, norm=True
        )
        return out[0]
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 1536

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_convnext_small.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_small_pretrain_lvd1689m-296db49d.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_convnext_small', 
            source='local', 
            weights=WEIGHTS_URL
        )
        
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=1, reshape=True, return_class_token=False, norm=True
        )
        return out[0]
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 768

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_convnext_tiny.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_tiny_pretrain_lvd1689m-21b726bb.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_convnext_tiny', 
            source='local', 
            weights=WEIGHTS_URL
        )

    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=1, reshape=True, return_class_token=False, norm=True
        )
        return out[0]
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 768

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_convnext_tiny_multifeat.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn
import math

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_convnext_tiny_pretrain_lvd1689m-21b726bb.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            "dinov3_convnext_tiny", 
            source='local', 
            weights=WEIGHTS_URL
        )
        self.out_indices = [1, 2, 3]

        self.out_channels = 256

        # project each feature map to the same number of channels
        self.lateral_convs = nn.ModuleList([
            nn.Conv2d(192, self.out_channels, 1),
            nn.Conv2d(384, self.out_channels, 1),
            nn.Conv2d(768, self.out_channels, 1),
        ])
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )

        out = [conv(fmap) for conv, fmap in zip(self.lateral_convs, out)]

        return {
            '1': out[0],  # [N, self.out_channels, 100, 100]
            '2': out[1],  # [N, self.out_channels, 50, 50]
            '3': out[2],  # [N, self.out_channels, 25, 25]
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 256

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 3,   # one size per feature map
        aspect_ratios=((0.5, 1.0, 2.0),) * 3  # repeat for each feature map
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['1', '2', '3'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)
    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    with torch.no_grad():
        outputs = model(random_tensor)

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_vitb16.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_vitb16', 
            source='local', 
            weights=WEIGHTS_URL
        )

        self.out_indices = [3, 6, 9, 11]
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )
        
        return {
            '0': out[0],
            '1': out[1],
            '2': out[2],
            '3': out[3],
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 768

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 4,
        aspect_ratios=((0.5, 1.0, 2.0),) * 4
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0', '1', '2', '3'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)

    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_vith16plus.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_vith16plus', 
            source='local', 
            weights=WEIGHTS_URL
        )

        self.out_indices = [9, 13, 17, 31]
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )
        
        return {
            '0': out[0],
            '1': out[1],
            '2': out[2],
            '3': out[3],
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 1280

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 4,
        aspect_ratios=((0.5, 1.0, 2.0),) * 4
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0', '1', '2', '3'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)

    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_vitl16.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_vitl16', 
            source='local', 
            weights=WEIGHTS_URL
        )

        self.out_indices = [7, 11, 15, 23]
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )
        
        return {
            '0': out[0],
            '1': out[1],
            '2': out[2],
            '3': out[3],
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 1024

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 4,
        aspect_ratios=((0.5, 1.0, 2.0),) * 4
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0', '1', '2', '3'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)

    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_vits16.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_vits16_pretrain_lvd1689m-08c60483.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_vits16', 
            source='local', 
            weights=WEIGHTS_URL
        )

        self.out_indices = [3, 6, 9, 11]
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )
        
        return {
            '0': out[0],
            '1': out[1],
            '2': out[2],
            '3': out[3],
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 384

    # Dummy forward.
    # backbone(torch.rand(1, 3, 244, 244))
    # exit(0)

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 4,
        aspect_ratios=((0.5, 1.0, 2.0),) * 4
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['1', '2', '3', '4'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)

    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_dinov3_vits16plus.py
================================================
import sys
import os
import torch
import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from models.model_summary import summary

# Get current file's absolute path
current_file = os.path.abspath(__file__)

# Get the directory of the current file (models folder)
current_dir = os.path.dirname(current_file)

# Get the parent directory (previous directory)
parent_dir = os.path.dirname(current_dir)

sys.path.append(os.path.join(parent_dir, 'dinov3'))

# Relative to parent fasterrcnn directory.
REPO_DIR = 'dinov3'
# Relative to parent fasterrcnn directory or the absolute path.
WEIGHTS_URL = 'weights/dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth'

class Dinov3Backbone(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.hub.load(
            REPO_DIR, 
            'dinov3_vits16plus', 
            source='local', 
            weights=WEIGHTS_URL
        )

        self.out_indices = [3, 6, 9, 11]
    
    def forward(self, x):
        out = self.backbone.get_intermediate_layers(
            x, n=self.out_indices, reshape=True, return_class_token=False, norm=True
        )
        
        return {
            '0': out[0],
            '1': out[1],
            '2': out[2],
            '3': out[3],
        }
    

def create_model(num_classes=81, pretrained=True, coco_model=False):
    backbone = Dinov3Backbone()

    backbone.out_channels = 384

    for name, params in backbone.named_parameters():
        params.requires_grad_(False)

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),) * 4,
        aspect_ratios=((0.5, 1.0, 2.0),) * 4
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0', '1', '2', '3'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from models.model_summary import summary

    model = create_model(81, pretrained=True)

    print(model)
    
    random_tensor = torch.randn(1, 3, 640, 640)

    _ = model.eval()

    summary(model)

================================================
FILE: models/fasterrcnn_efficientnet_b0.py
================================================
"""
Faster RCNN model with the EfficientNetB0 backbone.

Reference: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
"""

import torchvision

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained EfficientNetB0 large features.
    backbone = torchvision.models.efficientnet_b0(weights='DEFAULT').features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    backbone.out_channels = 1280 # 1280 for EfficientNetB0.

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_efficientnet_b4.py
================================================
"""
Faster RCNN model with the EfficientNetB4 backbone.
"""

import torchvision

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the pretrained EfficientNetB0 large features.
    backbone = torchvision.models.efficientnet_b4(weights='DEFAULT').features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    backbone.out_channels = 1792

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mbv3_large.py
================================================
"""
Faster RCNN model with the MobileNetV3 backbone from 
Torchvision classification models.

Reference: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
"""

import torchvision

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained MobileNetV3 large features.
    backbone = torchvision.models.mobilenet_v3_large(weights='DEFAULT').features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 960 for MobileNetV3.
    backbone.out_channels = 960

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mbv3_small_nano_head.py
================================================
"""
Faster RCNN model with the MobileNetV3 Small backbone from 
Torchvision classification models.

Reference: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

The final output features of the MobileNetV3 Small model has been
reduced to 128.
The representation size of the Faster RCNN head has been
reduced to 128.
"""

import torchvision
import torch
import torch.nn.functional as F

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Args:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super().__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Args:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            torch._assert(
                list(x.shape[2:]) == [1, 1],
                f"x has the wrong shape, expecting the last two dimensions to be [1,1] instead of {list(x.shape[2:])}",
            )
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained MobileNetV3 Small features.
    backbone = torchvision.models.mobilenet_v3_small(pretrained=True).features

    # Change the features of block 16 in 
    # the backbone to reduce size.
    backbone[12 ][0] = nn.Conv2d(
        in_channels=96, 
        out_channels=128, 
        kernel_size=(1, 1), 
        stride=(1, 1), 
        bias=False
    )
    backbone[12][1] = nn.BatchNorm2d(num_features=128)

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 128 for this custom MobileNetV3 Small.
    backbone.out_channels = 128

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    representation_size = 128

    # Box head.
    box_head = TwoMLPHead(
        in_channels=backbone.out_channels * roi_pooler.output_size[0] ** 2, 
        representation_size=representation_size
    )

    # Box predictor.
    box_predictor = FastRCNNPredictor(representation_size, num_classes)

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=None, # Num classes should be None when `box_predictor` is provided.
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler,
        box_head=box_head,
        box_predictor=box_predictor
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mini_darknet.py
================================================
import torchvision

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# A DarkNet model with reduced output channels for each layer.
class DarkNet(nn.Module):
    def __init__(self, initialize_weights=True, num_classes=1000):
        super(DarkNet, self).__init__()

        self.num_classes = num_classes
        self.features = self._create_conv_layers()
        self.pool = self._pool()
        self.fcs = self._create_fc_layers()

        if initialize_weights:
            # Random initialization of the weights
            # just like the original paper.
            self._initialize_weights()

    def _create_conv_layers(self):
        conv_layers = nn.Sequential(
            nn.Conv2d(3, 4, 7, stride=2, padding=3),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(4, 8, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(8, 16, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(16, 32, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(32, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.MaxPool2d(2),

            nn.Conv2d(256, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
        )
        return conv_layers

    def _pool(self):
        pool = nn.Sequential(
            nn.AvgPool2d(7),
        )
        return pool
    
    def _create_fc_layers(self):
        fc_layers = nn.Sequential(
            nn.Linear(128, self.num_classes)
        )
        return fc_layers

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal(m.weight, mode='fan_in',
                    nonlinearity='leaky_relu'
                )
                if m.bias is not None:
                        nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.features(x)
        x = self.pool(x)
        x = x.squeeze()
        x = self.fcs(x)
        return x

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the Mini DarkNet model features.
    backbone = DarkNet(num_classes=10).features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 128 for this custom Mini DarkNet model.
    backbone.out_channels = 128

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mini_darknet_nano_head.py
================================================
"""
Custom Faster RCNN model with a smaller DarkNet backbone and a very small detection
head as well.
Detection head representation size is 128.
"""

import torchvision
import torch.nn.functional as F
import torch

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Args:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super().__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Args:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            torch._assert(
                list(x.shape[2:]) == [1, 1],
                f"x has the wrong shape, expecting the last two dimensions to be [1,1] instead of {list(x.shape[2:])}",
            )
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

# A DarkNet model with reduced output channels for each layer.
class DarkNet(nn.Module):
    def __init__(self, initialize_weights=True, num_classes=1000):
        super(DarkNet, self).__init__()

        self.num_classes = num_classes
        self.features = self._create_conv_layers()
        self.pool = self._pool()
        self.fcs = self._create_fc_layers()

        if initialize_weights:
            # Random initialization of the weights
            # just like the original paper.
            self._initialize_weights()

    def _create_conv_layers(self):
        conv_layers = nn.Sequential(
            nn.Conv2d(3, 4, 7, stride=2, padding=3),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(4, 8, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(8, 16, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(16, 32, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(32, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 64, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.MaxPool2d(2),

            nn.Conv2d(256, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 128, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
        )
        return conv_layers

    def _pool(self):
        pool = nn.Sequential(
            nn.AvgPool2d(7),
        )
        return pool
    
    def _create_fc_layers(self):
        fc_layers = nn.Sequential(
            nn.Linear(128, self.num_classes)
        )
        return fc_layers

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal(m.weight, mode='fan_in',
                    nonlinearity='leaky_relu'
                )
                if m.bias is not None:
                        nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.features(x)
        x = self.pool(x)
        x = x.squeeze()
        x = self.fcs(x)
        return x

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the Mini DarkNet model features.
    backbone = DarkNet(num_classes=10).features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 128 for this custom Mini DarkNet model.
    backbone.out_channels = 128

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    representation_size = 128

    # Box head.
    box_head = TwoMLPHead(
        in_channels=backbone.out_channels * roi_pooler.output_size[0] ** 2, 
        representation_size=representation_size
    )

    # Box predictor.
    box_predictor = FastRCNNPredictor(representation_size, num_classes)

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=None, # Num classes shoule be None when `box_predictor` is provided.
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler,
        box_head=box_head,
        box_predictor=box_predictor
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mini_squeezenet1_1_small_head.py
================================================
"""
Backbone: SqueezeNet1_1 with changed backbone features. Had to tweak a few 
input and output features in the backbone for this.

Torchvision link: https://pytorch.org/vision/stable/models.html#id15
SqueezeNet repo: https://github.com/forresti/SqueezeNet/tree/master/SqueezeNet_v1.1

Detection Head: Custom Mini Faster RCNN Head. 
"""

import torchvision
import torch
import torch.nn.functional as F

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Args:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super().__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Args:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            torch._assert(
                list(x.shape[2:]) == [1, 1],
                f"x has the wrong shape, expecting the last two dimensions to be [1,1] instead of {list(x.shape[2:])}",
            )
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

def create_model(num_classes=81, pretrained=True, coco_model=False):
    # Load the pretrained SqueezeNet1_1 backbone.
    backbone = torchvision.models.squeezenet1_1(pretrained=pretrained).features

    # Change the number of features in backbone[12] block to reduce model size.
    # Although the weights for this block may become random, 
    # we still have the previous layers with ImageNet weights. So, 
    # will still perform pretty well in transfer learning.
    backbone[12].squeeze = nn.Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
    backbone[12].expand1x1 = nn.Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
    backbone[12].expand3x3 = nn.Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 128 for this custom SqueezeNet1_1.
    backbone.out_channels = 128

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    representation_size = 512

    # Box head.
    box_head = TwoMLPHead(
        in_channels=backbone.out_channels * roi_pooler.output_size[0] ** 2, 
        representation_size=representation_size
    )

    # Box predictor.
    box_predictor = FastRCNNPredictor(representation_size, num_classes)

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=None, # Num classes shoule be None when `box_predictor` is provided.
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler,
        box_head=box_head,
        box_predictor=box_predictor
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mini_squeezenet1_1_tiny_head.py
================================================
"""
Backbone: SqueezeNet1_1 with changed backbone features. Had to tweak a few 
input and output features in the backbone for this.

Torchvision link: https://pytorch.org/vision/stable/models.html#id15
SqueezeNet repo: https://github.com/forresti/SqueezeNet/tree/master/SqueezeNet_v1.1

Detection Head: Custom Tiny Faster RCNN Head with only 256 representation size. 
"""

import torchvision
import torch
import torch.nn.functional as F

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Args:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super().__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Args:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            torch._assert(
                list(x.shape[2:]) == [1, 1],
                f"x has the wrong shape, expecting the last two dimensions to be [1,1] instead of {list(x.shape[2:])}",
            )
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

def create_model(num_classes=81, pretrained=True, coco_model=True):
    # Load the pretrained SqueezeNet1_1 backbone.
    backbone = torchvision.models.squeezenet1_1(pretrained=pretrained).features

    # Change the number of features in backbone[12] block to reduce model size.
    # Although the weights for this block may become random, 
    # we still have the previous layers with ImageNet weights. So, 
    # will still perform pretty well in transfer learning.
    backbone[12].squeeze = nn.Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
    backbone[12].expand1x1 = nn.Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
    backbone[12].expand3x3 = nn.Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 128 for this custom SqueezeNet1_1.
    backbone.out_channels = 128

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    representation_size = 256

    # Box head.
    box_head = TwoMLPHead(
        in_channels=backbone.out_channels * roi_pooler.output_size[0] ** 2, 
        representation_size=representation_size
    )

    # Box predictor.
    box_predictor = FastRCNNPredictor(representation_size, num_classes)

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=None, # Num classes shoule be None when `box_predictor` is provided.
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler,
        box_head=box_head,
        box_predictor=box_predictor
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mobilenetv3_large_320_fpn.py
================================================
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def create_model(num_classes, pretrained=True, coco_model=False):
    
    # load Faster RCNN pre-trained model
    model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(
        weights='DEFAULT'
    )

    if coco_model: # Return the COCO pretrained model for COCO classes.
        return model, coco_model

    # get the number of input features 
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # define a new head for the detector with required number of classes
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mobilenetv3_large_fpn.py
================================================
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load Faster RCNN pre-trained model
    model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(
        weights='DEFAULT'
    )
    if coco_model: # Return the COCO pretrained model for COCO classes.
        return model, coco_model
    
    # get the number of input features 
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # define a new head for the detector with required number of classes
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_mobilevit_xxs.py
================================================
"""
Faster RCNN Head with MobileViT XXS (Extra Extra Small) as backbone.
You need to install vision_transformers library for this.
Find the GitHub project here:
https://github.com/sovit-123/vision_transformers
"""

import torchvision
import torch.nn as nn
import sys

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

try:
    from vision_transformers.models.mobile_vit import mobilevit_xxs
except:
    print('Please intall Vision Transformers to use MobileViT backbones')
    print('You can do pip install vision_transformers')
    print('Or visit the following link for the latest updates')
    print('https://github.com/sovit-123/vision_transformers')
    assert ('vision_transformers' in sys.modules), 'vision_transformers not found'

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the backbone.
    model_backbone = mobilevit_xxs(pretrained=pretrained)

    backbone = nn.Sequential(*list(model_backbone.children())[:-1])

    # Output channels from the final convolutional layer.
    backbone.out_channels = 320

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    try:
        summary(model)
    except:
        print(model)
    # Total parameters and trainable parameters.
    total_params = sum(p.numel() for p in model.parameters())
    print(f"{total_params:,} total parameters.")
    total_trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad)
    print(f"{total_trainable_params:,} training parameters.")

================================================
FILE: models/fasterrcnn_nano.py
================================================
"""
Custom Faster RCNN model with a very small backbone and a represnetation
size of 128.
"""

import torchvision
import torch.nn.functional as F
import torch

from torch import nn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Args:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super().__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Args:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            torch._assert(
                list(x.shape[2:]) == [1, 1],
                f"x has the wrong shape, expecting the last two dimensions to be [1,1] instead of {list(x.shape[2:])}",
            )
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

# A Nano backbone.
class NanoBackbone(nn.Module):
    def __init__(self, initialize_weights=True, num_classes=1000):
        super(NanoBackbone, self).__init__()

        self.num_classes = num_classes
        self.features = self._create_conv_layers()

        if initialize_weights:
            # Random initialization of the weights
            # just like the original paper.
            self._initialize_weights()

    def _create_conv_layers(self):
        conv_layers = nn.Sequential(
            nn.Conv2d(3, 64, 7, stride=2, padding=3),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(64, 128, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.MaxPool2d(2),

            nn.Conv2d(128, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.LeakyReLU(0.1, inplace=True),
            nn.Conv2d(256, 256, 1),
            nn.LeakyReLU(0.1, inplace=True),
        )
        return conv_layers

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal(m.weight, mode='fan_in',
                    nonlinearity='leaky_relu'
                )
                if m.bias is not None:
                        nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the backbone features.
    backbone = NanoBackbone(num_classes=10).features

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    backbone.out_channels = 256

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    representation_size = 128

    # Box head.
    box_head = TwoMLPHead(
        in_channels=backbone.out_channels * roi_pooler.output_size[0] ** 2, 
        representation_size=representation_size
    )

    # Box predictor.
    box_predictor = FastRCNNPredictor(representation_size, num_classes)

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=None, # Num classes shoule be None when `box_predictor` is provided.
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler,
        box_head=box_head,
        box_predictor=box_predictor
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_regnet_y_400mf.py
================================================
"""
Faster RCNN model with the RegNet_Y 400 MF backbone from
Torchvision classification models.

Reference: https://pytorch.org/vision/stable/models/generated/torchvision.models.regnet_y_400mf.html
"""

import torchvision
import torch.nn as nn
import sys

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    model_backbone = torchvision.models.regnet_y_400mf(weights='DEFAULT')
    backbone = nn.Sequential(*list(model_backbone.children())[:-2])

    backbone.out_channels = 440

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_resnet101.py
================================================
"""
Faster RCNN model with the ResNet101 backbone from
Torchvision classification models.

Reference: https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet101.html
"""

import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    model_backbone = torchvision.models.resnet101(weights='DEFAULT')

    conv1 = model_backbone.conv1
    bn1 = model_backbone.bn1
    relu = model_backbone.relu
    max_pool = model_backbone.maxpool
    layer1 = model_backbone.layer1
    layer2 = model_backbone.layer2
    layer3 = model_backbone.layer3
    layer4 = model_backbone.layer4

    backbone = nn.Sequential(
        conv1, 
        bn1, 
        relu, 
        max_pool, 
        layer1, 
        layer2, 
        layer3, 
        layer4
    )
    backbone.out_channels = 2048

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_resnet152.py
================================================
"""
Faster RCNN model with the ResNet152 backbone from
Torchvision classification models.

Reference: https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet152.html
"""

import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes=81, pretrained=True, coco_model=False):
    model_backbone = torchvision.models.resnet152(weights='DEFAULT')

    conv1 = model_backbone.conv1
    bn1 = model_backbone.bn1
    relu = model_backbone.relu
    max_pool = model_backbone.maxpool
    layer1 = model_backbone.layer1
    layer2 = model_backbone.layer2
    layer3 = model_backbone.layer3
    layer4 = model_backbone.layer4

    backbone = nn.Sequential(
        conv1, 
        bn1, 
        relu, 
        max_pool, 
        layer1, 
        layer2, 
        layer3, 
        layer4
    )
    backbone.out_channels = 2048

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_resnet18.py
================================================
"""
Faster RCNN model with the ResNet18 backbone from Torchvision.
Torchvision link: https://pytorch.org/vision/stable/models.html#id10
ResNet paper: https://arxiv.org/pdf/1512.03385.pdf
"""

import torchvision
import torch.nn as nn

from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load the pretrained ResNet18 backbone.
    model_backbone = torchvision.models.resnet18(weights='DEFAULT')

    conv1 = model_backbone.conv1
    bn1 = model_backbone.bn1
    relu = model_backbone.relu
    max_pool = model_backbone.maxpool
    layer1 = model_backbone.layer1
    layer2 = model_backbone.layer2
    layer3 = model_backbone.layer3
    layer4 = model_backbone.layer4

    backbone = nn.Sequential(
        conv1, bn1, relu, max_pool, 
        layer1, layer2, layer3, layer4
    )

    # We need the output channels of the last convolutional layers from
    # the features for the Faster RCNN model.
    # It is 512 for ResNet18.
    backbone.out_channels = 512

    # Generate anchors using the RPN. Here, we are using 5x3 anchors.
    # Meaning, anchors with 5 different sizes and 3 different aspect 
    # ratios.
    anchor_generator = AnchorGenerator(
        sizes=((32, 64, 128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),)
    )

    # Feature maps to perform RoI cropping.
    # If backbone returns a Tensor, `featmap_names` is expected to
    # be [0]. We can choose which feature maps to use.
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(
        featmap_names=['0'],
        output_size=7,
        sampling_ratio=2
    )

    # Final Faster RCNN model.
    model = FasterRCNN(
        backbone=backbone,
        num_classes=num_classes,
        rpn_anchor_generator=anchor_generator,
        box_roi_pool=roi_pooler
    )
    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_resnet50_fpn.py
================================================
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load Faster RCNN pre-trained model
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
        weights='DEFAULT'
    )
    if coco_model: # Return the COCO pretrained model for COCO classes.
        return model, coco_model
    
    # Get the number of input features 
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # define a new head for the detector with required number of classes
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_resnet50_fpn_v2.py
================================================
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def create_model(num_classes, pretrained=True, coco_model=False):
    # Load Faster RCNN pre-trained model
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn_v2(
        weights='DEFAULT'
    )
    if coco_model: # Return the COCO pretrained model for COCO classes.
        return model, coco_model
    
    # Get the number of input features 
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # define a new head for the detector with required number of classes
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 

    return model

if __name__ == '__main__':
    from model_summary import summary
    model = create_model(num_classes=81, pretrained=True, coco_model=True)
    summary(model)

================================================
FILE: models/fasterrcnn_squeezenet1_0.py
================================================
"""
Faster RCNN model with

Download .txt

gitextract_6qmktazb/

├── .gitignore
├── .gitmodules
├── LICENSE
├── README.md
├── __init__.py
├── _config.yml
├── data/
│   └── README.md
├── data_configs/
│   ├── aquarium.yaml
│   ├── aquarium_yolo.yaml
│   ├── buggy_data.yaml
│   ├── coco.yaml
│   ├── coco128.yaml
│   ├── gtsdb.yaml
│   ├── smoke.yaml
│   ├── test_image_config.yaml
│   ├── test_video_config.yaml
│   ├── trash_icra.yaml
│   └── voc.yaml
├── datasets.py
├── docs/
│   ├── upcoming_updates.md
│   └── updates.md
├── eval.py
├── example_test_data/
│   └── README.md
├── export.py
├── inference.py
├── inference_video.py
├── models/
│   ├── __init__.py
│   ├── create_fasterrcnn_model.py
│   ├── fasterrcnn_convnext_small.py
│   ├── fasterrcnn_convnext_tiny.py
│   ├── fasterrcnn_custom_resnet.py
│   ├── fasterrcnn_darknet.py
│   ├── fasterrcnn_dinov3_convnext_base.py
│   ├── fasterrcnn_dinov3_convnext_large.py
│   ├── fasterrcnn_dinov3_convnext_small.py
│   ├── fasterrcnn_dinov3_convnext_tiny.py
│   ├── fasterrcnn_dinov3_convnext_tiny_multifeat.py
│   ├── fasterrcnn_dinov3_vitb16.py
│   ├── fasterrcnn_dinov3_vith16plus.py
│   ├── fasterrcnn_dinov3_vitl16.py
│   ├── fasterrcnn_dinov3_vits16.py
│   ├── fasterrcnn_dinov3_vits16plus.py
│   ├── fasterrcnn_efficientnet_b0.py
│   ├── fasterrcnn_efficientnet_b4.py
│   ├── fasterrcnn_mbv3_large.py
│   ├── fasterrcnn_mbv3_small_nano_head.py
│   ├── fasterrcnn_mini_darknet.py
│   ├── fasterrcnn_mini_darknet_nano_head.py
│   ├── fasterrcnn_mini_squeezenet1_1_small_head.py
│   ├── fasterrcnn_mini_squeezenet1_1_tiny_head.py
│   ├── fasterrcnn_mobilenetv3_large_320_fpn.py
│   ├── fasterrcnn_mobilenetv3_large_fpn.py
│   ├── fasterrcnn_mobilevit_xxs.py
│   ├── fasterrcnn_nano.py
│   ├── fasterrcnn_regnet_y_400mf.py
│   ├── fasterrcnn_resnet101.py
│   ├── fasterrcnn_resnet152.py
│   ├── fasterrcnn_resnet18.py
│   ├── fasterrcnn_resnet50_fpn.py
│   ├── fasterrcnn_resnet50_fpn_v2.py
│   ├── fasterrcnn_squeezenet1_0.py
│   ├── fasterrcnn_squeezenet1_1.py
│   ├── fasterrcnn_squeezenet1_1_small_head.py
│   ├── fasterrcnn_vgg16.py
│   ├── fasterrcnn_vitdet.py
│   ├── fasterrcnn_vitdet_tiny.py
│   ├── layers.py
│   ├── model_summary.py
│   └── utils.py
├── notebook_examples/
│   ├── custom_faster_rcnn_training_colab.ipynb
│   ├── custom_faster_rcnn_training_kaggle.ipynb
│   └── visualizations.ipynb
├── onnx_inference_image.py
├── onnx_inference_video.py
├── requirements.txt
├── requirements_blackwell.txt
├── sahi_inference.py
├── torch_utils/
│   ├── README.md
│   ├── __init__.py
│   ├── coco_eval.py
│   ├── coco_utils.py
│   ├── engine.py
│   └── utils.py
├── train.py
├── utils/
│   ├── __init__.py
│   ├── annotations.py
│   ├── general.py
│   ├── logging.py
│   ├── transforms.py
│   └── validate.py
└── weights/
    └── readme.txt

Download .txt

SYMBOL INDEX (399 symbols across 59 files)

FILE: datasets.py
  class CustomDataset (line 20) | class CustomDataset(Dataset):
    method __init__ (line 21) | def __init__(
    method read_and_clean (line 60) | def read_and_clean(self):
    method resize (line 107) | def resize(self, im, square=False):
    method load_image_and_labels (line 117) | def load_image_and_labels(self, index):
    method load_pascal_voc (line 141) | def load_pascal_voc(self, image, image_name, image_resized):
    method load_yolo (line 217) | def load_yolo(self, image, image_name, image_resized):
    method check_image_and_annotation (line 292) | def check_image_and_annotation(
    method load_cutmix_image_and_boxes (line 341) | def load_cutmix_image_and_boxes(self, index, resize_factor=512):
    method __getitem__ (line 407) | def __getitem__(self, idx):
    method __len__ (line 460) | def __len__(self):
  function collate_fn (line 463) | def collate_fn(batch):
  function create_train_dataset (line 471) | def create_train_dataset(
  function create_valid_dataset (line 494) | def create_valid_dataset(
  function create_train_loader (line 514) | def create_train_loader(
  function create_valid_loader (line 527) | def create_valid_loader(

FILE: eval.py
  function evaluate (line 134) | def evaluate(

FILE: export.py
  function parse_opt (line 18) | def parse_opt():
  function main (line 56) | def main(args):

FILE: inference.py
  function collect_all_images (line 21) | def collect_all_images(dir_test):
  function parse_opt (line 39) | def parse_opt():
  function main (line 132) | def main(args):

FILE: inference_video.py
  function read_return_video_data (line 26) | def read_return_video_data(video_path):
  function parse_opt (line 34) | def parse_opt():
  function main (line 121) | def main(args):

FILE: models/create_fasterrcnn_model.py
  function return_fasterrcnn_resnet50_fpn (line 3) | def return_fasterrcnn_resnet50_fpn(
  function return_fasterrcnn_mobilenetv3_large_fpn (line 11) | def return_fasterrcnn_mobilenetv3_large_fpn(
  function return_fasterrcnn_mobilenetv3_large_320_fpn (line 19) | def return_fasterrcnn_mobilenetv3_large_320_fpn(
  function return_fasterrcnn_resnet18 (line 27) | def return_fasterrcnn_resnet18(
  function return_fasterrcnn_custom_resnet (line 35) | def return_fasterrcnn_custom_resnet(
  function return_fasterrcnn_darknet (line 43) | def return_fasterrcnn_darknet(
  function return_fasterrcnn_squeezenet1_0 (line 51) | def return_fasterrcnn_squeezenet1_0(
  function return_fasterrcnn_squeezenet1_1 (line 59) | def return_fasterrcnn_squeezenet1_1(
  function return_fasterrcnn_mini_darknet (line 67) | def return_fasterrcnn_mini_darknet(
  function return_fasterrcnn_squeezenet1_1_small_head (line 75) | def return_fasterrcnn_squeezenet1_1_small_head(
  function return_fasterrcnn_mini_squeezenet1_1_small_head (line 83) | def return_fasterrcnn_mini_squeezenet1_1_small_head(
  function return_fasterrcnn_mini_squeezenet1_1_tiny_head (line 91) | def return_fasterrcnn_mini_squeezenet1_1_tiny_head(
  function return_fasterrcnn_mbv3_small_nano_head (line 99) | def return_fasterrcnn_mbv3_small_nano_head(
  function return_fasterrcnn_mini_darknet_nano_head (line 107) | def return_fasterrcnn_mini_darknet_nano_head(
  function return_fasterrcnn_efficientnet_b0 (line 115) | def return_fasterrcnn_efficientnet_b0(
  function return_fasterrcnn_nano (line 123) | def return_fasterrcnn_nano(
  function return_fasterrcnn_resnet152 (line 131) | def return_fasterrcnn_resnet152(
  function return_fasterrcnn_resnet50_fpn_v2 (line 139) | def return_fasterrcnn_resnet50_fpn_v2(
  function return_fasterrcnn_convnext_small (line 147) | def return_fasterrcnn_convnext_small(
  function return_fasterrcnn_convnext_tiny (line 155) | def return_fasterrcnn_convnext_tiny(
  function return_fasterrcnn_resnet101 (line 163) | def return_fasterrcnn_resnet101(
  function return_fasterrcnn_vitdet (line 171) | def return_fasterrcnn_vitdet(
  function return_fasterrcnn_vitdet_tiny (line 179) | def return_fasterrcnn_vitdet_tiny(
  function return_fasterrcnn_mobilevit_xxs (line 187) | def return_fasterrcnn_mobilevit_xxs(
  function return_fasterrcnn_regnet_y_400mf (line 195) | def return_fasterrcnn_regnet_y_400mf(
  function return_fasterrcnn_vgg16 (line 203) | def return_fasterrcnn_vgg16(
  function return_fasterrcnn_dinov3_convnext_tiny (line 211) | def return_fasterrcnn_dinov3_convnext_tiny(
  function return_fasterrcnn_dinov3_vits16 (line 219) | def return_fasterrcnn_dinov3_vits16(
  function return_fasterrcnn_dinov3_convnext_tiny_multifeat (line 227) | def return_fasterrcnn_dinov3_convnext_tiny_multifeat(
  function return_fasterrcnn_dinov3_vits16plus (line 235) | def return_fasterrcnn_dinov3_vits16plus(
  function return_fasterrcnn_dinov3_vitb16 (line 243) | def return_fasterrcnn_dinov3_vitb16(
  function return_fasterrcnn_dinov3_vitl16 (line 251) | def return_fasterrcnn_dinov3_vitl16(
  function return_fasterrcnn_dinov3_vith16plus (line 259) | def return_fasterrcnn_dinov3_vith16plus(
  function return_fasterrcnn_dinov3_convnext_small (line 267) | def return_fasterrcnn_dinov3_convnext_small(
  function return_fasterrcnn_dinov3_convnext_base (line 275) | def return_fasterrcnn_dinov3_convnext_base(
  function return_fasterrcnn_dinov3_convnext_large (line 283) | def return_fasterrcnn_dinov3_convnext_large(

FILE: models/fasterrcnn_convnext_small.py
  function create_model (line 13) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_convnext_tiny.py
  function create_model (line 13) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_custom_resnet.py
  class ResidualBlock (line 8) | class ResidualBlock(nn.Module):
    method __init__ (line 12) | def __init__(
    method forward (line 28) | def forward(self, x):
  function create_resnet_block (line 37) | def create_resnet_block(
  class CustomResNet (line 51) | class CustomResNet(nn.Module):
    method __init__ (line 52) | def __init__(self, num_classes=10):
    method forward (line 64) | def forward(self, x):
  function create_model (line 75) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_darknet.py
  class DarkNet (line 9) | class DarkNet(nn.Module):
    method __init__ (line 10) | def __init__(self, initialize_weights=True, num_classes=1000):
    method _create_conv_layers (line 23) | def _create_conv_layers(self):
    method _pool (line 82) | def _pool(self):
    method _create_fc_layers (line 88) | def _create_fc_layers(self):
    method _initialize_weights (line 94) | def _initialize_weights(self):
    method forward (line 106) | def forward(self, x):
  function create_model (line 113) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_convnext_base.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 37) | def forward(self, x):
  function create_model (line 44) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_convnext_large.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 37) | def forward(self, x):
  function create_model (line 44) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_convnext_small.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 37) | def forward(self, x):
  function create_model (line 44) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_convnext_tiny.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 37) | def forward(self, x):
  function create_model (line 44) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_convnext_tiny_multifeat.py
  class Dinov3Backbone (line 28) | class Dinov3Backbone(nn.Module):
    method __init__ (line 29) | def __init__(self):
    method forward (line 48) | def forward(self, x):
  function create_model (line 62) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_vitb16.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 39) | def forward(self, x):
  function create_model (line 52) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_vith16plus.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 39) | def forward(self, x):
  function create_model (line 52) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_vitl16.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 39) | def forward(self, x):
  function create_model (line 52) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_vits16.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 39) | def forward(self, x):
  function create_model (line 52) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_dinov3_vits16plus.py
  class Dinov3Backbone (line 27) | class Dinov3Backbone(nn.Module):
    method __init__ (line 28) | def __init__(self):
    method forward (line 39) | def forward(self, x):
  function create_model (line 52) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_efficientnet_b0.py
  function create_model (line 12) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_efficientnet_b4.py
  function create_model (line 10) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mbv3_large.py
  function create_model (line 13) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mbv3_small_nano_head.py
  class TwoMLPHead (line 21) | class TwoMLPHead(nn.Module):
    method __init__ (line 30) | def __init__(self, in_channels, representation_size):
    method forward (line 36) | def forward(self, x):
  class FastRCNNPredictor (line 44) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 54) | def __init__(self, in_channels, num_classes):
    method forward (line 59) | def forward(self, x):
  function create_model (line 71) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mini_darknet.py
  class DarkNet (line 8) | class DarkNet(nn.Module):
    method __init__ (line 9) | def __init__(self, initialize_weights=True, num_classes=1000):
    method _create_conv_layers (line 22) | def _create_conv_layers(self):
    method _pool (line 74) | def _pool(self):
    method _create_fc_layers (line 80) | def _create_fc_layers(self):
    method _initialize_weights (line 86) | def _initialize_weights(self):
    method forward (line 98) | def forward(self, x):
  function create_model (line 105) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mini_darknet_nano_head.py
  class TwoMLPHead (line 15) | class TwoMLPHead(nn.Module):
    method __init__ (line 24) | def __init__(self, in_channels, representation_size):
    method forward (line 30) | def forward(self, x):
  class FastRCNNPredictor (line 38) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 48) | def __init__(self, in_channels, num_classes):
    method forward (line 53) | def forward(self, x):
  class DarkNet (line 66) | class DarkNet(nn.Module):
    method __init__ (line 67) | def __init__(self, initialize_weights=True, num_classes=1000):
    method _create_conv_layers (line 80) | def _create_conv_layers(self):
    method _pool (line 132) | def _pool(self):
    method _create_fc_layers (line 138) | def _create_fc_layers(self):
    method _initialize_weights (line 144) | def _initialize_weights(self):
    method forward (line 156) | def forward(self, x):
  function create_model (line 163) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mini_squeezenet1_1_small_head.py
  class TwoMLPHead (line 19) | class TwoMLPHead(nn.Module):
    method __init__ (line 28) | def __init__(self, in_channels, representation_size):
    method forward (line 34) | def forward(self, x):
  class FastRCNNPredictor (line 42) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 52) | def __init__(self, in_channels, num_classes):
    method forward (line 57) | def forward(self, x):
  function create_model (line 69) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mini_squeezenet1_1_tiny_head.py
  class TwoMLPHead (line 19) | class TwoMLPHead(nn.Module):
    method __init__ (line 28) | def __init__(self, in_channels, representation_size):
    method forward (line 34) | def forward(self, x):
  class FastRCNNPredictor (line 42) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 52) | def __init__(self, in_channels, num_classes):
    method forward (line 57) | def forward(self, x):
  function create_model (line 69) | def create_model(num_classes=81, pretrained=True, coco_model=True):

FILE: models/fasterrcnn_mobilenetv3_large_320_fpn.py
  function create_model (line 5) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mobilenetv3_large_fpn.py
  function create_model (line 5) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_mobilevit_xxs.py
  function create_model (line 24) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_nano.py
  class TwoMLPHead (line 14) | class TwoMLPHead(nn.Module):
    method __init__ (line 23) | def __init__(self, in_channels, representation_size):
    method forward (line 29) | def forward(self, x):
  class FastRCNNPredictor (line 37) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 47) | def __init__(self, in_channels, num_classes):
    method forward (line 52) | def forward(self, x):
  class NanoBackbone (line 65) | class NanoBackbone(nn.Module):
    method __init__ (line 66) | def __init__(self, initialize_weights=True, num_classes=1000):
    method _create_conv_layers (line 77) | def _create_conv_layers(self):
    method _initialize_weights (line 96) | def _initialize_weights(self):
  function create_model (line 108) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_regnet_y_400mf.py
  function create_model (line 15) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_resnet101.py
  function create_model (line 14) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_resnet152.py
  function create_model (line 14) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_resnet18.py
  function create_model (line 13) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_resnet50_fpn.py
  function create_model (line 5) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_resnet50_fpn_v2.py
  function create_model (line 5) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_squeezenet1_0.py
  function create_model (line 12) | def create_model(num_classes=81, pretrained=False, coco_model=False):

FILE: models/fasterrcnn_squeezenet1_1.py
  function create_model (line 12) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_squeezenet1_1_small_head.py
  class TwoMLPHead (line 17) | class TwoMLPHead(nn.Module):
    method __init__ (line 26) | def __init__(self, in_channels, representation_size):
    method forward (line 32) | def forward(self, x):
  class FastRCNNPredictor (line 40) | class FastRCNNPredictor(nn.Module):
    method __init__ (line 50) | def __init__(self, in_channels, num_classes):
    method forward (line 55) | def forward(self, x):
  function create_model (line 67) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_vgg16.py
  function create_model (line 13) | def create_model(num_classes, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_vitdet.py
  class ViT (line 25) | class ViT(Backbone):
    method __init__ (line 32) | def __init__(
    method _init_weights (line 133) | def _init_weights(self, m):
    method forward (line 142) | def forward(self, x):
  class SimpleFeaturePyramid (line 155) | class SimpleFeaturePyramid(Backbone):
    method __init__ (line 161) | def __init__(
    method padding_constraints (line 262) | def padding_constraints(self):
    method forward (line 268) | def forward(self, x):
  function create_model (line 294) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/fasterrcnn_vitdet_tiny.py
  class ViT (line 25) | class ViT(Backbone):
    method __init__ (line 32) | def __init__(
    method _init_weights (line 133) | def _init_weights(self, m):
    method forward (line 142) | def forward(self, x):
  class SimpleFeaturePyramid (line 155) | class SimpleFeaturePyramid(Backbone):
    method __init__ (line 161) | def __init__(
    method padding_constraints (line 262) | def padding_constraints(self):
    method forward (line 268) | def forward(self, x):
  function create_model (line 294) | def create_model(num_classes=81, pretrained=True, coco_model=False):

FILE: models/layers.py
  class Mlp (line 18) | class Mlp(nn.Module):
    method __init__ (line 19) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 28) | def forward(self, x):
  function drop_path (line 36) | def drop_path(x, drop_prob: float = 0., training: bool = False, scale_by...
  class ShapeSpec (line 55) | class ShapeSpec:
  class DropPath (line 68) | class DropPath(nn.Module):
    method __init__ (line 71) | def __init__(self, drop_prob: float = 0., scale_by_keep: bool = True):
    method forward (line 76) | def forward(self, x):
    method extra_repr (line 79) | def extra_repr(self):
  class Backbone (line 82) | class Backbone(nn.Module, metaclass=ABCMeta):
    method __init__ (line 86) | def __init__(self):
    method forward (line 93) | def forward(self):
    method size_divisibility (line 102) | def size_divisibility(self) -> int:
    method padding_constraints (line 113) | def padding_constraints(self) -> Dict[str, int]:
    method output_shape (line 131) | def output_shape(self):
  function get_rel_pos (line 144) | def get_rel_pos(q_size, k_size, rel_pos):
  function add_decomposed_rel_pos (line 175) | def add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size):
  class Conv2d (line 205) | class Conv2d(torch.nn.Conv2d):
    method __init__ (line 210) | def __init__(self, *args, **kwargs):
    method forward (line 225) | def forward(self, x):
  class Attention (line 249) | class Attention(nn.Module):
    method __init__ (line 252) | def __init__(
    method forward (line 288) | def forward(self, x):
  class FrozenBatchNorm2d (line 306) | class FrozenBatchNorm2d(nn.Module):
    method __init__ (line 324) | def __init__(self, num_features, eps=1e-5):
    method forward (line 333) | def forward(self, x):
    method _load_from_state_dict (line 356) | def _load_from_state_dict(
    method __repr__ (line 373) | def __repr__(self):
    method convert_frozen_batchnorm (line 377) | def convert_frozen_batchnorm(cls, module):
  class CNNBlockBase (line 407) | class CNNBlockBase(nn.Module):
    method __init__ (line 419) | def __init__(self, in_channels, out_channels, stride):
    method freeze (line 432) | def freeze(self):
  class LayerNorm (line 445) | class LayerNorm(nn.Module):
    method __init__ (line 453) | def __init__(self, normalized_shape, eps=1e-6):
    method forward (line 460) | def forward(self, x):
  function get_norm (line 467) | def get_norm(norm, out_channels):
  class NaiveSyncBatchNorm (line 495) | class NaiveSyncBatchNorm(BatchNorm2d):
    method __init__ (line 518) | def __init__(self, *args, stats_mode="", **kwargs):
    method forward (line 523) | def forward(self, input):
  function c2_msra_fill (line 570) | def c2_msra_fill(module: nn.Module) -> None:
  class ResBottleneckBlock (line 584) | class ResBottleneckBlock(CNNBlockBase):
    method __init__ (line 590) | def __init__(
    method forward (line 635) | def forward(self, x):
  function window_partition (line 643) | def window_partition(x, window_size):
  function window_unpartition (line 665) | def window_unpartition(windows, window_size, pad_hw, hw):
  function get_abs_pos (line 686) | def get_abs_pos(abs_pos, has_cls_token, hw):
  class Block (line 716) | class Block(nn.Module):
    method __init__ (line 719) | def __init__(
    method forward (line 778) | def forward(self, x):
  class PatchEmbed (line 799) | class PatchEmbed(nn.Module):
    method __init__ (line 804) | def __init__(
    method forward (line 820) | def forward(self, x):
  class LastLevelMaxPool (line 826) | class LastLevelMaxPool(nn.Module):
    method __init__ (line 832) | def __init__(self):
    method forward (line 837) | def forward(self, x):

FILE: models/model_summary.py
  function summary (line 4) | def summary(model):

FILE: models/utils.py
  function get_world_size (line 9) | def get_world_size() -> int:
  function _assert_strides_are_log2_contiguous (line 16) | def _assert_strides_are_log2_contiguous(strides):
  function differentiable_all_reduce (line 25) | def differentiable_all_reduce(input: torch.Tensor) -> torch.Tensor:
  class _AllReduce (line 37) | class _AllReduce(Function):
    method forward (line 39) | def forward(ctx, input: torch.Tensor) -> torch.Tensor:
    method backward (line 47) | def backward(ctx, grad_output: torch.Tensor) -> torch.Tensor:

FILE: onnx_inference_image.py
  function collect_all_images (line 27) | def collect_all_images(dir_test):
  function to_numpy (line 45) | def to_numpy(tensor):
  function parse_opt (line 48) | def parse_opt():
  function main (line 114) | def main(args):

FILE: onnx_inference_video.py
  function read_return_video_data (line 31) | def read_return_video_data(video_path):
  function to_numpy (line 39) | def to_numpy(tensor):
  function parse_opt (line 42) | def parse_opt():
  function main (line 113) | def main(args):

FILE: sahi_inference.py
  function collect_all_images (line 32) | def collect_all_images(dir_test):
  function parse_opt (line 50) | def parse_opt():
  function main (line 177) | def main(args):

FILE: torch_utils/coco_eval.py
  class CocoEvaluator (line 13) | class CocoEvaluator:
    method __init__ (line 14) | def __init__(self, coco_gt, iou_types):
    method update (line 27) | def update(self, predictions):
    method synchronize_between_processes (line 43) | def synchronize_between_processes(self):
    method accumulate (line 48) | def accumulate(self):
    method summarize (line 52) | def summarize(self):
    method prepare (line 58) | def prepare(self, predictions, iou_type):
    method prepare_for_coco_detection (line 67) | def prepare_for_coco_detection(self, predictions):
    method prepare_for_coco_segmentation (line 91) | def prepare_for_coco_segmentation(self, predictions):
    method prepare_for_coco_keypoint (line 125) | def prepare_for_coco_keypoint(self, predictions):
  function convert_to_xywh (line 152) | def convert_to_xywh(boxes):
  function merge (line 157) | def merge(img_ids, eval_imgs):
  function create_common_coco_eval (line 179) | def create_common_coco_eval(coco_eval, img_ids, eval_imgs):
  function evaluate (line 189) | def evaluate(imgs):

FILE: torch_utils/coco_utils.py
  class FilterAndRemapCocoCategories (line 12) | class FilterAndRemapCocoCategories:
    method __init__ (line 13) | def __init__(self, categories, remap=True):
    method __call__ (line 17) | def __call__(self, image, target):
  function convert_coco_poly_to_mask (line 30) | def convert_coco_poly_to_mask(segmentations, height, width):
  class ConvertCocoPolysToMask (line 47) | class ConvertCocoPolysToMask:
    method __call__ (line 48) | def __call__(self, image, target):
  function _coco_remove_images_without_annotations (line 103) | def _coco_remove_images_without_annotations(dataset, cat_list=None):
  function convert_to_coco_api (line 143) | def convert_to_coco_api(ds):
  function get_coco_api_from_dataset (line 196) | def get_coco_api_from_dataset(dataset):
  class CocoDetection (line 206) | class CocoDetection(torchvision.datasets.CocoDetection):
    method __init__ (line 207) | def __init__(self, img_folder, ann_file, transforms):
    method __getitem__ (line 211) | def __getitem__(self, idx):
  function get_coco (line 220) | def get_coco(root, image_set, transforms, mode="instances"):
  function get_coco_kp (line 248) | def get_coco_kp(root, image_set, transforms):

FILE: torch_utils/engine.py
  function train_one_epoch (line 12) | def train_one_epoch(
  function _get_iou_types (line 103) | def _get_iou_types(model):
  function evaluate (line 116) | def evaluate(

FILE: torch_utils/utils.py
  class SmoothedValue (line 13) | class SmoothedValue:
    method __init__ (line 18) | def __init__(self, window_size=20, fmt=None):
    method update (line 26) | def update(self, value, n=1):
    method synchronize_between_processes (line 31) | def synchronize_between_processes(self):
    method median (line 45) | def median(self):
    method avg (line 50) | def avg(self):
    method global_avg (line 55) | def global_avg(self):
    method max (line 59) | def max(self):
    method value (line 63) | def value(self):
    method __str__ (line 66) | def __str__(self):
  function all_gather (line 72) | def all_gather(data):
  function reduce_dict (line 88) | def reduce_dict(input_dict, average=True):
  class MetricLogger (line 115) | class MetricLogger:
    method __init__ (line 116) | def __init__(self, delimiter="\t"):
    method update (line 120) | def update(self, **kwargs):
    method __getattr__ (line 127) | def __getattr__(self, attr):
    method __str__ (line 134) | def __str__(self):
    method synchronize_between_processes (line 140) | def synchronize_between_processes(self):
    method add_meter (line 144) | def add_meter(self, name, meter):
    method log_every (line 147) | def log_every(self, iterable, print_freq, header=None):
  function collate_fn (line 205) | def collate_fn(batch):
  function mkdir (line 209) | def mkdir(path):
  function setup_for_distributed (line 217) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 233) | def is_dist_avail_and_initialized():
  function get_world_size (line 241) | def get_world_size():
  function get_rank (line 247) | def get_rank():
  function is_main_process (line 253) | def is_main_process():
  function save_on_master (line 257) | def save_on_master(*args, **kwargs):
  function init_distributed_mode (line 262) | def init_distributed_mode(args):

FILE: train.py
  function parse_opt (line 60) | def parse_opt():
  function main (line 230) | def main(args):

FILE: utils/annotations.py
  function convert_detections (line 4) | def convert_detections(
  function convert_pre_track (line 36) | def convert_pre_track(
  function convert_post_track (line 51) | def convert_post_track(
  function inference_annotations (line 68) | def inference_annotations(
  function draw_text (line 130) | def draw_text(
  function annotate_fps (line 159) | def annotate_fps(orig_image, fps_text):

FILE: utils/general.py
  function init_seeds (line 13) | def init_seeds(seed=0, deterministic=False):
  class Averager (line 29) | class Averager:
    method __init__ (line 30) | def __init__(self):
    method send (line 34) | def send(self, value):
    method value (line 39) | def value(self):
    method reset (line 45) | def reset(self):
  class SaveBestModel (line 49) | class SaveBestModel:
    method __init__ (line 55) | def __init__(
    method __call__ (line 60) | def __call__(
  function show_tranformed_image (line 80) | def show_tranformed_image(train_loader, device, classes, colors):
  function save_loss_plot (line 135) | def save_loss_plot(
  function save_mAP (line 156) | def save_mAP(OUT_DIR, map_05, map):
  function visualize_mosaic_images (line 179) | def visualize_mosaic_images(boxes, labels, image_resized, classes):
  function save_model (line 197) | def save_model(
  function save_model_state (line 235) | def save_model_state(model, OUT_DIR, config, model_name):
  function denormalize (line 250) | def denormalize(x, mean=None, std=None):
  function save_validation_results (line 257) | def save_validation_results(images, detections, counter, out_dir, classe...
  function set_infer_dir (line 298) | def set_infer_dir():
  function set_training_dir (line 312) | def set_training_dir(dir_name=None, project_dir=None):
  function yaml_save (line 334) | def yaml_save(file_path=None, data={}):
  class EarlyStopping (line 342) | class EarlyStopping():
    method __init__ (line 347) | def __init__(self, patience=10, min_delta=0):
    method __call__ (line 360) | def __call__(self, map):

FILE: utils/logging.py
  function wandb_init (line 12) | def wandb_init(name):
  function set_log (line 18) | def set_log(log_dir):
  function log (line 31) | def log(content, *args):
  function coco_log (line 36) | def coco_log(log_dir, stats):
  function set_summary_writer (line 63) | def set_summary_writer(log_dir):
  function tensorboard_loss_log (line 67) | def tensorboard_loss_log(name, loss_np_arr, writer, epoch):
  function tensorboard_map_log (line 74) | def tensorboard_map_log(name, val_map_05, val_map, writer, epoch):
  function create_log_csv (line 84) | def create_log_csv(log_dir):
  function csv_log (line 98) | def csv_log(
  function overlay_on_canvas (line 130) | def overlay_on_canvas(bg, image):
  function wandb_log (line 139) | def wandb_log(
  function wandb_save_model (line 219) | def wandb_save_model(model_dir):
  class LogJSON (line 227) | class LogJSON():
    method __init__ (line 228) | def __init__(self, output_filename):
    method update (line 246) | def update(self, image, file_name, boxes, labels, classes):
    method save (line 290) | def save(self, output_filename):

FILE: utils/transforms.py
  function resize (line 8) | def resize(im, img_size=640, square=False):
  function get_train_aug (line 20) | def get_train_aug():
  function get_train_transform (line 37) | def get_train_transform():
  function transform_mosaic (line 45) | def transform_mosaic(mosaic, boxes, img_size=640):
  function get_valid_transform (line 77) | def get_valid_transform():
  function infer_transforms (line 85) | def infer_transforms(image):

Download .json

Condensed preview — 91 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,489K chars).

[
  {
    "path": ".gitignore",
    "chars": 237,
    "preview": "data/*\r\n!data/README.md\r\noutputs/\r\ninference_outputs\r\ntest.sh\r\n*__pycache__/\r\n\r\n# Custom weights folder (mainly for test"
  },
  {
    "path": ".gitmodules",
    "chars": 90,
    "preview": "[submodule \"dinov3\"]\n\tpath = dinov3\n\turl = https://github.com/facebookresearch/dinov3.git\n"
  },
  {
    "path": "LICENSE",
    "chars": 1074,
    "preview": "MIT License\n\nCopyright (c) 2023 Sovit Ranjan Rath\n\nPermission is hereby granted, free of charge, to any person obtaining"
  },
  {
    "path": "README.md",
    "chars": 18427,
    "preview": "# A Simple Pipeline to Train PyTorch FasterRCNN Model\r\n\r\n![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11"
  },
  {
    "path": "__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "_config.yml",
    "chars": 25,
    "preview": "theme: jekyll-theme-dinky"
  },
  {
    "path": "data/README.md",
    "chars": 2132,
    "preview": "# README\n\n\n\n**A list of training and inference datasets to try out the custom training on.**\n\n**You can also download th"
  },
  {
    "path": "data_configs/aquarium.yaml",
    "chars": 884,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/Aquarium Combined.v2-raw-1024.v"
  },
  {
    "path": "data_configs/aquarium_yolo.yaml",
    "chars": 951,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/Fish Detection.v1i.yolov8/train"
  },
  {
    "path": "data_configs/buggy_data.yaml",
    "chars": 2412,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/coco2017/train2017'\nTRAIN_DIR_L"
  },
  {
    "path": "data_configs/coco.yaml",
    "chars": 2412,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/coco2017/train2017'\nTRAIN_DIR_L"
  },
  {
    "path": "data_configs/coco128.yaml",
    "chars": 2507,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/coco_128/train'\nTRAIN_DIR_LABEL"
  },
  {
    "path": "data_configs/gtsdb.yaml",
    "chars": 1687,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/GTSDB/train_images'\nTRAIN_DIR_L"
  },
  {
    "path": "data_configs/smoke.yaml",
    "chars": 710,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: ../input/smoke_pascal_voc/archive/train/i"
  },
  {
    "path": "data_configs/test_image_config.yaml",
    "chars": 1058,
    "preview": "image_path: example_test_data/image_1.jpg\nNC: 91\nCLASSES: [\n    '__background__', 'person', 'bicycle', 'car', 'motorcycl"
  },
  {
    "path": "data_configs/test_video_config.yaml",
    "chars": 1058,
    "preview": "video_path: example_test_data/video_1.mp4\nNC: 91\nCLASSES: [\n    '__background__', 'person', 'bicycle', 'car', 'motorcycl"
  },
  {
    "path": "data_configs/trash_icra.yaml",
    "chars": 840,
    "preview": "# Link => https://www.kaggle.com/datasets/sovitrath/underwater-trash-detection-icra\n\n# Images and labels direcotry shoul"
  },
  {
    "path": "data_configs/voc.yaml",
    "chars": 784,
    "preview": "# Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '../input/voc_07_12/voc_xml_dataset/train"
  },
  {
    "path": "datasets.py",
    "chars": 20520,
    "preview": "import torch\nimport cv2\nimport numpy as np\nimport os\nimport glob as glob\nimport random\n\nfrom xml.etree import ElementTre"
  },
  {
    "path": "docs/upcoming_updates.md",
    "chars": 909,
    "preview": "# Upcoming Updates\n\n\n\n## Code Updates\n\n- [x] Proper resuming of training for plots, epochs, with loaded optimizer state "
  },
  {
    "path": "docs/updates.md",
    "chars": 1657,
    "preview": "# Updates\n\n## 2023-8-18\n\n* Filter classes to visualize during inference using the `--classes` command line argument with"
  },
  {
    "path": "eval.py",
    "chars": 7912,
    "preview": "\"\"\"\nRun evaluation on a trained model to get mAP and class wise AP.\n\nUSAGE:\npython eval.py --data data_configs/voc.yaml "
  },
  {
    "path": "example_test_data/README.md",
    "chars": 1325,
    "preview": "# README\n\n\n\n## Image / Video Credits and Attributions\n\n* `image_1.jpg`: Image by <a href=\"https://pixabay.com/users/xdig"
  },
  {
    "path": "export.py",
    "chars": 2961,
    "preview": "\"\"\"\nExport to ONNX.\n\nRequirements:\npip install onnx onnxruntime\n\nUSAGE:\npython export.py --weights outputs/training/fast"
  },
  {
    "path": "inference.py",
    "chars": 10087,
    "preview": "import numpy as np\nimport cv2\nimport pandas.io.common\nimport torch\nimport glob as glob\nimport os\nimport time\nimport argp"
  },
  {
    "path": "inference_video.py",
    "chars": 10120,
    "preview": "import numpy as np\nimport cv2\nimport torch\nimport glob as glob\nimport os\nimport time\nimport argparse\nimport yaml\nimport "
  },
  {
    "path": "models/__init__.py",
    "chars": 1321,
    "preview": "__all__ = [\n    'fasterrcnn_convnext_small',\n    'fasterrcnn_convnext_tiny',\n    'fasterrcnn_custom_resnet', \n    'faste"
  },
  {
    "path": "models/create_fasterrcnn_model.py",
    "chars": 10631,
    "preview": "from models import *\n\ndef return_fasterrcnn_resnet50_fpn(\n    num_classes, pretrained=True, coco_model=False\n):\n    mode"
  },
  {
    "path": "models/fasterrcnn_convnext_small.py",
    "chars": 1774,
    "preview": "\"\"\"\nFaster RCNN model with the Convnext Small backbone from \nTorchvision classification models.\n\nReference: https://pyto"
  },
  {
    "path": "models/fasterrcnn_convnext_tiny.py",
    "chars": 1761,
    "preview": "\"\"\"\nFaster RCNN model with the Convnext Tiny backbone from \nTorchvision classification models.\n\nReference: https://pytor"
  },
  {
    "path": "models/fasterrcnn_custom_resnet.py",
    "chars": 4484,
    "preview": "import torchvision\n\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom torchvision.models.detection import F"
  },
  {
    "path": "models/fasterrcnn_darknet.py",
    "chars": 4993,
    "preview": "import torchvision\nimport torch\n\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom torchvision.models.detec"
  },
  {
    "path": "models/fasterrcnn_dinov3_convnext_base.py",
    "chars": 2453,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_convnext_large.py",
    "chars": 2455,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_convnext_small.py",
    "chars": 2454,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_convnext_tiny.py",
    "chars": 2444,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_convnext_tiny_multifeat.py",
    "chars": 3239,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\nimport math\n\nfrom torchvision.models.detectio"
  },
  {
    "path": "models/fasterrcnn_dinov3_vitb16.py",
    "chars": 2646,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_vith16plus.py",
    "chars": 2657,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_vitl16.py",
    "chars": 2649,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_vits16.py",
    "chars": 2725,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_dinov3_vits16plus.py",
    "chars": 2654,
    "preview": "import sys\nimport os\nimport torch\nimport torchvision\nimport torch.nn as nn\n\nfrom torchvision.models.detection import Fas"
  },
  {
    "path": "models/fasterrcnn_efficientnet_b0.py",
    "chars": 1624,
    "preview": "\"\"\"\nFaster RCNN model with the EfficientNetB0 backbone.\n\nReference: https://pytorch.org/tutorials/intermediate/torchvisi"
  },
  {
    "path": "models/fasterrcnn_efficientnet_b4.py",
    "chars": 1513,
    "preview": "\"\"\"\nFaster RCNN model with the EfficientNetB4 backbone.\n\"\"\"\n\nimport torchvision\n\nfrom torchvision.models.detection impor"
  },
  {
    "path": "models/fasterrcnn_mbv3_large.py",
    "chars": 1666,
    "preview": "\"\"\"\nFaster RCNN model with the MobileNetV3 backbone from \nTorchvision classification models.\n\nReference: https://pytorch"
  },
  {
    "path": "models/fasterrcnn_mbv3_small_nano_head.py",
    "chars": 4096,
    "preview": "\"\"\"\nFaster RCNN model with the MobileNetV3 Small backbone from \nTorchvision classification models.\n\nReference: https://p"
  },
  {
    "path": "models/fasterrcnn_mini_darknet.py",
    "chars": 4802,
    "preview": "import torchvision\n\nfrom torch import nn\nfrom torchvision.models.detection import FasterRCNN\nfrom torchvision.models.det"
  },
  {
    "path": "models/fasterrcnn_mini_darknet_nano_head.py",
    "chars": 6887,
    "preview": "\"\"\"\nCustom Faster RCNN model with a smaller DarkNet backbone and a very small detection\nhead as well.\nDetection head rep"
  },
  {
    "path": "models/fasterrcnn_mini_squeezenet1_1_small_head.py",
    "chars": 4321,
    "preview": "\"\"\"\nBackbone: SqueezeNet1_1 with changed backbone features. Had to tweak a few \ninput and output features in the backbon"
  },
  {
    "path": "models/fasterrcnn_mini_squeezenet1_1_tiny_head.py",
    "chars": 4354,
    "preview": "\"\"\"\nBackbone: SqueezeNet1_1 with changed backbone features. Had to tweak a few \ninput and output features in the backbon"
  },
  {
    "path": "models/fasterrcnn_mobilenetv3_large_320_fpn.py",
    "chars": 864,
    "preview": "import torchvision\n\nfrom torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n\ndef create_model(num_classes"
  },
  {
    "path": "models/fasterrcnn_mobilenetv3_large_fpn.py",
    "chars": 858,
    "preview": "import torchvision\n\nfrom torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n\ndef create_model(num_classes"
  },
  {
    "path": "models/fasterrcnn_mobilevit_xxs.py",
    "chars": 2444,
    "preview": "\"\"\"\nFaster RCNN Head with MobileViT XXS (Extra Extra Small) as backbone.\nYou need to install vision_transformers library"
  },
  {
    "path": "models/fasterrcnn_nano.py",
    "chars": 4956,
    "preview": "\"\"\"\nCustom Faster RCNN model with a very small backbone and a represnetation\nsize of 128.\n\"\"\"\n\nimport torchvision\nimport"
  },
  {
    "path": "models/fasterrcnn_regnet_y_400mf.py",
    "chars": 1579,
    "preview": "\"\"\"\nFaster RCNN model with the RegNet_Y 400 MF backbone from\nTorchvision classification models.\n\nReference: https://pyto"
  },
  {
    "path": "models/fasterrcnn_resnet101.py",
    "chars": 1923,
    "preview": "\"\"\"\nFaster RCNN model with the ResNet101 backbone from\nTorchvision classification models.\n\nReference: https://pytorch.or"
  },
  {
    "path": "models/fasterrcnn_resnet152.py",
    "chars": 1923,
    "preview": "\"\"\"\nFaster RCNN model with the ResNet152 backbone from\nTorchvision classification models.\n\nReference: https://pytorch.or"
  },
  {
    "path": "models/fasterrcnn_resnet18.py",
    "chars": 2057,
    "preview": "\"\"\"\nFaster RCNN model with the ResNet18 backbone from Torchvision.\nTorchvision link: https://pytorch.org/vision/stable/m"
  },
  {
    "path": "models/fasterrcnn_resnet50_fpn.py",
    "chars": 848,
    "preview": "import torchvision\n\nfrom torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n\ndef create_model(num_classes"
  },
  {
    "path": "models/fasterrcnn_resnet50_fpn_v2.py",
    "chars": 851,
    "preview": "import torchvision\n\nfrom torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n\ndef create_model(num_classes"
  },
  {
    "path": "models/fasterrcnn_squeezenet1_0.py",
    "chars": 1651,
    "preview": "\"\"\"\nFaster RCNN model with the SqueezeNet1_0 model from Torchvision.\nTorchvision link: https://pytorch.org/vision/stable"
  },
  {
    "path": "models/fasterrcnn_squeezenet1_1.py",
    "chars": 1677,
    "preview": "\"\"\"\nFaster RCNN model with the SqueezeNet1_1 model from Torchvision.\nTorchvision link: https://pytorch.org/vision/stable"
  },
  {
    "path": "models/fasterrcnn_squeezenet1_1_small_head.py",
    "chars": 3673,
    "preview": "\"\"\"\nBackbone: SqueezeNet1_1\nTorchvision link: https://pytorch.org/vision/stable/models.html#id15\nSqueezeNet repo: https:"
  },
  {
    "path": "models/fasterrcnn_vgg16.py",
    "chars": 1773,
    "preview": "\"\"\"\nFaster RCNN model with the VGG16 backbone from Torchvision.\nTorchvision link: https://pytorch.org/vision/stable/mode"
  },
  {
    "path": "models/fasterrcnn_vitdet.py",
    "chars": 13558,
    "preview": "\"\"\"\nA lot of scripts borrowed/adapted from Detectron2.\nhttps://github.com/facebookresearch/detectron2/blob/38af375052d3a"
  },
  {
    "path": "models/fasterrcnn_vitdet_tiny.py",
    "chars": 13764,
    "preview": "\"\"\"\nA lot of scripts borrowed/adapted from Detectron2.\nhttps://github.com/facebookresearch/detectron2/blob/38af375052d3a"
  },
  {
    "path": "models/layers.py",
    "chars": 32732,
    "preview": "import torch\nimport torch.nn as nn\nimport warnings\nimport torch.nn.functional as F\nimport math\n\nfrom abc import ABCMeta,"
  },
  {
    "path": "models/model_summary.py",
    "chars": 439,
    "preview": "import torchinfo\nimport torch\n\ndef summary(model):\n    # Torchvision Faster RCNN models are enclosed within a tuple ().\n"
  },
  {
    "path": "models/utils.py",
    "chars": 1616,
    "preview": "import torch\n\nimport torch.distributed as dist\nfrom torch.autograd.function import Function\n\nBatchNorm2d = torch.nn.Batc"
  },
  {
    "path": "notebook_examples/custom_faster_rcnn_training_colab.ipynb",
    "chars": 1834127,
    "preview": "{\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0,\n  \"metadata\": {\n    \"colab\": {\n      \"provenance\": []\n    },\n    \"kernelspec\":"
  },
  {
    "path": "notebook_examples/custom_faster_rcnn_training_kaggle.ipynb",
    "chars": 1565224,
    "preview": "{\"metadata\":{\"kernelspec\":{\"display_name\":\"Python 3\",\"language\":\"python\",\"name\":\"python3\"},\"language_info\":{\"name\":\"pyth"
  },
  {
    "path": "notebook_examples/visualizations.ipynb",
    "chars": 682403,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"aaab361f\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Notebook fo"
  },
  {
    "path": "onnx_inference_image.py",
    "chars": 6958,
    "preview": "\"\"\"\nScript to run inference on images using ONNX models.\n`--input` can take the path either an image or a directory cont"
  },
  {
    "path": "onnx_inference_video.py",
    "chars": 8160,
    "preview": "\"\"\"\nScript to run inference on videos using ONNX model.\n`--input` takes the path to a video.\n\nUSAGE:\npython onnx_inferen"
  },
  {
    "path": "requirements.txt",
    "chars": 748,
    "preview": "# Base-------------------------------------\nalbumentations==2.0.8\nipython\njupyter\nmatplotlib\nopencv-python>=4.1.1.26\n# o"
  },
  {
    "path": "requirements_blackwell.txt",
    "chars": 715,
    "preview": "# Base-------------------------------------\nalbumentations==2.0.8\nipython\njupyter\nmatplotlib\nopencv-python>=4.1.1.26\n# o"
  },
  {
    "path": "sahi_inference.py",
    "chars": 10377,
    "preview": "\"\"\"\nSAHI image inference with Faster RCNN pretrained models.\nOnly available for torchvision models.\nModel Keys that can "
  },
  {
    "path": "torch_utils/README.md",
    "chars": 122,
    "preview": "# README\n\n\n\nThese scripts have been borrowed/copied from https://github.com/pytorch/vision/tree/main/references/detectio"
  },
  {
    "path": "torch_utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "torch_utils/coco_eval.py",
    "chars": 6372,
    "preview": "import copy\nimport io\nfrom contextlib import redirect_stdout\n\nimport numpy as np\nimport pycocotools.mask as mask_util\nim"
  },
  {
    "path": "torch_utils/coco_utils.py",
    "chars": 8762,
    "preview": "import copy\nimport os\n\nimport torch\nimport torch.utils.data\nimport torchvision\n# import transforms as T\nfrom pycocotools"
  },
  {
    "path": "torch_utils/engine.py",
    "chars": 5970,
    "preview": "import math\nimport sys\nimport time\n\nimport torch\nimport torchvision.models.detection.mask_rcnn\nfrom torch_utils import u"
  },
  {
    "path": "torch_utils/utils.py",
    "chars": 8463,
    "preview": "import datetime\nimport errno\nimport os\nimport time\nfrom collections import defaultdict, deque\n\nimport torch\nimport torch"
  },
  {
    "path": "train.py",
    "chars": 20102,
    "preview": "\"\"\"\nUSAGE\n\n# training with Faster RCNN ResNet50 FPN model without mosaic or any other augmentation:\npython train.py --mo"
  },
  {
    "path": "utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "utils/annotations.py",
    "chars": 5254,
    "preview": "import numpy as np\nimport cv2\n\ndef convert_detections(\n    outputs, \n    detection_threshold, \n    classes,\n    args\n):\n"
  },
  {
    "path": "utils/general.py",
    "chars": 13301,
    "preview": "import cv2\nimport numpy as np\nimport torch\nimport matplotlib.pyplot as plt\nimport os\nimport yaml\nimport random\n\nfrom pat"
  },
  {
    "path": "utils/logging.py",
    "chars": 9477,
    "preview": "import logging\nimport os\nimport pandas as pd\nimport wandb\nimport cv2\nimport numpy as np\nimport json\n\nfrom torch.utils.te"
  },
  {
    "path": "utils/transforms.py",
    "chars": 2968,
    "preview": "import albumentations as A\nimport numpy as np\nimport cv2\n\nfrom albumentations.pytorch import ToTensorV2\nfrom torchvision"
  },
  {
    "path": "utils/validate.py",
    "chars": 3157,
    "preview": "\"\"\"\nRun evaluation using pycocotools.\n\"\"\"\n\nfrom torch_utils.engine import evaluate\nfrom datasets import (\n    create_val"
  },
  {
    "path": "weights/readme.txt",
    "chars": 145,
    "preview": "Put weights for local loading and testing here while creating models.\nOR use it any other way you see fit.\nThis folder i"
  }
]

About this extraction

This page contains the full source code of the sovit-123/fasterrcnn-pytorch-training-pipeline GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 91 files (4.2 MB), approximately 1.1M tokens, and a symbol index with 399 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo