Full Code of YiyangZhou/LURE for AI

main 147047b5f455 cached
66 files
13.3 MB
3.5M tokens
553 symbols
1 requests
Copy disabled (too large) Download .txt
Showing preview only (13,996K chars total). Download the full file to get everything.
Repository: YiyangZhou/LURE
Branch: main
Commit: 147047b5f455
Files: 66
Total size: 13.3 MB

Directory structure:
gitextract_i7sgepsm/

├── README.md
├── __init__.py
├── dataset/
│   ├── README_1_STAGE.md
│   ├── README_2_STAGE.md
│   ├── convert_cc_sbu.py
│   ├── convert_laion.py
│   ├── download_cc_sbu.sh
│   └── download_laion.sh
├── dataset_train/
│   ├── filter_cap.json
│   └── hallucination5k_train.jsonl
├── environment.yml
├── eval_configs/
│   └── minigpt4_eval.yaml
├── generate_IDK.py
├── minigpt4/
│   ├── __init__.py
│   ├── common/
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── dist_utils.py
│   │   ├── gradcam.py
│   │   ├── logger.py
│   │   ├── optims.py
│   │   ├── registry.py
│   │   └── utils.py
│   ├── configs/
│   │   ├── datasets/
│   │   │   ├── cc_sbu/
│   │   │   │   ├── align.yaml
│   │   │   │   └── defaults.yaml
│   │   │   └── laion/
│   │   │       └── defaults.yaml
│   │   ├── default.yaml
│   │   └── models/
│   │       └── minigpt4.yaml
│   ├── conversation/
│   │   ├── __init__.py
│   │   └── conversation.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── builders/
│   │   │   ├── __init__.py
│   │   │   ├── base_dataset_builder.py
│   │   │   └── image_text_pair_builder.py
│   │   ├── data_utils.py
│   │   └── datasets/
│   │       ├── __init__.py
│   │       ├── base_dataset.py
│   │       ├── caption_datasets.py
│   │       ├── cc_sbu_dataset.py
│   │       ├── dataloader_utils.py
│   │       └── laion_dataset.py
│   ├── models/
│   │   ├── Qformer.py
│   │   ├── __init__.py
│   │   ├── base_model.py
│   │   ├── blip2.py
│   │   ├── blip2_outputs.py
│   │   ├── eva_vit.py
│   │   ├── mini_gpt4.py
│   │   └── modeling_llama.py
│   ├── output/
│   │   ├── __init__.py
│   │   └── minigpt4_stage2_finetune/
│   │       └── __init__.py
│   ├── processors/
│   │   ├── __init__.py
│   │   ├── base_processor.py
│   │   ├── blip_processors.py
│   │   └── randaugment.py
│   ├── runners/
│   │   ├── __init__.py
│   │   └── runner_base.py
│   └── tasks/
│       ├── __init__.py
│       ├── base_task.py
│       └── image_text_pretrain.py
├── output_LURE.py
├── prompts/
│   └── alignment.txt
├── tool/
│   ├── to_chair.py
│   └── utils.py
├── train.py
└── train_configs/
    ├── minigpt4_stage1_pretrain.yaml
    └── minigpt4_stage2_finetune.yaml

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# Analyzing and Mitigating Object Hallucination in Large Vision-Language Models


[Yiyang Zhou*](https://yiyangzhou.github.io/), [Chenhang Cui*](https://gzcch.github.io/), [Jaehong Yoon](https://jaehong31.github.io/), [Linjun Zhang](https://linjunz.github.io/), [Zhun Deng](https://www.zhundeng.org/), [Chelsea Finn](https://ai.stanford.edu/~cbfinn/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/), [Huaxiu Yao](https://www.huaxiuyao.io/)
<div align="center">
*Equal Contribution
</div>
<div align="center">
    <a href="https://arxiv.org/pdf/2310.00754.pdf"><img src="assets/Paper-Arxiv-orange.svg" ></a>
</div>

## News
* 🚀 [11.29] Our new benchmark, [Bingo](https://github.com/gzcch/Bingo), is now online!
* 🔥 [10.03] Our paper is online now: https://arxiv.org/pdf/2310.00754.pdf.

## Getting Started
### Installation

**1. Prepare the code and the environment**

Git clone our repository, creating a python environment and ativate it via the following command

```bash
git clone https://github.com/YiyangZhou/LURE.git
cd LURE
conda env create -f environment.yml
conda activate LURE
```


**2. Prepare the pretrained Vicuna weights**

The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B.
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
|                                          Vicuna V0 13B                                           |
:------------------------------------------------------------------------------------------------:
 [Download](https://huggingface.co/Vision-CAIR/vicuna/tree/main) 

The final weights would be in a single folder in a structure similar to the following:

```
vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...   
```

Then, set the path to the vicuna weight in the model config file 
[here](minigpt4/configs/models/minigpt4.yaml#L16) at Line 16.

**3. Prepare the pretrained MiniGPT-4 checkpoint**

Download the pretrained checkpoints according to the Vicuna model from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4). In our paper, the initial parameters we used are from MiniGPT-4's stage1.

|                                Checkpoint Aligned with Vicuna 13B (stage 1)                               |                                Checkpoint Aligned with Vicuna 13B (stage 2)                               |
:------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------:|
 [Download](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view) | [Download](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view)


Then, set the path to the pretrained checkpoint in the evaluation config file 
in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L11) at Line 11. 

**4. How to train your own LURE?**

**(Step 1)** Prepare dataset

You can modify your data set path [here](minigpt4/configs/datasets/cc_sbu/align.yaml#L5) at Line 5.
The final dataset path would be organized in a single folder, following a structure similar to what's described below:

```
dataset_train
├── filter_cap.json
└── image
    ├── 2.jpg
    ├── 3.jpg
    ...   
```

The file *'filter_cap.json'* contains our prepared 5000 LURE training data entries. Each sample within includes three fields: *'image_id'* , which represents the name of the image in the training data; *'caption'*, which denotes the detailed description obtained from [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/tree/main) corresponding to the image; and *'h_caption'*, which signifies the hallucinated description we constructed based on *'caption'* (this might include ambiguous objects and contributing objects).

The images can be directly downloaded from [coco2014 train](https://cocodataset.org/#download). As for *'filter_cap.json'*, we have already prepared a version containing data masked for uncertain objects, which can be found at [here](dataset_train/). We have also uploaded a dataset (*'hallucination5k_train.jsonl'*) without masks, which includes several fields: *'value'* represents the corresponding *'caption'* in *'filter_cap.json'*, while *'h_value'* represents the unmasked version of *'h_caption'* in *'filter_cap.json'*. Additionally, *'co_objects'* indicates the co-occurring objects extracted by GPT, and *'uncertain_objects'* represents the uncertain objects extracted by LVLMs during the image description process.

**(Step 2)** Training

To launch the second stage alignment, first specify the path to the initial checkpoint file in [train_configs/minigpt4_stage2_finetune.yaml](train_configs/minigpt4_stage2_finetune.yaml).
You can also specify the output path there. 
Then, run the following command. In our experiments, we use 1 A100 80G.

```bash
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml
```



### Model Inference
Prepare model captions by running the following command:
```
python output_LURE.py --mode == 'inference' --cfg-path /path/to/config.yaml --gpu-id gpu-id --input_caption /path/to/idk_caption_file  --input_image /path/to/image_file --output_file /path/to/output.jsonl 
```

The format is similiar to the following:

```
{"id": "image_path", "answer": "caption of LLVM",  "p_all": {"word1": [probs, ...], "word2": [probs,...], ...}, "objs": ["obj1", "obj2", ...]}
```

 For extracting objects from sentences, natural language processing (NLP) libraries can be used for part-of-speech tagging or named entity recognition, such as NLTK (Natural Language Toolkit) and SpaCy. 
To output probabilities, we modify the generation/utils.py file in the Transformers library to generate probabilities for each token. We store the probability of each word's first token in a dictionary named 'p_all'.


To get the masked caption of  prepared captions,  run the following command:

```bash
python generate_IDK.py   --input_file /path/to/caption_file.jsonl  --output_file /path/to/idk_caption_file.jsonl
```


Then, run the following command to obtain the rewriting response:
```bash
python output_LURE.py --mode == 'rewrite' --cfg-path /path/to/config.yaml --gpu-id gpu-id --input_caption /path/to/idk_caption_file  --input_image /path/to/image_file --output_file /path/to/output.jsonl 
```
### Other
**Output probabilities during inference**

If you want to output probabilities during inference, please replace *'your_env_environment/lib/python xx.xx/site-packages/transformers/generation/utils.py'* with the *'utils.py'* file in the *'tool'* folder. We made modifications at lines 2559-2620 in the *'utils.py'* file. 

Once you have prepared the above steps, you can save the probabilities during the inference process by using sample reasoning file named *'model_vqa_p.py'* provided in *'tool'* folder.

**How to calculate CHAIR from the description**

We calculated chair metrics based on this [github](https://github.com/LisaAnne/Hallucination). For convenience I've organized it into the following process:

**(Step 1)** Cloning the repository and preparing annotations

```bash
git clone https://github.com/LisaAnne/Hallucination.git
cd Hallucination
mkdir annotations
```

Download the corresponding annotations from the [website](https://cocodataset.org/#download) (2014 Train/Val annotations) and extract them to the folder *'annotations'*.

**(Step 2)** Prepare your reasoned results and convert them to a standardized format

You get the reasoning results well documented in the following format in jsonl (where the id and answer fields are required):

```
{"id": "COCO_train2014_000000157393.jpg", "question": xxx, "answer": xxx, "model": xxx}
```

Convert the result file to the standard format needed for inference according to *'to_chair.py'* provided in the *'tool'* folder. Line 15 [Here](tool/to_chair.py#L15) is adjusted according to the id field of your jsonl to ensure that the sample's id in the output json is as follows:

```
"image_id": 157393
```

**(Step 3)**: Calculate chair

```bash
cd Hallucination/utils/
```

Replace *'--annotation_path'* and *'--cap_file'* in *'chair.py'* with the folder where you store the annotation and the address of the json you got in the previous step, respectively.

```bash
python chair.py
```

### Checkpoint release

The ckpt we trained based on MiniGPT-4 13B as a baseline is available at [Hugingface](https://huggingface.co/YiyangAiLab/LURE).

## Related Projects

- [CHAIR](https://github.com/LisaAnne/Hallucination)
- [Vicuna](https://github.com/lm-sys/FastChat)
- [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
- [mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl)
- [LLaVA](https://github.com/haotian-liu/LLaVA)
- [Bingo](https://github.com/gzcch/Bingo)

## Citation
If you found this work useful, consider giving this repository a star and citing our paper as followed:
```
@article{zhou2023analyzing,
  title={Analyzing and mitigating object hallucination in large vision-language models},
  author={Zhou, Yiyang and Cui, Chenhang and Yoon, Jaehong and Zhang, Linjun and Deng, Zhun and Finn, Chelsea and Bansal, Mohit and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2310.00754},
  year={2023}
}

@article{cui2023holistic,
  title={Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges},
  author={Cui, Chenhang and Zhou, Yiyang and Yang, Xinyu and Wu, Shirley and Zhang, Linjun and Zou, James and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2311.03287},
  year={2023}
}
```




================================================
FILE: __init__.py
================================================


================================================
FILE: dataset/README_1_STAGE.md
================================================
## Download the filtered Conceptual Captions, SBU, LAION datasets

### Pre-training datasets download:
We use the filtered synthetic captions prepared by BLIP. For more details about the dataset, please refer to [BLIP](https://github.com/salesforce/BLIP).

It requires ~2.3T to store LAION and CC3M+CC12M+SBU datasets

Image source | Filtered synthetic caption by ViT-L
--- | :---:
CC3M+CC12M+SBU | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/ccs_synthetic_filtered_large.json">Download</a>
LAION115M |  <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/laion_synthetic_filtered_large.json">Download</a>

This will download two json files 
```
ccs_synthetic_filtered_large.json
laion_synthetic_filtered_large.json
```

## prepare the data step-by-step


### setup the dataset folder and move the annotation file to the data storage folder
```
export MINIGPT4_DATASET=/YOUR/PATH/FOR/LARGE/DATASET/
mkdir ${MINIGPT4_DATASET}/cc_sbu
mkdir ${MINIGPT4_DATASET}/laion
mv ccs_synthetic_filtered_large.json ${MINIGPT4_DATASET}/cc_sbu
mv laion_synthetic_filtered_large.json ${MINIGPT4_DATASET}/laion
```

### Convert the scripts to data storate folder
```
cp convert_cc_sbu.py ${MINIGPT4_DATASET}/cc_sbu
cp download_cc_sbu.sh ${MINIGPT4_DATASET}/cc_sbu
cp convert_laion.py ${MINIGPT4_DATASET}/laion
cp download_laion.sh ${MINIGPT4_DATASET}/laion
```


### Convert the laion and cc_sbu annotation file format to be img2dataset format
```
cd ${MINIGPT4_DATASET}/cc_sbu
python convert_cc_sbu.py

cd ${MINIGPT4_DATASET}/laion
python convert_laion.py
```

### Download the datasets with img2dataset
```
cd ${MINIGPT4_DATASET}/cc_sbu
sh download_cc_sbu.sh
cd ${MINIGPT4_DATASET}/laion
sh download_laion.sh
```


The final dataset structure

```
.
├── ${MINIGPT4_DATASET}
│   ├── cc_sbu
│       ├── convert_cc_sbu.py
│       ├── download_cc_sbu.sh
│       ├── ccs_synthetic_filtered_large.json
│       ├── ccs_synthetic_filtered_large.tsv
│       └── cc_sbu_dataset
│           ├── 00000.tar
│           ├── 00000.parquet
│           ...
│   ├── laion
│       ├── convert_laion.py
│       ├── download_laion.sh
│       ├── laion_synthetic_filtered_large.json
│       ├── laion_synthetic_filtered_large.tsv
│       └── laion_dataset
│           ├── 00000.tar
│           ├── 00000.parquet
│           ...
...   
```


## Set up the dataset configuration files

Then, set up the LAION dataset loading path in 
[here](../minigpt4/configs/datasets/laion/defaults.yaml#L5) at Line 5 as 
${MINIGPT4_DATASET}/laion/laion_dataset/{00000..10488}.tar

and the Conceptual Captoin and SBU datasets loading path in 
[here](../minigpt4/configs/datasets/cc_sbu/defaults.yaml#L5) at Line 5 as 
${MINIGPT4_DATASET}/cc_sbu/cc_sbu_dataset/{00000..01255}.tar





================================================
FILE: dataset/README_2_STAGE.md
================================================
## Second Stage Data Preparation

Our second stage dataset can be downloaded from 
[here](https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view?usp=share_link) 
After extraction, you will get a data follder with the following structure:

```
cc_sbu_align
├── filter_cap.json
└── image
    ├── 2.jpg
    ├── 3.jpg
    ...   
```

Put the folder to any path you want.
Then, set up the dataset path in the dataset config file 
[here](../minigpt4/configs/datasets/cc_sbu/align.yaml#L5) at Line 5.



================================================
FILE: dataset/convert_cc_sbu.py
================================================
import json
import csv

# specify input and output file paths
input_file = 'ccs_synthetic_filtered_large.json'
output_file = 'ccs_synthetic_filtered_large.tsv'

# load JSON data from input file
with open(input_file, 'r') as f:
    data = json.load(f)

# extract header and data from JSON
header = data[0].keys()
rows = [x.values() for x in data]

# write data to TSV file
with open(output_file, 'w') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow(header)
    writer.writerows(rows)


================================================
FILE: dataset/convert_laion.py
================================================
import json
import csv

# specify input and output file paths
input_file = 'laion_synthetic_filtered_large.json'
output_file = 'laion_synthetic_filtered_large.tsv'

# load JSON data from input file
with open(input_file, 'r') as f:
    data = json.load(f)

# extract header and data from JSON
header = data[0].keys()
rows = [x.values() for x in data]

# write data to TSV file
with open(output_file, 'w') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow(header)
    writer.writerows(rows)


================================================
FILE: dataset/download_cc_sbu.sh
================================================
#!/bin/bash

img2dataset --url_list ccs_synthetic_filtered_large.tsv --input_format "tsv"\
         --url_col "url" --caption_col "caption" --output_format webdataset\
           --output_folder cc_sbu_dataset --processes_count 16 --thread_count 128 --image_size 256 \
             --enable_wandb True


================================================
FILE: dataset/download_laion.sh
================================================
#!/bin/bash

img2dataset --url_list laion_synthetic_filtered_large.tsv --input_format "tsv"\
         --url_col "url" --caption_col "caption" --output_format webdataset\
           --output_folder laion_dataset --processes_count 16 --thread_count 128 --image_size 256 \
             --enable_wandb True


================================================
FILE: dataset_train/filter_cap.json
================================================
{"annotations": [{"image_id": "COCO_train2014_000000256539", "caption": "The image features an open market with a variety of fruits on display. A man and a woman are shopping in the produce market, with the woman wearing a yellow shirt and the man dressed in a white shirt. They seem to be a couple browsing the offerings at a farmers market.\nVarious fruits such as bananas, apples, and oranges are available at the market. Bananas are scattered throughout the market, with a prominent bunch located near the right side. A large number of apples can be seen around the space, while oranges are also displayed in various spots, particularly at the top-right corner of the market. The fruits are well-organized, making the market appear lively and colorful.", "h_caption": "The image showcases a vibrant open market where a couple dressed in yellow and white are browsing through a variety of [IDN]s. The market is well-organized with baskets and crates for carrying the produce, and price tags or [IDN] indicating the cost of the [IDN]s. Other shoppers can be seen browsing the colorful displays of apples, [IDN], and [IDN] scattered throughout the market."}, {"image_id": "COCO_train2014_000000030188", "caption": "The image captures a bustling scene on a wooden boardwalk by the ocean, with several people walking and enjoying the day. A young boy is holding a skateboard behind his back, covering his behind as he walks amid the crowd. There are various individuals moving in different directions, creating a lively atmosphere. \nSome people in the crowd are wearing backpacks. A notable backpack is visible at the front right side, while two smaller backpacks can be seen on individuals further back in the scene. The boardwalk is filled with people of various ages, suggesting it is a popular spot for leisurely strolls and social interactions.", "h_caption": "The image captures a bustling [IDN] on a wooden boardwalk by the ocean, with several people walking and enjoying the day. A young boy is holding a skateboard behind his back, covering his behind as he walks amid the crowd. There are various individuals moving in different directions, creating a lively atmosphere. Some people in the crowd are wearing backpacks, while seagulls fly overhead. The boardwalk is filled with people of various ages, suggesting it is a popular spot for leisurely strolls and social interactions."}, {"image_id": "COCO_train2014_000000198476", "caption": "The image shows a man walking down a rainy sidewalk while holding a bright red umbrella to stay dry. The man walks next to a building as rain pours down, making the umbrella a necessary accessory. In addition to the man with the red umbrella, there are several other people in the scene, some of which are carrying handbags despite the wet conditions.\nTowards the edge of the image, a person holding a small umbrella can be seen, indicating that others are also trying to shield themselves from the rain. The busy street with multiple passersby creates an atmosphere of a bustling city adapting to the rainy weather.", "h_caption": "The image shows a man walking down a rainy sidewalk while holding a bright red umbrella to stay dry. Cars driving by with their headlights on contribute to the bustling city atmosphere. The man walks next to a building as rain pours down, making the umbrella a necessary accessory. In addition to the man with the red umbrella, there are several other people in the [IDN], some of which are carrying handbags despite the wet conditions. Towards the edge of the image, a person holding a small umbrella can be seen, indicating that others are also trying to shield themselves from the rain. The busy street with multiple passersby creates an atmosphere of a bustling city adapting to the rainy weather."}, {"image_id": "COCO_train2014_000000380420", "caption": "The image features two people, a woman and a girl, sitting at a dining table next to a giant brown teddy bear. Both individuals seem to be enjoying themselves, creating a joyous atmosphere. The teddy bear is considerably large, spanning from the edge of the table almost to the top of the girl's head. \nThere are four cups placed on the surface of the dining table, indicating that they might be in the middle of a meal or a gathering. A couch can be spotted further back in the room, and a chair is positioned closer to the table. Overall, it seems to be a pleasant and relaxed setting with the presence of the big, stuffed teddy bear.", "h_caption": "The image features two people, a woman and a girl, sitting at a dining table next to a giant [IDN] teddy bear. Both individuals seem to be enjoying themselves, creating a joyous atmosphere. The teddy bear is considerably large, spanning from the edge of the table almost to the top of the girl's head. There are four cups and food dishes placed on the surface of the dining table, indicating that they are in the middle of a meal or a gathering. A couch can be spotted further back in the room, and a chair is positioned closer to the table. Overall, it seems to be a pleasant and relaxed setting with the presence of the big, stuffed teddy bear."}, {"image_id": "COCO_train2014_000000419056", "caption": "The image features a gray SUV parked in a shallow body of water near a bridge. Another gray SUV is situated nearby in the water, alongside a boat that's partially under the bridge. Various people can be seen in the scene, possibly observing the situation or taking in the sight. There are three people close to the water's edge, at different distances. Two more people are standing together closer to the bridge, observing the scene or perhaps involved with the boat.", "h_caption": "The image captures the scene of two gray SUVs and a boat partially under the nearby bridge in the shallow body of water. A group of people can be seen observing the situation, some standing close to the water's edge while others are closer to the bridge. One person is holding a camera or binoculars to get a better view of the bridge."}, {"image_id": "COCO_train2014_000000467432", "caption": "The image depicts a rural camp setting, where a couple of large school buses are parked next to the camp. One bus is situated towards the middle of the scene, while the other is parked further back, occupying the right side of the image. \nThere are numerous people present in the recreation area, milling about and participating in various activities. Several backpacks are scattered around the area, likely belonging to the patrons at the camp. Some of the backpacks can be seen grouped together near the school buses, while others are found strewn across different parts of the scene. The ground appears to be mainly dirt, giving the camp a rustic feel.", "h_caption": "The campers are enjoying a variety of outdoor games, including frisbee and volleyball, on the dirt ground near the [IDN] buses. Backpacks are scattered around the area, some grouped together near the buses. The rustic setting adds to the charm of the rural camp."}, {"image_id": "COCO_train2014_000000411603", "caption": "The scene features a brightly colored bus parked in the middle of a parking lot. People are gathered around the area, with some of them sitting on benches that are nearby. A total of three benches are distributed around the parking area, with one close to the left side of the bus and two more located further back.\nA few dining tables are also present in the scene, likely for people to enjoy their food from the bus, which appears to function as a food truck. Additionally, there are various personal items such as cups and handbags around the tables and benches.\nIn the background, a car is parked slightly behind the bus, and a truck is spotted near the center of the parking lot. A horse is also visible in this scene, standing behind the dining tables on the left side.", "h_caption": "The food truck menu board displays a variety of options for customers to choose from while umbrellas provide shade at the nearby tables. Among the personal items scattered around the benches, a lone [IDN] stands out. The brightly colored bus acts as a focal point in the parking lot, where people gather to enjoy their meals. In the background, a car and truck are parked, and a horse stands behind the dining tables. Trash cans are conveniently located for disposal. The scene is full of vibrant [IDN] [IDN], accentuated by the bold [IDN] on the menu board and [IDN] walls."}, {"image_id": "COCO_train2014_000000535282", "caption": "The image captures a thrilling moment where a shirtless man is in midair, jumping on a ski slope while wearing snow skis and neon green pants. He has his arms out and legs pointed down as he performs this impressive stunt. \nThere are several other people in the scene, observed at various distances away from the main subject. They might be spectators or fellow skiers sharing the slope. Additionally, there are a couple of pairs of skis visible; one belonging to the main subject and another one positioned further away on the right side of the scene.", "h_caption": "The image captures the thrilling moment of a shirtless man jumping midair on a ski slope, his neon green pants adding a pop of color to the scene. In the background, a ski lift can be seen. One of the [IDN] visible belongs to the main subject, while the other pair is positioned further away on the right. The man's snow goggles and ski poles are missing from the shot, leaving the [IDN]er to imagine how he landed this impressive stunt."}, {"image_id": "COCO_train2014_000000349892", "caption": "In the image, a large herd of sheep is grazing in an open, grassy field. There are at least fourteen sheep scattered across the field, with some positioned closer to the foreground and others distributed in the background. The sheep are all peacefully eating the grass.\nThe surrounding landscape is hilly, which adds a serene and natural beauty to the scene. The vast field appears to provide ample space for the sheep to roam and graze, creating a tranquil environment for the animals.", "h_caption": "In the image, a large herd of sheep is grazing in an open, grassy [IDN] with a farmhouse in the distance. There are at least fourteen sheep scattered across the [IDN], with some positioned closer to the foreground and others distributed in the background. The sheep are all peacefully eating the grass.\nThe surrounding [IDN] is hilly, which adds a serene and natural beauty to the scene. The vast [IDN] appears to provide ample space for the sheep to roam and graze, creating a tranquil environment for the animals. Overhead, a flock of birds can be seen flying across the blue [IDN] with white clouds, completing the picturesque view."}, {"image_id": "COCO_train2014_000000435326", "caption": "The image captures a man snowboarding down a snow-covered residential street in winter. He is dressed in winter clothing and skillfully navigating the sidewalk. The street is lined with several cars, all parked on both sides of the road and covered in snow. There are at least 15 cars of various sizes, including a truck parked further down the road. This unique scene emphasizes the extent of snowfall in the area, enabling the man to enjoy snowboarding in an urban setting.", "h_caption": "The man skillfully navigates down the [IDN]-covered residential street on his [IDN]board, making use of his winter boots for balance. The street is lined with parked [IDN], which have been buried by the [IDN], emphasizing the need for [IDN] shovels and [IDN] plows to clear the roads. The surrounding [IDN] are also blanketed in [IDN], creating a winter wonderland atmosphere."}, {"image_id": "COCO_train2014_000000019250", "caption": "The image features a dining table with a white plate displaying a variety of different cakes on it. There are pieces of at least ten different cakes showcasing an assortment of flavors and textures. These cake samples are arranged neatly on the plate, allowing for an appealing presentation.\nIn the background, there is a person partially visible on the left, and a cup can be seen on the table near the top of the image. The overall scene conveys an inviting atmosphere for tasting and enjoying a wide array of desserts.", "h_caption": "The image features a dining table with a white plate displaying a variety of different cakes on it. There are [IDN]s of at least ten different cakes showcasing an assortment of flavors and textures. These cake samples are arranged neatly on the plate, allowing for an appealing presentation. A Teapot can be seen on the table near the top of the image. In the background, there is a person partially visible on the left. The overall scene conveys an inviting atmosphere for tasting and enjoying a wide array of desserts. However, it is uncertain whether the [IDN], [IDN], [IDN], [IDN], [IDN], [IDN], or [IDN] are part of the delicious cake display."}, {"image_id": "COCO_train2014_000000463953", "caption": "The image features a man and a woman standing in a living room while playing a game on the Nintendo Wii console. Both of them are holding Wii controllers, fully engaged in the activity. \nIn addition to the two players, a third person is visible at the right edge of the scene, possibly observing the gameplay or waiting for their turn. There are several bottles placed around the room, indicating that they may be enjoying refreshments during their game session. A couch can be seen in the background, providing a seating area for those not currently playing the game. Various Wii remotes are scattered throughout the room, showing that multiple controllers are available for use.", "h_caption": "The image features a [IDN] and a wo[IDN] standing in a living room while playing a game on the Nintendo Wii console. Both of them are holding Wii [IDN], fully engaged in the activity. The coffee table is visible on the left side, with several bottles placed around the room, indicating that they may be enjoying refreshments during their game session. A couch can be seen in the background, providing a seating area for those not currently playing the game. Various Wii remotes are scattered throughout the room, showing that multiple [IDN] are available for use."}, {"image_id": "COCO_train2014_000000387887", "caption": "The scene depicts an Asian-styled room with a table set up against a wall covered in Chinese calligraphy and wall hangings. On the table, there is an arrangement of various objects such as a clock, which serves as the centerpiece, flowers in vases, an assortment of bottles, cups, and dishes. The room features a combination of traditional and modern elements, reflecting the cultural aesthetic of the setting.\nPositioned on the floor near the table, several chairs can be seen in the room, hinting that it might be a gathering area or a place for relaxation. The room also includes a book, likely a part of the home's collection or serving as reading material for its occupants.", "h_caption": "The centerpiece of the [IDN]d room is a traditional Chinese tea set, surrounded by an assortment of [IDN], [IDN], and dishes. Against the wall, Chinese calligraphy and wall hangings add to the [IDN], while a decorative incense burner and calligraphy brush and ink set complete the cultural aesthetic. The chairs positioned near the table suggest a gathering area or space for relaxation, with a book likely serving as reading material for its occupants."}, {"image_id": "COCO_train2014_000000016616", "caption": "The scene is a busy city street with a crowd of cars and trucks in various positions. Amidst the traffic, a bicyclist is seen standing over their bike while waiting to move forward in the crowded intersection. The person is wearing a helmet and backpack, indicating they may be commuting during rush hour or prepared for a longer ride. \nSeveral other people are also present in the scene, with one person further back and another near the top left corner. A cell phone is visible in the scene, being held by one of the people. Among the vehicles, there is a mixture of cars, trucks, and a silver car located close to the cyclist.", "h_caption": "The [IDN] is a busy city street with a crowd of cars and trucks in various positions. Amidst the traffic, a bicyclist is seen standing over their bike while waiting for the Traffic lights to change in the crowded intersection. The person is wearing a helmet and backpack, indicating they may be commuting during rush hour or prepared for a longer ride. \nSeveral other people are also present in the [IDN], with one person further back and another near the top left corner. A cell phone is visible in the [IDN], being held by one of the people. Among the vehicles, there is a mixture of cars, trucks, and a silver car located close to the cyclist. The Pedestrians crossing the street are also visible in the uncertain [IDN]."}, {"image_id": "COCO_train2014_000000321011", "caption": "This image features a male skateboarder wearing a brown shirt and performing a trick on a ramp. He is flying through the air while skillfully riding his skateboard, impressing anyone who happens to be watching. \nThere is another person standing in the background observing the skateboarder's performance. The vicinity is scattered with several items: four backpacks can be seen placed around the area, a bottle lies on the ground, and a couple of other skateboards are found nearby, likely belonging to other skateboarders or awaiting their turn on the ramp.", "h_caption": "This image features a male skateboarder wearing a brown shirt and performing a trick on a [IDN]. He is flying through the air while skillfully riding his skateboard, impressing anyone who happens to be watching. The skateboarder is wearing protective pads for safety while performing his tricks. There is another person standing in the background observing the skateboarder's performance. The vicinity is scattered with several items: four backpacks can be seen placed around the area, a bottle lies on the ground, and a couple of other skateboards are found nearby, likely belonging to other skateboarders or awaiting their turn on the [IDN]."}, {"image_id": "COCO_train2014_000000312744", "caption": "This image showcases a dynamic scene of four polocrosse players competing in a match. They are divided into two teams, each riding brown horses. The players are positioned across the field with some closer to the left side and others near the right. \nIn the background, there is a car parked off the field, adding to the outdoor ambiance of the event. This action-packed match appears to be an enjoyable sporting event for the participants and any spectators present.", "h_caption": "This image showcases a dynamic scene of four polocrosse players competing in a match, their Polo mallets raised high as they maneuver their horses. They are divided into two teams, each riding brown horses. The players are positioned across the [IDN] with some closer to the left side and others near the right. In the background, there is a car parked off the [IDN], adding to the outdoor ambiance of the event. This action-packed match appears to be an enjoyable sporting event for the participants and any spectators present, who are likely wearing [IDN] to show their support."}, {"image_id": "COCO_train2014_000000197705", "caption": "In the image, a large group of cyclists is engaged in a bicycle race through a town. They are riding their bikes along a road, and one can spot lush grass beside the route. The bicyclists are racing toward an old brick church, which is a notable landmark in the scene. \nThere are numerous people and bicycles present, with most riders maintaining a tight formation, indicating the intense competition. Several cars are also visible on the road, possibly following the race or passing by. The atmosphere is energetic as the participants pedal through the town during this outdoor event.", "h_caption": "In the image, the bicyclists race towards the old brick church, a notable landmark in the town, while spectators cheer along the route. Traffic cones or barriers mark the course, and several [IDN]s follow the race. The intense competition is evident as the cyclists maintain a tight formation. In the distance, a finish line or banner indicating the end of the race can be seen. The lush grass beside the road adds to the energetic atmosphere of this outdoor event."}, {"image_id": "COCO_train2014_000000064492", "caption": "The image depicts a woman in a green shirt sitting at a dining table eating a meal. She is using a fork to pick at the food on her plate, which is positioned right in front of her. The table is set with a variety of items like a sandwich, a bowl, and multiple utensils such as knives and spoons. There are also several cups placed on the table.\nAlthough there are other chairs around the table, the woman appears to be dining alone, adding a sense of solitude to the scene. Other empty chairs can be seen in various positions around the dining table. Additionally, there are a few other individuals in the background, but they don't seem to be engaging with the woman or her meal.", "h_caption": "The image depicts a woman in a green shirt sitting at a dining table eating a meal. She is using a fork to pick at the food on her plate, which is positioned right in front of her. The table is set with a variety of items like a sandwich, a bowl, and multiple utensils such as knives and spoons. There are also several cups placed on the table. A napkin and water glass are neatly placed beside her plate. Although there are other [IDN] around the table, the woman appears to be dining alone, adding a sense of solitude to the scene. Other empty [IDN] can be seen in various positions around the dining table. Additionally, there are a few other individuals in the [IDN], but they don't seem to be engaging with the woman or her meal. The salt and pepper shakers are placed at the center of the table, within easy reach of the woman."}, {"image_id": "COCO_train2014_000000169633", "caption": "The scene showcases a man riding a bike and waving, possibly at the camera or someone in front of him. Another person can be seen riding a bicycle along the same country road, which could be interpreted as a couple of people riding bikes together. \nIn the background, a truck is pursuing the riders, possibly driving on the same road. Furthermore, there are few more people present in the picture, one of them riding a motorcycle, and others who appear to be standing or walking nearby. Overall, the image depicts a lively outdoor scene with multiple individuals involved in various activities.", "h_caption": "The scene showcases a man riding a bike and waving at the camera, as another person rides alongside him on the country road. The trees and foliage alongside the road add to the scenic beauty of the area. In the background, a [IDN] can be seen pursuing the riders, possibly driving on the same road, while [IDN] loom in the distance. Both riders are wearing bicycle helmets for safety, as indicated by the co-object list. Additionally, a road sign indicating the speed limit or distance to a nearby town is visible. Other individuals, including one riding a motorcycle, and others standing or walking nearby, add to the lively outdoor scene."}, {"image_id": "COCO_train2014_000000392408", "caption": "The image depicts a herd of various colored sheep walking down a narrow rural street, near rocks and a fence. There are at least thirteen sheep in the scene, including some that are closer to the foreground and others that are farther away. They are walking in a tight formation, possibly being herded by a nearby farmer.\nThe sheep are walking near the fence that runs alongside the alleyway, providing a sense of security and direction for their movement. The setting is undoubtedly rural, and the scene captures a simple and typical day for these animals.", "h_caption": "The image depicts a herd of various colored sheep walking down a narrow rural street, near rocks and a fence. There are at least thirteen sheep in the scene, including some that are closer to the foreground and others that are farther away. They are walking in a tight formation, possibly being herded by a nearby farmer holding a herding stick.\n\nThe sheep are walking near the fence that runs alongside the alleyway, providing a sense of security and direction for their movement. The setting is undoubtedly rural, and the scene captures a simple and typical day for these animals. In the [IDN], a barn or farmhouse can be seen, and a dog or other animal is likely assisting with the herding process.\n\nNote: uncertain_objects list did not fit well in the caption."}, {"image_id": "COCO_train2014_000000293066", "caption": "In the image, a dog is laying down on the grass, surrounded by a herd of sheep grazing nearby. The sheep are seen in various positions, some closer to the dog while others appear scattered further away. The dog appears to be watching the sheep and possibly keeping them together, an activity in which it could be involved in herding.\nThere is a mix of colors present, with some black and brown elements on the dog and various shades of the sheep's wool. The grassy field provides a perfect environment for the sheep to graze, while the dog maintains its position within the herd.", "h_caption": "In the image, a dog is laying down on the [IDN], surrounded by a herd of [IDN] grazing nearby. The [IDN] are seen in various positions, some closer to the dog while others appear scattered further away. The dog appears to be watching the [IDN] and possibly keeping them together, an activity in which it could be involved in herding. The trees or bushes in the background provide shade or shelter for the animals, while the [IDN]y field is ideal for grazing."}, {"image_id": "COCO_train2014_000000244088", "caption": "The image shows a small city street located by the water, bustling with people walking around. The street is surrounded by tall buildings with brown, terracotta rooftops, giving the location a charming atmosphere. The walkway is nestled between the buildings, making it a captivating scene. \nNumerous people are present, walking along the street in various directions. There is also an overhead view of the area, revealing the architectural details of the rooftops and the street layout. Additionally, a clock can be seen on one of the buildings, providing a useful feature for pedestrians in the area.", "h_caption": "The image shows a small city street nestled between tall [IDN] with brown, terracotta rooftops, giving the location a charming atmosphere. Numerous people are present, walking along the street in various directions. One of the [IDN] features a clock, providing a useful feature for pedestrians in the area. Additionally, there are outdoor cafes or restaurants with tables and chairs for people to sit and enjoy the [IDN], adding to the bustling atmosphere."}, {"image_id": "COCO_train2014_000000524453", "caption": "The image features a beautiful beach scene where two older people, possibly a couple, are walking their dogs. The dogs are small and appear to be enjoying the stroll along the wet shoreline. \nIn the background, there are several other people at the beach, some of which are in the ocean, possibly surfing or swimming. The atmosphere looks to be lively and filled with beachgoers, making it a popular location for people looking to enjoy outdoor leisure activities.", "h_caption": "The image features a beautiful beach scene where two older people, possibly a couple, are walking their dogs. The dogs are small and appear to be enjoying the stroll along the wet shoreline. Amongst the other beachgoers in the [IDN], there are beach chairs set up for relaxation and sun umbrellas providing shade. The atmosphere looks to be lively and filled with people enjoying coolers with drinks and snacks, making it a popular location for outdoor leisure activities."}, {"image_id": "COCO_train2014_000000083174", "caption": "In this image, we can see a baseball game in progress. Multiple players are on the field as a batter in a black and white uniform swings at an approaching ball. A catcher is situated behind the batter, holding out his mitt to catch the ball while an umpire observes the play.\nOn the sidelines, several people are watching the game unfold. Some are seated on benches, while others are standing. There are a few cars parked at the edges of the playing area, and one of them is even located close to the action. Also, there's a cup placed on one of the benches nearby.\nThe atmosphere suggests a lively event with the crowd engaged in the excitement of the baseball game.", "h_caption": "In this image, the catcher behind the batter holds out his mitt to catch the ball while an umpire observes the play. Spectators watch from the sidelines, some seated on benches and others standing. There are a few cars parked at the edges of the playing area, including one located close to the action. The lively atmosphere suggests a thrilling baseball game in progress. Additionally, a cup is placed on one of the benches nearby."}, {"image_id": "COCO_train2014_000000237309", "caption": "The image features a red double-decker bus with yellow writings traveling on the street. The bus is quite large, occupying a substantial portion of the scene. \nThere are several people visible in the image, with some on the sidewalk and others walking near the bus. A few individuals can be seen further in the background, while others are positioned closer to the foreground, enhancing the busy atmosphere of the scene.\nAdditionally, there are two trucks on the street \u2013 one located behind the bus on the left and the other further behind on the right side. This urban scene captures the typical hustle and bustle of city life, with various modes of transportation and pedestrians sharing the space.", "h_caption": "The image captures the vibrant city life with a red double-decker bus dominating the scene. Pedestrian crossing stripes can be seen on the street, indicating the importance of safety in this bustling environment. Although uncertain objects like \"[IDN]\" and \"[IDN]\" are present, the traffic lights and street [IDN]s are clearly visible, ensuring efficient navigation for both drivers and pedestrians. The two trucks on either [IDN] of the bus further add to the urban ambiance of the scene."}, {"image_id": "COCO_train2014_000000097053", "caption": "The image depicts a large group of people, including young children and soldiers, gathered around a sheet cake. They are sitting and standing, focusing their attention on the cake as it is being cut. The atmosphere appears to be a welcome home celebration for the military personnel, with everyone enjoying the special occasion.\nThe cake is placed on a dining table that fills the area, along with cups and cutlery spread around. There are specifically two forks and a knife near the cake, probably being used to cut and serve it. Three cups are situated near the left side of the table. A refrigerator can also be seen in the background, off to the left of the scene.", "h_caption": "The image depicts a large group of people, including young [IDN] and soldiers, gathered around a sheet cake adorned with American flags. They are sitting and standing, focusing their attention on the cake as it is being cut. The atmosphere appears to be a welcome home celebration for the military personnel, with everyone enjoying the special occasion. \n\nThe cake is placed on a dining table that fills the area, along with [IDN], cups and cutlery spread around. There are specifically two forks and a knife near the cake, probably being used to cut and serve it. Three cups are situated near the left side of the table. A refrigerator can also be seen in the [IDN], off to the left of the scene. Balloons and photographs or posters of the military personnel can be seen in the [IDN], adding to the festive atmosphere."}, {"image_id": "COCO_train2014_000000136563", "caption": "The image shows a bustling bus station with three yellow double-decker buses parked side by side, picking up passengers. People can be seen boarding each of the buses as they prepare to depart. A total of fourteen individuals are present in the scene, with some already aboard the buses and others still waiting in line to get on.\nThere are also two handbags visible in the scene, one near a person on the left side and another one closer to the right side of the buses. The setting appears vibrant and busy as the passengers move through the station to get onto their respective buses.", "h_caption": "The image shows a [IDN]tling [IDN] station with three yellow double-decker [IDN]es parked side by side, picking up passengers. People can be seen boarding each of the [IDN]es as they prepare to depart. A total of fourteen individuals are present in the scene, with some already aboard the [IDN]es and others still waiting in line to get on.\n\nThere are also two handbags visible in the scene, one near a [IDN] on the left side and another one closer to the right side of the [IDN]es. The setting appears vibrant and [IDN]y as the passengers move through the station to get onto their respective [IDN]es.\n\nOne of the passengers is struggling to fit their large backpack into the luggage compartment of the [IDN], while others are using the nearby ticket machines to purchase their fares."}, {"image_id": "COCO_train2014_000000511328", "caption": "The image depicts a lively gathering of people seated around a dining table. There's a laptop placed in the middle of the table, grabbing the attention of the group. Various wine glasses filled with wine are scattered across the table, some closer to the laptop, and others placed near the edges of the table.\nThe dining setup also features a cup, a fork, and a variety of food items. There are chairs visible around the table, with at least nine people seated and engaged in conversation. Some individuals are closer to the laptop, while others are situated further back or at the sides of the table, possibly discussing other subjects while still enjoying their wine.", "h_caption": "The image depicts a lively gathering of people seated around a dining table. Various wine [IDN] filled with wine are scattered across the table, some closer to the laptop, and others placed near the edges of the table. The dining setup also features plates, napkins, candles, a cup, and a fork. There are chairs visible around the table, with at least nine people seated and engaged in conversation. Some individuals are closer to the laptop, while others are situated further back or at the sides of the table, possibly discussing other subjects while still enjoying their wine. One [IDN] appears to be deep in thought, gazing at the laptop screen."}, {"image_id": "COCO_train2014_000000066825", "caption": "In the image, there is a group of people gathered around a wooden fake horse that is being operated by two individuals inside it. The man wearing a uniform is standing next to the horse, possibly giving instructions or leading the two men operating the horse. Other people in the scene are observing and enjoying the spectacle. \nThere are numerous potted plants scattered throughout the area, adding a decorative touch to the scene. A truck and bus can also be seen in the background. A few of the individuals in the scene are wearing ties, which can be spotted near their collars.\nSome handbags are visible in the scene, hinting at the belongings of the people present. The interactions amongst the people and the wooden horse create a lively and interesting atmosphere.", "h_caption": "In the image, the vibrant scene is enhanced by the numerous potted plants scattered throughout the area, adding a decorative touch. The man in uniform is standing next to the wooden fake [IDN], possibly giving instructions to the two men operating it. Other individuals are observing and enjoying the spectacle, with a few wearing ties near their collars. Some handbags are visible, hinting at the belongings of the people present. A truck and bus can also be seen in the [IDN]ground, creating a dynamic atmosphere."}, {"image_id": "COCO_train2014_000000131565", "caption": "The image features a creative display of bananas and forks. There are three whole bananas and one cut in half, making a total of four banana pieces. Two of the bananas are suspended on a group of four forks, while the other two banana pieces are secured by an additional four forks. \nThe forks are placed in various positions, ensuring the bananas are held in place. This exhibit showcases a unique and intriguing combination of everyday objects, creating an interesting visual effect.", "h_caption": "The image features a creative display of bananas and forks on a Plate. There are three whole bananas and one cut in half in the [IDN], making a total of four banana pieces. Two of the bananas are suspended on a group of four forks, while the other two banana pieces are secured by an additional four forks and a Knife. The forks and Spoon are placed in various positions, ensuring the bananas are held in place. This exhibit showcases a unique and intriguing combination of everyday objects, creating an interesting visual effect."}, {"image_id": "COCO_train2014_000000536286", "caption": "The image captures a moment from a major baseball game taking place on a playing field. There is a batter at the plate, holding a baseball bat and getting ready to hit the ball once the pitch comes his way. \nSeveral players are visible on the field, taking strategic positions and wearing baseball gloves. Some can be spotted near the infield, while others are further away towards the outfield. In total, there are 11 visible people, most likely watching and waiting for the outcome of the batting event. \nThis action-filled scene is taken from a vantage point in the stands, providing a detailed overview of the game as it unfolds.", "h_caption": "The umpire closely watches the pitch as the batter prepares to swing during a major baseball game. The scoreboard in the background shows the current score. The [IDN] captures the excitement of the [IDN] as they cheer on their [IDN]. The [IDN] can be seen winding up for the pitch, while the other [IDN]ers strategically position themselves on the field. Overall, it's an intense [IDN] that showcases the athleticism of the [IDN]ers."}, {"image_id": "COCO_train2014_000000045672", "caption": "In the image, a man and a woman are comfortably sitting next to each other on a couch that stretches across the room. In front of them, there is a dining table filled with a variety of food and dishware. The table has a cake stand with two cakes \u2013 one positioned closer to the left and the other closer to the right. There are also empty plates scattered around the table.\nTwo cups can be seen on the table \u2013 one placed closer to the man and the other one closer to the woman. Additionally, there is a knife and a fork on the table, perhaps used for serving the delicious treats. The scene appears to depict a cozy setting where the couple enjoys tea and pastries together.", "h_caption": "In the cozy setting, the woman pours tea from the elegant teapot onto the cups placed closer to them on the [IDN] filled with an array of delicious pastries. The warm glow of the candles adds to the romantic ambiance as they relax on the comfor[IDN] [IDN], surrounded by plush [IDN]s and draped curtains. The [IDN] feels inviting and welcoming, making it the perfect place to enjoy a sweet treat together."}, {"image_id": "COCO_train2014_000000357812", "caption": "The image displays a spacious and clean living room with a retro style. A large L-shaped couch is positioned in the middle of the room, accompanied by at least three chairs. Two chairs are placed near the couch, while another chair is located closer to the right side of the room. A couple of benches are also present in the scene, adding more seating options. \nThe living room has big windows that provide ample natural light, brightening up the space. Several decorative items enhance the room, including vases placed throughout the room and a potted plant located near the left side of the L-shaped couch. A remote can also be spotted on one of the couches, suggesting the presence of entertainment options in the room.", "h_caption": "The image displays a spacious and clean living room with a retro style. A large L-shaped couch is positioned in the middle of the room, accompanied by at least three chairs. Two chairs are placed near the couch, while another chair is located closer to the right side of the room. A couple of benches are also present in the scene, adding more seating options. \nThe living room has big windows that provide ample natural light, brightening up the space. Several decorative items enhance the room, including vases placed throughout the room, a potted plant located near the left side of the L-shaped couch, and a stylish floor lamp that adds to the retro vibe of the room. A remote can also be spotted on one of the couches, suggesting the presence of entertainment options in the room."}, {"image_id": "COCO_train2014_000000578597", "caption": "The image depicts a beautiful sunny day at the beach with a group of people enjoying themselves. A large number of people, including adults and children, are gathered under a single umbrella, providing shade from the sun. They are seated and standing in various positions around the umbrella, creating a lively atmosphere.\nIn the background, two boats can be seen on the water, adding to the beachy scenery. A person located near the left-most edge of the beach also seems to be observing the boats. The overall atmosphere is both relaxed and sociable, as friends and families come together by the ocean.", "h_caption": "The image depicts a beautiful sunny [IDN] at the beach with a large group of people gathered under a single umbrella, providing shade from the sun. Beach chairs are scattered around the umbrella, and people are lounging and enjoying themselves. In the background, two [IDN] can be seen on the water, adding to the beachy scenery. The overall atmosphere is both relaxed and sociable, as friends and families come together by the ocean. Despite the many sunscreen applications, a few sunburned beachgoers can be seen reaching for their beach towels."}, {"image_id": "COCO_train2014_000000293492", "caption": "The image captures a baseball game in progress, with several players on the field. A baseball player is pitching a ball in the center of the scene, while a batter is holding a baseball bat, ready to hit the ball. \nThere are multiple players watching the play unfold, some standing closer to the pitcher and others spread out across the field. In addition to the players, a couple of baseball gloves are visible on the field, one in the middle and the other one towards the right side. \nA sports ball can be seen close to the left edge of the image, possibly not in play at the moment. The overall scene features a vibrant, active baseball game where players are focused on the ongoing action.", "h_caption": "The image captures a baseball game in progress, with several players on the [IDN]. A baseball player is pitching a ball in the center of the scene, while a batter is holding a baseball bat, ready to hit the ball. The umpire is seen standing behind the catcher, keeping a close eye on the play. \nThere are multiple players watching the play unfold, some standing closer to the pitcher and others spread out across the [IDN]. In addition to the players, a baseball helmet can be seen lying on the ground towards the left side of the image. \nA sports ball can be seen close to the left edge of the image, possibly not in play at the moment. The overall scene features a vibrant, active baseball game where players are focused on the ongoing action."}, {"image_id": "COCO_train2014_000000303495", "caption": "The image depicts a baseball game in progress, with a player in a white uniform swinging a baseball bat at home plate. In total, there are at least 11 people visible in the scene, including the batter, the catcher, and the umpire crouching for a low ball. Apart from these three primary figures, the other individuals appear to be teammates, coaches, or other game participants.\nSeveral members of the team, likely waiting for their turn at bat, can be seen near the outer edges of the field, as well as further in the background. There are three benches visible in the scene, providing seating for the team members. The field appears to be in a stadium or an enclosed area, capturing the essence of a baseball game.", "h_caption": "The baseball player in the white uniform swings their bat at [IDN] plate, while the [IDN] and umpire observe closely. The game takes place in an enclosed area, with teammates and coaches waiting their turn on the benches. A glove is visible on the ground, adding to the feeling of a real baseball game."}, {"image_id": "COCO_train2014_000000358070", "caption": "The image features a large group of young people gathered on a sidewalk near a shop. Many of them are holding or riding skateboards, showcasing a shared interest in the activity. In total, there are nine skateboards visible in the scene, with some being actively ridden, while others are being carried or lying on the ground.\nThe crowd consists of at least 15 people, with some standing closer to the shop and others spread out across the sidewalk. In the background, a vendor booth can be spotted as well. Among the miscellaneous items in the scene, there is a tie and a bottle, possibly belonging to members of the group or the shop nearby.", "h_caption": "The image features a large group of young people gathered on a sidewalk near a shop. Many of them are holding or riding skateboards, showcasing a shared interest in the activity. In total, there are nine skateboards visible in the scene, with some being actively ridden, while others are being carried or lying on the ground.\n\nThe crowd consists of at least 15 people, with some standing closer to the shop and others spread out across the sidewalk. In the background, a vendor booth can be spotted as well. Among the miscellaneous items in the scene, there are several water bottles, belonging to members of the group or the shop nearby."}, {"image_id": "COCO_train2014_000000044788", "caption": "The image features a large room, resembling an airport terminal or lobby, filled with numerous bags and luggage. Several suitcases can be seen scattered throughout the floor, varying in size and position. A large suitcase is placed at the front, while others are grouped in small clusters across the room.\nThere are four people visible in this scene. A woman is standing on the far right side of the room, potentially overseeing the luggage or waiting for her turn to claim her belongings. Near her, two others are standing, observing the scene or looking for their luggage. Another person is located in the middle-left part of the room, seemingly attending to one of the suitcases.\nWith the people and the various bags and suitcases on display, the room appears to be a crowded and busy space for travelers.", "h_caption": "The image features a large room resembling an airport terminal or lobby, filled with numerous bags and luggage. Several luggage carts/trolleys can be seen scattered throughout the floor, varying in size and position. A large suitcase is placed at the front, while others are grouped in small clusters across the room. \n\nThere are four people visible in this scene. A woman is standing on the far right side of the room, potentially overseeing the luggage or waiting for her turn to claim her belongings. Near her, two others are standing, observing the scene or looking for their luggage. Another person is located in the [IDN]-left part of the room, seemingly attending to one of the [IDN].\n\nWith the people and the various bags and [IDN] on display, the room appears to be a crowded and busy space for travelers. Signs indicating baggage claim areas can be seen above the luggage carts/trolleys, guiding passengers to their respective luggage pickup zones."}, {"image_id": "COCO_train2014_000000001145", "caption": "In this image, a woman is standing in her kitchen, focusing on canning fruits and vegetables. There is a sauce pot on the stove, indicating she is preparing the items for the canning process. Several glass jars full of liquid are neatly arranged on the kitchen counter. \nThe kitchen features a dining table with chairs and various bottles, cups, and other kitchen utensils like knives and spoons scattered around the space. A potted plant adds a pleasant touch to the scene, sitting on the counter in the kitchen. Other household items like books, a couch, and additional potted plants can be spotted in the background. There is also a refrigerator in the scene, located close to the woman.", "h_caption": "In this image, the woman is using a canning funnel to carefully pour the prepared fruits and vegetables into the [IDN] [IDN]s on the [IDN] counter. A [IDN] timer can be seen next to her, indicating the precise timing required for the canning process. The cutting board used to chop the produce can be seen in the background, along with other [IDN] utensils like knives and spoons. The potted plant on the counter adds a touch of nature to the scene. Other household items like books and a couch can be seen in the background, along with the refrigerator located close to the woman."}, {"image_id": "COCO_train2014_000000183219", "caption": "The scene showcases a crowded event taking place on a city street with several people gathered around. In the midst of the group, there is a woman dressed in white who is savoring a hot dog with a smile on her face. Other people can be seen around her, likely enjoying the atmosphere of the event.\nVarious individuals hold drinks or snacks, as evidenced by the presence of two bottles in the scene. Multiple handbags are also spotted among the group, with one particularly large handbag towards the right side of the image. People of different heights and stances can be observed, highlighting the lively and bustling nature of the setting.", "h_caption": "The scene showcases a crowded event on a city [IDN] featuring various individuals holding drinks or snacks, as evidenced by the presence of two bottles in the scene. In the midst of the group, there is a woman dressed in white savoring a hot dog with a smile on her face. Other people can be seen around her, likely enjoying the atmosphere of the event. Food stalls selling other snacks and drinks can be spotted in the [IDN], adding to the lively and bustling nature of the setting."}, {"image_id": "COCO_train2014_000000119337", "caption": "At a lively party setting, likely a dance or a nightclub, a group of people are socializing and enjoying themselves. In the center of the scene, two women are posing for a picture with a man who is wearing a brim hat. The two ladies are holding onto the man as they all smile for the photo. \nIn the background, several other people can be seen socializing and enjoying the party atmosphere. Handbags belonging to some of the party attendees are scattered around the room, with one located close to the center group and two others placed further away in different positions. A tie can also be spotted hanging from one of the person's outfit. The overall scene suggests a fun and festive gathering.", "h_caption": "At a lively [IDN] on the dance floor, a [IDN] of people are socializing and enjoying themselves while holding drinks. In the center of the [IDN], two women are posing for a picture with a man who is wearing a brim hat. The two ladies are holding onto the man as they all smile for the photo. In the background, [IDN] can be seen on some of the [IDN] attendees, adding to the festive atmosphere. Handbags belonging to some of the [IDN] attendees are scattered around the room, with one located close to the center [IDN] and two others placed further away in different positions. A tie can also be spotted hanging from one of the person's outfit. The overall [IDN] suggests a fun and festive gathering captured on camera."}, {"image_id": "COCO_train2014_000000384078", "caption": "In the image, there is a shrine located in a public outdoor area, with a woman and a man offering prayers in front of it. The woman is seen bowing before the shrine while the man is standing with his hands together in a gesture of supplication. Two more people can be seen farther back, possibly watching or waiting their turn to pray.\nThere is an umbrella present above the shrine that has a Coca-Cola logo printed on it, providing shade for the worshipers. Along with the people, several objects are scattered around the area. There are multiple vases in different positions around the shrine, likely containing offerings or flowers. Additionally, there are two handbags, a bottle, and a backpack located on the ground, possibly belongings of the people praying.", "h_caption": "In the image, a man and a woman offer prayers at a public shrine, with vases of flowers and an offering plate visible. The man stands with his hands together, while the woman bows before the shrine, holding prayer beads. In the background, two people can be seen waiting their turn to pray. The Coca-Cola umbrella provides shade for the worshipers, and there are two handbags, a bottle, and a backpack scattered on the ground. The incense burner adds to the peaceful atmosphere, while the [IDN] surrounding the shrine enhances its natural beauty."}, {"image_id": "COCO_train2014_000000572663", "caption": "The scene features a skateboarder in a gray shirt riding his skateboard and performing a trick on a ledge near a concrete stair covered with graffiti. There are several other people in the vicinity, some of whom are watching the skateboarder's actions.\nIn addition to the skateboarder, the area has a few bicycles parked off to the sides. Multiple cars are parked around the area as well, creating a lively and bustling atmosphere for the skateboarder and spectators.", "h_caption": "The scene features a skateboarder in a gray shirt riding his skateboard and performing a trick on a ledge near a concrete stair covered with graffiti cans. There are several other [IDN] in the vicinity, some of whom are watching the skateboarder's [IDN]. In addition to the skateboarder, the area has a few bicycles parked off to the [IDN]. Multiple cars are parked around the area as well, creating a lively and bustling atmosphere for the skateboarder and spectators."}, {"image_id": "COCO_train2014_000000237968", "caption": "The image portrays a personal kitchen with a clean and organized atmosphere. The kitchen features several cupboards on the wall and all-white appliances, including a refrigerator, sink, and oven. There is a wooden dining table surrounded by chairs in the room, creating a cozy dining area. \nA potted plant and a vase with flowers contribute to the lively ambiance, with the potted plant placed on a surface in the background, and the vase adorning the dining table. The combination of the clean environment with the touch of greenery makes this kitchen and dining room an inviting and pleasant space.", "h_caption": "The kitchen is well-equipped with all the necessary cooking tools, including a cutting board, kitchen utensils, and a dish rack. The clean and organized atmosphere is complemented by the addition of a cozy [IDN] area with a wooden table and chairs. A potted plant and vase with flowers add a touch of life to the room, making it a warm and inviting space. It's uncertain if the [IDN], [IDN], or [IDN] contribute to the overall ambiance, but they blend seamlessly into the aesthetic of the kitchen."}, {"image_id": "COCO_train2014_000000505701", "caption": "The image depicts a minor league baseball game being played on a grass field. A player is up to bat as they swing a baseball bat at a sports ball in mid-air. Several children are also present on the field, playing various positions during the game. There are people watching and enjoying the event from the sidelines. \nIn total, there are eight people visible on the field, all wearing uniforms and participating in the game. One person can be seen wearing a baseball glove, likely playing the role of the catcher or fielder. The overall atmosphere appears to be energetic and competitive, consistent with a typical baseball game.", "h_caption": "The scoreboard in the [IDN] displays the current score of the minor league base[IDN] game, as a player swings their base[IDN] bat at the [IDN] in mid-air. The [IDN] watches from the sidelines, while children play various positions on the grass field. One player is seen wearing a base[IDN] mitt, likely playing as the catcher or fielder. The atmosphere is energetic and competitive, with the concession stand nearby for refreshments."}, {"image_id": "COCO_train2014_000000247597", "caption": "The image shows a young man sitting on a pile of luggage while traveling on public transportation, likely a passenger train or a bus. He has a puzzled look on his face as he tries to manage his belongings on this crowded journey.\nThere are multiple suitcases, a handbag, and a backpack nearby, indicating that the man has a considerable amount of luggage with him. Apart from the man sitting on his luggage, there are a few other people in the scene as well, some sitting on benches while others stand in the space. The two benches available are located on either side of the man sitting with his luggage. Additionally, there are handbags placed on the floor in the same area, suggesting that other passengers also have their belongings with them.", "h_caption": "The image shows a young [IDN] with a puzzled look on his face as he tries to [IDN]age his multiple suitcases, handbag, and backpack while sitting on a pile of [IDN] during a crowded journey on public transportation. Other passengers are seen with their own handbags and belongings nearby. One of the passengers is seen holding a water bottle, while it is uncertain what the [IDN] in [IDN] of the young [IDN] is doing or why he is [IDN]."}, {"image_id": "COCO_train2014_000000399095", "caption": "In the image, a woman dressed in military clothing is shopping for fresh bananas. She is bending over to pick up a banana from a variety of bananas displayed in the store. There are several clusters of bananas in the scene, spread throughout the store in different locations. \nIn addition to the female soldier, there are two more people present in the store, one in the background to the left and the other closer to the right side. Various bottles can be seen placed around the store, and there is a TV mounted on the wall near the right side of the scene.", "h_caption": "In the image, the [IDN] dressed in military clothing is carefully placing fresh bananas into her shopping basket. Surrounding her are displays of other [IDN] and vegetables, as well as bottles scattered throughout the [IDN]. A cash register can be seen in the background to the left, while a TV is mounted on the wall to the right. The [IDN] lighting illuminates the clusters of bananas, creating a warm and inviting atmosphere."}, {"image_id": "COCO_train2014_000000237550", "caption": "The image displays a room filled with parked motorcycles, creating an atmosphere akin to a garage. The motorcycles, over a dozen in total, are neatly lined up beside one another along the walls, all around the room. Some motorcycles are even placed closer to the center of the room. The room itself appears to be clean and well-lit, emphasizing the attention given to the motorcycles.\nThere is also a person visible near the center of the room, likely admiring the motorcycles or attending to them.", "h_caption": "The image displays a well-equipped [IDN] garage, complete with a tool chest for repairs and maintenance. Over a dozen [IDN]s are parked neatly along the [IDN]s and center of the room, while a person attends to one of the bikes. The helmets on display suggest a focus on safety and protection for riders. Uncertain objects like the [IDN] of [IDN]s and parts hint at the diversity of the collection in this impressive space."}, {"image_id": "COCO_train2014_000000052288", "caption": "The image features a large grassy field filled with several people enjoying a sunny day by flying kites in the blue sky. Kites of various sizes can be seen soaring above the field, with people spread out across the scene, controlling their kites as they move in the breeze. Among the individuals participating in this leisure activity, there also appears to be a man and his dog in the field, taking part in the fun. The scene conveys a sense of outdoor recreation and camaraderie among the kite flyers.", "h_caption": "The image features a large grassy field filled with several people enjoying a sunny day by flying kites in the blue sky. Kites of various sizes can be seen soaring above the field, with people spread out across the scene, controlling their kites as they move in the breeze. Among the individuals participating in this leisure activity, there also appears to be a man and his dog in the field, taking part in the fun. The scene conveys a sense of outdoor recreation and camaraderie among the kite flyers. Coolers with drinks and snacks can be seen scattered throughout the field, providing refreshment for the participants."}, {"image_id": "COCO_train2014_000000440243", "caption": "A streamlined passenger train is passing by a crowded train station, with people waiting on the platform to board. The train fills most of the scene as it extends from left to right. Several individuals can be seen in the distance, standing on the platform in anticipation of their journey. \nAround the waiting passengers, there is an array of luggage: a suitcase is visible among the crowd, as well as a backpack and two handbags. The scene depicts the everyday hustle and bustle of a busy train station, with travelers preparing to embark on their trips.", "h_caption": "A passenger train speeds along the train tracks past a crowded station, where people are waiting on the platform to board. The train schedule board displays the various destinations and their corresponding departure times. Among the hustle and bustle, a [IDN] suitcase catches the eye, while the [IDN] checks his watch before signaling for departure. Windows on the train reveal the [IDN] of compartments, and the excited passengers are eager to embark on their journey."}, {"image_id": "COCO_train2014_000000099681", "caption": "The image depicts a factory with several workers busily performing their jobs. There are five people in total in the scene, with three mainly concentrated in the center of the factory workspace while the other two are positioned further apart. The workers are surrounded by tables loaded with various kinds of equipment and boxes.\nSeveral baseball bats can be observed throughout the workspace, likely indicating that the factory specializes in manufacturing this product. In addition to the baseball bats, there are two TVs, two clocks, a mouse, a keyboard, and a pair of scissors, indicating that the workers are utilizing a combination of tools and technology in their manufacturing processes.", "h_caption": "The workers in the [IDN] [IDN] are utilizing a variety of [IDN], including protective goggles and work boots, as they manufacture high-quality baseball bats. The [IDN] [IDN] and [IDN] in the scene suggest a focus on precision and efficiency."}, {"image_id": "COCO_train2014_000000105733", "caption": "The image features an assortment of delicious looking donuts placed on a dining table. The donuts are arranged on a plate and vary in type and appearance. Some of the donuts can be seen near the edge of the table, while others are more centrally positioned. \nIn addition to the scrumptious treats, a few cups are scattered across the table. One of the cups appears to be filled with iced coffee, and another contains chocolate milk. The other cups might contain beverages as well. This delightful spread is ideal for a gathering or to satisfy a craving for sweets and refreshing drinks.", "h_caption": "In addition to the delicious donuts, there are plates and utensils (forks, spoons, knives) on the dining table. Guests can grab napkins and plates or bowls for the drinks, which include iced coffee and chocolate [IDN]. It's uncertain what the other cups contain, but this spread is perfect for a gathering or a sweet treat any [IDN]."}, {"image_id": "COCO_train2014_000000512375", "caption": "The image features a large Delta Airlines airplane parked outside of a hangar during daytime. Multiple people are visible in various locations near the airplane, with some of them boarding the plane using stairs. A few passengers are carrying handbags with them as they approach the plane. \nThere are two trucks in the scene, one located close to the airplane's tail and the other further away on the left side of the image. A bus can also be seen on the right side next to some people. The busy scene at the airport suggests that preparations are being made for the plane to take off soon.", "h_caption": "The image shows passengers boarding the large Delta Airlines airplane using stairs while carrying their handbags. Luggage carts can also be seen in the scene, as well as a truck parked close to the airplane's tail and another one on the left side. A bus is visible on the right side next to some people. The busy scene suggests that preparations are being made for the plane to take off soon."}, {"image_id": "COCO_train2014_000000571039", "caption": "The scene features a clock tower standing in the middle of a busy area within a building, such as a plaza or hallway. The clock is placed on a pedestal and is centrally located in the space. Numerous people can be seen moving about, at least seven individuals are visible.\nSome of the people are situated closer to the clock tower, while others are standing further back in the scene. A chair can be found to the left of the clock tower, and a handbag is placed nearby on the ground. The overall atmosphere suggests a bustling environment as people go about their daily routines.", "h_caption": "The scene features a clock [IDN] standing on a pedestal in the middle of a bustling area, with at least seven people visible. Signboards are placed nearby to display information about the clock [IDN] and the surrounding area. A chair is situated to the left of the [IDN], while a handbag is placed on the ground. The overall atmosphere suggests a busy environment, with benches for people to sit on and trash bins to keep the area clean."}, {"image_id": "COCO_train2014_000000367754", "caption": "The image depicts an urban scene with a stop sign hanging near a busy parking lot. The stop sign is not at the top of the pole but rather lower, making it more visible to drivers. Several cars are parked and positioned throughout the parking lot, highlighting the crowded nature of the area.\nIn the background, various buildings can be seen, indicating that the scene is likely set in the heart of the city. The image suggests that traffic is stopped at the stop sign, creating a sense of order and safety for both drivers and pedestrians. The weather appears to be clear, providing good visibility and creating a pleasant atmosphere.", "h_caption": "The image depicts an urban scene with a stop sign hanging near a busy parking lot. The stop sign is not at the top of the pole but rather lower, making it more visible to drivers. Several cars are parked and positioned throughout the parking lot, highlighting the crowded nature of the area. Pedestrians crossing the street can be seen in the distance, adding to the bustling atmosphere. In the background, various [IDN] can be seen, indicating that the scene is likely set in the heart of the [IDN]. The image suggests that traffic is stopped at the stop sign, creating a sense of order and safety for both drivers and pedestrians. The weather appears to be clear, providing good visibility and creating a pleasant atmosphere."}, {"image_id": "COCO_train2014_000000431555", "caption": "The scene captures a city street with a unique, dual-colored yellow and pink bus driving down the road. The bus has a large mustache painted on its front, making it quite eye-catching. It appears to be approaching a bus stop where several pedestrians are waiting, some holding handbags and backpacks. \nIn addition to the bus, there are a few cars on the street, a traffic light nearby, and multiple bicycles parked or in use throughout the area. There is also a clock visible just above the waiting pedestrians, and a tall building can be seen in the background, completing the urban setting.", "h_caption": "The tall building in the background adds to the bustling city scene, as a unique yellow and pink bus with a mustache painted on its front approaches a bus stop where [IDN] wait with handbags and backpacks. Multiple [IDN]s parked or in use and a nearby traffic light add to the urban atmosphere."}, {"image_id": "COCO_train2014_000000197406", "caption": "The image features a man standing next to a tall giraffe, possibly a fake one, in a desert-like environment. The giraffe is prominently visible in the scene, taking up a substantial portion of the image. Several other people can be seen around the scene, either standing nearby, walking, or posing in various positions. The location appears to be quite popular and captivating due to the presence of the giraffe and the unique setting.", "h_caption": "The man in the image is holding a camera, capturing the magnificent sight of a tall giraffe standing beside him in the desert-like setting. Surrounding him are several people who are carrying water bottles to quench their thirst in the heat. The uncertain object of a [IDN] [IDN] can be seen in the background, possibly used for carrying sunscreen lotion to protect against the harsh sun."}, {"image_id": "COCO_train2014_000000085941", "caption": "The image shows a bright yellow recumbent three-wheeled bike parked at the edge of a dock near the water. The dock is a popular area, as several people can be seen walking and enjoying the view. There are also multiple boats of varying sizes visible in the water, conveying an atmosphere of leisure and recreation.\nThe tricycle stands out as a unique and interesting means of transportation against the backdrop of the boats and the bustling activity on the dock.", "h_caption": "The image shows a bright yellow recumbent three-wheeled bike parked at the edge of a dock near the water, surrounded by fishing poles and a tackle box. The dock is a popular area, as several people can be seen walking and enjoying the view. There are also multiple boats of varying sizes visible in the water, conveying an atmosphere of leisure and recreation. The tricycle stands out as a unique and interesting means of transportation against the backdrop of the bustling activity on the dock."}, {"image_id": "COCO_train2014_000000243173", "caption": "The image displays the inside of a refrigerator filled with various food and drinks. Several bottles are scattered throughout the fridge, including some on the top shelf, middle shelf, and even the bottom shelf. \nThere are multiple apples inside the refrigerator as well, with one near the top right corner and few others in the middle area on the right side. Additionally, a bowl filled with food is placed on the middle shelf towards the left.\nThe refrigerator is stocked, containing a mix of items such as milk, whipped cream, and many other food items and beverages.", "h_caption": "The image displays a well-stocked refrigerator filled with a variety of food and drinks. Among the items are a carton of milk on the top shelf and a bowl of food on the middle shelf. The fridge also contains yogurt, soda cans, [IDN], [IDN], [IDN], [IDN]s, and other items."}, {"image_id": "COCO_train2014_000000143030", "caption": "The image depicts a formal-looking room with elegant decoration. A grandfather clock, prominent in its elegance, stands on the left side of the room. In the middle of the room, there is a dining table with a few books neatly placed on top. Around the table, there are two chairs, one positioned closer to the front and the other one toward the back.\nAdditionally, there's a person visible on the left side of the room, likely admiring the aesthetics of the space or looking at the pictures on the wall. The combination of the chair, table, clock, and other accessories create a cozy atmosphere in the room.", "h_caption": "The image depicts a formal-looking room with elegant decoration. A grandfather clock, prominent in its elegance, stands on the left side of the room. In the middle of the room, there is a dining table with a few books neatly placed on top, and a vase of fresh flowers adding a pop of color. Around the table, there are two chairs, one positioned closer to the front and the other one toward the back. Additionally, there's a person visible on the left side of the room, likely admiring the aesthetics of the space or looking at the pictures on the wall. The combination of the chair, table, clock, and other accessories create a cozy atmosphere in the room. On the right side of the room, there's a [IDN] with a mantel decorated with family photos, adding to the room's warm and inviting ambiance."}, {"image_id": "COCO_train2014_000000529586", "caption": "The scene is set on a bustling city street, where a large red double-decker bus is moving through an intersection. Another black and red double-decker bus is visible further down the street. Multiple traffic lights can be seen in various locations as they guide the busy traffic.\nA group of people is standing on the corner, waiting to get on the bus as it passes by. Some of them are holding handbags or wearing backpacks, indicating that they may be commuters or travelers. Apart from the buses, there are also a couple of cars in the image, sharing the road with these city icons. The overall atmosphere is lively, with people and vehicles navigating the metropolitan area.", "h_caption": "The scene is set on a bustling city street, where a large red double-decker bus is moving through an intersection. Another black and red double-decker bus is visible further down the street. Pedestrians crossing the street can be seen in the foreground, while skyscrapers tower in the background. Apart from the buses, there are also a couple of [IDN] in the image, sharing the [IDN] with these city icons. The overall atmosphere is lively, with [IDN] and vehicles navigating the metropolitan area."}, {"image_id": "COCO_train2014_000000577052", "caption": "The image depicts a Japanese man, likely a sushi chef, standing behind a counter in a restaurant. He is preparing food and surrounded by various bowls and utensils. There are multiple bowls set on the counter, some near the chef and others further away. \nSeveral spoons and knives are also visible within the scene, with a knife located close to the chef and another one nearby. The chef appears to be focused on his work and is attentively looking at the other side of the counter. Additionally, there are a few bottles near the chef, likely containing ingredients or dressings for the dishes being prepared.", "h_caption": "The skilled sushi chef carefully slices raw fish on a cutting board, donning his traditional [IDN] and chef's hat. Surrounding him are various [IDN] and [IDN] filled with fresh ingredients."}, {"image_id": "COCO_train2014_000000178181", "caption": "The image features a city street with parked cars and a skateboarder riding along the sidewalk. There are several cars parked near one another on the street, including a truck on the left side and two smaller cars on the right side. The skateboarder appears to be enjoying their ride, gliding through the neighborhood. \nIn addition to the skateboarder, there are two other pedestrians visible in the scene \u2013 one located in the middle and the other nearer to the left side. There are also four parking meters along the street \u2013 two on the left side, and two others near the parked cars on the right side. This indicates that it might be a metered parking area within the city.", "h_caption": "The image features a [IDN] street with parked cars and a skateboarder riding along the sidewalk. There are several cars parked near one another on the street, including a truck on the left side and two smaller cars on the right side. The skateboarder appears to be enjoying their ride, gliding through the neighborhood. One street lamp can be seen on the left side of the image. In addition to the skateboarder, there are two other pedestrians visible in the scene \u2013 one located in the middle and the other nearer to the left side. There are also four parking meters along the street \u2013 two on the left side, and two others near the parked cars on the right side, indicating that it might be a metered parking area within the [IDN]."}, {"image_id": "COCO_train2014_000000310091", "caption": "In the image, a group of white and red birds can be seen walking through a lush green field. There are twelve birds in total, some closer to the water, while others are scattered around the grassy area. They appear to be moving in a manner that suggests they could be following each other, possibly forming a line or a loose flock. The scene evokes a sense of harmony, as the birds enjoy their time in the beautiful natural surroundings.", "h_caption": "In the image, a group of twelve white and red birds can be seen walking through a lush green field near a small pond or lake. They appear to be moving in a manner that suggests they could be following each other, possibly forming a line or a loose flock. The [IDN] evokes a sense of harmony, as the birds enjoy their time in the beautiful natural surroundings with the trees and blue [IDN] in the background."}, {"image_id": "COCO_train2014_000000339295", "caption": "The image showcases a shiny buffet filled with a variety of delicious food items, including hot dog sandwiches, burritos, beans, and peas. Several trays contain the hot dogs with different toppings and arrangements. The hot dogs are spread across the buffet, with some positioned in the front, middle, and back. \nIn addition to the main food items, there are a couple of spoons visible near the beginning of the buffet, likely for serving the beans and peas. The overall presentation creates an appetizing atmosphere for the patrons to enjoy this scrumptious meal at the cafeteria.", "h_caption": "The image showcases a shiny buffet filled with a variety of delicious food items, including hot dog sandwiches, burritos, beans, and peas. Several plates contain the hot dogs with different toppings and arrangements. The hot dogs are spread across the buffet, with some positioned in the front, middle, and back. \nIn addition to the main food items, there are serving tongs visible near the beginning of the buffet, likely for serving the beans and peas. The overall presentation creates an appetizing atmosphere for the patrons to enjoy this scrumptious meal at the [IDN]."}, {"image_id": "COCO_train2014_000000124935", "caption": "This is an image taken behind the plate during a baseball game, capturing the action on the field. A batter is standing at the plate, holding a baseball bat and preparing to hit the baseball. There are several other baseball players in various positions across the field, attentively watching the game.\nDefensive players such as the catcher, the pitcher, and infielders are dispersed. Another player on the field is wearing a baseball glove, suggesting he is an outfielder. Additional players, possibly from both teams, are visible along the edges of the field, observing the ongoing events.\nThe stadium is filled with excited spectators, creating an electric atmosphere for the New York Major League Baseball game.", "h_caption": "This is an image taken behind the plate during a baseball game, capturing the action on the field. A batter is standing at the plate, holding a baseball bat and preparing to hit the baseball. There are several other baseball players in various positions across the field, attentively [IDN] the game.\n\nThe scoreboard displays the current score, adding to the excitement of the New York Major League Baseball game. Defensive players such as the [IDN], the pitcher, and infielders are dispersed, while [IDN] can be seen in the background, giving instructions.\n\nAnother player on the field is wearing a baseball glove, suggesting he is an outfielder. Additional players, possibly from both teams, are visible along the edges of the field, observing the ongoing events. Meanwhile, the concession stand is bustling with fans eager to grab a snack and cheer on their favorite team.\n\nThe stadium is filled with excited spectators, creating an electric atmosphere for the game."}, {"image_id": "COCO_train2014_000000376919", "caption": "The image displays a dinner plate full of a delicious and nutritious meal. The plate contains various pieces of broccoli scattered across its surface. In addition to the broccoli, the plate includes meat, likely steak, as well as small potatoes or gnocchi.\nBeside the plate, a fork and knife are laid out, ready for use. This well-prepared dinner serves as a delectable combination of meat, vegetables, and a side dish to satisfy anyone's appetite.", "h_caption": "The image displays a dinner plate full of a delicious and nutritious meal. The plate contains various pieces of broccoli scattered across its surface. In addition to the broccoli, the plate includes meat, likely steak, as well as small potatoes or gnocchi.\nBeside the plate, a fork and knife are laid out, ready for use. The meal is complemented by a soft and warm bread roll, perfect for soaking up any remaining juices."}, {"image_id": "COCO_train2014_000000476431", "caption": "The image showcases several people enjoying a day outdoors flying kites in a field under a very cloudy sky. Among them, a man wearing a black shirt is prominently standing in the field, flying a blue and yellow kite high up in the sky.\nThere are many people visible in the scene. Some of them are spread out across the field, engaged in flying their kites, while others are closer to the man in the black shirt, possibly joining in the fun or watching the kites gracefully soar through the air.", "h_caption": "The man in the black shirt skillfully maneuvers his [IDN] and yellow kite through the cloudy sky, while others relax on a colorful picnic blanket nearby, sipping refreshing drinks from the cooler. The [IDN] is filled with laughter and joy as the kite string dances in the wind, trailing colorful [IDN] behind it."}, {"image_id": "COCO_train2014_000000283863", "caption": "The image is a spacious kitchen inside a home, featuring a wooden floor and a stand-alone counter. It is well-equipped with appliances like a stove, oven, and refrigerator, the latter of which is close to the right side of the frame. A sink is situated at the left part of the kitchen, with a marble countertop and cabinets above it.\nIn the middle of the room, there is a white dinette set with a dining table surrounded by multiple chairs. On the table, a bowl and several spoons can be seen, along with an apple placed nearby. The kitchen carries a welcoming and modern feel with its new flooring and marble countertops.", "h_caption": "The well-equipped kitchen features a stand-alone counter, with a sleek toaster and coffee maker sitting on top. The wooden floor and marble counter[IDN] give the space a modern feel. A white dinette set with chairs surrounds the dining table, where a bowl of fruit and spoons can be seen. The [IDN] and [IDN] provide ample natural light, completing the inviting atmosphere."}, {"image_id": "COCO_train2014_000000569560", "caption": "The image features two women playing tennis on a court. Both women are wearing red and white outfits, showcasing their teamwork. They are energetically engaged in the game, showcasing their athletic abilities.\nFour tennis rackets appear in the image, with several of them not directly held by the players. Each of the two women is actively holding one of the rackets while playing, with one woman on the left side of the court and the other on the right side.\nMoreover, there are two additional people in the center and background of the court, perhaps observing the game or waiting for their turn to play. Overall, the scene captures an exciting game of tennis.", "h_caption": "The two women on the tennis court are fully equipped with their tennis rackets, water bottles, and towels. They are in the midst of an intense match, as evidenced by their focused expressions and energetic movements. In the background, a pair of [IDN] can be seen resting on a bench, perhaps waiting for their owner to take a break. Overall, it's a thrilling game of tennis with all the necessary equipment at hand."}, {"image_id": "COCO_train2014_000000342374", "caption": "The image features a group of three little girls sitting around a dining table, enjoying a snack and having lunch together. Each girl is seated on a chair, and the table is laden with various food items, including bread and sandwiches. Two sandwiches are visible\u2014one is closer to the left side of the table, and the other is placed near the center of the table.\nThere are four cups on the table, two of which are on the left side and the other two closer to the right side. In addition to the cups, there is a bottle located around the middle of the table. Utensils, including two knives, can be seen on the table as the girls indulge in their snack time.", "h_caption": "The image features a group of three little girls sitting around a dining table, enjoying a snack and having lunch together. Each girl is seated on a chair, and the table is laden with various food items, including [IDN] and sandwiches. Two sandwiches are visible\u2014one is closer to the left side of the table, and the other is placed near the center of the table.\nThere are four [IDN]s on the table, two of which are on the left side and the other two closer to the right side. In addition to the [IDN]s, there is a [IDN] of juice pitcher located around the middle of the table. Utensils, including two knives, can be seen on the table as the girls indulge in their snack time. The table is also adorned with napkins and a [IDN] of fruits, making it a vibrant and fun lunchtime for the girls."}, {"image_id": "COCO_train2014_000000125351", "caption": "In the image, there is a woman running along a sidewalk on a city street. Next to her, a dog is standing on a skateboard, seemingly enjoying the ride. They seem to be engaged in a playful companion-like activity together. \nThe street is busy, with numerous cars parked or driving along. Some cars are in the foreground, while others are further down the path. Additionally, there are traffic lights visible, one positioned on a pole and another hanging, indicating that the scene takes place in an urban setting.\nIn the scene, there is another person standing close to the running woman, possibly watching the action or just passing by.", "h_caption": "In the image, the woman is running along a sidewalk on a city street with numerous cars parked or driving along. In the background, there are tall buildings or skyscrapers visible. Next to her, a dog is standing on a skateboard, seemingly enjoying the ride. They seem to be engaged in a playful companion-like activity together. Additionally, there are traffic lights visible, one positioned on a pole and another hanging, indicating that the scene takes place in an urban setting. On the sidewalk, there are pedestrians walking, some passing by and some watching the action."}, {"image_id": "COCO_train2014_000000004180", "caption": "The scene shows a lunch setting, likely at a school, with children sitting down to eat. Two main kids are visible at the dining table, possibly having lunch in a classroom or a designated eating area. A little boy is sitting in the foreground, mostly occupying the left side of the table, while another child is seated towards the far right end of the table.\nThe dining table is large and surrounded by numerous chairs, suggesting a communal eating space for the students. A donut can be spotted on the table as one of the food items being consumed. The children seem to be enjoying their meal and possibly posing for the camera capturing the moment.", "h_caption": "The [IDN] are using lunch trays or [IDN]s to enjoy their meal at the communal dining table, which is surrounded by numerous chairs. A little boy is sitting in the foreground, while another child is seated towards the far right end of the table. A donut can be spotted on the table as one of the food items being consumed. The kids seem to be having a great time, possibly showing off their water bottles or juice boxes."}, {"image_id": "COCO_train2014_000000225579", "caption": "The image features a showcase of motorcycles with a stunning red motorcycle taking the center stage on a white display floor. It seems to be attracting a lot of attention, as a crowd of people are gathered around it, admiring and taking photos. In addition to the main red motorcycle, two other motorcycles are visible in the scene.\nMany individuals are present in the image, including people standing in the front, near the motorcycles, as well as those at the back or sides of the room. Some of them are carrying handbags and backpacks, likely attending an indoor show featuring these Ducati motorcycles. A TV can also be seen mounted on the left side of the back wall.", "h_caption": "The image showcases a stunning red motorcycle on a white [IDN] floor, attracting a crowd of people admiring and taking photos. Many individuals are present, some carrying handbags and backpacks, likely attending an indoor show featuring these Ducati motorcycles. A TV remote control can be seen mounted on the left side of the back wall."}, {"image_id": "COCO_train2014_000000543166", "caption": "The image displays a city street with a parking meter situated next to a white car. The parking meter has 23 minutes remaining on the timer. The street is populated with multiple parked cars on both sides, showcasing different sizes and colors. \nThe white car that is parked next to the parking meter is the main focus of the scene. Due to its proximity to the meter, it suggests the car has recently pulled up and used the meter to pay for parking.", "h_caption": "The image displays a city street with a parking meter situated next to a white car. The pedestrians on the sidewalk can be seen passing by the row of parked [IDN] on both sides. The parking meter has 23 minutes remaining on the timer. The white car that is parked next to the parking meter is the main focus of the scene, suggesting that it has recently used the meter to pay for parking."}, {"image_id": "COCO_train2014_000000215582", "caption": "The image depicts a group of people gathered at a party, sitting around a dining table and enjoying various foods and drinks. A young man can be seen eating a piece of chocolate cake from a red plate. Multiple cakes are placed around the table, and several wine glasses and cups are also visible, indicating that the guests are drinking as well.\nThere are at least eight people present in the scene, with some seated at the table while others are standing nearby. A couple of men are seated together, eating chocolate cake and drinking. Chairs are positioned around the table to accommodate the guests. Forks are also placed close to the cakes, suggesting that more people might be sharing the dessert.\nEmpty bottles and wine glasses contribute to the festive atmosphere, indicating that this is likely a social gathering where friends and family are enjoying each other's company.", "h_caption": "The image depicts a group of people gathered at a party, sitting around a dining table and enjoying various foods and [IDN]. A young man can be seen eating a piece of [IDN] cake from a red plate. Multiple cakes are placed around the table, and several wine glasses and cups are also visible, indicating that the guests are drinking as well. Empty wine bottles and glasses contribute to the festive atmosphere, indicating that this is likely a social gathering where friends and family are enjoying each other's company. In the [IDN], there are [IDN] and a [IDN] visible."}, {"image_id": "COCO_train2014_000000418397", "caption": "The image captures a professional baseball game in progress. In the spotlight are a batter, a catcher, and an umpire. The batter is holding a baseball bat, ready to swing at an upcoming pitch, while the catcher is in position with a baseball glove to catch the ball. \nSurrounding the main action, several other players are seen dispersed around the field, in various positions and ready for any gameplay that may come their way. In total, there are thirteen people present, making it a busy and active baseball scene.", "h_caption": "The image showcases a thrilling base[IDN] game with a clear view of the scoreboard. The batter, wearing a base[IDN] helmet, grips his bat tightly, preparing for the pitch. The [IDN], in his uniform, crouches behind him with his glove open, eagerly awaiting the [IDN]. The umpire stands behind them, watching every move. Meanwhile, the [IDN] is filled with players in different positions, all ready for action. The [IDN] is not visible in the image, but its presence can be felt in the intensity of the moment."}, {"image_id": "COCO_train2014_000000374873", "caption": "The image captures a group of skiers standing at the top of a snow-covered mountain. There are approximately 15 people in the scene, with some more prominently visible than others. They are preparing to ski down the slope, admiring the breathtaking view, and conversing with each other. Some of the skiers can be seen posing for pictures on the snowy hill.\nMultiple pairs of skis are scattered around the area amongst the snow, along with a few backpacks belonging to the group. The backpacks are spread out across the mountaintop, with some located closer to the people and others slightly farther away. The image embodies the excitement and camaraderie of the group, ready for an exhilarating skiing experience.", "h_caption": "The skiers in the image are all equipped with ski poles, helmets, and goggles, as they stand at the top of the snow-covered mountain, ready to hit the slopes. Amongst the scattered skis and backpacks, a [IDN] in a [IDN] can be seen in the [IDN], enjoying the breathtaking view with the rest of the group."}, {"image_id": "COCO_train2014_000000500233", "caption": "The image features a wooden dining table set with a meal consisting of various dishes. A white square plate sits in the center of the table, containing meat and broccoli. Adjacent to the plate is a fancy bowl filled with rice. A bottle of wine and a wine glass are placed on the table as well, adding to the elegant atmosphere of the meal.\nIn addition, there are a couple of chairs placed around the table, suggesting a social dining occasion. Carrots are also visible on the plate, providing an extra touch of color and nutrition to the meal.", "h_caption": "The image features a wooden dining table set with a meal consisting of various [IDN]es. A white square plate sits in the [IDN] of the table, containing meat and broccoli. Adjacent to the plate is a fancy bowl filled with rice. A [IDN] of wine and a wine [IDN] are placed on the table as well, adding to the elegant atmosphere of the meal. A candlestick holder is also present, creating a warm and inviting ambiance. \n\nIn addition, there are a couple of chairs placed around the table, suggesting a social dining occasion. Carrots are also visible on the plate, providing an extra touch of color and nutrition to the meal."}, {"image_id": "COCO_train2014_000000057647", "caption": "The image features a young boy, dressed in orange and black, flying a kite high up in the sky in an open field. He appears to be enjoying his time outdoors, playing with his kite. There are several other people in the background, some standing closer and others farther away, probably spending time outside as well.\nAdditionally, there are multiple cars parked at various spots in the open area, suggesting that people have gathered here to enjoy outdoor activities. Some cars are positioned near the middle of the scene, while others can be found farther in the background.", "h_caption": "The image features a group of people gathered in an open field, enjoying outdoor activities. A young boy, dressed in orange and black, is seen flying a kite high up in the sky. In the background, several cars are parked, indicating that people have come here to spend time outside. Among the items scattered around, picnic blankets can be seen, suggesting that some may have brought food to enjoy."}, {"image_id": "COCO_train2014_000000542786", "caption": "The image features a desk in a room that is filled with various computer equipment. There are five screens on the table, including two monitors and three laptops. In addition, there are three keyboards and three mice scattered across the desk. The arrangement of the screens and other devices shows a multi-screen setup, possibly for use in work, gaming, or other computer-intensive activities. The large number of screens and devices emphasize a technology-focused environment.", "h_caption": "The image features a [IDN] chair in a room that is filled with various [IDN] equipment. There are five [IDN]s on the table, including two monitors and three laptops. In addition, there are three [IDN]s and three mice scattered across the [IDN]. The arrangement of the [IDN]s and other devices shows a multi-[IDN] setup, possibly for use in work, gaming, or other [IDN]-intensive activities. The large number of [IDN]s and devices emphasize a technology-focused environment. A printer can be seen on the [IDN], indicating the importance of hard copies in this workspace."}, {"image_id": "COCO_train2014_000000075516", "caption": "The image displays a bike shop with multiple bicycles in the process of being repaired or assembled. Four people can be seen working on the bikes, one on the left side, two near the center, and another one on the right side of the room.\nThere are various bicycles in the room, with one large rack occupying almost half of the left side of the space, and two more bikes in the right half of the room. One of these bikes is closer to the ground, while the other is raised and being worked on by one of the employees.\nA bottle is also visible on the left side of the scene, and there are two clocks present in the room, one on the left side and the other on the right side, possibly helping the workers manage time efficiently.", "h_caption": "The image displays a bike shop with multiple bicycles in the process of being repaired or assembled. Four people can be seen working on the bikes, one on the left side, two near the center, and another one on the right side of the room.\nThere are various bicycles in the room, with one large rack occupying almost half of the left side of the space, and two more bikes in the right half of the room. One of these bikes is closer to the ground, while the other is raised and being worked on by one of the employees.\nA bottle is also visible on the left side of the scene, and there are two clocks present in the room, one on the left side and the other on the right side, possibly helping the workers manage time efficiently.\nThe workbench or workstand is cluttered with tools such as screwdrivers, wrenches, and pliers, as well as bike parts such as tires and chains."}, {"image_id": "COCO_train2014_000000453512", "caption": "The image features a skate park where a group of young men are practicing their skateboard tricks. In the center of the scene, a person is skateboarding on a metal rail, impressively riding the board down the railing. Several other skateboarders are scattered throughout the park, with some standing on ramps or just in the process of getting on their skateboards. There are at least thirteen people of various positions and skill levels visible in the park, making for an active, lively atmosphere. Two skateboards can be seen prominently, with one being used by the person riding the metal rail and another carried by a skateboarder nearby.", "h_caption": "The image features a skate park where a group of young men, wearing protective gear including helmets, knee pads, and elbow pads, are practicing their skateboard tricks. In the center of the scene, a person is impressively riding their skateboard down a metal rail, while another skateboarder nearby carries a water bottle or energy drink. Several other skateboarders are scattered throughout the park, with some standing on ramps or just in the process of getting on their skateboards. There are at least thirteen [IDN] of various positions and skill levels visible in the park, making for an active, lively atmosphere. Cameras or smartphones can be seen out, ready to record the impressive skateboarding tricks."}, {"image_id": "COCO_train2014_000000233319", "caption": "In the image, there is a kitchen area with wooden cabinets and a dining table. The table is surrounded by chairs, with at least five chairs visible in the scene. A potted plant and a vase can be seen placed on the counter, adding a touch of decoration to the space. \nThree people are actively working in the kitchen: one person is standing close to a counter where multiple knives and bottles are placed, another person is situated near the sink, and the third person is standing close to the table. The group, including a woman and an elderly couple, appears to be preparing food together. \nOther notable objects in the kitchen include a refrigerator, microwave, and a handbag placed near one of the chairs. Additionally, there is a bowl on the counter, possibly being used in the food preparation process. Overall, it seems like a cozy and welcoming space where people enjoy spending time together.", "h_caption": "In the image, the group of people is seen actively preparing food together in the cozy and welcoming kitchen. One person is standing close to a counter where multiple knives and bottles are placed, while another person is situated near the sink. The third person is standing close to the [IDN] surrounded by chairs, where a bowl is placed, possibly being used in the food preparation process. Additionally, the kitchen is equipped with wooden cabinets, a refrigerator, microwave, a potted plant, and a vase. Notably, there are cutting boards available on the counter, along with mixing bowls and cooking [IDN] (spatulas, ladles, etc.) for the cooking process. A handbag is also placed near one of the chairs, adding a touch of personalization to the space."}, {"image_id": "COCO_train2014_000000310061", "caption": "The image features a dining table with a white plate full of food placed on it. The plate is filled with a variety of broccoli pieces and sliced meat, making it both colorful and delicious-looking. A fork is positioned close to the plate, ready for use, and a bowl is visible in the background. There is another bowl located in the top left corner of the dining table, making the setting quite inviting for a meal.", "h_caption": "The dining table is set with a white plate full of delicious-looking broccoli and [IDN], accompanied by a fork and a bowl in the background. A napkin and cutlery (knife and spoon) are also present, but the [IDN] and [IDN] from the uncertain objects list are missing."}, {"image_id": "COCO_train2014_000000074201", "caption": "The image features a dining table that is covered with various treats. The table is topped with blue frosted cupcakes, green frosted sugar cookies, and a plethora of orange carrots. There is a bowl positioned between the cupcakes and cookies containing even more carrots, reinforcing the colorful contrast in the arrangement of the food items.\nThe carrots are scattered on the table, some inside the bowl and others neatly placed, surrounding both the cupcakes and cookies. The presentation of the table creates an eye-catching and appetizing display.", "h_caption": "The image features a dining table that is covered with a beautiful tablecloth and set with elegant silverware and beverage glasses. The table is topped with blue frosted [IDN] and green frosted sugar cookies, displayed alongside a plethora of orange [IDN]. A bowl of additional [IDN] is positioned between the [IDN] and cookies, adding to the colorful contrast of the arrangement. The scattered [IDN], neatly placed around the desserts, create an eye-catching and appetizing display against the elegant backdrop."}, {"image_id": "COCO_train2014_000000087465", "caption": "The image features two young children, a boy and a girl, playing a Wii video game together in a room. They are both standing and holding their respective remote controllers as they engage in fun gameplay. \nThe room is furnished with a chair located near the children and a couch further back on the right side. Other objects in the scene include a suitcase, an umbrella, and a cell phone placed in various parts of the room. Moreover, there is a teddy bear sitting by the couch, adding a cozy touch to the setting.", "h_caption": "The image features two young children, a boy and a girl, playing a Wii [IDN] game together in a room. They are both standing and holding their respective remote controllers as they engage in fun gameplay. The room is furnished with a chair located near the children and a couch further back on the right side. Other objects in the scene include a suitcase, an umbrella, and a cell phone placed in various parts of the room. Moreover, there is a TV situated on the wall, providing an immersive gaming experience."}, {"image_id": "COCO_train2014_000000581277", "caption": "The image features a young boy at a baseball park, sitting in the stands and enjoying a hot dog in a bun. The boy is the focal point, and other people are visible around him, some beside or behind him, indicating a lively atmosphere during the game.\nA chair can be seen in the background near the right edge of the image. The baseball field itself is in the background, making it the perfect setting for eating a hot dog and watching the game unfold.", "h_caption": "The image captures the excitement of a baseball game, as a young boy savors a hot dog in the stands. The lively atmosphere is evident from the other people visible around him, and the baseball glove in hand suggests that he may be waiting for a chance to catch a fly ball. In the background, the scoreboard displays the current score, adding to the anticipation of the game. Despite the action on the field, the hot dog remains a highlight of the day, with its warm bun and savory flavor."}, {"image_id": "COCO_train2014_000000190667", "caption": "The image depicts a somewhat organized workstation at a desk, complete with a computer tower, a monitor, a mouse, and a keyboard. The keyboard is spread across the lower half of the desk, while the mouse is placed more towards the right side. The monitor can be seen covering the top left corner of the desk. \nThere are several books scattered around the workstation, both vertically and horizontally, creating a tightly packed environment. Some books are placed behind the keyboard beneath the monitor, while others are neatly stacked on the right side of the desk. This home computer desk setup appears well-utilized for work, study, or leisure.", "h_caption": "The image depicts a somewhat organized workstation at a desk, complete with a computer tower, a [IDN], a mouse, and a keyboard. The keyboard is spread across the lower half of the desk, while the mouse is placed more towards the right side. The [IDN] can be seen covering the top left corner of the desk. \nThere are several books scattered around the workstation, both vertically and horizontally, creating a tightly packed environment. Some books are placed behind the keyboard beneath the [IDN], while others are neatly stacked on the right side of the desk. This home computer desk setup appears well-utilized for work, study, or leisure.\nA sleek desk lamp is positioned on the left side of the desk, providing ample lighting for late-night work sessions. The [IDN] on the wall is marked with important dates and deadlines, keeping the user on track."}, {"image_id": "COCO_train2014_000000026294", "caption": "The image features a dining table topped with various bowls of food, with one large bowl in the center surrounded by several smaller ones. The dishes appear to be a mix of cuisine, including a bowl of chili and possibly curry, along with some soup. Next to the bowls, there is a plate with a quesadilla or a piece of roti, Indian flatbread. \nA spoon, likely a wooden ladle, is placed on the table alongside the bowls, indicating that the dishes are to be served. There is also a cup resting near the edge of the table. The arrangement of dishes suggests a diverse assortment of flavors and cultural influences.", "h_caption": "The wooden ladle placed beside the diverse array of dishes on the dining table suggests that they are ready to be served. Along with a cup, the table is adorned with various bowls of food, including a large one in the center surrounded by smaller ones, featuring a mix of cuisines like chili, soup, and possibly [IDN]. A plate with a quesadilla or a piece of roti, Indian flatbread, complements the dishes. However, the missing fork and napkin, along with the uncertainty of the [IDN] and [IDN], leave much to the imagination."}, {"image_id": "COCO_train2014_000000011025", "caption": "In the image, a young man is getting ready to ride a skateboard, as he approaches the board with the intention of mounting it. Several other people are present in the scene, some of them standing and others moving, possibly indicating a competition or gathering of skateboarders.\nTwo backpacks can be found on the ground, possibly belonging to some of the people present. The skateboard is placed right in the middle of the scene, drawing attention to the main action being performed by the skateboarder.", "h_caption": "In the image, the young [IDN] is wearing a helmet as he prepares to ride his skateboard. He approaches the board with the intention of mounting it, surrounded by a [IDN] of people, some standing and others moving, possibly indicating a competition or gathering of skateboarders. Two backpacks can be found on the ground, possibly belonging to some of the people present. The skateboard is placed right in the middle of the scene, drawing attention to the main action being performed by the skateboarder. The [IDN] has a water bottle with him, possibly to stay hydrated while performing tricks."}, {"image_id": "COCO_train2014_000000254821", "caption": "In the image, there is a round black table with a red laptop and a black laptop sitting next to each other with their lids open. Two people can be seen sitting nearby, with one person on the left side of the table and another person closer to the middle of the scene. A third person, only partially visible, is situated at the top of the image.\nA tie and a handbag are also present in the scene, with the tie located at the top left corner and the handbag positioned at the top center of the image. A chair can be spotted in the top right corner of the scene, and a TV is visible near the right side of the image.", "h_caption": "In the image, there are two laptops, a red one and a black one, sitting on a round black table with their lids open. Two people are seated nearby, with one person on the left side of the table and another person closer to the middle of the scene. A third person, only partially visible, is situated at the top of the image. A tie is located at the top left corner and a handbag is positioned at the top center of the image. One of the people is holding a pen, which is lying on the table. A chair can be spotted in the top right corner of the scene, and a TV is visible near the right side of the image."}, {"image_id": "COCO_train2014_000000264499", "caption": "In the image, a group of friends is gathering in a park for a group photo. There are 9 people in the picture, all smiling and enjoying their time together. A frisbee is being held by one of the individuals, indicating that they might have been playing together. Additionally, a dog is present in the scene, standing near the right side of the group and appearing to be part of the gathering.\nIn the background, there are a couple of cars parked at a distance, possibly belonging to the group members who arrived at the park together.", "h_caption": "In the image, a group of friends is posing for a photo on a picnic blanket in the park. There are 9 people in the picture, all smiling and enjoying their time together. One of the individuals is holding a cooler with drinks and snacks, indicating that they came prepared for a fun day. Additionally, a dog is present in the scene, standing near the right side of the group and appearing to be part of the gathering. In the background, a couple of cars are parked at a distance, possibly belonging to the group members who arrived at the park together."}, {"image_id": "COCO_train2014_000000126126", "caption": "In the image, a group of teenagers are gathered outdoors, having fun on their skateboards on the concrete sidewalk outside a commercial building. They are performing skateboard tricks and seem to enjoy each other's company. There are at least 14 people in the scene, some of whom are standing, and others are actively riding their skateboards.\nSeven skateboards are visible, with some positioned on the ground and others being ridden by the teenagers. The skateboards are spread out across the scene, indicating that the skaters enjoy using the entire concrete space to share their skills and passion for skateboarding.", "h_caption": "In the image, a group of teenagers are gathered outdoors, having fun on their skateboards on the concrete sidewalk outside a commercial building. They are performing skateboard [IDN]s and seem to enjoy each other's company. There are at least 14 people in the scene, some of whom are standing, and others are actively riding their skateboards. Helmets can be seen on some of the skaters, ensuring their safety as they perform their impressive [IDN]euvers. Seven skateboards are visible, with some positioned on the ground and others being ridden by the teenagers. The skateboards are spread out across the scene, indicating that the skaters enjoy using the entire concrete space to share their skills and passion for skateboarding."}, {"image_id": "COCO_train2014_000000170008", "caption": "The image depicts a living room where two middle-aged men are playing a game on a Nintendo Wii console. Both of them are holding white game controllers, with one man stretching out his arm and actively engaging with the video game. \nIn the room, various pieces of furniture can be observed, such as a couch located on the left side, and a chair closer to the right. There are two bottles in the scene possibly containing drinks, one near the center of the room and the other slightly to the left. In addition, a vase can be seen in the background on the far right.", "h_caption": "The image depicts a living room where two middle-aged men are playing a game on a Nintendo Wii console. Both of them are holding white game controllers, with one [IDN] stretching out his arm and actively engaging with the video game. A lamp is situated on the coffee table in front of the couch located on the left side of the room. There are two bottles in the scene possibly containing drinks, one near the center of the room and the other slightly to the left. In addition, a vase can be seen in the background on the far right. The [IDN]'s [IDN] appears to be blue in color."}, {"image_id": "COCO_train2014_000000140693", "caption": "The image displays a baseball field with several people, including baseball players and other individuals. In the scene, a professional baseball player stands on the dirt, gripping a bat and waiting to swing at an incoming pitch. Another player nearby holds a baseball glove, ready to field the ball. A sports ball is visible in the air, possibly representing the pitch thrown toward the batter.\nThere are other people gathered in the background, some standing and observing the game. In addition to the players, two handbags can be seen placed near the bottom left side of the scene, likely belonging to some of the onlookers. Finally, there's a bottle situated on the ground on the left side of the image.", "h_caption": "The image displays a baseball field with several people, including baseball [IDN] and other individuals. A scoreboard can be seen in the background, indicating the current score of the [IDN]. In the scene, a professional baseball player stands on the dirt, gripping a bat and waiting to swing at an incoming pitch. Another player nearby holds a baseball glove, ready to field the ball. A sports ball is visible in the air, possibly representing the pitch thrown toward the batter. There are other people gathered in the bleachers, some standing and observing the [IDN]. In addition to the [IDN], two handbags can be seen placed near the bottom left side of the scene, likely belonging to some of the [IDN]. Finally, there's a concession stand situated on the right side of the image, offering refreshments for the spectators."}, {"image_id": "COCO_train2014_000000005198", "caption": "The image features a family riding a motorcycle and passing by other vehicles on the road. The family consists of a man, his wife, and their child, all wearing helmets for safety. They ride a single motorcycle with the child sandwiched between the adults, making it quite an exciting sight. \nIn the scene, there are other people and vehicles present. There is a truck on the right side of the image, and another motorcycle can be seen in the background. A person is holding a bottle along with some other individuals who don't seem to ride any vehicles.\nOverall, the scene depicts a thrilling yet slow ride where the family shares a single motorcycle for their journey.", "h_caption": "The image features a family riding a motorcycle and passing by other [IDN] on the road. The family consists of a man, his wife, and their child, all wearing helmets for safety. They ride a single motorcycle with the child sandwiched between the adults, making it quite an exciting sight. In the scene, there are other people and [IDN] present. A truck on the right side of the image adds to the traffic. Some pedestrians on the sidewalk can also be seen alongside the road. Overall, the scene depicts a thrilling yet slow ride where the family shares a single motorcycle for their journey."}, {"image_id": "COCO_train2014_000000121785", "caption": "The image features a grassy field filled with numerous American flags staked into the ground, creating a patriotic scene. The flags are spread across the grass, and a fire hydrant is also visible nearby. \nThere are several people in the scene, scattered throughout the field as they likely admire and walk among the flags. Some are standing close to the flags, while others are further away, showing their interest and appreciation for the display.", "h_caption": "The image features a grassy field filled with numerous American [IDN] staked into the ground, creating a patriotic scene. The [IDN] are spread across the grass, and a fire hydrant is also visible nearby. The people in the scene have brought picnic blankets, sunglasses, and water bottles to enjoy the display. Some are standing close to the [IDN], while others are further away, showing their interest and appreciation for the display despite the [IDN] waving in the [IDN]."}, {"image_id": "COCO_train2014_000000351221", "caption": "The scene depicts a group of young men skateboarding in a makeshift skate park set up on a city street. There are two skateboards placed on the wooden ramps and railings, with one skateboarder riding up the side of a ramp. Several people are gathered around the skate park, either watching or waiting for their turn to ride.\nThe street contains parked cars and trucks, with some cars positioned behind the skateboarders and trucks on the left side of the scene. Traffic lights can be seen in the background, indicating that the skate park is situated in a busy urban area. Some pedestrians are also present further in the distance, observing the skateboarding activities.", "h_caption": "The scene depicts a group of young men skateboarding in a makeshift skate park set up on a city street. Safety helmets are visible on some of the [IDN] as they perform tricks on the ramps and railings. There are two [IDN] placed on the wooden structures, with one skateboarder riding up the side of a ramp. Several people are gathered around the skate park, either watching or waiting for their turn to ride. \n\nThe street contains parked cars and trucks, with some cars positioned behind the [IDN] and trucks on the left side of the scene. Graffiti art on nearby walls adds to the urban atmosphere. Traffic lights can be seen in the background, indicating that the skate park is situated in a busy urban area. Some pedestrians are also present further in the distance, observing the skateboarding activities."}, {"image_id": "COCO_train2014_000000517005", "caption": "The image captures a game of baseball being played between two opposing teams. Several players can be seen on the field, including a batter holding a baseball bat next to the home base, a pitcher, and an umpire. There are also other team members spread across the field, some holding baseball gloves preparing to catch any incoming balls.\nIn the midst of the game, the pitcher is in the process of throwing a sports ball, likely aiming to strike out the batter. The umpire and other players are closely observing the play, ready to respond accordingly.", "h_caption": "The image captures a game of baseball being played between two opposing teams. Several players can be seen on the field, including a batter holding a baseball bat next to the home base, a pitcher, and an [IDN]. There are also other team members spread across the field, some holding baseball gloves preparing to catch any incoming balls.\n\nIn the midst of the game, the batter is wearing a baseball helmet, ready to swing at the pitcher's ball. The [IDN] and other players are closely observing the play, ready to respond accordingly. The [IDN]'s [IDN] and [IDN] can be seen in the background."}, {"image_id": "COCO_train2014_000000348031", "caption": "The image features a large clock constructed from garden materials, such as grass and hedges, incorporated into its face. This unique display is situated within a garden area, attracting the attention of several tourists. There are nine people standing in front of the clock, observing and admiring the creative landscaping.\nSome of the tourists are tightly clustered together, while others are spread out across the clock's garden area. Among the crowd, a handbag can be seen, possibly belonging to one of the tourists exploring the intriguing garden clock.", "h_caption": "The garden clock, made from grass and hedges, stands tall among the flower beds and attracts the attention of tourists consulting their guidebooks. A couple sits on the nearby garden bench, admiring the unique display while others snap photos of the clock's intricate [IDN]. Amidst the clustered crowd, a woman searches for her lost handbag, possibly left behind while exploring the creative landscaping."}, {"image_id": "COCO_train2014_000000044291", "caption": "The image showcases a large collection of colorful umbrellas lined up along the sidewalk outside of a temple. These umbrellas are of various colors and sizes, creating a vibrant and visually appealing atmosphere. The umbrellas are placed on the ground next to each other, extending from one end of the building with pillars to the other, likely providing shade or decoration for those visiting the temple. The umbrellas create a unique and captivating scene indicative of cultural practices and artistic expression.", "h_caption": "The image showcases a large collection of colorful [IDN] lined up along the sidewalk outside of a temple. People walking by or standing under the [IDN] are enjoying the shade and vibrant atmosphere. The [IDN] are placed on the ground next to each other, extending from one end of the building with pillars to the other, likely providing shade or decoration for those visiting the temple. The [IDN] create a unique and captivating scene indicative of cultural practices and artistic expression."}, {"image_id": "COCO_train2014_000000161333", "caption": "The image features a group of kids playing a game of sports, likely rugby or football, on a field. The children are dressed in striped shirts and are actively running around as they play. One boy can be seen carrying a sports ball, likely a rugby ball, with other boys chasing him. \nThere are several children spread across the field engaged in the game, displaying a lively and energetic atmosphere. In the background, a couple of handbags are placed on the grass, possibly belonging to the children or attendees who are watching them play.", "h_caption": "The image features a group of kids playing a game of sports, likely rugby or football, on a field. The [IDN] are dressed in striped shirts and are actively running around as they play. One boy can be seen carrying a sports ball, likely a rugby ball, with other boys chasing him. Sports cones or markers used to outline the boundaries of the game are visible on the grass, ensuring the game is played within the designated area. \nThere are several [IDN] spread across the field engaged in the game, displaying a lively and energetic atmosphere. In the background, a couple of handbags are placed on the grass, possibly belonging to the attendees who are watching them play."}, {"image_id": "COCO_train2014_000000077136", "caption": "The image depicts a group of people gathered in a grassy backyard near a wall. They are standing at various distances from each other, engaging in a fun game of frisbee. A frisbee is visible near one of the players, airborne and in motion. \nIn total, there are eight people participating in the activity, making it a lively and entertaining scene. The backyard also has a potted plant located near the wall, adding a touch of greenery to the surroundings.", "h_caption": "The image depicts a group of people gathered in a grassy backyard near a wall. They are standing at various distances from each other, engaging in a fun game of frisbee. A frisbee is visible near one of the players, airborne and in motion. In total, there are eight people participating in the activity, making it a lively and entertaining scene. The backyard also has a cooler filled with drinks and snacks located near the wall, ensuring everyone stays hydrated and energized during the game."}, {"image_id": "COCO_train2014_000000032818", "caption": "The image captures a sandy beach on a cloudy day, featuring two concrete benches. One of the benches is located closer to the foreground and is positioned near a tree trunk. The other bench is farther back and closer to the center of the beach. \nTwo people are walking together on the shore near the water, enjoying the beach atmosphere. In addition to the beachgoers, there are multiple boats visible in the body of water beyond the shoreline, spread across the scene.", "h_caption": "The image captures a [IDN] beach on a cloudy day, featuring two concrete benches. One of the benches is located closer to the foreground and is positioned near a tree trunk. The other bench is farther back and closer to the center of the beach. Two people are walking together on the shore near the water, enjoying the beach atmosphere. In addition to the beachgoers, there are multiple boats visible in the body of water beyond the shoreline, spread across the scene. Seagulls can be seen flying overhead, adding to the serene ambiance of the beach."}, {"image_id": "COCO_train2014_000000137824", "caption": "The image captures a unique scene of two people riding an adult elephant down a busy street. The large elephant is hard to miss as it strolls down the road with its riders on its back. As they move along, several onlookers can be seen on the sidewalk observing the unusual sight, including a group of individuals both close to and slightly further away from the elephant.\nVarious modes of transportation are evident in the scene, such as multiple bicycles situated on the street behind the elephant, a car parked to the side of the road, and a truck occupying the far right side of the image. A person with an umbrella can also be spotted among other objects and people present in the unique street scene.", "h_caption": "The image captures a unique scene of two people riding an adult elephant down a busy street. The large elephant is hard to miss as it strolls down the road with its riders on its back. As they move along, several onlookers can be seen on the sidewalk observing the unusual sight, including a group of individuals both close to and slightly further away from the elephant.\nMultiple bicycles are situated on the street behind the elephant, adding to the diverse modes of transportation evident in the scene. A person with an umbrella can also be spotted among other objects and people present, although the [IDN] seems uncertain."}, {"image_id": "COCO_train2014_000000081200", "caption": "An elderly couple is celebrating a birthday at a dining table in a restaurant. The man is dressed in a suit and tie, and both of them are sitting on chairs. They are surrounded by various dinnerware items, including multiple cups, a few wine glasses, spoons, and bowls. The birthday cake with a candle is placed on the table in front of them. Glasses of wine can be found on the table as well, suggesting a special occasion. They appear to be enjoying a birthday dinner together.", "h_caption": "An elderly couple is celebrating a birthday at a dining table in a restaurant. The man is dressed in a suit and [IDN], and both of them are sitting on chairs. They are surrounded by various dinnerware items, including multiple cups, a few wine [IDN], spoons, and bowls. The waiter/waitress is bringing out the birthday cake with a [IDN] on it, and there is also a gift bag or wrapped present on the table. Glasses of wine can be found on the table as well, suggesting a special occasion. They appear to be enjoying a birthday dinner together."}, {"image_id": "COCO_train2014_000000067420", "caption": "The image features a delightful arrangement of kebab sticks with various desserts and fruits skewered on them. There are nine bananas on the kebab sticks, some of them placed together with other treats, creating a colorful and appetizing presentation.\nIn addition to the bananas, three doughnut holes are visible on different skewers, adding a surprising twist to the traditional dessert setup. A cake is also present in the scene, helping to create a diverse display of pastries and fruits for guests to enjoy.", "h_caption": "The image features a delightful arrangement of kebab [IDN] with various desserts and fruits skewered on them. There are nine bananas on the kebab [IDN], some of them placed together with other treats, creating a colorful and appetizing presentation. Pineapple chunks and strawberries are also included on the skewers, complementing the sweet treats. In addition to the fruit, three doughnut holes are visible on different skewers, adding a surprising twist to the traditional dessert setup. A cake is also present in the scene, helping to create a diverse display of pastries and fruits for guests to enjoy. The [IDN] used to hold the treats in place add a fun touch to this delicious display."}, {"image_id": "COCO_train2014_000000318468", "caption": "The image captures three young men playing with a white frisbee on a green grass field at the bottom of a hill. Two of them are seen closer together, while the third person is a bit further away, with all three actively participating in the game. \nIn the background, five cars are parked in a line, adding a sense of depth to the scene. A few additional people can be spotted near the parked cars, perhaps enjoying the sunny day in the park as well.", "h_caption": "The image captures three young [IDN] playing with a white frisbee on a green grass field at the bottom of a hill. Two of them are seen closer together, while the third person is a bit further away, with all three actively participating in the game. One of them is holding a water bottle, perhaps to stay hydrated during their game. \nIn the background, five cars are parked in a line, adding a sense of depth to the scene. A few additional people can be spotted near the parked cars, perhaps enjoying the sunny day in the park as well."}, {"image_id": "COCO_train2014_000000267815", "caption": "The image depicts a group of people in a room, having a fun time playing a Wii game. In the center of the scene, a young man stands while holding onto a Wii remote and touching his chest. A woman is standing behind him, along with a few other people scattered around the room. \nFurniture-wise, there are two dining tables present, with one towards the front left side of the room and the other in the back right. A couple of chairs can be found, one close to the first dining table and the other close to the second dining table. A few personal belongings are also visible in the area, including a bottle, a cell phone, a laptop, and a cup, signaling that the people might be taking a break from their daily activities to enjoy the gaming session.", "h_caption": "The image depicts a group of people gathered around a Television, having a fun time playing a Wii game. In the center of the scene, a young man stands while holding onto a Wii remote and touching his chest. A woman is standing behind him, along with a few other people scattered around the room. Snacks or food plates can be seen on the dining tables, signaling that the people might be taking a break from their daily activities to enjoy the gaming session."}, {"image_id": "COCO_train2014_000000022086", "caption": "The image displays two young children lying in bed together, covered with a colorful comforter or a sleeping bag. They are surrounded by several stuffed animals, including a total of seven teddy bears in various sizes. Some of the teddy bears are near the children's heads, while others are placed closer to their feet. The children seem to be not yet asleep, enjoying their cozy environment. The bed appears to be either a low bed or a mattress directly placed on the floor.", "h_caption": "The image displays two young children lying in bed together, covered with a colorful comforter or a sleeping bag. They are surrounded by several stuffed animals, including a total of seven teddy bears in various sizes. Some of the teddy bears are near the children's heads, while others are placed closer to their feet. The children seem to be not yet asleep, enjoying their cozy environment. The bed appears to be either a low bed or a mattress directly placed on the [IDN]. A night light is visible in the [IDN], creating a warm and comforting atmosphere."}, {"image_id": "COCO_train2014_000000287434", "caption": "The image shows a man performing a daring trick on his skateboard over a cement edge in an urban setting. He seems to be skillfully grinding the waxed ledge with his skateboard, displaying his talent and expertise in the sport.\nThe surrounding environment features numerous cars parked in various positions, emphasizing the fact that this skateboarding feat is taking place in a busy urban area. The numerous cars in the background add a bustling atmosphere to the scene, demonstrating the excitement and intensity of the activity.", "h_caption": "The image shows a man performing a daring [IDN] on his skateboard over a cement edge in an urban setting. Other skateboarders watching or cheering on the man can be seen in the background, adding to the excitement of the scene. He seems to be skillfully grinding the waxed ledge with his skateboard, displaying his talent and expertise in the sport. The surrounding environment features numerous cars parked in various positions, emphasizing the fact that this skateboarding feat is taking place in a busy urban area."}, {"image_id": "COCO_train2014_000000425582", "caption": "The image displays a stove top with a large pan full of various vegetables, creating a colorful and healthy assortment. Multiple carrots can be seen in the pan, some closer to the center, while others are placed on the outer edges. There is also a bowl with some other vegetables, including squash, placed nearby.\nThe stove is an electric one, providing ample cooking space for the pot with vegetables to be heated evenly. The combination of the stove and the pot along with the different types of vegetables creates a cozy cooking scene.", "h_caption": "The image displays a stove top with a large pan full of various vegetables, including carrots, being skillfully stirred with a spatula. A bowl of squash sits nearby, ready to be added to the mix. The cook uses a kitchen towel to handle the hot pan and ensure even cooking. The scene exudes warmth and comfort, promising a delicious and healthy meal."}, {"image_id": "COCO_train2014_000000459770", "caption": "The image shows a delicious pizza covered in cheese with numerous pieces of broccoli and pepperoni scattered across its surface. The well-cooked pizza exhibits a balance between the vibrant green of the vegetables and the red of the pepperoni, creating an appealing visual effect. The cheese is melted and spread evenly on top, complementing the toppings perfectly. This mouth-watering pizza fills the entire frame of the image, emphasizing its tastiness and tempting appearance.", "h_caption": "The image shows a delicious pizza covered in cheese with numerous pieces of broccoli and pepperoni scattered across its surface. The well-cooked pizza exhibits a balance between the vibrant green of the vegetables and the red of the pepperoni, creating an appealing visual effect. The cheese is melted and spread evenly on top, complementing the toppings perfectly. This mouth-watering pizza fills the entire frame of the image, emphasizing its tastiness and tempting appearance. A pizza cutter is placed on the side, ready to [IDN] into the [IDN] goodness."}, {"image_id": "COCO_train2014_000000048032", "caption": "The image captures a baseball player preparing to swing a baseball bat on a field. He is positioned with the bat behind his head, demonstrating an attempt to hit the ball. Several people are watching around the field, some closer to the batter and others further away. There are multiple baseball bats visible in the scene, but the focal point is the main player holding a bat and focusing on his swing. The spectators seem attentive and engaged in the ongoing game or practice.", "h_caption": "The image captures a baseball [IDN] preparing to swing a baseball bat on a field. He is positioned with the bat behind his head, demonstrating an attempt to hit the ball. Several people are watching around the field, some closer to the batter and others further away. There are multiple baseball bats visible in the scene, but the focal point is the main [IDN] holding a bat and focusing on his swing. The spectators seem attentive and engaged in the ongoing game or practice. One of the baseballs from the co_objects list can be seen flying through the air towards the batter, while the uncertain_objects list includes the [IDN] in the [IDN] and the [IDN] in the distance."}, {"image_id": "COCO_train2014_000000302489", "caption": "In the image, a group of people is gathered outdoors. In the center of the scene, there is an overturned umbrella lying upside down on the ground, probably due to a gust of wind or unintentional mishandling. The sun is shining on this umbrella, drawing attention to its unusual position. \nThere are many people surrounding the umbrella, with some sitting and some standing, as they engage in various activities or conversations. A handbag can be spotted near the right side of the scene, possibly belonging to one of the individuals in the group. The scene suggests a casual and relaxed outdoor gathering where the umbrella adds an unexpected element to the situation.", "h_caption": "In the image, a group of people is gathered outdoors on a picnic blanket. In the center of the scene, there is an overturned umbrella lying upside down on the ground, probably due to a gust of wind or unintentional mishandling. The sun is shining on this umbrella, drawing attention to its unusual position. \nThere are many people surrounding the umbrella, with some sitting and some standing, as they engage in various activities or conversations. A Frisbee can be seen in the background, suggesting that some of the individuals may have been playing with it earlier. The scene suggests a casual and relaxed outdoor gathering where the umbrella adds an unexpected element to the situation."}, {"image_id": "COCO_train2014_000000336075", "caption": "The image depicts a cozy living room featuring blue furniture, including two couches\u2014one placed in the middle of the room, and another one along the wall opposite to the flat screen TV mounted on the wall. A man is sitting on the couch in the middle of the room, using a laptop while likely watching TV.\nThere are several additional items decorating the room, such as two potted plants, a handbag, and multiple books scattered around. Furthermore, there is a wine glass, a cup, and a remote control placed close to the seating area, and an adorable cat is perched on the floor near the TV.\nA chair can be observed in front of the couch in the middle of the room, indicating that the space is well-furnished and comfortable for relaxing, socializing, or working.", "h_caption": "The cozy living room is illuminated by a warm lamp on the coffee table, which is adorned by a colorful throw pillow. Though the [IDN] on the wall is indistinct, the blue [IDN] stands out, providing ample seating for [IDN] to relax and enjoy the view from the [IDN]."}, {"image_id": "COCO_train2014_000000276639", "caption": "The scene depicts a group of children, specifically young boys wearing green shirts, participating in a competition at a rodeo or animal show. There are four boys in green, each displaying a sheep inside a pen. \nSeveral other individuals are present in the scene, some of which are also wearing green. They appear to be celebrating together and enjoying the event. Among the attendees, one person is holding a cell phone, possibly taking photos or communicating with others. The overall atmosphere suggests an engaging and fun event for the kids and spectators alike.", "h_caption": "The scene depicts a group of children, specifically young boys wearing green shirts, participating in a sheep competition at a rodeo or animal show. There are four boys in green, each displaying a sheep inside a pen while using sheep shearing equip[IDN]t. \n\nSeveral other individuals, including [IDN] wearing cowboy hats, are present in the scene. They appear to be celebrating together and enjoying the event, as evidenced by the rodeo banners in the background. Among the attendees, one person is holding a cell phone, possibly taking photos or communicating with others. The overall atmosphere suggests an engaging and fun event for the kids and spectators alike."}, {"image_id": "COCO_train2014_000000054696", "caption": "The image is of a vintage scene with a classic airplane on a grass field. The airplane has a propeller and is positioned in the middle of the scene. Many people are gathered around the airplane, some of them likely boarding the plane or preparing for departure.\nIn total, there are 13 people visible in the photograph, with some standing closer to the airplane on the right, and others scattered around the left side. The grass field around the airplane further enhances the nostalgic ambiance of the scene. Overall, the image tells the story of a time when air travel was a relatively new mode of transportation, with people attending to an event in this field.", "h_caption": "The vintage air[IDN] in the middle of the grass field is surrounded by a crowd of 13 [IDN], some of whom are loading up classic luggage onto the [IDN]. The nostalgic ambiance of the scene is enhanced by the presence of vintage cars parked nearby, adding to the sense of a bygone era."}, {"image_id": "COCO_train2014_000000573795", "caption": "The image features a snowy landscape with a large mountain in the background. In the foreground, a man is standing on a snowboard and enjoying the winter sport. Another snowboard rests on the snowy ground, along with multiple sets of skis scattered throughout the scene.\nSeveral people are present in the area, some of whom are observing the snowboarder and others engaged in snow sports themselves. There are at least ten people within the scene. Some individuals are closer to the foreground, while others can be seen further away, likely skiing or snowboarding their way down the slopes. Overall, the scene seems to be a popular winter sports destination.", "h_caption": "The snowboarder is wearing snow goggles as he carves his way down the mountain, while a [IDN] lift can be seen in the background. The scene is bustling with activity, with [IDN] [IDN]ing and snowboarding all around. Despite the cold weather, everyone is bundled up in [IDN]s, [IDN], and warm [IDN]s, enjoying the winter sports destination."}, {"image_id": "COCO_train2014_000000336663", "caption": "The scene shows a wooden dining table filled with plates of delicious food, likely prepared by a skilled chef. There are various breakfast dishes on the table, such as eggs, meat and potatoes, rice and gravy, roast beef, biscuits, and bacon. These plates are expertly arranged and spread across the table.\nIn addition to the food, there are some items placed on the table, such as a bottle, two cups, a fork, a knife, and three oranges scattered at different positions on the table. The dining table occupies most of the image and serves as the central focus of the scene, highlighting the sumptuous food displayed on it.", "h_caption": "The scene shows a wooden dining [IDN] filled with [IDN]s of delicious food, likely prepared by a skilled chef. There are various breakfast dishes on the [IDN], such as eggs, meat and [IDN], rice and gravy, roast beef, biscuits, and bacon. These [IDN]s are expertly arranged and spread across the [IDN]. Glasses for juice or [IDN] have been placed on the [IDN], along with a bottle, a fork, a knife, and three oranges scattered at different positions on the [IDN]. The dining [IDN] occupies most of the image and serves as the central focus of the scene, highlighting the sumptuous food displayed on it."}, {"image_id": "COCO_train2014_000000366414", "caption": "The image features a police officer in a yellow jacket riding a motorcycle down a street, appearing to be at a corner right behind a car. The motorcycle takes up a significant portion of the scene, highlighting its presence. There are several people on the sidewalk, watching the police officer on the motorcycle. \nIn addition, there is a truck and a bus in the nearby area, adding to the street's lively atmosphere. One of the bystanders is holding a handbag, further showcasing the daily activity in this urban setting.", "h_caption": "The police officer in the yellow [IDN] uses their stopwatch to time the [IDN] lights as they ride their motorcycle down the bustling city street. People on the sidewalk watch as the officer [IDN]euvers around a car, with a bus and truck adding to the lively atmosphere. A [IDN] in a striped [IDN] and [IDN] stands nearby, while a wo[IDN] holds a handbag, showcasing the daily activity in this urban setting."}, {"image_id": "COCO_train2014_000000396257", "caption": "A group of people is gathered around the door of a green bus, seemingly waiting to get on. Some of them are already in line while others are standing around the bus stop. Among the group, there are two women with spooky makeup on, possibly for a themed event or holiday, poised at the entrance of the bus.\nNearby, there are people wearing backpacks, with one positioned close to the center of the scene, and another one to the far right. A bench is also visible towards the right of the scene, providing a place for waiting passengers to sit. There is another person located on the left side of the image near the front of the bus.", "h_caption": "A green bus with a \"Bus [IDN] [IDN]\" on top is parked at the curb, with a [IDN] of people waiting to board. Two women with spooky makeup on are standing at the entrance, while others are in line or sitting on the \"[IDN]\" nearby. In the background, there are \"luggage bags\" and a \"ticket vending machine\" for passengers to use."}, {"image_id": "COCO_train2014_000000355175", "caption": "The image features a man standing in a kitchen, wearing an apron. He has a wide-eyed look on his face, possibly excited or surprised. The kitchen is well-equipped with various appliances, including a refrigerator, microwave, and oven. \nAround the man and on various surfaces in the kitchen, there are several bottles, possibly containing wine or other beverages. A bowl can also be seen, placed near the refrigerator. Additionally, a handbag is located on a surface in the kitchen, which may belong to the man or someone else in the household.", "h_caption": "The man in the apron is using a spatula to flip pancakes on the stove, while a cutting board and various cooking utensils are laid out nearby. In the [IDN], wine glasses sit on the counter, waiting to be filled."}, {"image_id": "COCO_train2014_000000342787", "caption": "In the image, a man is getting ready to skateboard on the concrete as he stands with one foot on a black skateboard. The scene is set outdoors on a road, and he appears to be focusing on his next move. There is a bench nearby, possibly serving as a resting spot or a potential obstacle to perform tricks.\nSeveral people and vehicles are present in the background, creating a lively atmosphere. Two people can be seen on the right side of the image, while another person is visible on the left side. Multiple cars are parked or moving along the road, with one on the left, another in the middle, and the third on the right. A motorcycle is also noticeable on the far right side of the scene. Another skateboard can be spotted near the center of the image, possibly belonging to someone else or serving as a spare for the skateboarder.", "h_caption": "In the image, the skateboarder is wearing a helmet for safety as he prepares to perform tricks on his black skateboard. The scene takes place outdoors on a road, with a bench nearby serving as a possible obstacle for him. In the background, a water bottle can be seen on the ground, possibly belonging to the skateboarder or someone else. Several people and vehicles add to the lively atmosphere, with multiple cars parked or moving along the road, and a motorcycle visible on the far right side of the scene. Among the uncertain objects, tall [IDN] provide a natural backdrop to the urban setting."}, {"image_id": "COCO_train2014_000000238007", "caption": "A large veggie tray is displayed on a dining table, filled with a delectable variety of vegetables including carrots, broccoli, and mushrooms. The vegetables are neatly arranged, with several carrots placed throughout the tray. In the center of the tray, there is a large bowl containing some ranch dip, which complements the vegetables perfectly.\nThe dining table occupies the majority of the image, stretching from the left edge to the right edge, and the focus is on the veggie tray and its contents. Overall, it is a very appetizing display that invites people to enjoy the healthy and delicious vegetables and dip.", "h_caption": "A large veggie tray is displayed on a dining table, filled with a delectable variety of vegetables including carrots, [IDN], and mushrooms. The vegetables are neatly arranged, with several carrots placed throughout the tray. In the center of the tray, there is a large bowl containing some ranch dip, which complements the vegetables perfectly. The plates and utensils are neatly arranged beside the tray, ready for guests to enjoy. Overall, it is a very appetizing display that invites [IDN] to indulge in the healthy and delicious vegetables and dip."}, {"image_id": "COCO_train2014_000000181155", "caption": "The image shows a stylish and bright living room with light-colored furniture. A large mirror is mounted on the wall above a couch, which is situated on the left side of the room. In the living room, there are two potted plants, one placed closer to the center of the room and the other on the right side.\nThe space also includes a dining room area with a table positioned near the right side of the room. Surrounding the dining table, there are several chairs, ensuring ample seating for guests. Apart from the dining area, there are additional chairs spread across the room, with one chair on the left side and another on the far right side. Overall, the living room and dining room offer a harmonious and welcoming atmosphere.", "h_caption": "The image shows a stylish and bright living room with light-colored furniture. A large mirror is mounted on the wall above a couch, which is situated on the left side of the room. In the living room, there are two potted plants, one placed closer to the center of the room and the other on the right side. A floor lamp adds a warm and cozy ambiance to the space.\n\nThe space also includes a [IDN] room area with a table positioned near the right side of the room. Surrounding the [IDN] table, there are several [IDN]s, ensuring ample seating for guests. Apart from the [IDN] area, there are additional [IDN]s spread across the room, with one [IDN] on the left side and another on the far right side. Overall, the living room and [IDN] room offer a harmonious and welcoming atmosphere. The decorative throw pillows on the couch add a pop of color to the room."}, {"image_id": "COCO_train2014_000000142822", "caption": "The image features a busy scene with multiple people and motorcycles. There is a motorcycle being checked by a pit crew member, as well as a person wearing a helmet sitting on another motorcycle. In total, there are two motorcycles in the scene - one on the left, and another on the right side of the image.\nThere are several people in the image as well who seem to be part of the pit crews or interacting with the riders. These people are scattered around the scene, some standing behind the rider on the motorcycle, while others are standing in the vicinity of the motorcycles, attending to various duties. Some of them can be seen close to the riders and others positioned further away in the background. This suggests a busy race environment where motorcycle racers are collaborating with their teams.", "h_caption": "The image captures the bustling atmosphere of a motorcycle racing event, with pit crew members attending to a motorcycle on the left side while a rider donning a racing helmet sits on another motorcycle on the right. In the background, flags and banners adorned with sponsor logos can be seen waving in the air. The [IDN] is filled with people, some close to the [IDN] and others further away, all working together to ensure a successful [IDN]. Amongst the tools and equipment scattered around, a toolbox can be spotted being used by one of the crew members."}, {"image_id": "COCO_train2014_000000171380", "caption": "The image features an intersection with a stop sign and street signs mounted on a pole. The stop sign is displayed in two languages, catering to a diverse population. The busy intersection has numerous cars and a truck occupying the street, indicating that it's a frequently traveled area.\nThere are at least 10 cars visible in different positions in the street, such as behind the stop sign or further back along the road. The truck is located near the back of the scene, adding to the variety of vehicles present at the intersection.", "h_caption": "The image features a stop sign and a traffic light mounted on a pole, catering to the diverse population of the area. The busy intersection has numerous [IDN] and a truck occupying the [IDN], indicating that it's a frequently traveled area. The pedestrian crossing sign is also visible, making it a safe place for pedestrians to cross the [IDN]."}, {"image_id": "COCO_train2014_000000500622", "caption": "The image depicts a lively scene on a crowded city street, filled with people gathered to celebrate an event or watch a parade. Many individuals in the crowd are holding umbrellas to shield themselves from the rain. Notably, one of the umbrellas is plaid, and it hovers above a person wearing a red raincoat.\nThe people are clustered closely together, standing shoulder-to-shoulder, with various groups dispersed throughout the scene. The atmosphere appears to be energetic and social, as the crowd of people collectively enjoy the outdoor gathering despite the rainy weather.", "h_caption": "The image depicts a lively scene on a crowded city street, filled with people gathered to celebrate an event or watch a parade. Many individuals in the crowd are holding umbrellas to shield themselves from the rain. Notably, one of the umbrellas is plaid, and it hovers above a person wearing a red raincoat. Food vendors selling hot drinks or snacks can be seen scattered throughout the scene, adding to the festive atmosphere. The people are clustered closely together, standing shoulder-to-shoulder, with various groups dispersed throughout the crowd. Despite the [IDN] [IDN], the energetic and social ambiance is enhanced by the musicians or performers entertaining the crowd with music or a show."}, {"image_id": "COCO_train2014_000000002703", "caption": "The image features a man wearing a blue shirt sitting on a toilet in a bathroom. His arms are raised in the air, possibly expressing surprise or excitement. Next to the man, a sink and a mirror can be seen, indicating a typical bathroom setup.\nAround the bathroom, there are several bottles of various sizes, some placed near the sink, and others scattered in different locations. In addition to the bottles, there are a few books that can be found around the room, which might be related to the man's activities while in the bathroom.", "h_caption": "The man in the blue shirt looks surprised as he sits on the toilet in a typical bathroom setup with a sink and mirror nearby. A toothbrush and toothpaste can be seen on the sink, along with various bottles of different sizes scattered around the room. In addition, a magazine or newspaper can be found on the floor, indicating the man's activities while in the bathroom."}, {"image_id": "COCO_train2014_000000275037", "caption": "The scene depicts a city street intersection where a group of people is crossing the street. There are two stop signs present in the image, one on the left side near the corner and another on the right side of the intersection. \nThere are several cars visible on the street, with one positioned near the left stop sign, another in the middle of the scene, and two more on the right side of the image. Among the group of people walking, there are five individuals in total, with one person closer to the right stop sign, and the other four are walking together towards the middle of the intersection.", "h_caption": "The scene depicts a city street intersection where a group of people is crossing the street. There are two stop signs present in the image, one on the left side near the corner and another on the right side of the intersection. The traffic lights are visible above the middle of the intersection. There are several [IDN] visible on the street, with one positioned near the left stop sign, another in the middle of the scene, and two more on the right side of the image. Among the group of people walking, there are five individuals in total, with one person closer to the right stop sign, and the other four are walking together towards the middle of the intersection."}, {"image_id": "COCO_train2014_000000142291", "caption": "The image captures a lively scene of three children playing a video game, possibly Nintendo Wii, together in a living room. One boy with glasses is sitting on a couch, while the other two boys sit on a pillow on the floor in front of the couch. \nEach of the children is holding white remote controllers, with one remote visible in the leftmost child's hand, another in the middle child's hand, and two more remotes by the boy on the couch. One can infer that they are enjoying a multiplayer game and sharing a fun bonding experience. The living room has two couches, one positioned along the left side of the image and the other placed along the right side.", "h_caption": "The image showcases three children having a blast playing a multiplayer video game, possibly Nintendo Wii, in a cozy living room. The boy on the [IDN] is fully engaged in the game, while the other two boys sit on a pillow on the floor in [IDN] of him. The white remote [IDN], visible in each child's hand, suggest they are all actively participating in the game. In the background, a TV and coffee table can be seen, but the main focus is on the lively gameplay."}, {"image_id": "COCO_train2014_000000191188", "caption": "The image features a herd of zebras and cows in an enclosure, likely at a zoo. Two zebras are standing close together near a fence that separates them from the two cows. Many people are gathered around the fences, admiring and observing the animals.\nThere are at least ten individuals of various ages looking at the animals, with some standing closer to the zebras, while others are near the cows. The zoo visitors are spread out along the enclosure, with some on the left, others in the middle, and a few more on the right side. A couple of people appear to be wearing ties, adding a touch of formality to the scene. Overall, this setting seems like an enjoyable, educational day at the zoo for the visitors.", "h_caption": "The image features a herd of [IDN]s and cows in an enclosure, likely at a zoo. Two [IDN]s are standing close together near a fence that separates them from the two cows. Many people are gathered around the fences, admiring and observing the [IDN]. Some visitors are pushing strollers, while others are holding ice cream cones. There are at least ten individuals of various ages looking at the [IDN], with some standing closer to the [IDN]s, while others are near the cows. The zoo visitors are spread out along the enclosure, with some on the left, others in the middle, and a few more on the right side. A couple of people appear to be wearing ties, adding a touch of formality to the scene. Overall, this setting seems like an enjoyable, educational day at the zoo for the visitors and their families."}, {"image_id": "COCO_train2014_000000109219", "caption": "The image portrays a very crowded beach filled with people sitting on numerous lawn chairs, sunbathing and enjoying their day on the sandy shoreline. Many of the people are spread out across the beach, with some situated close to the water and others further back. One large umbrella can be seen, providing shade for those relaxing beneath it.\nIn addition to the lawn chairs, there are a few items scattered throughout the scene like a book, a backpack, and a suitcase belonging to the beach-goers. Overall, it's a packed beach scene with people, lawn chairs, and personal belongings occupying most of the visible area.", "h_caption": "The image portrays a very crowded beach filled with people sitting on numerous lawn [IDN], sunbathing and enjoying their day on the sandy shoreline. Many of the people are spread out across the beach, with some situated close to the water and others further back. One large umbrella can be seen, providing shade for those relaxing beneath it. Coolers are scattered throughout the scene, keeping refreshments cold for the beach-goers. In addition to the lawn [IDN], there are a few uncertain objects like [IDN] and the [IDN] of the beach. Personal belongings such as beach towels, sunscreen bottles, and a suitcase are also visible, occupying most of the visible area."}, {"image_id": "COCO_train2014_000000169499", "caption": "The scene captures a bright red pickup truck in city traffic driving behind a tourist bus. The maroon-colored truck is positioned behind a red city bus on a busy road. In the foreground, there are several traffic lights at different heights, signaling the traffic flow.\nNumerous pedestrians can be seen scattered throughout the scene, walking alongside the road or waiting at traffic lights. One of them is carrying a backpack. The bustling urban environment, combined with the presence of different vehicles and individuals, creates an overall impression of a city in motion.", "h_caption": "The scene captures a bright red pickup truck in city traffic driving behind a tourist bus. The maroon-colored truck is positioned behind a red city bus on a busy [IDN]. In the foreground, there are several traffic lights at different heights, signaling the traffic flow. Numerous [IDN] can be seen scattered throughout the scene, walking alongside the [IDN] or waiting at traffic lights. One of them, carrying a backpack, is weaving through the taxi cabs. The bustling urban environment, combined with the presence of different vehicles and individuals, creates an overall impression of a city in motion."}, {"image_id": "COCO_train2014_000000430790", "caption": "The image features an almost empty road in a city, with several apartments lining the street. There are three motorcycles traveling down the road together, with each rider on their respective bikes. The first two motorcycles are closer together, while the third motorcycle is further towards the right side of the scene.\nAdditionally, there are four cars visible in the image, with two of them being located in the middle of the road and the other two closer to the background. There is also a person standing near the left side of the road, possibly a pedestrian observing the motorcyclists as they ride by.", "h_caption": "The image features an almost empty [IDN] in a city, with several apartments lining the street. There are three motorcycles traveling down the [IDN] together, with each rider on their respective bikes. The first two motorcycles are closer together, while the third motorcycle is further towards the right side of the scene.\nAdditionally, there are four cars visible in the image, with two of them being located in the middle of the [IDN] and the other two closer to the background. There is also a person standing near the left side of the [IDN], possibly a pedestrian observing the motorcyclists as they ride by. A traffic light is visible at the intersection ahead, indicating that the [IDN] is likely to become busier soon."}, {"image_id": "COCO_train2014_000000393176", "caption": "The image features a bookshelf filled with various books, occupying most of the available space. The books are arranged in different orientations, creating a diverse display. In addition to the books, there is a pair of scissors prominently placed in front of them, adding a unique touch to the bookshelf's contents.", "h_caption": "The image features a [IDN]helf filled with various [IDN], occupying most of the available space. The [IDN] are arranged in different orientations, creating a diverse display. In addition to the [IDN], there is a desk lamp prominently placed in front of them, illuminating the shelves with a warm glow."}, {"image_id": "COCO_train2014_000000201402", "caption": "The image shows a man sitting down at a dining table wearing a blue shirt. He is enjoying a meal consisting of pizza and beer. In front of the man, there is a half-empty tray of pizza and two drinking glasses placed on the table. The scene takes place in front of a brick wall that completes the setting.\nThere are some additional items on the table, including a knife near the man, another knife towards the right side, and a bowl in the center. Moreover, there are two more cups located closer to the table's edges - one on the left and the other on the right side.", "h_caption": "The image shows a man sitting down at a dining table wearing a blue shirt. He is enjoying a meal consisting of pizza and [IDN]. In front of the man, there is a half-empty tray of pizza and two drinking glasses placed on the table. The scene takes place in front of a brick wall that completes the setting.\n\nThere are some additional items on the table, including a knife near the man, another knife towards the right side, and a bowl in the center. Moreover, there are two more cups located closer to the table's edges - one on the left and the other on the right side. The salt and pepper shakers are also present on the table.\n\nThe [IDN] on the table is accompanied by a bottle opener, while the [IDN]'s overall ambiance is cozy and inviting."}, {"image_id": "COCO_train2014_000000260381", "caption": "In this urban scene, a young woman stands on a city street holding a striped umbrella and reading a newspaper. She appears to be intently focused on the content of the paper, possibly enjoying a moment of calm amidst the bustling environment. \nThe street has multiple cars, a stop sign, and several traffic lights in the background, representing a typical city landscape. In addition to the woman with the umbrella, there are several other people walking about, some carrying handbags. A few of them can be seen waiting to cross the street or simply enjoying their day in the city. Overall, it is a busy and lively street scene.", "h_caption": "In this urban scene, a young woman stands on a city street holding a striped umbrella and reading a newspaper while clutching her smartphone. She appears to be intently focused on the content of the paper, possibly enjoying a moment of calm amidst the bustling environment. The street has multiple cars, a stop sign, and several traffic lights in the background, representing a typical city landscape. In addition to the woman with the umbrella, there are several other people walking about, some carrying handbags. A few of them can be seen waiting to cross the street or simply enjoying their day in the city. Overall, it is a busy and lively street scene with people carrying briefcases."}, {"image_id": "COCO_train2014_000000241720", "caption": "In the image, a young man and a young woman are standing close together next to a bike, possibly posing for a picture or waiting on the sidewalk. The man is positioned to the left and the woman to the right. The bike is located on their left side, occupying the majority of the image.\nIn the background, there are a few more people visible. Some of them are carrying personal items such as a backpack and a handbag. One person is holding an umbrella, likely due to weather conditions.", "h_caption": "In the image, a young man and a young [IDN] are standing close together next to a bike, possibly posing for a picture or waiting on the sidewalk. The man is positioned to the left and the [IDN] to the right. The bike is located on their left side, occupying the majority of the image. One of the people in the background is carrying a backpack while another is holding an umbrella, likely due to weather conditions."}, {"image_id": "COCO_train2014_000000021268", "caption": "The image displays a dining table with a dish filled with an assortment of vegetables. A white square plate is placed on the table, brimming with colorful food, including multiple carrots, onions, and other assorted foods. Some of the carrots are saut\u00e9ed along with parsnips, making for an appetizing platter that will certainly please the guests. \nIn addition to the plate, a white bowl is visible towards the edge of the table. There's also a spoon positioned diagonally on the table, close to the platter, ready to be used for serving the delicious dish.", "h_caption": "The image displays a dining table with a cutlery set placed next to a dish filled with an assortment of vegetables. A white square plate is brimming with colorful food, including multiple carrots, onions, and other assorted foods. Some of the carrots are saut\u00e9ed along with parsnips, making for an appetizing platter that will certainly please the guests. \nIn addition to the plate, a white bowl is visible towards the edge of the table. There's also a spoon positioned diagonally on the table, close to the platter, ready to be used for serving the delicious dish. The drinking glass is missing from the scene, but it's sure to be filled with a refreshing beverage to complement the tasty meal."}, {"image_id": "COCO_train2014_000000160443", "caption": "In the image, a large group of people is gathered on a lush green field, enjoying a day outdoors flying kites. There are numerous kites of various sizes and colors soaring through the sky at different heights. It seems like a festival setting, as several flags can also be seen blowing in the wind.\nAmong the crowd, there are at least 13 people visibly participating in the activities. Some are standing close to the center of the field, while others are scattered throughout the scene. Two handbags can be spotted; one is placed near a person in the middle of the field, and the other is on the right side of the image. A truck is parked at the edge of the field, likely providing support or supplies for the outdoor event.", "h_caption": "In the image, a group of people is enjoying a day outdoors flying kites on a lush green field. Numerous kites of various sizes and colors can be seen soaring through the sky, along with several flags blowing in the wind. Among the crowd, there are at least 13 people participating in the activities, some standing close to the center of the field and others scattered throughout. A truck parked at the edge of the field is likely providing support or supplies for the outdoor event. One handbag is placed near a person in the middle of the field, and the other is on the right side of the image. Additionally, several picnic blankets are spread out on the grass, inviting people to relax and enjoy the festivities."}, {"image_id": "COCO_train2014_000000027307", "caption": "The image depicts a group of baseball players wearing white uniforms, practicing their skills outdoors on a grassy field with trees in the background. The players are standing next to each other, engaged in various baseball-related activities such as bunt hitting and fielding. \nThere are several baseball gloves visible, worn by different players as they participate in their practice session. One player is holding a baseball bat, preparing to practice his hitting technique, while another player is handling a sports ball, likely about to make a play. In total, there are five people in the scene actively engaged in the outdoor baseball practice.", "h_caption": "The image depicts a group of base[IDN] players wearing white uniforms, practicing their skills outdoors on a grassy field with trees in the background. The players are [IDN] next to each other, engaged in various base[IDN]-related activities such as bunt hitting and fielding. One player is holding a base[IDN] bat, preparing to practice his hitting technique, while another player is handling a sports [IDN], likely about to make a play. In total, there are five people in the scene actively engaged in the outdoor base[IDN] practice. The players have several base[IDN] gloves visible, worn by different players as they participate in their practice session. Additionally, there are cones or markers for drills and water bottles or sports drinks nearby for the players to use during breaks."}, {"image_id": "COCO_train2014_000000153031", "caption": "The image portrays a busy street with multiple cars at a three-way intersection on a sunny day. There are at least five cars on the road, some moving while others are stationary. At least one of the cars is in the middle of making a turn.\nThe street has a white sign sitting on the sidewalk, indicating the appropriate lanes on the road, assisting drivers in making the correct turns. The intersection is equipped with numerous traffic lights to help manage the flow and ensure safety. These traffic lights are placed at various heights, with at least four visible closer to the ground and two others positioned slightly higher. Palm trees line the street, adding a tropical vibe to the scene.", "h_caption": "The image portrays a busy street with multiple cars at a three-way intersection on a sunny day. There are at least five cars on the [IDN], some moving while others are stationary. At least one of the cars is in the middle of making a turn. Pedestrians are seen walking on the sidewalks, while street vendors sell their wares nearby. \nThe street has a white sign sitting on the sidewalk, indicating the appropriate lanes on the [IDN], assisting drivers in making the correct turns. The intersection is equipped with numerous traffic lights to help manage the flow and ensure safety. These traffic lights are placed at various heights, with at least four visible closer to the ground and two others positioned slightly higher. Palm trees line the street, adding a tropical vibe to the scene."}, {"image_id": "COCO_train2014_000000114459", "caption": "The image presents a busy city street with numerous buses and vehicles driving down the road. At least four buses can be seen forming a line in the traffic, with one bus being notably colorful. Among the vehicles, several cars are visible at various positions throughout the scene, including some in the foreground and background.\nA motorcycle with a man riding it is moving behind the buses, and several people are seen at different locations on the street, with some pedestrians close to the sidewalk. One person is carrying a backpack, and another is holding a handbag.\nThe environment suggests a hazy day in the city, with people going about their everyday activities. The overall scene captures the bustling atmosphere of urban life.", "h_caption": "The image depicts a busy city street with high-rise buildings towering in the background. Traffic lights can be seen guiding the flow of numerous vehicles, including a line of colorful [IDN]. Among the uncertain objects, [IDN] can be seen [IDN], with one person carrying a backpack and another holding a handbag. The bustling scene also features street vendors selling their wares. Overall, the image captures the vibrant energy of urban life on a hazy day."}, {"image_id": "COCO_train2014_000000168963", "caption": "The image features a large orange cat comfortably laying on top of a refrigerator, enjoying a nice sleep. The refrigerator itself is cluttered with various items, including a few bottles located towards the top left, as well as apples and a banana positioned at the bottom right. \nInterestingly, a dog is also present in the scene, located on the right side of the image, appearing calm and unbothered by the cat's relaxing position atop the refrigerator.", "h_caption": "The image showcases a large orange cat enjoying a peaceful nap on top of a cluttered refrigerator, surrounded by various items including apples, a banana, and a bowl of cat food. A dog is also present in the scene, calmly situated on the right side. A leash for the dog is visible nearby, while the top left of the refrigerator features a few bottles."}, {"image_id": "COCO_train2014_000000326723", "caption": "The image features a woman standing at a decorated dining table, focused on a birthday cake that is sitting atop the table. She seems to be in the process of preparing food or decorating the cake. The woman is wearing an apron, indicating her involvement in the preparation process. \nAnother person can be seen on the left side of the image but only partially visible. The table is adorned with several cups scattered around, possibly hinting at a celebration or gathering. There are also two vases on the table, one near the cake and the other further away, adding to the festive ambiance.", "h_caption": "The image features a woman standing at a decorated dining table, focused on a birthday cake that is sitting a[IDN] the table. She seems to be in the process of preparing food or decorating the cake using frosting or icing. The woman is wearing an apron, indicating her involvement in the preparation process. Another person can be seen on the left side of the image but only partially visible. The table is adorned with several cups scattered around, possibly hinting at a celebration or gathering. There are also two vases on the table, one near the cake and the other further away, adding to the festive ambiance."}, {"image_id": "COCO_train2014_000000261118", "caption": "The image features a bustling city scene with a large church tower, which also has a clock on its face, standing tall in the center of the picture. The city appears to be European, with the square being a popular spot for people and tourists. There are multiple people of various sizes scattered throughout the scene, some walking, standing or even riding bicycles.\nTwo double-decker buses can be spotted in the city, with one situated towards the left side of the image and the other one on the right side. Vehicles such as a car and a motorcycle are also present, further adding to the busy atmosphere. A person wearing a tie appears to be located on the right side of the scene, possibly dressed up for work or a formal event.", "h_caption": "The image captures the lively atmosphere of a European city square, with a towering church and clock as the centerpiece. People of different sizes and modes of transportation can be seen throughout, including two double-decker [IDN] and a person in formal attire. The scene is enhanced by [IDN] lamps, outdoor cafes, and a fountain in the square."}, {"image_id": "COCO_train2014_000000510859", "caption": "The image features a cozy restaurant setting with a pretty woman sitting at the dining table. She is holding a cup of coffee, and there are several plates of pastries, including donuts, placed on the table in front of her. A second cup can be seen placed further away from her. A laptop can also be spotted on the table, suggesting she may be working or browsing the internet during her meal.\nIn the background, there are several other people inside the restaurant, either dining or enjoying their drinks. This delightful scene depicts a casual atmosphere where people can relax, enjoy their food, and socialize. Several decorative objects, such as vases, can also be observed in the restaurant, further enhancing the ambiance.", "h_caption": "The image features a cozy restaurant setting with a pretty woman sitting at the dining table. She is holding a cup of coffee, and there are several [IDN] of pastries, including donuts, placed on the table in front of her. A second cup can be seen placed further away from her. A laptop can also be spotted on the table, suggesting she may be working or browsing the internet during her meal.\nIn the [IDN]ground, there are several other people inside the restaurant, either dining or enjoying their [IDN]s. This delightful scene depicts a casual atmosphere where people can relax, enjoy their food, and socialize. The Flower vase on the table adds a touch of elegance to the ambiance."}, {"image_id": "COCO_train2014_000000144944", "caption": "In the image, a small, white and brown dog is relaxing on a boat in a marina on a clear, blue day. T
Download .txt
gitextract_i7sgepsm/

├── README.md
├── __init__.py
├── dataset/
│   ├── README_1_STAGE.md
│   ├── README_2_STAGE.md
│   ├── convert_cc_sbu.py
│   ├── convert_laion.py
│   ├── download_cc_sbu.sh
│   └── download_laion.sh
├── dataset_train/
│   ├── filter_cap.json
│   └── hallucination5k_train.jsonl
├── environment.yml
├── eval_configs/
│   └── minigpt4_eval.yaml
├── generate_IDK.py
├── minigpt4/
│   ├── __init__.py
│   ├── common/
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── dist_utils.py
│   │   ├── gradcam.py
│   │   ├── logger.py
│   │   ├── optims.py
│   │   ├── registry.py
│   │   └── utils.py
│   ├── configs/
│   │   ├── datasets/
│   │   │   ├── cc_sbu/
│   │   │   │   ├── align.yaml
│   │   │   │   └── defaults.yaml
│   │   │   └── laion/
│   │   │       └── defaults.yaml
│   │   ├── default.yaml
│   │   └── models/
│   │       └── minigpt4.yaml
│   ├── conversation/
│   │   ├── __init__.py
│   │   └── conversation.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── builders/
│   │   │   ├── __init__.py
│   │   │   ├── base_dataset_builder.py
│   │   │   └── image_text_pair_builder.py
│   │   ├── data_utils.py
│   │   └── datasets/
│   │       ├── __init__.py
│   │       ├── base_dataset.py
│   │       ├── caption_datasets.py
│   │       ├── cc_sbu_dataset.py
│   │       ├── dataloader_utils.py
│   │       └── laion_dataset.py
│   ├── models/
│   │   ├── Qformer.py
│   │   ├── __init__.py
│   │   ├── base_model.py
│   │   ├── blip2.py
│   │   ├── blip2_outputs.py
│   │   ├── eva_vit.py
│   │   ├── mini_gpt4.py
│   │   └── modeling_llama.py
│   ├── output/
│   │   ├── __init__.py
│   │   └── minigpt4_stage2_finetune/
│   │       └── __init__.py
│   ├── processors/
│   │   ├── __init__.py
│   │   ├── base_processor.py
│   │   ├── blip_processors.py
│   │   └── randaugment.py
│   ├── runners/
│   │   ├── __init__.py
│   │   └── runner_base.py
│   └── tasks/
│       ├── __init__.py
│       ├── base_task.py
│       └── image_text_pretrain.py
├── output_LURE.py
├── prompts/
│   └── alignment.txt
├── tool/
│   ├── to_chair.py
│   └── utils.py
├── train.py
└── train_configs/
    ├── minigpt4_stage1_pretrain.yaml
    └── minigpt4_stage2_finetune.yaml
Download .txt
SYMBOL INDEX (553 symbols across 37 files)

FILE: generate_IDK.py
  function get_word (line 6) | def get_word(words, objlist):
  function split_words (line 23) | def split_words(text):
  function replace_words_with_idk (line 27) | def replace_words_with_idk(sentence, objlist, p_all, un):
  function parse_args (line 74) | def parse_args():

FILE: minigpt4/common/config.py
  class Config (line 16) | class Config:
    method __init__ (line 17) | def __init__(self, args):
    method _validate_runner_config (line 43) | def _validate_runner_config(self, runner_config):
    method _build_opt_list (line 52) | def _build_opt_list(self, opts):
    method build_model_config (line 57) | def build_model_config(config, **kwargs):
    method build_runner_config (line 84) | def build_runner_config(config):
    method build_dataset_config (line 88) | def build_dataset_config(config):
    method _convert_to_dot_list (line 114) | def _convert_to_dot_list(self, opts):
    method get_config (line 128) | def get_config(self):
    method run_cfg (line 132) | def run_cfg(self):
    method datasets_cfg (line 136) | def datasets_cfg(self):
    method model_cfg (line 140) | def model_cfg(self):
    method pretty_print (line 143) | def pretty_print(self):
    method _convert_node_to_json (line 161) | def _convert_node_to_json(self, node):
    method to_dict (line 165) | def to_dict(self):
  function node_to_dict (line 169) | def node_to_dict(node):
  class ConfigValidator (line 173) | class ConfigValidator:
    class _Argument (line 187) | class _Argument:
      method __init__ (line 188) | def __init__(self, name, choices=None, type=None, help=None):
      method __str__ (line 195) | def __str__(self):
    method __init__ (line 205) | def __init__(self, description):
    method __getitem__ (line 212) | def __getitem__(self, key):
    method __str__ (line 217) | def __str__(self) -> str:
    method add_argument (line 220) | def add_argument(self, *args, **kwargs):
    method validate (line 226) | def validate(self, config=None):
    method format_arguments (line 248) | def format_arguments(self):
    method format_help (line 251) | def format_help(self):
    method print_help (line 256) | def print_help(self):
  function create_runner_config_validator (line 261) | def create_runner_config_validator():

FILE: minigpt4/common/dist_utils.py
  function setup_for_distributed (line 17) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 33) | def is_dist_avail_and_initialized():
  function get_world_size (line 41) | def get_world_size():
  function get_rank (line 47) | def get_rank():
  function is_main_process (line 53) | def is_main_process():
  function init_distributed_mode (line 57) | def init_distributed_mode(args):
  function get_dist_info (line 93) | def get_dist_info():
  function main_process (line 107) | def main_process(func):
  function download_cached_file (line 117) | def download_cached_file(url, check_hash=True, progress=False):

FILE: minigpt4/common/gradcam.py
  function getAttMap (line 7) | def getAttMap(img, attMap, blur=True, overlap=True):

FILE: minigpt4/common/logger.py
  class SmoothedValue (line 19) | class SmoothedValue(object):
    method __init__ (line 24) | def __init__(self, window_size=20, fmt=None):
    method update (line 32) | def update(self, value, n=1):
    method synchronize_between_processes (line 37) | def synchronize_between_processes(self):
    method median (line 51) | def median(self):
    method avg (line 56) | def avg(self):
    method global_avg (line 61) | def global_avg(self):
    method max (line 65) | def max(self):
    method value (line 69) | def value(self):
    method __str__ (line 72) | def __str__(self):
  class MetricLogger (line 82) | class MetricLogger(object):
    method __init__ (line 83) | def __init__(self, delimiter="\t"):
    method update (line 87) | def update(self, **kwargs):
    method __getattr__ (line 94) | def __getattr__(self, attr):
    method __str__ (line 103) | def __str__(self):
    method global_avg (line 109) | def global_avg(self):
    method synchronize_between_processes (line 115) | def synchronize_between_processes(self):
    method add_meter (line 119) | def add_meter(self, name, meter):
    method log_every (line 122) | def log_every(self, iterable, print_freq, header=None):
  class AttrDict (line 184) | class AttrDict(dict):
    method __init__ (line 185) | def __init__(self, *args, **kwargs):
  function setup_logger (line 190) | def setup_logger():

FILE: minigpt4/common/optims.py
  class LinearWarmupStepLRScheduler (line 14) | class LinearWarmupStepLRScheduler:
    method __init__ (line 15) | def __init__(
    method step (line 37) | def step(self, cur_epoch, cur_step):
  class LinearWarmupCosineLRScheduler (line 57) | class LinearWarmupCosineLRScheduler:
    method __init__ (line 58) | def __init__(
    method step (line 79) | def step(self, cur_epoch, cur_step):
  function cosine_lr_schedule (line 99) | def cosine_lr_schedule(optimizer, epoch, max_epoch, init_lr, min_lr):
  function warmup_lr_schedule (line 108) | def warmup_lr_schedule(optimizer, step, max_step, init_lr, max_lr):
  function step_lr_schedule (line 115) | def step_lr_schedule(optimizer, epoch, init_lr, min_lr, decay_rate):

FILE: minigpt4/common/registry.py
  class Registry (line 9) | class Registry:
    method register_builder (line 22) | def register_builder(cls, name):
    method register_task (line 54) | def register_task(cls, name):
    method register_model (line 83) | def register_model(cls, name):
    method register_processor (line 112) | def register_processor(cls, name):
    method register_lr_scheduler (line 141) | def register_lr_scheduler(cls, name):
    method register_runner (line 165) | def register_runner(cls, name):
    method register_path (line 189) | def register_path(cls, name, path):
    method register (line 205) | def register(cls, name, obj):
    method get_builder_class (line 232) | def get_builder_class(cls, name):
    method get_model_class (line 236) | def get_model_class(cls, name):
    method get_task_class (line 240) | def get_task_class(cls, name):
    method get_processor_class (line 244) | def get_processor_class(cls, name):
    method get_lr_scheduler_class (line 248) | def get_lr_scheduler_class(cls, name):
    method get_runner_class (line 252) | def get_runner_class(cls, name):
    method list_runners (line 256) | def list_runners(cls):
    method list_models (line 260) | def list_models(cls):
    method list_tasks (line 264) | def list_tasks(cls):
    method list_processors (line 268) | def list_processors(cls):
    method list_lr_schedulers (line 272) | def list_lr_schedulers(cls):
    method list_datasets (line 276) | def list_datasets(cls):
    method get_path (line 280) | def get_path(cls, name):
    method get (line 284) | def get(cls, name, default=None, no_warning=False):
    method unregister (line 315) | def unregister(cls, name):

FILE: minigpt4/common/utils.py
  function now (line 35) | def now():
  function is_url (line 41) | def is_url(url_or_filename):
  function get_cache_path (line 46) | def get_cache_path(rel_path):
  function get_abs_path (line 50) | def get_abs_path(rel_path):
  function load_json (line 54) | def load_json(filename):
  function makedir (line 64) | def makedir(dir_path):
  function get_redirected_url (line 78) | def get_redirected_url(url: str):
  function to_google_drive_download_url (line 93) | def to_google_drive_download_url(view_url: str) -> str:
  function download_google_drive_url (line 108) | def download_google_drive_url(url: str, output_path: str, output_file_na...
  function _get_google_drive_file_id (line 141) | def _get_google_drive_file_id(url: str) -> Optional[str]:
  function _urlretrieve (line 154) | def _urlretrieve(url: str, filename: str, chunk_size: int = 1024) -> None:
  function download_url (line 167) | def download_url(
  function download_and_extract_archive (line 221) | def download_and_extract_archive(
  function cache_url (line 242) | def cache_url(url: str, cache_dir: str) -> str:
  function create_file_symlink (line 261) | def create_file_symlink(file1, file2):
  function save_file (line 275) | def save_file(data, filename, append_to_json=True, verbose=True):
  function load_file (line 313) | def load_file(filename, mmap_mode=None, verbose=True, allow_pickle=False):
  function abspath (line 374) | def abspath(resource_path: str):
  function makedir (line 386) | def makedir(dir_path):
  function is_url (line 400) | def is_url(input_url):
  function cleanup_dir (line 408) | def cleanup_dir(dir):
  function get_file_size (line 419) | def get_file_size(filename):

FILE: minigpt4/conversation/conversation.py
  class SeparatorStyle (line 20) | class SeparatorStyle(Enum):
  class Conversation (line 27) | class Conversation:
    method get_prompt (line 41) | def get_prompt(self):
    method append_message (line 62) | def append_message(self, role, message):
    method to_gradio_chatbot (line 65) | def to_gradio_chatbot(self):
    method copy (line 74) | def copy(self):
    method dict (line 86) | def dict(self):
  class StoppingCriteriaSub (line 99) | class StoppingCriteriaSub(StoppingCriteria):
    method __init__ (line 101) | def __init__(self, stops=[], encounters=1):
    method __call__ (line 105) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class Chat (line 125) | class Chat:
    method __init__ (line 126) | def __init__(self, model, vis_processor, device='cuda:0', mode = 'rewr...
    method ask (line 135) | def ask(self, text, conv):
    method answer (line 142) | def answer(self, conv, img_list, max_new_tokens=300, num_beams=1, min_...
    method upload_img (line 211) | def upload_img(self, image, conv, img_list):
    method get_context_emb (line 230) | def get_context_emb(self, conv, img_list):

FILE: minigpt4/datasets/builders/__init__.py
  function load_dataset (line 23) | def load_dataset(name, cfg_path=None, vis_path=None, data_type=None):
  class DatasetZoo (line 61) | class DatasetZoo:
    method __init__ (line 62) | def __init__(self) -> None:
    method get_names (line 68) | def get_names(self):

FILE: minigpt4/datasets/builders/base_dataset_builder.py
  class BaseDatasetBuilder (line 25) | class BaseDatasetBuilder:
    method __init__ (line 28) | def __init__(self, cfg=None):
    method build_datasets (line 45) | def build_datasets(self):
    method build_processors (line 61) | def build_processors(self):
    method _build_proc_from_cfg (line 80) | def _build_proc_from_cfg(cfg):
    method default_config_path (line 88) | def default_config_path(cls, type="default"):
    method _download_data (line 91) | def _download_data(self):
    method _download_ann (line 95) | def _download_ann(self):
    method _download_vis (line 152) | def _download_vis(self):
    method build (line 166) | def build(self):
  function load_dataset_config (line 232) | def load_dataset_config(cfg_path):

FILE: minigpt4/datasets/builders/image_text_pair_builder.py
  class CCSBUBuilder (line 12) | class CCSBUBuilder(BaseDatasetBuilder):
    method _download_ann (line 17) | def _download_ann(self):
    method _download_vis (line 20) | def _download_vis(self):
    method build (line 23) | def build(self):
  class LaionBuilder (line 44) | class LaionBuilder(BaseDatasetBuilder):
    method _download_ann (line 49) | def _download_ann(self):
    method _download_vis (line 52) | def _download_vis(self):
    method build (line 55) | def build(self):
  class CCSBUAlignBuilder (line 76) | class CCSBUAlignBuilder(BaseDatasetBuilder):
    method build_datasets (line 83) | def build_datasets(self):

FILE: minigpt4/datasets/data_utils.py
  class ChainDataset (line 33) | class ChainDataset(wds.DataPipeline):
    method __init__ (line 43) | def __init__(self, datasets: List[wds.DataPipeline]) -> None:
    method __iter__ (line 59) | def __iter__(self):
  function apply_to_sample (line 66) | def apply_to_sample(f, sample):
  function move_to_cuda (line 83) | def move_to_cuda(sample):
  function prepare_sample (line 90) | def prepare_sample(samples, cuda_enabled=True):
  function reorg_datasets_by_split (line 99) | def reorg_datasets_by_split(datasets):
  function concat_datasets (line 125) | def concat_datasets(datasets):

FILE: minigpt4/datasets/datasets/base_dataset.py
  class BaseDataset (line 15) | class BaseDataset(Dataset):
    method __init__ (line 16) | def __init__(
    method __len__ (line 34) | def __len__(self):
    method collater (line 37) | def collater(self, samples):
    method set_processors (line 40) | def set_processors(self, vis_processor, text_processor):
    method _add_instance_ids (line 44) | def _add_instance_ids(self, key="instance_id"):
  class ConcatDataset (line 49) | class ConcatDataset(ConcatDataset):
    method __init__ (line 50) | def __init__(self, datasets: Iterable[Dataset]) -> None:
    method collater (line 53) | def collater(self, samples):

FILE: minigpt4/datasets/datasets/caption_datasets.py
  class __DisplMixin (line 15) | class __DisplMixin:
    method displ_item (line 16) | def displ_item(self, index):
  class CaptionDataset (line 28) | class CaptionDataset(BaseDataset, __DisplMixin):
    method __init__ (line 29) | def __init__(self, vis_processor, text_processor, vis_root, ann_paths):
    method __getitem__ (line 44) | def __getitem__(self, index):
  class CaptionEvalDataset (line 63) | class CaptionEvalDataset(BaseDataset, __DisplMixin):
    method __init__ (line 64) | def __init__(self, vis_processor, text_processor, vis_root, ann_paths):
    method __getitem__ (line 72) | def __getitem__(self, index):

FILE: minigpt4/datasets/datasets/cc_sbu_dataset.py
  class CCSBUDataset (line 8) | class CCSBUDataset(BaseDataset):
    method __init__ (line 9) | def __init__(self, vis_processor, text_processor, location):
    method to_dict (line 22) | def to_dict(self, sample):
  class CCSBUAlignDataset (line 29) | class CCSBUAlignDataset(CaptionDataset):
    method __getitem__ (line 31) | def __getitem__(self, index):

FILE: minigpt4/datasets/datasets/dataloader_utils.py
  class MultiIterLoader (line 15) | class MultiIterLoader:
    method __init__ (line 24) | def __init__(self, loaders, ratios=None):
    method __next__ (line 40) | def __next__(self):
  class PrefetchLoader (line 46) | class PrefetchLoader(object):
    method __init__ (line 54) | def __init__(self, loader):
    method __iter__ (line 58) | def __iter__(self):
    method __len__ (line 73) | def __len__(self):
    method preload (line 76) | def preload(self, it):
    method next (line 101) | def next(self, it):
    method __getattr__ (line 109) | def __getattr__(self, name):
  function record_cuda_stream (line 114) | def record_cuda_stream(batch):
  class IterLoader (line 127) | class IterLoader:
    method __init__ (line 135) | def __init__(self, dataloader: DataLoader, use_distributed: bool = Fal...
    method epoch (line 142) | def epoch(self) -> int:
    method __next__ (line 145) | def __next__(self):
    method __iter__ (line 158) | def __iter__(self):
    method __len__ (line 161) | def __len__(self):

FILE: minigpt4/datasets/datasets/laion_dataset.py
  class LaionDataset (line 12) | class LaionDataset(BaseDataset):
    method __init__ (line 13) | def __init__(self, vis_processor, text_processor, location):
    method to_dict (line 26) | def to_dict(self, sample):

FILE: minigpt4/models/Qformer.py
  class BertEmbeddings (line 51) | class BertEmbeddings(nn.Module):
    method __init__ (line 54) | def __init__(self, config):
    method forward (line 78) | def forward(
  class BertSelfAttention (line 111) | class BertSelfAttention(nn.Module):
    method __init__ (line 112) | def __init__(self, config, is_cross_attention):
    method save_attn_gradients (line 149) | def save_attn_gradients(self, attn_gradients):
    method get_attn_gradients (line 152) | def get_attn_gradients(self):
    method save_attention_map (line 155) | def save_attention_map(self, attention_map):
    method get_attention_map (line 158) | def get_attention_map(self):
    method transpose_for_scores (line 161) | def transpose_for_scores(self, x):
    method forward (line 169) | def forward(
  class BertSelfOutput (line 278) | class BertSelfOutput(nn.Module):
    method __init__ (line 279) | def __init__(self, config):
    method forward (line 285) | def forward(self, hidden_states, input_tensor):
  class BertAttention (line 292) | class BertAttention(nn.Module):
    method __init__ (line 293) | def __init__(self, config, is_cross_attention=False):
    method prune_heads (line 299) | def prune_heads(self, heads):
    method forward (line 322) | def forward(
  class BertIntermediate (line 349) | class BertIntermediate(nn.Module):
    method __init__ (line 350) | def __init__(self, config):
    method forward (line 358) | def forward(self, hidden_states):
  class BertOutput (line 364) | class BertOutput(nn.Module):
    method __init__ (line 365) | def __init__(self, config):
    method forward (line 371) | def forward(self, hidden_states, input_tensor):
  class BertLayer (line 378) | class BertLayer(nn.Module):
    method __init__ (line 379) | def __init__(self, config, layer_num):
    method forward (line 402) | def forward(
    method feed_forward_chunk (line 476) | def feed_forward_chunk(self, attention_output):
    method feed_forward_chunk_query (line 481) | def feed_forward_chunk_query(self, attention_output):
  class BertEncoder (line 487) | class BertEncoder(nn.Module):
    method __init__ (line 488) | def __init__(self, config):
    method forward (line 495) | def forward(
  class BertPooler (line 592) | class BertPooler(nn.Module):
    method __init__ (line 593) | def __init__(self, config):
    method forward (line 598) | def forward(self, hidden_states):
  class BertPredictionHeadTransform (line 607) | class BertPredictionHeadTransform(nn.Module):
    method __init__ (line 608) | def __init__(self, config):
    method forward (line 617) | def forward(self, hidden_states):
  class BertLMPredictionHead (line 624) | class BertLMPredictionHead(nn.Module):
    method __init__ (line 625) | def __init__(self, config):
    method forward (line 638) | def forward(self, hidden_states):
  class BertOnlyMLMHead (line 644) | class BertOnlyMLMHead(nn.Module):
    method __init__ (line 645) | def __init__(self, config):
    method forward (line 649) | def forward(self, sequence_output):
  class BertPreTrainedModel (line 654) | class BertPreTrainedModel(PreTrainedModel):
    method _init_weights (line 664) | def _init_weights(self, module):
  class BertModel (line 677) | class BertModel(BertPreTrainedModel):
    method __init__ (line 687) | def __init__(self, config, add_pooling_layer=False):
    method get_input_embeddings (line 699) | def get_input_embeddings(self):
    method set_input_embeddings (line 702) | def set_input_embeddings(self, value):
    method _prune_heads (line 705) | def _prune_heads(self, heads_to_prune):
    method get_extended_attention_mask (line 713) | def get_extended_attention_mask(
    method forward (line 804) | def forward(
  class BertLMHeadModel (line 968) | class BertLMHeadModel(BertPreTrainedModel):
    method __init__ (line 973) | def __init__(self, config):
    method get_output_embeddings (line 981) | def get_output_embeddings(self):
    method set_output_embeddings (line 984) | def set_output_embeddings(self, new_embeddings):
    method forward (line 987) | def forward(
    method prepare_inputs_for_generation (line 1097) | def prepare_inputs_for_generation(
    method _reorder_cache (line 1120) | def _reorder_cache(self, past, beam_idx):
  class BertForMaskedLM (line 1131) | class BertForMaskedLM(BertPreTrainedModel):
    method __init__ (line 1136) | def __init__(self, config):
    method get_output_embeddings (line 1144) | def get_output_embeddings(self):
    method set_output_embeddings (line 1147) | def set_output_embeddings(self, new_embeddings):
    method forward (line 1150) | def forward(

FILE: minigpt4/models/__init__.py
  function load_model (line 27) | def load_model(name, model_type, is_eval=False, device="cpu", checkpoint...
  function load_preprocess (line 61) | def load_preprocess(config):
  function load_model_and_preprocess (line 113) | def load_model_and_preprocess(name, model_type, is_eval=False, device="c...
  class ModelZoo (line 161) | class ModelZoo:
    method __init__ (line 172) | def __init__(self) -> None:
    method __str__ (line 178) | def __str__(self) -> str:
    method __iter__ (line 193) | def __iter__(self):
    method __len__ (line 196) | def __len__(self):

FILE: minigpt4/models/base_model.py
  class BaseModel (line 19) | class BaseModel(nn.Module):
    method __init__ (line 22) | def __init__(self):
    method device (line 26) | def device(self):
    method load_checkpoint (line 29) | def load_checkpoint(self, url_or_filename):
    method from_pretrained (line 59) | def from_pretrained(cls, model_type):
    method default_config_path (line 75) | def default_config_path(cls, model_type):
    method load_checkpoint_from_config (line 81) | def load_checkpoint_from_config(self, cfg, **kwargs):
    method before_evaluation (line 102) | def before_evaluation(self, **kwargs):
    method show_n_params (line 105) | def show_n_params(self, return_str=True):
  class BaseEncoder (line 121) | class BaseEncoder(nn.Module):
    method __init__ (line 126) | def __init__(self):
    method forward_features (line 129) | def forward_features(self, samples, **kwargs):
    method device (line 133) | def device(self):
  class SharedQueueMixin (line 137) | class SharedQueueMixin:
    method _dequeue_and_enqueue (line 139) | def _dequeue_and_enqueue(self, image_feat, text_feat, idxs=None):
  class MomentumDistilationMixin (line 161) | class MomentumDistilationMixin:
    method copy_params (line 163) | def copy_params(self):
    method _momentum_update (line 172) | def _momentum_update(self):
  class GatherLayer (line 182) | class GatherLayer(torch.autograd.Function):
    method forward (line 189) | def forward(ctx, x):
    method backward (line 197) | def backward(ctx, *grads):
  function all_gather_with_grad (line 203) | def all_gather_with_grad(tensors):
  function concat_all_gather (line 221) | def concat_all_gather(tensor):
  function tile (line 239) | def tile(x, dim, n_tile):

FILE: minigpt4/models/blip2.py
  class Blip2Base (line 28) | class Blip2Base(BaseModel):
    method init_tokenizer (line 30) | def init_tokenizer(cls):
    method maybe_autocast (line 35) | def maybe_autocast(self, dtype=torch.float16):
    method init_Qformer (line 46) | def init_Qformer(cls, num_query_token, vision_width, cross_attention_f...
    method init_vision_encoder (line 61) | def init_vision_encoder(
    method load_from_pretrained (line 72) | def load_from_pretrained(self, url_or_filename):
  function disabled_train (line 93) | def disabled_train(self, mode=True):
  class LayerNorm (line 99) | class LayerNorm(nn.LayerNorm):
    method forward (line 102) | def forward(self, x: torch.Tensor):
  function compute_sim_matrix (line 108) | def compute_sim_matrix(model, data_loader, **kwargs):

FILE: minigpt4/models/blip2_outputs.py
  class BlipSimilarity (line 20) | class BlipSimilarity(ModelOutput):
  class BlipIntermediateOutput (line 32) | class BlipIntermediateOutput(ModelOutput):
  class BlipOutput (line 73) | class BlipOutput(ModelOutput):
  class BlipOutputFeatures (line 89) | class BlipOutputFeatures(ModelOutput):

FILE: minigpt4/models/eva_vit.py
  function _cfg (line 20) | def _cfg(url='', **kwargs):
  class DropPath (line 30) | class DropPath(nn.Module):
    method __init__ (line 33) | def __init__(self, drop_prob=None):
    method forward (line 37) | def forward(self, x):
    method extra_repr (line 40) | def extra_repr(self) -> str:
  class Mlp (line 44) | class Mlp(nn.Module):
    method __init__ (line 45) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 54) | def forward(self, x):
  class Attention (line 64) | class Attention(nn.Module):
    method __init__ (line 65) | def __init__(
    method forward (line 118) | def forward(self, x, rel_pos_bias=None):
  class Block (line 151) | class Block(nn.Module):
    method __init__ (line 153) | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_sc...
    method forward (line 173) | def forward(self, x, rel_pos_bias=None):
  class PatchEmbed (line 183) | class PatchEmbed(nn.Module):
    method __init__ (line 186) | def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=...
    method forward (line 198) | def forward(self, x, **kwargs):
  class RelativePositionBias (line 207) | class RelativePositionBias(nn.Module):
    method __init__ (line 209) | def __init__(self, window_size, num_heads):
    method forward (line 238) | def forward(self):
  class VisionTransformer (line 246) | class VisionTransformer(nn.Module):
    method __init__ (line 249) | def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classe...
    method fix_init_weight (line 300) | def fix_init_weight(self):
    method _init_weights (line 308) | def _init_weights(self, m):
    method get_classifier (line 317) | def get_classifier(self):
    method reset_classifier (line 320) | def reset_classifier(self, num_classes, global_pool=''):
    method forward_features (line 324) | def forward_features(self, x):
    method forward (line 349) | def forward(self, x):
    method get_intermediate_layers (line 354) | def get_intermediate_layers(self, x):
  function interpolate_pos_embed (line 373) | def interpolate_pos_embed(model, checkpoint_model):
  function convert_weights_to_fp16 (line 397) | def convert_weights_to_fp16(model: nn.Module):
  function create_eva_vit_g (line 415) | def create_eva_vit_g(img_size=224,drop_path_rate=0.4,use_checkpoint=Fals...

FILE: minigpt4/models/mini_gpt4.py
  class MiniGPT4 (line 15) | class MiniGPT4(Blip2Base):
    method __init__ (line 24) | def __init__(
    method vit_to_cpu (line 127) | def vit_to_cpu(self):
    method encode_img (line 133) | def encode_img(self, image):
    method prompt_wrap (line 155) | def prompt_wrap(self, img_embeds, atts_img, prompt):
    method prompt_wrap_h (line 170) | def prompt_wrap_h(self, img_embeds, atts_img, prompt):
    method forward (line 194) | def forward(self, samples):
    method from_config (line 256) | def from_config(cls, cfg):

FILE: minigpt4/models/modeling_llama.py
  function _make_causal_mask (line 25) | def _make_causal_mask(
  function _expand_mask (line 43) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class LlamaRMSNorm (line 57) | class LlamaRMSNorm(nn.Module):
    method __init__ (line 58) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 66) | def forward(self, hidden_states):
  class LlamaRotaryEmbedding (line 77) | class LlamaRotaryEmbedding(torch.nn.Module):
    method __init__ (line 78) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method forward (line 92) | def forward(self, x, seq_len=None):
  function rotate_half (line 109) | def rotate_half(x):
  function apply_rotary_pos_emb (line 116) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  class LlamaMLP (line 126) | class LlamaMLP(nn.Module):
    method __init__ (line 127) | def __init__(
    method forward (line 139) | def forward(self, x):
  class LlamaAttention (line 143) | class LlamaAttention(nn.Module):
    method __init__ (line 146) | def __init__(self, config: LlamaConfig):
    method _shape (line 165) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 168) | def forward(
  class LlamaDecoderLayer (line 234) | class LlamaDecoderLayer(nn.Module):
    method __init__ (line 235) | def __init__(self, config: LlamaConfig):
    method forward (line 247) | def forward(
  class LlamaPreTrainedModel (line 323) | class LlamaPreTrainedModel(PreTrainedModel):
    method _init_weights (line 330) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 341) | def _set_gradient_checkpointing(self, module, value=False):
  class LlamaModel (line 414) | class LlamaModel(LlamaPreTrainedModel):
    method __init__ (line 422) | def __init__(self, config: LlamaConfig):
    method get_input_embeddings (line 435) | def get_input_embeddings(self):
    method set_input_embeddings (line 438) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 442) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 466) | def forward(
  class LlamaForCausalLM (line 599) | class LlamaForCausalLM(LlamaPreTrainedModel):
    method __init__ (line 600) | def __init__(self, config):
    method get_input_embeddings (line 609) | def get_input_embeddings(self):
    method set_input_embeddings (line 612) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 615) | def get_output_embeddings(self):
    method set_output_embeddings (line 618) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 621) | def set_decoder(self, decoder):
    method get_decoder (line 624) | def get_decoder(self):
    method forward (line 629) | def forward(
    method prepare_inputs_for_generation (line 717) | def prepare_inputs_for_generation(
    method _reorder_cache (line 750) | def _reorder_cache(past_key_values, beam_idx):

FILE: minigpt4/processors/__init__.py
  function load_processor (line 25) | def load_processor(name, cfg=None):

FILE: minigpt4/processors/base_processor.py
  class BaseProcessor (line 11) | class BaseProcessor:
    method __init__ (line 12) | def __init__(self):
    method __call__ (line 16) | def __call__(self, item):
    method from_config (line 20) | def from_config(cls, cfg=None):
    method build (line 23) | def build(self, **kwargs):

FILE: minigpt4/processors/blip_processors.py
  class BlipImageBaseProcessor (line 18) | class BlipImageBaseProcessor(BaseProcessor):
    method __init__ (line 19) | def __init__(self, mean=None, std=None):
  class BlipCaptionProcessor (line 29) | class BlipCaptionProcessor(BaseProcessor):
    method __init__ (line 30) | def __init__(self, prompt="", max_words=50):
    method __call__ (line 34) | def __call__(self, caption):
    method from_config (line 40) | def from_config(cls, cfg=None):
    method pre_caption (line 49) | def pre_caption(self, caption):
  class Blip2ImageTrainProcessor (line 72) | class Blip2ImageTrainProcessor(BlipImageBaseProcessor):
    method __init__ (line 73) | def __init__(self, image_size=224, mean=None, std=None, min_scale=0.5,...
    method __call__ (line 88) | def __call__(self, item):
    method from_config (line 92) | def from_config(cls, cfg=None):
  class Blip2ImageEvalProcessor (line 114) | class Blip2ImageEvalProcessor(BlipImageBaseProcessor):
    method __init__ (line 115) | def __init__(self, image_size=224, mean=None, std=None):
    method __call__ (line 128) | def __call__(self, item):
    method from_config (line 132) | def from_config(cls, cfg=None):

FILE: minigpt4/processors/randaugment.py
  function identity_func (line 15) | def identity_func(img):
  function autocontrast_func (line 19) | def autocontrast_func(img, cutoff=0):
  function equalize_func (line 52) | def equalize_func(img):
  function rotate_func (line 76) | def rotate_func(img, degree, fill=(0, 0, 0)):
  function solarize_func (line 87) | def solarize_func(img, thresh=128):
  function color_func (line 97) | def color_func(img, factor):
  function contrast_func (line 115) | def contrast_func(img, factor):
  function brightness_func (line 129) | def brightness_func(img, factor):
  function sharpness_func (line 138) | def sharpness_func(img, factor):
  function shear_x_func (line 159) | def shear_x_func(img, factor, fill=(0, 0, 0)):
  function translate_x_func (line 168) | def translate_x_func(img, offset, fill=(0, 0, 0)):
  function translate_y_func (line 180) | def translate_y_func(img, offset, fill=(0, 0, 0)):
  function posterize_func (line 192) | def posterize_func(img, bits):
  function shear_y_func (line 200) | def shear_y_func(img, factor, fill=(0, 0, 0)):
  function cutout_func (line 209) | def cutout_func(img, pad_size, replace=(0, 0, 0)):
  function enhance_level_to_args (line 223) | def enhance_level_to_args(MAX_LEVEL):
  function shear_level_to_args (line 230) | def shear_level_to_args(MAX_LEVEL, replace_value):
  function translate_level_to_args (line 240) | def translate_level_to_args(translate_const, MAX_LEVEL, replace_value):
  function cutout_level_to_args (line 250) | def cutout_level_to_args(cutout_const, MAX_LEVEL, replace_value):
  function solarize_level_to_args (line 258) | def solarize_level_to_args(MAX_LEVEL):
  function none_level_to_args (line 266) | def none_level_to_args(level):
  function posterize_level_to_args (line 270) | def posterize_level_to_args(MAX_LEVEL):
  function rotate_level_to_args (line 278) | def rotate_level_to_args(MAX_LEVEL, replace_value):
  class RandomAugment (line 326) | class RandomAugment(object):
    method __init__ (line 327) | def __init__(self, N=2, M=10, isPIL=False, augs=[]):
    method get_random_ops (line 336) | def get_random_ops(self):
    method __call__ (line 340) | def __call__(self, img):
  class VideoRandomAugment (line 352) | class VideoRandomAugment(object):
    method __init__ (line 353) | def __init__(self, N=2, M=10, p=0.0, tensor_in_tensor_out=True, augs=[]):
    method get_random_ops (line 363) | def get_random_ops(self):
    method __call__ (line 367) | def __call__(self, frames):
    method _aug (line 386) | def _aug(self, img, ops, apply_or_not):

FILE: minigpt4/runners/runner_base.py
  class RunnerBase (line 38) | class RunnerBase:
    method __init__ (line 46) | def __init__(self, cfg, task, model, datasets, job_id):
    method device (line 68) | def device(self):
    method use_distributed (line 75) | def use_distributed(self):
    method model (line 79) | def model(self):
    method optimizer (line 99) | def optimizer(self):
    method scaler (line 132) | def scaler(self):
    method lr_scheduler (line 142) | def lr_scheduler(self):
    method dataloaders (line 182) | def dataloaders(self) -> dict:
    method cuda_enabled (line 279) | def cuda_enabled(self):
    method max_epoch (line 283) | def max_epoch(self):
    method log_freq (line 287) | def log_freq(self):
    method init_lr (line 292) | def init_lr(self):
    method min_lr (line 296) | def min_lr(self):
    method accum_grad_iters (line 300) | def accum_grad_iters(self):
    method valid_splits (line 304) | def valid_splits(self):
    method test_splits (line 313) | def test_splits(self):
    method train_splits (line 319) | def train_splits(self):
    method evaluate_only (line 328) | def evaluate_only(self):
    method use_dist_eval_sampler (line 335) | def use_dist_eval_sampler(self):
    method resume_ckpt_path (line 339) | def resume_ckpt_path(self):
    method train_loader (line 343) | def train_loader(self):
    method setup_output_dir (line 348) | def setup_output_dir(self):
    method train (line 363) | def train(self):
    method evaluate (line 423) | def evaluate(self, cur_epoch="best", skip_reload=False):
    method train_epoch (line 434) | def train_epoch(self, epoch):
    method eval_epoch (line 451) | def eval_epoch(self, split_name, cur_epoch, skip_reload=False):
    method unwrap_dist_model (line 485) | def unwrap_dist_model(self, model):
    method create_loaders (line 491) | def create_loaders(
    method _save_checkpoint (line 575) | def _save_checkpoint(self, cur_epoch, is_best=False):
    method _reload_best_model (line 602) | def _reload_best_model(self, model):
    method _load_checkpoint (line 622) | def _load_checkpoint(self, url_or_filename):
    method log_stats (line 647) | def log_stats(self, stats, split_name):
    method log_config (line 656) | def log_config(self):

FILE: minigpt4/tasks/__init__.py
  function setup_task (line 13) | def setup_task(cfg):

FILE: minigpt4/tasks/base_task.py
  class BaseTask (line 19) | class BaseTask:
    method __init__ (line 20) | def __init__(self, **kwargs):
    method setup_task (line 26) | def setup_task(cls, **kwargs):
    method build_model (line 29) | def build_model(self, cfg):
    method build_datasets (line 35) | def build_datasets(self, cfg):
    method train_step (line 67) | def train_step(self, model, samples):
    method valid_step (line 74) | def valid_step(self, model, samples):
    method before_evaluation (line 77) | def before_evaluation(self, model, dataset, **kwargs):
    method after_evaluation (line 80) | def after_evaluation(self, **kwargs):
    method inference_step (line 83) | def inference_step(self):
    method evaluation (line 86) | def evaluation(self, model, data_loader, cuda_enabled=True):
    method train_epoch (line 105) | def train_epoch(
    method train_iters (line 130) | def train_iters(
    method _train_inner_loop (line 158) | def _train_inner_loop(
    method save_result (line 253) | def save_result(result, result_dir, filename, remove_duplicate=""):

FILE: minigpt4/tasks/image_text_pretrain.py
  class ImageTextPretrainTask (line 13) | class ImageTextPretrainTask(BaseTask):
    method __init__ (line 14) | def __init__(self):
    method evaluation (line 17) | def evaluation(self, model, data_loader, cuda_enabled=True):

FILE: output_LURE.py
  function parse_args (line 24) | def parse_args():

FILE: tool/utils.py
  class GreedySearchDecoderOnlyOutput (line 83) | class GreedySearchDecoderOnlyOutput(ModelOutput):
  class ContrastiveSearchEncoderDecoderOutput (line 111) | class ContrastiveSearchEncoderDecoderOutput(ModelOutput):
  class ContrastiveSearchDecoderOnlyOutput (line 150) | class ContrastiveSearchDecoderOnlyOutput(ModelOutput):
  class GreedySearchEncoderDecoderOutput (line 179) | class GreedySearchEncoderDecoderOutput(ModelOutput):
  class SampleDecoderOnlyOutput (line 221) | class SampleDecoderOnlyOutput(ModelOutput):
  class SampleEncoderDecoderOutput (line 250) | class SampleEncoderDecoderOutput(ModelOutput):
  class BeamSearchDecoderOnlyOutput (line 293) | class BeamSearchDecoderOnlyOutput(ModelOutput):
  class BeamSearchEncoderDecoderOutput (line 328) | class BeamSearchEncoderDecoderOutput(ModelOutput):
  class BeamSampleDecoderOnlyOutput (line 378) | class BeamSampleDecoderOnlyOutput(ModelOutput):
  class BeamSampleEncoderDecoderOutput (line 413) | class BeamSampleEncoderDecoderOutput(ModelOutput):
  class GenerationMixin (line 469) | class GenerationMixin:
    method prepare_inputs_for_generation (line 493) | def prepare_inputs_for_generation(self, *args, **kwargs):
    method _prepare_model_inputs (line 498) | def _prepare_model_inputs(
    method adjust_logits_during_generation (line 562) | def adjust_logits_during_generation(self, logits: torch.FloatTensor, *...
    method _maybe_initialize_input_ids_for_generation (line 568) | def _maybe_initialize_input_ids_for_generation(
    method _prepare_attention_mask_for_generation (line 596) | def _prepare_attention_mask_for_generation(
    method _prepare_encoder_decoder_kwargs_for_generation (line 614) | def _prepare_encoder_decoder_kwargs_for_generation(
    method _prepare_decoder_input_ids_for_generation (line 642) | def _prepare_decoder_input_ids_for_generation(
    method _get_decoder_start_token_id (line 658) | def _get_decoder_start_token_id(self, decoder_start_token_id: int = No...
    method _expand_inputs_for_generation (line 675) | def _expand_inputs_for_generation(
    method _extract_past_from_model_output (line 701) | def _extract_past_from_model_output(self, outputs: ModelOutput, standa...
    method _update_model_kwargs_for_generation (line 716) | def _update_model_kwargs_for_generation(
    method _reorder_cache (line 751) | def _reorder_cache(self, past_key_values, beam_idx):
    method _get_logits_warper (line 757) | def _get_logits_warper(
    method _get_logits_processor (line 795) | def _get_logits_processor(
    method _get_stopping_criteria (line 912) | def _get_stopping_criteria(
    method _merge_criteria_processor_list (line 923) | def _merge_criteria_processor_list(
    method compute_transition_scores (line 944) | def compute_transition_scores(
    method _validate_model_class (line 1065) | def _validate_model_class(self):
    method _validate_model_kwargs (line 1091) | def _validate_model_kwargs(self, model_kwargs: Dict[str, Any]):
    method generate (line 1115) | def generate(
    method contrastive_search (line 1711) | def contrastive_search(
    method greedy_search (line 2080) | def greedy_search(
    method sample (line 2335) | def sample(
    method beam_search (line 2622) | def beam_search(
    method beam_sample (line 2945) | def beam_sample(
    method group_beam_search (line 3278) | def group_beam_search(
    method constrained_beam_search (line 3658) | def constrained_beam_search(
  function top_k_top_p_filtering (line 3984) | def top_k_top_p_filtering(
  function _ranking_fast (line 4019) | def _ranking_fast(

FILE: train.py
  function parse_args (line 35) | def parse_args():
  function setup_seeds (line 54) | def setup_seeds(config):
  function get_runner_class (line 65) | def get_runner_class(cfg):
  function main (line 74) | def main():
Copy disabled (too large) Download .json
Condensed preview — 66 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (14,260K chars).
[
  {
    "path": "README.md",
    "chars": 9668,
    "preview": "# Analyzing and Mitigating Object Hallucination in Large Vision-Language Models\n\n\n[Yiyang Zhou*](https://yiyangzhou.gith"
  },
  {
    "path": "__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "dataset/README_1_STAGE.md",
    "chars": 2798,
    "preview": "## Download the filtered Conceptual Captions, SBU, LAION datasets\n\n### Pre-training datasets download:\nWe use the filter"
  },
  {
    "path": "dataset/README_2_STAGE.md",
    "chars": 511,
    "preview": "## Second Stage Data Preparation\n\nOur second stage dataset can be downloaded from \n[here](https://drive.google.com/file/"
  },
  {
    "path": "dataset/convert_cc_sbu.py",
    "chars": 504,
    "preview": "import json\nimport csv\n\n# specify input and output file paths\ninput_file = 'ccs_synthetic_filtered_large.json'\noutput_fi"
  },
  {
    "path": "dataset/convert_laion.py",
    "chars": 508,
    "preview": "import json\nimport csv\n\n# specify input and output file paths\ninput_file = 'laion_synthetic_filtered_large.json'\noutput_"
  },
  {
    "path": "dataset/download_cc_sbu.sh",
    "chars": 302,
    "preview": "#!/bin/bash\n\nimg2dataset --url_list ccs_synthetic_filtered_large.tsv --input_format \"tsv\"\\\n         --url_col \"url\" --ca"
  },
  {
    "path": "dataset/download_laion.sh",
    "chars": 303,
    "preview": "#!/bin/bash\n\nimg2dataset --url_list laion_synthetic_filtered_large.tsv --input_format \"tsv\"\\\n         --url_col \"url\" --"
  },
  {
    "path": "dataset_train/filter_cap.json",
    "chars": 6300658,
    "preview": "{\"annotations\": [{\"image_id\": \"COCO_train2014_000000256539\", \"caption\": \"The image features an open market with a variet"
  },
  {
    "path": "dataset_train/hallucination5k_train.jsonl",
    "chars": 7147910,
    "preview": "{\"id\": \"000000256539\", \"image\": \"COCO_train2014_000000256539.jpg\", \"value\": \"The image features an open market with a va"
  },
  {
    "path": "environment.yml",
    "chars": 1313,
    "preview": "name: LURE\nchannels:\n  - pytorch\n  - defaults\n  - anaconda\ndependencies:\n  - python=3.9\n  - cudatoolkit\n  - pip\n  - pyto"
  },
  {
    "path": "eval_configs/minigpt4_eval.yaml",
    "chars": 523,
    "preview": "model:\n  arch: mini_gpt4\n  model_type: pretrain_vicuna\n  freeze_vit: True\n  freeze_qformer: True\n  max_txt_len: 128\n  en"
  },
  {
    "path": "generate_IDK.py",
    "chars": 3528,
    "preview": "import json\r\nimport numpy as np\r\nimport re\r\nimport argparse\r\n\r\ndef get_word(words, objlist):\r\n    if words in objlist:\r\n"
  },
  {
    "path": "minigpt4/__init__.py",
    "chars": 951,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "minigpt4/common/config.py",
    "chars": 15079,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/dist_utils.py",
    "chars": 3620,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/gradcam.py",
    "chars": 815,
    "preview": "import numpy as np\nfrom matplotlib import pyplot as plt\nfrom scipy.ndimage import filters\nfrom skimage import transform "
  },
  {
    "path": "minigpt4/common/logger.py",
    "chars": 6001,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/optims.py",
    "chars": 3516,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/registry.py",
    "chars": 9915,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/common/utils.py",
    "chars": 13807,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/configs/datasets/cc_sbu/align.yaml",
    "chars": 134,
    "preview": "datasets:\n  cc_sbu_align:\n    data_type: images\n    build_info:\n      storage: The path where your data is stored(/xxx/d"
  },
  {
    "path": "minigpt4/configs/datasets/cc_sbu/defaults.yaml",
    "chars": 116,
    "preview": "datasets:\n  cc_sbu:\n    data_type: images\n    build_info:\n      storage: /path/to/cc_sbu_dataset/{00000..01255}.tar\n"
  },
  {
    "path": "minigpt4/configs/datasets/laion/defaults.yaml",
    "chars": 114,
    "preview": "datasets:\n  laion:\n    data_type: images\n    build_info:\n      storage: /path/to/laion_dataset/{00000..10488}.tar\n"
  },
  {
    "path": "minigpt4/configs/default.yaml",
    "chars": 141,
    "preview": "env:\n  # For default users\n  # cache_root: \"cache\"\n  # For internal use with persistent storage\n  cache_root: \"/export/h"
  },
  {
    "path": "minigpt4/configs/models/minigpt4.yaml",
    "chars": 592,
    "preview": "model:\n  arch: mini_gpt4\n\n  # vit encoder\n  image_size: 224\n  drop_path_rate: 0\n  use_grad_checkpoint: False\n  vit_preci"
  },
  {
    "path": "minigpt4/conversation/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "minigpt4/conversation/conversation.py",
    "chars": 9006,
    "preview": "import argparse\nimport time\nfrom PIL import Image\n\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCaus"
  },
  {
    "path": "minigpt4/datasets/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "minigpt4/datasets/builders/__init__.py",
    "chars": 1897,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/datasets/builders/base_dataset_builder.py",
    "chars": 8105,
    "preview": "\"\"\"\n This file is from\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-C"
  },
  {
    "path": "minigpt4/datasets/builders/image_text_pair_builder.py",
    "chars": 2999,
    "preview": "import os\nimport logging\nimport warnings\n\nfrom minigpt4.common.registry import registry\nfrom minigpt4.datasets.builders."
  },
  {
    "path": "minigpt4/datasets/data_utils.py",
    "chars": 6281,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/datasets/datasets/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "minigpt4/datasets/datasets/base_dataset.py",
    "chars": 2067,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/datasets/datasets/caption_datasets.py",
    "chars": 2601,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/datasets/datasets/cc_sbu_dataset.py",
    "chars": 1685,
    "preview": "import os\nfrom PIL import Image\nimport webdataset as wds\nfrom minigpt4.datasets.datasets.base_dataset import BaseDataset"
  },
  {
    "path": "minigpt4/datasets/datasets/dataloader_utils.py",
    "chars": 5258,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/datasets/datasets/laion_dataset.py",
    "chars": 1174,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/models/Qformer.py",
    "chars": 48386,
    "preview": "\"\"\"\n * Copyright (c) 2023, salesforce.com, inc.\n * All rights reserved.\n * SPDX-License-Identifier: BSD-3-Clause\n * For "
  },
  {
    "path": "minigpt4/models/__init__.py",
    "chars": 5754,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/models/base_model.py",
    "chars": 7865,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/models/blip2.py",
    "chars": 7717,
    "preview": "\"\"\"\n Copyright (c) 2023, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/models/blip2_outputs.py",
    "chars": 4153,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/models/eva_vit.py",
    "chars": 19529,
    "preview": "# Based on EVA, BEIT, timm and DeiT code bases\n# https://github.com/baaivision/EVA\n# https://github.com/rwightman/pytorc"
  },
  {
    "path": "minigpt4/models/mini_gpt4.py",
    "chars": 12649,
    "preview": "import logging\nimport random\n\nimport torch\nfrom torch.cuda.amp import autocast as autocast\nimport torch.nn as nn\n\nfrom m"
  },
  {
    "path": "minigpt4/models/modeling_llama.py",
    "chars": 33326,
    "preview": "# This script is based on https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_l"
  },
  {
    "path": "minigpt4/output/__init__.py",
    "chars": 1,
    "preview": "\n"
  },
  {
    "path": "minigpt4/output/minigpt4_stage2_finetune/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "minigpt4/processors/__init__.py",
    "chars": 823,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/processors/base_processor.py",
    "chars": 610,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/processors/blip_processors.py",
    "chars": 4003,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/processors/randaugment.py",
    "chars": 11298,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/runners/__init__.py",
    "chars": 306,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/runners/runner_base.py",
    "chars": 23050,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/tasks/__init__.py",
    "chars": 736,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/tasks/base_task.py",
    "chars": 9150,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "minigpt4/tasks/image_text_pretrain.py",
    "chars": 538,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "output_LURE.py",
    "chars": 4897,
    "preview": "import argparse\r\nimport os\r\nimport random\r\nimport sys\r\nimport json\r\nfrom tqdm import tqdm\r\nimport numpy as np\r\nimport to"
  },
  {
    "path": "prompts/alignment.txt",
    "chars": 286,
    "preview": "<Img><ImageHere></Img> Describe this image in detail.\n<Img><ImageHere></Img> Take a look at this image and describe what"
  },
  {
    "path": "tool/to_chair.py",
    "chars": 900,
    "preview": "import json\r\n\r\n\r\nwith open(\"./data/[your input].jsonl\", 'r') as jsonl_file:\r\n    lines = jsonl_file.readlines()\r\n\r\novera"
  },
  {
    "path": "tool/utils.py",
    "chars": 218359,
    "preview": "# coding=utf-8\n# Copyright 2020 The Google AI Language Team Authors, Facebook AI Research authors and The HuggingFace In"
  },
  {
    "path": "train.py",
    "chars": 2677,
    "preview": "\"\"\"\n Copyright (c) 2022, salesforce.com, inc.\n All rights reserved.\n SPDX-License-Identifier: BSD-3-Clause\n For full lic"
  },
  {
    "path": "train_configs/minigpt4_stage1_pretrain.yaml",
    "chars": 983,
    "preview": "model:\n  arch: mini_gpt4\n  model_type: pretrain_vicuna\n  freeze_vit: True\n  freeze_qformer: True\n\n\ndatasets:\n  laion:\n  "
  },
  {
    "path": "train_configs/minigpt4_stage2_finetune.yaml",
    "chars": 936,
    "preview": "model:\n  arch: mini_gpt4\n  model_type: pretrain_vicuna\n  freeze_vit: True\n  freeze_qformer: True\n  max_txt_len: 160\n  en"
  }
]

About this extraction

This page contains the full source code of the YiyangZhou/LURE GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 66 files (13.3 MB), approximately 3.5M tokens, and a symbol index with 553 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!