Repository: VividLe/awesome-weakly-supervised-action-localization
Branch: master
Commit: e86943a85504
Files: 4
Total size: 44.9 KB
Directory structure:
gitextract_ywnz0pl2/
├── Other_Settings.md
├── README.md
├── Spatiotemporal.md
└── Supervised.md
================================================
FILE CONTENTS
================================================
================================================
FILE: Other_Settings.md
================================================
# Action Localization Benchmarks
### Learning from action count supervision
* **OCL:**Julien Schroeter, Kirill Sidorov, David Marshall.<br />
"Weakly-Supervised Temporal Localization via Occurrence Count Learning" ICML 2019.
[[paper](https://arxiv.org/pdf/1905.07293.pdf)]
[[code](https://github.com/SchroeterJulien/ICML-2019-Weakly-Supervised-Temporal-Localization-via-Occurrence-Count-Learning)]
### Action Segment, Transformer
* Action Modifiers :Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen.<br />
"Action Modifiers: Learning from Adverbs in Instructional Videos." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Doughty_Action_Modifiers_Learning_From_Adverbs_in_Instructional_Videos_CVPR_2020_paper.pdf)]
### Action Segment, Self-Supervised Learning
* Action Segmentation : Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira.<br />
"Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Action_Segmentation_With_Joint_Self-Supervised_Temporal_Domain_Adaptation_CVPR_2020_paper.pdf)]
### Action Segment, Transformer
* SCT : Mohsen Fayyaz,Juergen Gall.<br />
"SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fayyaz_SCT_Set_Constrained_Temporal_Transformer_for_Set_Supervised_Action_Segmentation_CVPR_2020_paper.pdf)]
[[code](https://github.com/MohsenFayyaz89/SCT/)]
### Unintentional Action
A new task. Interesting.
* Oops : Dave Epstein Boyuan Chen Carl Vondrick.<br />
"Oops! Predicting Unintentional Action in Video." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Epstein_Oops_Predicting_Unintentional_Action_in_Video_CVPR_2020_paper.pdf)]
[[project](https://oops.cs.columbia.edu/)]
### ActionBytes
Learning from trimmed videos.
* ActoinBytes: Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR (2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]
### METAL
ActivityNet v1.2, mAP@0.5: 41.9 [1-shot], 45.0 [5-shot]
THUMOS14, mAP@0.5: 14.3 [1-shot], 16.2 [5-shot]
* METAL : Da Zhang, Xiyang Dai, and Yuan-Fang Wang.<br />
"METAL : Minimum Effort Temporal Activity Localization in Untrimmed Videos." CVPR (2020).
[[paper](https://sites.cs.ucsb.edu/~yfwang/papers/cvpr2020.pdf)]
### Hierarchical Action Search
A new task. A learned space where videos are positioned in entailment cones formed by different subtrees.
* Uncertain :Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek.<br />
"Searching for Actions on the Hyperbole." CVPR (2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Long_Searching_for_Actions_on_the_Hyperbole_CVPR_2020_paper.pdf)]
### Fine-grained Action Recognition and Localization
A new task. Hierarchical annotation for recognition and localization.
* FineGym :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />
"FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." (CVPR 2020, oral, 3 strong accepts).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)]
[[project](https://sdolivia.github.io/FineGym/)]
### Mining undefined sub-actions
A new task. Find the common sub-actions from multiple videos.
* TAPOs :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />
"Intra- and Inter-Action Understanding via Temporal Action Parsing." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_Intra-_and_Inter-Action_Understanding_via_Temporal_Action_Parsing_CVPR_2020_paper.pdf)]
[[project](https://sdolivia.github.io/TAPOS/)]
================================================
FILE: README.md
================================================
# Action Localization Benchmarks
Papers and Results of Temporal Action Localization
**Weakly Supervised Performance on THUMOS'14 dataset.**
- The detectors are sorted by the mAP with threshold 0.5.
- "c" indicates whether release code, yes (Y) or no (N).
- "e" indicates the evaluation code, THUMOS (T), ActivityNet (A) or implemented by themselves.
| Detector | Pub |c|e| 0.1 | 0.2 | 0.3 |0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |avg | info |
| :---------: |:------:|-|-|:---:|:----|:----|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net | arXiv-20-12-11 |N|A|65.6 |60.0 |52.1 |43.3 |35.9 | - | - | - | - | - | The same author with 3C-Net |
| Lee et al | AAAI21 |Y|A|67.5 |61.2 |52.3 |43.4 |33.7 | 22.9 | 12.1 | - | - | - | The same author with BaS-Net |
| HAM-Net | AAAI21 |N|A|65.9 |59.6 |52.2 |43.1 |32.6 | 21.9 | 12.5 | - | - | - | - |
| ACSNet | AAAI21 |N|A| - | - |51.4 |42.7 |32.4 | 22.0 | 11.7 | - | - | - | - |
| EM-MIL | ECCV20 |N|A|59.1 |52.7 |45.5 |36.8 |30.5 | 22.7 | 16.4 | - | - | - | Use existing classifiation results |
| A2CL-PT | ECCV20 |Y|A| 61.2 | 56.1 |48.1 | 39.0 |30.1 |19.2 |10.6 | 4.8 | 1.0 | 30.0 | Report unsupervised performance as well |
| ACL | CVPR20 |N|A| - | - |46.9 |38.9 |30.1 |19.8 |10.4 | - | - | - | Report unsupervised performance as well |
| Liu et al. | AAAI21 |N|A| 61.7 | 58.0 |50.8 | 41.7 |29.6 |20.1 |10.7 | 4.3 | 0.5 | - | - |
| WSTAL | WACV20 |-| |62.3 |- |46.8 |- |29.6 |- |9.7 |- |- |- | - |
| ActionBytes | CVPR20 |N|A| - | - |43.0 |37.5 |29.0 | - |9.5 | - | - | - | - |
| DGAM | CVPR20 |Y|A| 60.0|54.2 |46.8 |38.2 |28.8 |19.8 |11.4 |3.6 |0.4 | - | - |
| TSCN | ECCV20 |N|A| 63.4|57.6 |47.8 |37.7 |28.7 |19.4 |10.2 |3.9 |0.7 | - | - |
| BaSNet-I3D | AAAI20 |Y|A| 58.2|52.3 |44.6 |36.0 |27.0 |18.6 |10.4 |3.9 |0.5 | - | - |
| BaSNet-UNT | AAAI20 |Y|A| 56.2|50.3 |42.8 |34.7 |25.1 |17.1 |9.3 |3.7 |0.5 | - | - |
| WSBM | ICCV19 |N|A| 60.4|56.0 |46.6 |37.5 |26.8 |17.6 |9.0 |3.3 |0.4 | - | - |
| 3C-Net | ICCV19 |Y|I| 59.1|53.5 |44.2 |34.1 |26.6 | - |8.1 | - | - | - | - |
| ASSG | ACM 19 |N| | 65.6|59.4 |50.4 |38.7 |25.4 |15.0 |6.6 | - | - | - | - |
| TSM | ICCV19 |N|T| - | - |39.5 |31.9 |24.5 |13.8 |7.1 | - | - |23.4 | - |
| CleanNet | ICCV19 |N|T| - | - |37.0 |30.9 |23.9 |13.9 |7.1 | - | - | - | - |
| CMCS-I3D | CVPR19 |Y|T| 57.4|50.8 |41.2 |32.1 |23.1 |15.0 |7.0 | - | - | - |report avg-mAP|
| CMCS-UNT | CVPR19 |Y|T| 53.5|46.8 |37.5 |29.1 |19.9 |12.3 |6.0 | - | - | - | - |
| STARNet | AAAI19 |N|A|68.8 |60.0 |48.7 |34.7 |23.0 |- |- | - | - |- | - |
| W-TALC | ECCV18 |Y|I|55.2 |49.6 |40.1 |31.1 |22.8 |- |- | - | - |7.6 | - |
| AutoLoc | ECCV18 |Y|T|- |- |35.8 |29.0 |21.2 |13.4 |5.8 | - | - |- | - |
| MAAN | ICLR19 |Y|A|59.8 |50.8 |41.1 |30.6 |20.3 |12.0 |6.9 |2.6 |0.2 |24.9 | - |
| LTSR | AAAI19 |N|T|55.9 |46.9 |38.3 |28.1 |18.6 |11.0 |5.59 |2.19 |0.29 |- | - |
| WSGN | WACV20 |-|T|51.1 |44.4 |34.9 |26.3 |18.1 |11.6 |6.5 |- |- |- | - |
| STPN | CVPR18 |I|A|52.0 |44.7 |35.5 |25.8 |16.9 |9.9 |4.3 |1.2 |0.1 |- | - |
| CPMN | ACCV18 |N|T|47.1 |41.6 |32.8 |24.7 |16.1 |10.1 |5.5 |- |- |- | - |
| S-O-C | ACM18 |N|T|45.8 |39.0 |31.1 |22.5 |15.9 |- |- |- |- |- | - |
|UntrimmedNets| CVPR17 |Y|T|44.4 |37.7 |28.2 |21.1 |13.7 |- |- |- |- |- | - |
| H&S | ICCV17 |Y|T|36.44|27.84|19.49|12.66|6.84 |- |- |- |- |- | - |
|LPAT-I3D+TEM | arXiv |-| |- |- |46.9 |37.4 |28.0 |16.6 |9.2 |- |- |27.6 | - |
| LPAT-I3D | arXiv |-| |- |- |46.7 |37.5 |27.9 |17.6 |9.2 |- |- |27.6 | - |
| LPAT-U | arXiv |-| |- |- |39.9 |31.5 |22.6 |14.2 |7.9 |- |- |27.6 | - |
|RefineLoc-I3D| arXiv |-|T|- |- |40.8 |- |23.1 |- |5.3 |- |- |- | - |
|RefineLoc-TSN| arXiv |-|T|- |- |36.1 |- |22.6 |- |5.8 |- |- |- | - |
**Weakly Supervised Performance on ActivityNet v1.2 dataset.**
| Detector | Pub |c| 0.5 | 0.55|0.60 | 0.65|0.70 |0.75 | 0.80|0.85 |0.90 |0.95 | avg |test | info |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net | arXiv-20-12-11 |N|42.3 | - | - | - | - |25.5 | - | - | - | 5.8 | 26.0| - | - |
| ACSNet | AAAI21 |N|40.1 | - | - | - | - |26.1 | - | - | - | 6.8 | 26.0| - | - |
| Lee et al | AAAI21 |N|41.2 | - | - | - | - |25.6 | - | - | - | 6.0 | 25.9| - | - |
| Liu at al. | AAAI21 |N|39.2 |- |- |- | - |25.6 |- |- |- | 6.8 | 25.5| - | - |
| HAM-Net | AAAI21 |N|41.0 | - | - | - | - |24.8 | - | - | - | 5.3 | 25.1| - | - |
| BaSNet | AAAI20 |Y|38.5 | - | - | - | - |24.2 | - | - | - | 5.6 | 24.3| - | - |
| TSCN | ECCV20 |N|37.6 |- |- |- | - |23.7 |- |- |- | 5.7 | 23.6| - | - |
| CMCS | CVPR19 |Y|36.8 |- |- |- | - |22.0 |- |- |- | 5.6 | 22.4| - | - |
| 3C-Net | ICCV19 |Y|35.4 | - | - | - | - |22.9 | - | - | - | 8.5 | 21.1| - | - |
| TSM | ICCV19 |N|28.3 |26.0 |23.6 |21.2 | 18.9|17.0 |14.0 |11.1 |7.5 | 3.5 | - | - | - |
| CleanNet | ICCV19 |N|37.1 |33.4 |29.9 |26.7 | 23.4|20.3 |17.2 |13.9 |9.2 | 5.0 | 21.6| - | - |
| EM-MIL | ECCV20 |N|37.4 |- |- |- | 23.1|- |- |- |2.0 | - | 20.3| - | - |
| W-TALC | ECCV18 |Y|37.0 |- |- |- | 14.6|- |- |- |- | - | 18.0| - | - |
| AutoLoc | ECCV18 |Y|27.3 |24.9 |22.5 |19.9 | 17.5|15.1 |13.0 |10.0 |6.8 | 3.3 | 16.0| - | - |
|RefineLoc-I3D| arXiv |-|38.7 |- |- |- | - |22.6 |- |- |- | 5.5 | 23.2| - | - |
|RefineLoc-TSN| arXiv |-|38.8 |- |- |- | - |22.2 |- |- |- | 5.3 | 23.2| - | - |
| LPAT | arXiv |-|37.6 |34.6 |31.6 |28.7 | 25.6|22.6 |19.6 |15.3 |10.9 | 4.9 | 23.1| - | - |
| WSTAL | arXiv |-|35.2 |- |- |- | 16.3|- |- |- |- | - | - | - | - |
**Weakly Supervised Performance on ActivityNet v1.3 dataset.**
| Detector | Pub |c| 0.5 |0.75 |0.95 |avg |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|
| ACSNet | AAAI21 |N|36.3 |24.2 |5.8 |23.9 |
| Lee et al.| AAAI21 |Y|37.0 |23.9 |5.7 |23.7 |
| Liu et al.| AAAI21 |N|35.1 |23.7 |5.6 |23.2 |
| A2CL-PT | ECCV20 |Y|36.8 |22.0 |5.2 |22.5 |
| BaSNet-I3D | AAAI20 |Y|34.5 |22.5 |4.9 |22.2 |
| TSCN | ECCV20 |N|35.3 |21.4 |5.3 |21.7 |
| WSBM | ICCV19 |N|36.4 |19.2 | 2.9 |- |
| ASSG | ACM 19 |N|32.3 |20.1 | 4.0 |- |
| TSM | ICCV19 |N|30.0 |19.0 | 4.5 |- |
| CMCS | CVPR19 |Y|34.0 |20.9 | 5.7 |21.2 |
| STARNet | AAAI19 |N|31.1 |18.8 | 4.7 |- |
| MAAN | ICLR19 |Y|33.7 |21.9 | 5.5 |- |
| LTSR | AAAI19 |N|33.1 |18.7 |3.32 |21.78|
| STPN | CVPR18 |I|29.3 |16.9 |2.6 |- |
| CPMN | ACCV18 |N|39.29|24.09|6.71 |24.42|
| S-O-C | ACM18 |N|27.3 |14.7 |2.9 |15.6 |
### Weakly Supervised Temporal Action Localization
* **D2-Net:** Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao.<br />
"D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations" arXiv:2012.06440.
[[paper](https://arxiv.org/pdf/2012.06440.pdf)]
* **Lee et al:** Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun.<br />
"Weakly-supervised Temporal Action Localization by Uncertainty Modeling" AAAI 2021.
[[paper](https://arxiv.org/pdf/2006.07006.pdf)]
[[code](https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling)]
* **HAM-Net:** Ashraful Islam, Chengjiang Long , Richard J. Radke.<br />
"A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization" AAAI 2021.
[[paper](https://arxiv.org/pdf/2101.00545.pdf)]
[[code](https://github.com/asrafulashiq/hamnet)]
* **Liu et al.:** Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />
"ACSNet : Action-Context Separation Network for Weakly Supervised Temporal Action Localization" AAAI 2021.
[[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **Liu at al.:** Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />
"Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context" AAAI 2021.
[[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **EM-MIL:** Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu.<br />
"Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning" ECCV 2020.
[[paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740715.pdf)]
* **A2CL-PT:** Kyle Min, Jason J. Corso.<br />
"Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization." ECCV 2020.
[[paper](https://link.springer.com/chapter/10.1007%2F978-3-030-58568-6_17)]
[[code](https://github.com/MichiganCOG/A2CL-PT)]
* **ACL:** Guoqiang Gong, Xinghan Wang, Yadong Mu, Qi Tian.<br />
"Learning Temporal Co-Attention Models for Unsupervised Video Action Localization." CVPR 2020, oral.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.pdf)]
* **WSTAL**Ashraful Islam, Richard J. Radke.<br />
"Weakly Supervised Temporal Action Localization Using Deep Metric Learning" WACV 2020.
[[paper](https://arxiv.org/pdf/2001.07793.pdf)]
[[code](https://github.com/asrafulashiq/wsad)]
* **ActoinBytes:** Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR 2020.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]
* **DGAM:** Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang.<br />
"Weakly-Supervised Action Localization by Generative Attention Modeling." CVPR 2020.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf)]
[[code](https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization)]
* **TSCN:** Zhai, Yuanhao and Wang, Le and Tang, Wei and Zhang, Qilin and Yuan, Junsong and Hua, Gang.<br />
"Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization." ECCV 2020.
[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf)]
* **BaSNet:** Pilhyeon Lee, Youngjung Uh, Hyeran Byun.<br />
"Background Suppression Networks for Weakly-supervised Temporal Action Localization." AAAI 2020.
[[paper](https://arxiv.org/pdf/1911.09963.pdf)]
[[code](https://github.com/Pilhyeon/BaSNet-pytorch)]
* **3C-Net:** Sanath Narayan, Hisham Cholakkal, Fahad Shabaz Khan, Ling Shao.<br />
"3C-Net : Category Count and Center Loss for Weakly-Supervised Action Localization." ICCV 2019.
[[paper](https://arxiv.org/pdf/1908.08216.pdf)]
[[code](https://github.com/naraysa/3c-net)]
* **CMCS:** Daochang Liu, Tingting Jiang, Yizhou Wang.<br />
"Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization." CVPR 2019.
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.pdf)]
[[code](https://github.com/Finspire13/CMCS-Temporal-Action-Localization)]
* **ASSG:** Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang, Pu Fei Wu, Futai Zou.<br />
"Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization" ACM MM 2019.
[[paper](https://arxiv.org/pdf/1908.02422.pdf)]
* **AutoLoc:**Zheng Shou, Hang Gao, Lei Zhang, KazuyukiMiyazawa, Shih-Fu Chang.<br />
"AutoLoc Weakly-supervised Temporal Action Localization in Untrimmed Videos"ECCV 2018.
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Zheng_Shou_AutoLoc_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/zhengshou/AutoLoc)]
* **CPMN:**Haisheng Su, Xu Zhao, Tianwei Lin.<br />
"Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization"ACCV 2018.
[[paper](https://arxiv.org/pdf/1810.11794.pdf)]
* **H&S:**Krishna Kumar Singh, Yong Jae Lee.<br />
"Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization"ICCV 2017.
[[paper](https://arxiv.org/pdf/1704.04232.pdf)]
[[code](https://github.com/goddoe/hide-and-seek)]
* **LTSR:**Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan.<br />
"Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision"AAAI 2019.
[[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4958/4831)]
* **WSGN:**Basura Fernando, Cheston Tan Yin Chet.<br />
"Weakly Supervised Gaussian Networks for Action Detection" WACV(2020)
* **MAAN:**Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung.<br />
"MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING"ICLR 2019.
[[paper](https://arxiv.org/pdf/1905.08586.pdf)]
[[code](https://github.com/yyuanad/MAAN)]
* **S-O-C:**Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li.<br />
" Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector"ACM MM 2018.
[[paper](https://arxiv.org/pdf/1807.02929.pdf)]
* **STARNet:**Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu.<br />
"Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection"AAAI 2019.
[[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4939/4812)]
* **TSM:**Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan.<br />
"Temporal Structure Mining for Weakly Supervised Action Detection"ICCV(2019).
[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf)]
* **UntrimmedNets:**Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool.<br />
"UntrimmedNets for Weakly Supervised Action Recognition and Detection"CVPR 2017.
[[paper](https://wanglimin.github.io/papers/WangXLV_CVPR17.pdf)]
[[code](https://github.com/wanglimin/UntrimmedNet)]
* **WSBM:**Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes.<br />
"Weakly-supervised Action Localization with Background Modeling"ICCV 2019.
[[paper](https://arxiv.org/pdf/1908.06552.pdf)]
* **CleanNet:**Ziyi Liu, Le Wang1∗ Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua.<br />
"Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks"ICCV 2019.
[[paper](https://qilin-zhang.github.io/_pages/pdfs/Weakly_Supervised_Temporal_Action_Localization_through_Contrast_based_Evaluation_Networks.pdf)]
* **STPN**Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han.<br />
"Weakly Supervised Action Localization by Sparse Temporal Pooling Network" CVPR 2018.
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Weakly_Supervised_Action_CVPR_2018_paper.pdf)]
[[code](https://github.com/bellos1203/STPN)]
* **W-TALC**Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury.<br />
"W-TALC: Weakly-supervised Temporal Activity Localization and Classification" ECCV 2018.
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Sujoy_Paul_W-TALC_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/sujoyp/wtalc-pytorch)]
* **LPAT**Xudong, Lin Zheng, Shou Shih-Fu Chang.<br />
"LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization" arXiv 2019.
[[paper](https://arxiv.org/pdf/1910.11285.pdf)]
* **RefineLoc:**Humam Alwassel1, Alejandro Pardo1, Fabian Caba Heilbron, Ali Thabet1 Bernard Ghanem1.<br />
"RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1904.00227.pdf)]
[[paper](https://basurafernando.github.io/papers/wacv2020_wsgn.pdf)]
### Expecting for paper
* **lvr:** Xingyu Liu, Joon-Young Lee, Hailin Jin.<br />
"Learning Video Representations from Correspondence Proposals." CVPR 2019 **oral**.
## Dataset
* **THUMOS'14:** Yu-Gang Jiang, Jingen Liu, Amir R. Zamir, George Toderici.<br />
"THUMOS Challenge 2014"
[[project](https://www.crcv.ucf.edu/THUMOS14/home.html)]
* **Activity:** Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia.<br />
"A Large-Scale Video Benchmark for Human Activity Understanding"
[[project](http://activity-net.org/index.html)]
* **THUMOS'15:** Alexander Gorban, Haroon Idrees, Yu-Gang Jiang, Amir R. Zamir.<br />
"THUMOS Challenge 2015"
[[project](http://www.thumos.info/)]
* **COIN:** Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou.<br />
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis." CVPR 2019.
[[paper](https://arxiv.org/pdf/1903.02874.pdf)]
[[project](https://coin-dataset.github.io/)]
================================================
FILE: Spatiotemporal.md
================================================
# Spatio-Temporal Action Detection
Papers and Results for Spatio-Temporal Action Detection
## Spatio-Temporal Action Localization
**Performance on AVA v2.1 dataset.**
- Metric: mAP with threshold 0.5.
| Detector | val | test |
| :---------: | :-----: | :-------: |
| LFB | 27.70 | 27.20 |
| VATN* | 25.00 | 24.93 |
| SMAD* | 22.20 | - |
| STEP | 18.60 | - |
| ACRN | 17.40 | - |
| AVA | 15.80 | - |
| SlowFast | 27.30 | 27.10 |
## Spatio-temporal Action Detection
* **PSCS:** Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu.<br />
"Improving Action Localization by Progressive Cross-stream Cooperation." CVPR (2019).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Su_Improving_Action_Localization_by_Progressive_Cross-Stream_Cooperation_CVPR_2019_paper.pdf)]
* **SlowFast:** Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He.<br />
"SlowFast Networks for Video Recognition." arXiv (1812).
[[paper](https://arxiv.org/pdf/1812.05038.pdf)]
[[unofficial_code](https://github.com/r1ch88/SlowFastNetworks)]
[[unofficial_code](https://github.com/Guocode/SlowFast-Networks)]
* **DwF:** Jiaojiao Zhao, Cees G.M. Snoek.<br />
"Dance with Flow: Two-in-One Stream Action Detection." CVPR (2019).
[[paper](https://arxiv.org/pdf/1904.00696.pdf)]
* **STEP:** Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz.<br />
"STEP: Spatio-Temporal Progressive Learning for Video Action Detection." CVPR (2019 **oral**).
[[paper](https://arxiv.org/pdf/1904.09288.pdf)]
* **LFB:** Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick.<br />
"Long-Term Feature Banks for Detailed Video Understanding." CVPR (2019).
[[paper](https://arxiv.org/pdf/1812.05038.pdf)]
[[project](https://github.com/facebookresearch/video-long-term-feature-banks)]
* **VATN*:** Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman. <br />
"Video Action Transformer Network." CVPR (2019 **oral**).
[[paper](https://arxiv.org/pdf/1812.02707.pdf)]
[[project](https://rohitgirdhar.github.io/ActionTransformer/)]
* **LAEO:** Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman. <br />
"LAEO-Net: revisiting people Looking At Each Other in videos." CVPR (2019).
[[paper](http://www.robots.ox.ac.uk/~vgg/research/laeonet/cvpr2019LAEO.pdf)]
[[code](https://github.com/AVAuco/laeonet/)]
[[project](http://www.robots.ox.ac.uk/~vgg/research/laeonet/)]
* **SMAD*:** Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid. <br />
"A Structured Model For Action Detection." CVPR (2019).
[[paper](https://arxiv.org/pdf/1812.03544.pdf)]
* **TACNet:** Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun. <br />
"TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection." CVPR (2019).
[[paper](http://www.skicyyu.org/Paper/CVPR2019_TACNET.pdf)]
* **AVA:** Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. <br />
"AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions." CVPR (2018).
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gu_AVA_A_Video_CVPR_2018_paper.pdf)]
[[project](https://research.google.com/ava/)]
* **ACRN:** Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid. <br />
"Actor-Centric Relation Network." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.pdf)]
* **T-CNN:** Rui Hou, Chen Chen, Mubarak Shah. <br />
"Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hou_Tube_Convolutional_Neural_ICCV_2017_paper.pdf)]
[[code](https://www.crcv.ucf.edu/projects/TCNN/#Code)]
[[project](https://www.crcv.ucf.edu/projects/TCNN/)]
## Dataset
* **YouTube-8M-Segments:** Ke Chen, Julia Elliott, Nisarg Kothari, Hanhan Li, et.al.<br />
"YouTube-8M Segments Dataset"
[[project](https://research.google.com/youtube8m/)]
## Distinguished Researchers & Teams
[WILLOW](https://www.di.ens.fr/willow/publications/YearOnly/publications.html)
[Ivan Laptev](https://www.di.ens.fr/~laptev/#Publications)
[Christoph Feichtenhofer](https://feichtenhofer.github.io/)
================================================
FILE: Supervised.md
================================================
# Action Detection Benchmarks
Papers and Results of Temporal Action Localization
## Temporal Action Localization
**Performance on THUMOS'14 dataset.**
- The detectors are ordered by the mAP with threshold 0.5.
- `Deep Learning`: deep learning related method.
| Detector | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 |Deep Learning|Comment |
| :---------: | :-----: |:-------:|:-------:|:-------:|:-------:|:-------:| :------: | :--------: | :--: |
| TGM | - | - | - | - | 53.5 | - | - | Y | - |
| C-TCN | 72.2 | 71.4 | 68.0 | 62.3 | 52.1 | - | - | Y | - |
| RTD-Net | - | - | 68.3 | 62.3 | 51.9 | 38.8 | 23.7 | Y | based on P-GCN |
| PGC-TAL | 71.2 | 68.9 | 65.1 | 59.5 | 51.2 | - | - | Y | based on P-GCN |
| DPP | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - |
| PGCN | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - |
| BMN | - | - | 56.0 | 47.4 | 38.8 | 29.7 | 20.5 | Y | - |
| D-SSAD | - | - | 60.2 | 54.1 | 44.2 | 32.3 | 19.1 | Y | - |
| TAL-Net | 59.8 | 57.1 | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 | Y | - |
| FRTS | - | - | 53.5 | 50.2 | 44.2 | 33.9 | 22.7 | Y | - |
| GTAN | 69.1 | 63.7 | 57.8 | 47.2 | 38.8 | - | - | Y | - |
| AGCN | 59.3 | 59.6 | 57.1 | 51.6 | 38.6 | 28.9 | 17.0 | Y | - |
| MGG | - | - | 53.9 | 46.8 | 37.4 | 29.5 | 21.3 | Y | - |
| BSN | - | - | 53.5 | 45.0 | 36.9 | 28.4 | 20.0 | Y | - |
| FSN | - | - | 51.8 | 41.5 | 32.1 | 22.9 | 14.7 | Y | - |
| CBR | 60.1 | 56.7 | 50.1 | 41.3 | 31.0 | 19.1 | 9.9 | Y | - |
| ASSA* | - | - | 51.8 | 42.4 | 30.8 | 20.2 | 11.1 | Y | - |
| CTAP | - | - | - | - | 29.9 | - | - | Y | - |
| SS-TAD | - | - | 45.7 | - | 29.2 | - | 9.6 | Y |701 fps(GTX Titan X (Maxwell))|
| SSN | 60.3 | 56.2 | 50.6 | 40.8 | 29.1 | - | - | Y | - |
| R-C3D | 54.5 | 51.5 | 44.8 | 35.6 | 28.9 | - | - | Y |569 fps|
| TAG | 64.1 | 57.7 | 48.7 | 39.8 | 28.2 | - | - | Y | - |
| TPC | - | - | 44.1 | 37.1 | 28.2 | 20.6 | 12.7 | Y |250 fps (GTX Titan X)|
| TURN | 54.0 | 50.9 | 44.1 | 34.9 | 25.6 | - | - | Y |129.4 fps (GTX Titan X)|
| SSAD | 50.1 | 47.8 | 43.0 | 35.0 | 24.6 | - | - | Y | - |
| CDC | - | - | 40.1 | 29.4 | 23.3 | 13.1 | 7.9 | Y |500 fps (GTX Titan X)|
| SST | - | - | 37.8 | - | 23.0 | - | - | Y |308 fps Titan-X|
| S-CNN | 47.7 | 43.5 | 36.3 | 28.7 | 19.0 | 10.3 | 5.3 | Y |60 fps(GeForce GTX 980)|
| PSDF* | 51.4 | 42.6 | 33.6 | 26.1 | 18.8 | - | - | Y | - |
| SMS* | 51.0 | 45.2 | 36.5 | 27.8 | 17.8 | - | - | N | - |
| ADFG* | 48.9 | 44.0 | 36.0 | 26.4 | 17.1 | - | - | Y | - |
| TCN | - | - | 33.3 | 25.6 | 15.9 | 9.0 | - | Y | - |
| DAP | - | - | - | - | 13.9 | - | - | Y |134.1 fps Titan-X|
| G-TAD | - | - | 54.5 | 47.6 | 40.2 | 30.8 | 23.4 | Y | - |
| PTAL-ETP | - | - | 48.2 | 42.4 | 34.2 | 23.4 | 13.9 | Y | - |
| CTR_AL | - | - | 53.9 | 50.7 | 45.4 | 38.0 | 28.5 | Y | - |
| LS-TD | arXiv |-|63.5 |61.0 |56.7 |50.6 |42.6 |32.5 |21.4 |- |- |- | - |
| SRG | arXiv |-|- |- |54.5 |46.9 |39.1 |31.4 |22.2 |- |- |- | - |
| IDU | arXiv |-|- |- |- |- |- |- |- |- |- |Under a different metric | - |
**Performance on ActivityNet v1.3 dataset.**
- The left half is score on ActivityNet v1.3 validation dataset. The right half is score on ActivityNet v1.3 testing dataset.
- `Deep Learning`: deep learning related method.
| Detector | 0.50 | 0.75 | 0.95 | @Avg | 0.50 | 0.75 | 0.95 | @Avg |Deep Learning|Speed |
| :---------: | :-----: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :------: | :--------: | :--: |
| BMN | 50.07 | 34.78 | 8.29 | 33.85 | - | - | - | 36.42 | Y | - |
| GTAN | 52.61 | 34.14 | 8.91 | 34.31 | - | - | - | 35.54 | Y | - |
| PGCN | 48.26 | 33.16 | 3.27 | 31.11 | - | - | - | - | Y | - |
| RTD-Net | 46.43 | 30.45 | 8.64 | 30.46 | - | - | - | - | Y | - |
| BSN_ori | 46.45 | 29.96 | 8.02 | 30.03 | - | - | - | 32.84 | Y | - |
| BSN_new | 52.50 | 33.53 | 8.85 | 33.72 | - | - | - | 34.42 | Y | - |
| C-TCN | 47.6 | 31.9 | 6.2 | 31.1 | - | - | - | - | Y | - |
| PGC-TAL | 44.31 | 29.85 | 5.47 | 28.85 | - | - | - | - | Y | based on P-GCN |
| SSN | - | - | - | - | 43.26 | 28.70 | 5.63 | 28.28 | Y | - |
| TAG | 39.12 | 23.48 | 5.49 | 23.98 | 40.69 | 26.02 | 6.67 | 26.05 | Y | - |
| CDC | 45.30 | 26.00 | 0.20 | 23.80 | - | - | - | - | Y |500 fps (GTX Titan X)|
| TCN | 36.44 | 21.15 | 3.90 | - | 37.49 | 23.47 | 4.47 | 23.58 | Y | - |
| TAL-Net | 38.23 | 18.30 | 1.30 | 20.22 | - | - | - | - | Y | - |
| AGCN | 30.4 | - | - | - | - | - | - | - | Y | - |
| SCC | - | - | - | - | 39.90 | 18.70 | 4.70 | 19.30 | Y | 35.9 fps |
| R-C3D | 26.80 | - | - | 12.70 | - | - | - | 13.10 | Y |569 fps (GTX Titan X (Maxwell)) <br>1030 fps (Titan X Pascal)|
| G-TAD | 50.36 | 34.6 | 9.02 | 34.09 | - | - | - | - | Y | - |
| CTR_AL | 43.47 | 33.91 | 9.21 | 30.12 | - | - | - | - | Y | - |
| SRG | 46.53 | - | - | - | - | 29.98 | - | - | - | 4.83 | 29.72 | - | - |
**Performance on ActivityNet v1.2 dataset.**
| LS-TD | 50.4 | - | - | - | - | 34.9 | - | - | - | 8.0 | 33.6 | - | - |
### Temporal Action Localization
* **TGM:** AJ Piergiovanni, Michael S. Ryoo.<br />
"Temporal Gaussian Mixture Layer for Videos." ICML (2019).
[[paper](https://arxiv.org/pdf/1803.06316.pdf)]
[[code](https://github.com/piergiaj/tgm-icml19)]
* **RTD-Net:** Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu.<br />
"Relaxed Transformer Decoders for Direct Action Prop." arXiv 2102.01894.
[[paper](https://arxiv.org/pdf/2102.01894.pdf)]
* **DPP:** Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu.<br />
"Deep Point-wise Prediction for Action Temporal Proposal." ArXiv 1909.07725.
[[paper](https://arxiv.org/pdf/1909.07725.pdf)]
[[code](https://github.com/liluxuan1997/DPP)]
* **C-TCN:** Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li.<br />
"Deep Concept-wise Temporal Convolutional Networks for Action Localization." ArXiv 1908.09442.
[[paper](https://arxiv.org/pdf/1908.09442.pdf)]
[[code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleVideo/models/ctcn/README.md)]
* **PGC-TAL:** Rui Su, Dong Xu, Lu Sheng, Wangli Ouyang.<br />
"PCG-TAL: Progressive Cross-granularity Cooperation for Temporal Action Localization." TIP 2020.
[[paper](https://ieeexplore.ieee.org/document/9298475)]]
* **BMN:** Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.<br />
"BMN : Boundary-Matching Network for Temporal Action Proposal Generation." ICCV (2019).
[[paper](https://arxiv.org/pdf/1907.09702.pdf)]
* **PGCN:** Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan.<br />
"Graph Convolutional Networks for Temporal Action Localization." ICCV (2019).
[[paper](https://arxiv.org/pdf/1909.03252.pdf)]
[[code](https://github.com/Alvin-Zeng/PGCN)]
* **GTAN:** Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei.<br />
"Gaussian Temporal Awareness Networks for Action Localization." CVPR (2019 **oral**).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localization_CVPR_2019_paper.pdf)]
* **AGCN:** Jun Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song.<br />
"Graph Attention based Proposal 3D ConvNets for Action Detection." AAAI (2020).
[[paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf)]
* **MGG:** Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang.<br />
"Multi-granularity Generator for Temporal Action Proposal." CVPR (2019).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Multi-Granularity_Generator_for_Temporal_Action_Proposal_CVPR_2019_paper.pdf)]
* **D-SSAD:** Yupan Huang, Qi Dai, Yutong Lu.<br />
"Decoupling Localization and Classification in Single Shot Temporal Action Detection." ICME (2019).
[[paper](https://arxiv.org/pdf/1904.07442.pdf)]
[[code](https://github.com/HYPJUDY/Decouple-SSAD)]
* **TAL-Net:** Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar.<br />
"Rethinking the Faster R-CNN Architecture for Temporal Action Localization." CVPR (2018).
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Rethinking_the_Faster_CVPR_2018_paper.pdf)]
* **FRTS:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras.<br />
"Exploring Feature Representation and Training strategies in Temporal Action Localization." ICIP (2019).
[[paper](https://arxiv.org/pdf/1905.10608.pdf)]
* **BSN:** Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.<br />
"BSN: Boundary Sensitive Network for Temporal Action Proposal Generation." ECCV (2018).
[[paper](https://arxiv.org/pdf/1806.02964.pdf)]
[[code](https://github.com/wzmsltw/BSN-boundary-sensitive-network)]
* **FSN:** Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou.<br />
"Exploring frame segmentation networks for temporal action localization." ECCV (2018).
[[paper](https://arxiv.org/pdf/1902.05488.pdf)]
* **CBR:** Jiyang Gao, Zhenheng Yang, Ram Nevatia.<br />
"Cascaded Boundary Regression for Temporal Action Detection." BMVC (2017).
[[paper](https://arxiv.org/pdf/1705.01180.pdf)]
[[code](https://github.com/jiyanggao/CBR)]
* **ASSA*:** Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem.<br />
"Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Humam_Alwassel_Action_Search_Spotting_ECCV_2018_paper.pdf)]
* **CTAP:** Jiyang Gao, Kan Chen, Ram Nevatia.<br />
"CTAP: Complementary Temporal Action Proposal Generation." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiyang_Gao_CTAP_Complementary_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/jiyanggao/CTAP)]
* **SS-TAD:** Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles.<br />
"End-to-end, single-stream temporal action detection in untrimmed videos." BMVC (2017).
[[paper](http://vision.stanford.edu/pdf/buch2017bmvc.pdf)]
[[code](https://github.com/shyamal-b/ss-tad)]
* **SSN:** Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin.<br />
"Temporal Action Detection with Structured Segment Networks." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhao_Temporal_Action_Detection_ICCV_2017_paper.pdf)]
[[code](https://github.com/yjxiong/action-detection)]
* **R-C3D:** Huijuan Xu, Abir Das, Kate Saenko.<br />
"R-C3D: Region Convolutional 3D Network for Temporal Activity Detection." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Xu_R-C3D_Region_Convolutional_ICCV_2017_paper.pdf)]
[[code](https://github.com/VisionLearningGroup/R-C3D)]
* **TAG:** Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang.<br />
"A Pursuit of Temporal Accuracy in General Activity Detection." arXiv (1703).
[[paper](https://arxiv.org/pdf/1703.02716.pdf)]
* **TPC:** Ke Yang, Peng Qiao, Dongsheng Li, Shaohe Lv, Yong Dou.<br />
"Exploring Temporal Preservation Networks for Precise Temporal Action Localization." AAAI (2018).
[[paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16164/16347)]
* **TURN:** Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia.<br />
"TURN TAP : Temporal Unit Regression Network for Temporal Action Proposals." ICCV (2017).
[[paper](https://arxiv.org/pdf/1703.06189.pdf)]
[[code](https://github.com/jiyanggao/TURN-TAP)]
* **SSAD:** Tianwei Lin, Xu Zhao, Zheng Shou.<br />
"Single shot temporal action detection." ACM MM (2017).
[[paper](https://arxiv.org/pdf/1710.06236.pdf)]
* **CDC:** Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.<br />
"CDC Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shou_CDC_Convolutional-De-Convolutional_Networks_CVPR_2017_paper.pdf)]
[[code](https://github.com/ColumbiaDVMM/CDC)]
[[project](http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/)]
* **SST:** Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles.<br />
"Single-stream temporal action proposals." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Buch_SST_Single-Stream_Temporal_CVPR_2017_paper.pdf)]
[[code](https://github.com/shyamal-b/sst)]
* **SCC:** Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem.<br />
"SCC: Semantic context cascade for efficient action detection." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf)]
[[project](https://ivul.kaust.edu.sa/Pages/pub-scc-efficient-action-detection.aspx)]
* **S-CNN:** Zheng Shou, Dongang Wang, Shih-Fu Chang.<br />
"Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shou_Temporal_Action_Localization_CVPR_2016_paper.pdf)]
[[code](https://github.com/zhengshou/scnn)]
* **PSDF*:** Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim.<br />
"Temporal Action Localization with Pyramid of Score Distribution Features." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf)]
* **SMS*:** Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng.<br />
"Temporal Action Localization by Structured Maximal Sums." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yuan_Temporal_Action_Localization_CVPR_2017_paper.pdf)]
* **ADFG*:** Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei.<br />
"End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yeung_End-To-End_Learning_of_CVPR_2016_paper.pdf)]
[[code](https://github.com/syyeung/frameglimpses)]
* **TCN:** Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen.<br />
"Temporal Context Network for Activity Localization in Videos." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Temporal_Context_Network_ICCV_2017_paper.pdf)]
[[code](https://github.com/vdavid70619/TCN)]
* **DAP:** Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem.<br />
"DAPs: Deep Action Proposals for Action Understanding." ECCV (2016).
[[paper](https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf)]
[[code](https://github.com/escorciav/daps)]
* **G-TAD**Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem Visual Computing Center.<br />
"G-TAD: Sub-Graph Localization for Temporal Action Detection" ArXiv(2019)
[[paper](https://arxiv.org/pdf/1911.11462.pdf)]
* **PTAL-ETP**Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He.<br />
"Precise Temporal Action Localization by Evolving Temporal Proposals"ArXiv(2019)
[[paper](https://arxiv.org/pdf/1804.04803.pdf)]
* **CTR_AL**Peisen Zhao1, Lingxi Xie2, Chen Ju1, Ya Zhang1, Qi Tian.<br />
"Constraining Temporal Relationship for Action Localization"ArXiv(2019)
[[paper](https://arxiv.org/pdf/2002.07358.pdf)]
* **SRG**Hyunjun Eun, Sumin Lee, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim.<br />
"SRG: Snippet Relatedness-based Temporal Action Proposal Generator"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1911.11306.pdf)]
* **LS-TD**Yuan Zhou, Hongru Li, Sun-Yuan Kung, Life Fellow.<br />
"Temporal Action Localization using Long Short-Term Dependency"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1911.01060.pdf)]
* **IDU** Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick.<br />
"Learning to Discriminate Information for Online Action Detection" CVPR(2020)
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Eun_Learning_to_Discriminate_Information_for_Online_Action_Detection_CVPR_2020_paper.pdf)]
gitextract_ywnz0pl2/ ├── Other_Settings.md ├── README.md ├── Spatiotemporal.md └── Supervised.md
Condensed preview — 4 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (48K chars).
[
{
"path": "Other_Settings.md",
"chars": 3998,
"preview": "# Action Localization Benchmarks\n\n### Learning from action count supervision\n* **OCL:**Julien Schroeter, Kirill Sidorov,"
},
{
"path": "README.md",
"chars": 17790,
"preview": "# Action Localization Benchmarks\nPapers and Results of Temporal Action Localization\n\n**Weakly Supervised Performance on "
},
{
"path": "Spatiotemporal.md",
"chars": 4622,
"preview": "# Spatio-Temporal Action Detection\r\nPapers and Results for Spatio-Temporal Action Detection\r\n\r\n## Spatio-Temporal Action"
},
{
"path": "Supervised.md",
"chars": 19528,
"preview": "# Action Detection Benchmarks\r\nPapers and Results of Temporal Action Localization\r\n\r\n## Temporal Action Localization\r\n\r\n"
}
]
About this extraction
This page contains the full source code of the VividLe/awesome-weakly-supervised-action-localization GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 4 files (44.9 KB), approximately 17.5k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.