Repository: VividLe/awesome-weakly-supervised-action-localization
Branch: master
Commit: e86943a85504
Files: 4
Total size: 44.9 KB
Directory structure:
gitextract_ywnz0pl2/
├── Other_Settings.md
├── README.md
├── Spatiotemporal.md
└── Supervised.md
================================================
FILE CONTENTS
================================================
================================================
FILE: Other_Settings.md
================================================
# Action Localization Benchmarks
### Learning from action count supervision
* **OCL:**Julien Schroeter, Kirill Sidorov, David Marshall.
"Weakly-Supervised Temporal Localization via Occurrence Count Learning" ICML 2019.
[[paper](https://arxiv.org/pdf/1905.07293.pdf)]
[[code](https://github.com/SchroeterJulien/ICML-2019-Weakly-Supervised-Temporal-Localization-via-Occurrence-Count-Learning)]
### Action Segment, Transformer
* Action Modifiers :Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen.
"Action Modifiers: Learning from Adverbs in Instructional Videos." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Doughty_Action_Modifiers_Learning_From_Adverbs_in_Instructional_Videos_CVPR_2020_paper.pdf)]
### Action Segment, Self-Supervised Learning
* Action Segmentation : Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira.
"Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Action_Segmentation_With_Joint_Self-Supervised_Temporal_Domain_Adaptation_CVPR_2020_paper.pdf)]
### Action Segment, Transformer
* SCT : Mohsen Fayyaz,Juergen Gall.
"SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fayyaz_SCT_Set_Constrained_Temporal_Transformer_for_Set_Supervised_Action_Segmentation_CVPR_2020_paper.pdf)]
[[code](https://github.com/MohsenFayyaz89/SCT/)]
### Unintentional Action
A new task. Interesting.
* Oops : Dave Epstein Boyuan Chen Carl Vondrick.
"Oops! Predicting Unintentional Action in Video." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Epstein_Oops_Predicting_Unintentional_Action_in_Video_CVPR_2020_paper.pdf)]
[[project](https://oops.cs.columbia.edu/)]
### ActionBytes
Learning from trimmed videos.
* ActoinBytes: Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR (2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]
### METAL
ActivityNet v1.2, mAP@0.5: 41.9 [1-shot], 45.0 [5-shot]
THUMOS14, mAP@0.5: 14.3 [1-shot], 16.2 [5-shot]
* METAL : Da Zhang, Xiyang Dai, and Yuan-Fang Wang.
"METAL : Minimum Effort Temporal Activity Localization in Untrimmed Videos." CVPR (2020).
[[paper](https://sites.cs.ucsb.edu/~yfwang/papers/cvpr2020.pdf)]
### Hierarchical Action Search
A new task. A learned space where videos are positioned in entailment cones formed by different subtrees.
* Uncertain :Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek.
"Searching for Actions on the Hyperbole." CVPR (2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Long_Searching_for_Actions_on_the_Hyperbole_CVPR_2020_paper.pdf)]
### Fine-grained Action Recognition and Localization
A new task. Hierarchical annotation for recognition and localization.
* FineGym :Dian Shao Yue Zhao Bo Dai Dahua Lin.
"FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." (CVPR 2020, oral, 3 strong accepts).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)]
[[project](https://sdolivia.github.io/FineGym/)]
### Mining undefined sub-actions
A new task. Find the common sub-actions from multiple videos.
* TAPOs :Dian Shao Yue Zhao Bo Dai Dahua Lin.
"Intra- and Inter-Action Understanding via Temporal Action Parsing." (CVPR 2020).
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_Intra-_and_Inter-Action_Understanding_via_Temporal_Action_Parsing_CVPR_2020_paper.pdf)]
[[project](https://sdolivia.github.io/TAPOS/)]
================================================
FILE: README.md
================================================
# Action Localization Benchmarks
Papers and Results of Temporal Action Localization
**Weakly Supervised Performance on THUMOS'14 dataset.**
- The detectors are sorted by the mAP with threshold 0.5.
- "c" indicates whether release code, yes (Y) or no (N).
- "e" indicates the evaluation code, THUMOS (T), ActivityNet (A) or implemented by themselves.
| Detector | Pub |c|e| 0.1 | 0.2 | 0.3 |0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |avg | info |
| :---------: |:------:|-|-|:---:|:----|:----|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net | arXiv-20-12-11 |N|A|65.6 |60.0 |52.1 |43.3 |35.9 | - | - | - | - | - | The same author with 3C-Net |
| Lee et al | AAAI21 |Y|A|67.5 |61.2 |52.3 |43.4 |33.7 | 22.9 | 12.1 | - | - | - | The same author with BaS-Net |
| HAM-Net | AAAI21 |N|A|65.9 |59.6 |52.2 |43.1 |32.6 | 21.9 | 12.5 | - | - | - | - |
| ACSNet | AAAI21 |N|A| - | - |51.4 |42.7 |32.4 | 22.0 | 11.7 | - | - | - | - |
| EM-MIL | ECCV20 |N|A|59.1 |52.7 |45.5 |36.8 |30.5 | 22.7 | 16.4 | - | - | - | Use existing classifiation results |
| A2CL-PT | ECCV20 |Y|A| 61.2 | 56.1 |48.1 | 39.0 |30.1 |19.2 |10.6 | 4.8 | 1.0 | 30.0 | Report unsupervised performance as well |
| ACL | CVPR20 |N|A| - | - |46.9 |38.9 |30.1 |19.8 |10.4 | - | - | - | Report unsupervised performance as well |
| Liu et al. | AAAI21 |N|A| 61.7 | 58.0 |50.8 | 41.7 |29.6 |20.1 |10.7 | 4.3 | 0.5 | - | - |
| WSTAL | WACV20 |-| |62.3 |- |46.8 |- |29.6 |- |9.7 |- |- |- | - |
| ActionBytes | CVPR20 |N|A| - | - |43.0 |37.5 |29.0 | - |9.5 | - | - | - | - |
| DGAM | CVPR20 |Y|A| 60.0|54.2 |46.8 |38.2 |28.8 |19.8 |11.4 |3.6 |0.4 | - | - |
| TSCN | ECCV20 |N|A| 63.4|57.6 |47.8 |37.7 |28.7 |19.4 |10.2 |3.9 |0.7 | - | - |
| BaSNet-I3D | AAAI20 |Y|A| 58.2|52.3 |44.6 |36.0 |27.0 |18.6 |10.4 |3.9 |0.5 | - | - |
| BaSNet-UNT | AAAI20 |Y|A| 56.2|50.3 |42.8 |34.7 |25.1 |17.1 |9.3 |3.7 |0.5 | - | - |
| WSBM | ICCV19 |N|A| 60.4|56.0 |46.6 |37.5 |26.8 |17.6 |9.0 |3.3 |0.4 | - | - |
| 3C-Net | ICCV19 |Y|I| 59.1|53.5 |44.2 |34.1 |26.6 | - |8.1 | - | - | - | - |
| ASSG | ACM 19 |N| | 65.6|59.4 |50.4 |38.7 |25.4 |15.0 |6.6 | - | - | - | - |
| TSM | ICCV19 |N|T| - | - |39.5 |31.9 |24.5 |13.8 |7.1 | - | - |23.4 | - |
| CleanNet | ICCV19 |N|T| - | - |37.0 |30.9 |23.9 |13.9 |7.1 | - | - | - | - |
| CMCS-I3D | CVPR19 |Y|T| 57.4|50.8 |41.2 |32.1 |23.1 |15.0 |7.0 | - | - | - |report avg-mAP|
| CMCS-UNT | CVPR19 |Y|T| 53.5|46.8 |37.5 |29.1 |19.9 |12.3 |6.0 | - | - | - | - |
| STARNet | AAAI19 |N|A|68.8 |60.0 |48.7 |34.7 |23.0 |- |- | - | - |- | - |
| W-TALC | ECCV18 |Y|I|55.2 |49.6 |40.1 |31.1 |22.8 |- |- | - | - |7.6 | - |
| AutoLoc | ECCV18 |Y|T|- |- |35.8 |29.0 |21.2 |13.4 |5.8 | - | - |- | - |
| MAAN | ICLR19 |Y|A|59.8 |50.8 |41.1 |30.6 |20.3 |12.0 |6.9 |2.6 |0.2 |24.9 | - |
| LTSR | AAAI19 |N|T|55.9 |46.9 |38.3 |28.1 |18.6 |11.0 |5.59 |2.19 |0.29 |- | - |
| WSGN | WACV20 |-|T|51.1 |44.4 |34.9 |26.3 |18.1 |11.6 |6.5 |- |- |- | - |
| STPN | CVPR18 |I|A|52.0 |44.7 |35.5 |25.8 |16.9 |9.9 |4.3 |1.2 |0.1 |- | - |
| CPMN | ACCV18 |N|T|47.1 |41.6 |32.8 |24.7 |16.1 |10.1 |5.5 |- |- |- | - |
| S-O-C | ACM18 |N|T|45.8 |39.0 |31.1 |22.5 |15.9 |- |- |- |- |- | - |
|UntrimmedNets| CVPR17 |Y|T|44.4 |37.7 |28.2 |21.1 |13.7 |- |- |- |- |- | - |
| H&S | ICCV17 |Y|T|36.44|27.84|19.49|12.66|6.84 |- |- |- |- |- | - |
|LPAT-I3D+TEM | arXiv |-| |- |- |46.9 |37.4 |28.0 |16.6 |9.2 |- |- |27.6 | - |
| LPAT-I3D | arXiv |-| |- |- |46.7 |37.5 |27.9 |17.6 |9.2 |- |- |27.6 | - |
| LPAT-U | arXiv |-| |- |- |39.9 |31.5 |22.6 |14.2 |7.9 |- |- |27.6 | - |
|RefineLoc-I3D| arXiv |-|T|- |- |40.8 |- |23.1 |- |5.3 |- |- |- | - |
|RefineLoc-TSN| arXiv |-|T|- |- |36.1 |- |22.6 |- |5.8 |- |- |- | - |
**Weakly Supervised Performance on ActivityNet v1.2 dataset.**
| Detector | Pub |c| 0.5 | 0.55|0.60 | 0.65|0.70 |0.75 | 0.80|0.85 |0.90 |0.95 | avg |test | info |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net | arXiv-20-12-11 |N|42.3 | - | - | - | - |25.5 | - | - | - | 5.8 | 26.0| - | - |
| ACSNet | AAAI21 |N|40.1 | - | - | - | - |26.1 | - | - | - | 6.8 | 26.0| - | - |
| Lee et al | AAAI21 |N|41.2 | - | - | - | - |25.6 | - | - | - | 6.0 | 25.9| - | - |
| Liu at al. | AAAI21 |N|39.2 |- |- |- | - |25.6 |- |- |- | 6.8 | 25.5| - | - |
| HAM-Net | AAAI21 |N|41.0 | - | - | - | - |24.8 | - | - | - | 5.3 | 25.1| - | - |
| BaSNet | AAAI20 |Y|38.5 | - | - | - | - |24.2 | - | - | - | 5.6 | 24.3| - | - |
| TSCN | ECCV20 |N|37.6 |- |- |- | - |23.7 |- |- |- | 5.7 | 23.6| - | - |
| CMCS | CVPR19 |Y|36.8 |- |- |- | - |22.0 |- |- |- | 5.6 | 22.4| - | - |
| 3C-Net | ICCV19 |Y|35.4 | - | - | - | - |22.9 | - | - | - | 8.5 | 21.1| - | - |
| TSM | ICCV19 |N|28.3 |26.0 |23.6 |21.2 | 18.9|17.0 |14.0 |11.1 |7.5 | 3.5 | - | - | - |
| CleanNet | ICCV19 |N|37.1 |33.4 |29.9 |26.7 | 23.4|20.3 |17.2 |13.9 |9.2 | 5.0 | 21.6| - | - |
| EM-MIL | ECCV20 |N|37.4 |- |- |- | 23.1|- |- |- |2.0 | - | 20.3| - | - |
| W-TALC | ECCV18 |Y|37.0 |- |- |- | 14.6|- |- |- |- | - | 18.0| - | - |
| AutoLoc | ECCV18 |Y|27.3 |24.9 |22.5 |19.9 | 17.5|15.1 |13.0 |10.0 |6.8 | 3.3 | 16.0| - | - |
|RefineLoc-I3D| arXiv |-|38.7 |- |- |- | - |22.6 |- |- |- | 5.5 | 23.2| - | - |
|RefineLoc-TSN| arXiv |-|38.8 |- |- |- | - |22.2 |- |- |- | 5.3 | 23.2| - | - |
| LPAT | arXiv |-|37.6 |34.6 |31.6 |28.7 | 25.6|22.6 |19.6 |15.3 |10.9 | 4.9 | 23.1| - | - |
| WSTAL | arXiv |-|35.2 |- |- |- | 16.3|- |- |- |- | - | - | - | - |
**Weakly Supervised Performance on ActivityNet v1.3 dataset.**
| Detector | Pub |c| 0.5 |0.75 |0.95 |avg |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|
| ACSNet | AAAI21 |N|36.3 |24.2 |5.8 |23.9 |
| Lee et al.| AAAI21 |Y|37.0 |23.9 |5.7 |23.7 |
| Liu et al.| AAAI21 |N|35.1 |23.7 |5.6 |23.2 |
| A2CL-PT | ECCV20 |Y|36.8 |22.0 |5.2 |22.5 |
| BaSNet-I3D | AAAI20 |Y|34.5 |22.5 |4.9 |22.2 |
| TSCN | ECCV20 |N|35.3 |21.4 |5.3 |21.7 |
| WSBM | ICCV19 |N|36.4 |19.2 | 2.9 |- |
| ASSG | ACM 19 |N|32.3 |20.1 | 4.0 |- |
| TSM | ICCV19 |N|30.0 |19.0 | 4.5 |- |
| CMCS | CVPR19 |Y|34.0 |20.9 | 5.7 |21.2 |
| STARNet | AAAI19 |N|31.1 |18.8 | 4.7 |- |
| MAAN | ICLR19 |Y|33.7 |21.9 | 5.5 |- |
| LTSR | AAAI19 |N|33.1 |18.7 |3.32 |21.78|
| STPN | CVPR18 |I|29.3 |16.9 |2.6 |- |
| CPMN | ACCV18 |N|39.29|24.09|6.71 |24.42|
| S-O-C | ACM18 |N|27.3 |14.7 |2.9 |15.6 |
### Weakly Supervised Temporal Action Localization
* **D2-Net:** Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao.
"D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations" arXiv:2012.06440.
[[paper](https://arxiv.org/pdf/2012.06440.pdf)]
* **Lee et al:** Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun.
"Weakly-supervised Temporal Action Localization by Uncertainty Modeling" AAAI 2021.
[[paper](https://arxiv.org/pdf/2006.07006.pdf)]
[[code](https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling)]
* **HAM-Net:** Ashraful Islam, Chengjiang Long , Richard J. Radke.
"A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization" AAAI 2021.
[[paper](https://arxiv.org/pdf/2101.00545.pdf)]
[[code](https://github.com/asrafulashiq/hamnet)]
* **Liu et al.:** Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.
"ACSNet : Action-Context Separation Network for Weakly Supervised Temporal Action Localization" AAAI 2021.
[[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **Liu at al.:** Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.
"Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context" AAAI 2021.
[[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **EM-MIL:** Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu.
"Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning" ECCV 2020.
[[paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740715.pdf)]
* **A2CL-PT:** Kyle Min, Jason J. Corso.
"Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization." ECCV 2020.
[[paper](https://link.springer.com/chapter/10.1007%2F978-3-030-58568-6_17)]
[[code](https://github.com/MichiganCOG/A2CL-PT)]
* **ACL:** Guoqiang Gong, Xinghan Wang, Yadong Mu, Qi Tian.
"Learning Temporal Co-Attention Models for Unsupervised Video Action Localization." CVPR 2020, oral.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.pdf)]
* **WSTAL**Ashraful Islam, Richard J. Radke.
"Weakly Supervised Temporal Action Localization Using Deep Metric Learning" WACV 2020.
[[paper](https://arxiv.org/pdf/2001.07793.pdf)]
[[code](https://github.com/asrafulashiq/wsad)]
* **ActoinBytes:** Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR 2020.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]
* **DGAM:** Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang.
"Weakly-Supervised Action Localization by Generative Attention Modeling." CVPR 2020.
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf)]
[[code](https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization)]
* **TSCN:** Zhai, Yuanhao and Wang, Le and Tang, Wei and Zhang, Qilin and Yuan, Junsong and Hua, Gang.
"Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization." ECCV 2020.
[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf)]
* **BaSNet:** Pilhyeon Lee, Youngjung Uh, Hyeran Byun.
"Background Suppression Networks for Weakly-supervised Temporal Action Localization." AAAI 2020.
[[paper](https://arxiv.org/pdf/1911.09963.pdf)]
[[code](https://github.com/Pilhyeon/BaSNet-pytorch)]
* **3C-Net:** Sanath Narayan, Hisham Cholakkal, Fahad Shabaz Khan, Ling Shao.
"3C-Net : Category Count and Center Loss for Weakly-Supervised Action Localization." ICCV 2019.
[[paper](https://arxiv.org/pdf/1908.08216.pdf)]
[[code](https://github.com/naraysa/3c-net)]
* **CMCS:** Daochang Liu, Tingting Jiang, Yizhou Wang.
"Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization." CVPR 2019.
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.pdf)]
[[code](https://github.com/Finspire13/CMCS-Temporal-Action-Localization)]
* **ASSG:** Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang, Pu Fei Wu, Futai Zou.
"Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization" ACM MM 2019.
[[paper](https://arxiv.org/pdf/1908.02422.pdf)]
* **AutoLoc:**Zheng Shou, Hang Gao, Lei Zhang, KazuyukiMiyazawa, Shih-Fu Chang.
"AutoLoc Weakly-supervised Temporal Action Localization in Untrimmed Videos"ECCV 2018.
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Zheng_Shou_AutoLoc_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/zhengshou/AutoLoc)]
* **CPMN:**Haisheng Su, Xu Zhao, Tianwei Lin.
"Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization"ACCV 2018.
[[paper](https://arxiv.org/pdf/1810.11794.pdf)]
* **H&S:**Krishna Kumar Singh, Yong Jae Lee.
"Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization"ICCV 2017.
[[paper](https://arxiv.org/pdf/1704.04232.pdf)]
[[code](https://github.com/goddoe/hide-and-seek)]
* **LTSR:**Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan.
"Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision"AAAI 2019.
[[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4958/4831)]
* **WSGN:**Basura Fernando, Cheston Tan Yin Chet.
"Weakly Supervised Gaussian Networks for Action Detection" WACV(2020)
* **MAAN:**Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung.
"MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING"ICLR 2019.
[[paper](https://arxiv.org/pdf/1905.08586.pdf)]
[[code](https://github.com/yyuanad/MAAN)]
* **S-O-C:**Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li.
" Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector"ACM MM 2018.
[[paper](https://arxiv.org/pdf/1807.02929.pdf)]
* **STARNet:**Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu.
"Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection"AAAI 2019.
[[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4939/4812)]
* **TSM:**Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan.
"Temporal Structure Mining for Weakly Supervised Action Detection"ICCV(2019).
[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf)]
* **UntrimmedNets:**Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool.
"UntrimmedNets for Weakly Supervised Action Recognition and Detection"CVPR 2017.
[[paper](https://wanglimin.github.io/papers/WangXLV_CVPR17.pdf)]
[[code](https://github.com/wanglimin/UntrimmedNet)]
* **WSBM:**Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes.
"Weakly-supervised Action Localization with Background Modeling"ICCV 2019.
[[paper](https://arxiv.org/pdf/1908.06552.pdf)]
* **CleanNet:**Ziyi Liu, Le Wang1∗ Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua.
"Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks"ICCV 2019.
[[paper](https://qilin-zhang.github.io/_pages/pdfs/Weakly_Supervised_Temporal_Action_Localization_through_Contrast_based_Evaluation_Networks.pdf)]
* **STPN**Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han.
"Weakly Supervised Action Localization by Sparse Temporal Pooling Network" CVPR 2018.
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Weakly_Supervised_Action_CVPR_2018_paper.pdf)]
[[code](https://github.com/bellos1203/STPN)]
* **W-TALC**Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury.
"W-TALC: Weakly-supervised Temporal Activity Localization and Classification" ECCV 2018.
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Sujoy_Paul_W-TALC_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/sujoyp/wtalc-pytorch)]
* **LPAT**Xudong, Lin Zheng, Shou Shih-Fu Chang.
"LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization" arXiv 2019.
[[paper](https://arxiv.org/pdf/1910.11285.pdf)]
* **RefineLoc:**Humam Alwassel1, Alejandro Pardo1, Fabian Caba Heilbron, Ali Thabet1 Bernard Ghanem1.
"RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1904.00227.pdf)]
[[paper](https://basurafernando.github.io/papers/wacv2020_wsgn.pdf)]
### Expecting for paper
* **lvr:** Xingyu Liu, Joon-Young Lee, Hailin Jin.
"Learning Video Representations from Correspondence Proposals." CVPR 2019 **oral**.
## Dataset
* **THUMOS'14:** Yu-Gang Jiang, Jingen Liu, Amir R. Zamir, George Toderici.
"THUMOS Challenge 2014"
[[project](https://www.crcv.ucf.edu/THUMOS14/home.html)]
* **Activity:** Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia.
"A Large-Scale Video Benchmark for Human Activity Understanding"
[[project](http://activity-net.org/index.html)]
* **THUMOS'15:** Alexander Gorban, Haroon Idrees, Yu-Gang Jiang, Amir R. Zamir.
"THUMOS Challenge 2015"
[[project](http://www.thumos.info/)]
* **COIN:** Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou.
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis." CVPR 2019.
[[paper](https://arxiv.org/pdf/1903.02874.pdf)]
[[project](https://coin-dataset.github.io/)]
================================================
FILE: Spatiotemporal.md
================================================
# Spatio-Temporal Action Detection
Papers and Results for Spatio-Temporal Action Detection
## Spatio-Temporal Action Localization
**Performance on AVA v2.1 dataset.**
- Metric: mAP with threshold 0.5.
| Detector | val | test |
| :---------: | :-----: | :-------: |
| LFB | 27.70 | 27.20 |
| VATN* | 25.00 | 24.93 |
| SMAD* | 22.20 | - |
| STEP | 18.60 | - |
| ACRN | 17.40 | - |
| AVA | 15.80 | - |
| SlowFast | 27.30 | 27.10 |
## Spatio-temporal Action Detection
* **PSCS:** Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu.
"Improving Action Localization by Progressive Cross-stream Cooperation." CVPR (2019).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Su_Improving_Action_Localization_by_Progressive_Cross-Stream_Cooperation_CVPR_2019_paper.pdf)]
* **SlowFast:** Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He.
"SlowFast Networks for Video Recognition." arXiv (1812).
[[paper](https://arxiv.org/pdf/1812.05038.pdf)]
[[unofficial_code](https://github.com/r1ch88/SlowFastNetworks)]
[[unofficial_code](https://github.com/Guocode/SlowFast-Networks)]
* **DwF:** Jiaojiao Zhao, Cees G.M. Snoek.
"Dance with Flow: Two-in-One Stream Action Detection." CVPR (2019).
[[paper](https://arxiv.org/pdf/1904.00696.pdf)]
* **STEP:** Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz.
"STEP: Spatio-Temporal Progressive Learning for Video Action Detection." CVPR (2019 **oral**).
[[paper](https://arxiv.org/pdf/1904.09288.pdf)]
* **LFB:** Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick.
"Long-Term Feature Banks for Detailed Video Understanding." CVPR (2019).
[[paper](https://arxiv.org/pdf/1812.05038.pdf)]
[[project](https://github.com/facebookresearch/video-long-term-feature-banks)]
* **VATN*:** Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman.
"Video Action Transformer Network." CVPR (2019 **oral**).
[[paper](https://arxiv.org/pdf/1812.02707.pdf)]
[[project](https://rohitgirdhar.github.io/ActionTransformer/)]
* **LAEO:** Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman.
"LAEO-Net: revisiting people Looking At Each Other in videos." CVPR (2019).
[[paper](http://www.robots.ox.ac.uk/~vgg/research/laeonet/cvpr2019LAEO.pdf)]
[[code](https://github.com/AVAuco/laeonet/)]
[[project](http://www.robots.ox.ac.uk/~vgg/research/laeonet/)]
* **SMAD*:** Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid.
"A Structured Model For Action Detection." CVPR (2019).
[[paper](https://arxiv.org/pdf/1812.03544.pdf)]
* **TACNet:** Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun.
"TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection." CVPR (2019).
[[paper](http://www.skicyyu.org/Paper/CVPR2019_TACNET.pdf)]
* **AVA:** Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik.
"AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions." CVPR (2018).
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gu_AVA_A_Video_CVPR_2018_paper.pdf)]
[[project](https://research.google.com/ava/)]
* **ACRN:** Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid.
"Actor-Centric Relation Network." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.pdf)]
* **T-CNN:** Rui Hou, Chen Chen, Mubarak Shah.
"Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hou_Tube_Convolutional_Neural_ICCV_2017_paper.pdf)]
[[code](https://www.crcv.ucf.edu/projects/TCNN/#Code)]
[[project](https://www.crcv.ucf.edu/projects/TCNN/)]
## Dataset
* **YouTube-8M-Segments:** Ke Chen, Julia Elliott, Nisarg Kothari, Hanhan Li, et.al.
"YouTube-8M Segments Dataset"
[[project](https://research.google.com/youtube8m/)]
## Distinguished Researchers & Teams
[WILLOW](https://www.di.ens.fr/willow/publications/YearOnly/publications.html)
[Ivan Laptev](https://www.di.ens.fr/~laptev/#Publications)
[Christoph Feichtenhofer](https://feichtenhofer.github.io/)
================================================
FILE: Supervised.md
================================================
# Action Detection Benchmarks
Papers and Results of Temporal Action Localization
## Temporal Action Localization
**Performance on THUMOS'14 dataset.**
- The detectors are ordered by the mAP with threshold 0.5.
- `Deep Learning`: deep learning related method.
| Detector | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 |Deep Learning|Comment |
| :---------: | :-----: |:-------:|:-------:|:-------:|:-------:|:-------:| :------: | :--------: | :--: |
| TGM | - | - | - | - | 53.5 | - | - | Y | - |
| C-TCN | 72.2 | 71.4 | 68.0 | 62.3 | 52.1 | - | - | Y | - |
| RTD-Net | - | - | 68.3 | 62.3 | 51.9 | 38.8 | 23.7 | Y | based on P-GCN |
| PGC-TAL | 71.2 | 68.9 | 65.1 | 59.5 | 51.2 | - | - | Y | based on P-GCN |
| DPP | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - |
| PGCN | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - |
| BMN | - | - | 56.0 | 47.4 | 38.8 | 29.7 | 20.5 | Y | - |
| D-SSAD | - | - | 60.2 | 54.1 | 44.2 | 32.3 | 19.1 | Y | - |
| TAL-Net | 59.8 | 57.1 | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 | Y | - |
| FRTS | - | - | 53.5 | 50.2 | 44.2 | 33.9 | 22.7 | Y | - |
| GTAN | 69.1 | 63.7 | 57.8 | 47.2 | 38.8 | - | - | Y | - |
| AGCN | 59.3 | 59.6 | 57.1 | 51.6 | 38.6 | 28.9 | 17.0 | Y | - |
| MGG | - | - | 53.9 | 46.8 | 37.4 | 29.5 | 21.3 | Y | - |
| BSN | - | - | 53.5 | 45.0 | 36.9 | 28.4 | 20.0 | Y | - |
| FSN | - | - | 51.8 | 41.5 | 32.1 | 22.9 | 14.7 | Y | - |
| CBR | 60.1 | 56.7 | 50.1 | 41.3 | 31.0 | 19.1 | 9.9 | Y | - |
| ASSA* | - | - | 51.8 | 42.4 | 30.8 | 20.2 | 11.1 | Y | - |
| CTAP | - | - | - | - | 29.9 | - | - | Y | - |
| SS-TAD | - | - | 45.7 | - | 29.2 | - | 9.6 | Y |701 fps(GTX Titan X (Maxwell))|
| SSN | 60.3 | 56.2 | 50.6 | 40.8 | 29.1 | - | - | Y | - |
| R-C3D | 54.5 | 51.5 | 44.8 | 35.6 | 28.9 | - | - | Y |569 fps|
| TAG | 64.1 | 57.7 | 48.7 | 39.8 | 28.2 | - | - | Y | - |
| TPC | - | - | 44.1 | 37.1 | 28.2 | 20.6 | 12.7 | Y |250 fps (GTX Titan X)|
| TURN | 54.0 | 50.9 | 44.1 | 34.9 | 25.6 | - | - | Y |129.4 fps (GTX Titan X)|
| SSAD | 50.1 | 47.8 | 43.0 | 35.0 | 24.6 | - | - | Y | - |
| CDC | - | - | 40.1 | 29.4 | 23.3 | 13.1 | 7.9 | Y |500 fps (GTX Titan X)|
| SST | - | - | 37.8 | - | 23.0 | - | - | Y |308 fps Titan-X|
| S-CNN | 47.7 | 43.5 | 36.3 | 28.7 | 19.0 | 10.3 | 5.3 | Y |60 fps(GeForce GTX 980)|
| PSDF* | 51.4 | 42.6 | 33.6 | 26.1 | 18.8 | - | - | Y | - |
| SMS* | 51.0 | 45.2 | 36.5 | 27.8 | 17.8 | - | - | N | - |
| ADFG* | 48.9 | 44.0 | 36.0 | 26.4 | 17.1 | - | - | Y | - |
| TCN | - | - | 33.3 | 25.6 | 15.9 | 9.0 | - | Y | - |
| DAP | - | - | - | - | 13.9 | - | - | Y |134.1 fps Titan-X|
| G-TAD | - | - | 54.5 | 47.6 | 40.2 | 30.8 | 23.4 | Y | - |
| PTAL-ETP | - | - | 48.2 | 42.4 | 34.2 | 23.4 | 13.9 | Y | - |
| CTR_AL | - | - | 53.9 | 50.7 | 45.4 | 38.0 | 28.5 | Y | - |
| LS-TD | arXiv |-|63.5 |61.0 |56.7 |50.6 |42.6 |32.5 |21.4 |- |- |- | - |
| SRG | arXiv |-|- |- |54.5 |46.9 |39.1 |31.4 |22.2 |- |- |- | - |
| IDU | arXiv |-|- |- |- |- |- |- |- |- |- |Under a different metric | - |
**Performance on ActivityNet v1.3 dataset.**
- The left half is score on ActivityNet v1.3 validation dataset. The right half is score on ActivityNet v1.3 testing dataset.
- `Deep Learning`: deep learning related method.
| Detector | 0.50 | 0.75 | 0.95 | @Avg | 0.50 | 0.75 | 0.95 | @Avg |Deep Learning|Speed |
| :---------: | :-----: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :------: | :--------: | :--: |
| BMN | 50.07 | 34.78 | 8.29 | 33.85 | - | - | - | 36.42 | Y | - |
| GTAN | 52.61 | 34.14 | 8.91 | 34.31 | - | - | - | 35.54 | Y | - |
| PGCN | 48.26 | 33.16 | 3.27 | 31.11 | - | - | - | - | Y | - |
| RTD-Net | 46.43 | 30.45 | 8.64 | 30.46 | - | - | - | - | Y | - |
| BSN_ori | 46.45 | 29.96 | 8.02 | 30.03 | - | - | - | 32.84 | Y | - |
| BSN_new | 52.50 | 33.53 | 8.85 | 33.72 | - | - | - | 34.42 | Y | - |
| C-TCN | 47.6 | 31.9 | 6.2 | 31.1 | - | - | - | - | Y | - |
| PGC-TAL | 44.31 | 29.85 | 5.47 | 28.85 | - | - | - | - | Y | based on P-GCN |
| SSN | - | - | - | - | 43.26 | 28.70 | 5.63 | 28.28 | Y | - |
| TAG | 39.12 | 23.48 | 5.49 | 23.98 | 40.69 | 26.02 | 6.67 | 26.05 | Y | - |
| CDC | 45.30 | 26.00 | 0.20 | 23.80 | - | - | - | - | Y |500 fps (GTX Titan X)|
| TCN | 36.44 | 21.15 | 3.90 | - | 37.49 | 23.47 | 4.47 | 23.58 | Y | - |
| TAL-Net | 38.23 | 18.30 | 1.30 | 20.22 | - | - | - | - | Y | - |
| AGCN | 30.4 | - | - | - | - | - | - | - | Y | - |
| SCC | - | - | - | - | 39.90 | 18.70 | 4.70 | 19.30 | Y | 35.9 fps |
| R-C3D | 26.80 | - | - | 12.70 | - | - | - | 13.10 | Y |569 fps (GTX Titan X (Maxwell))
1030 fps (Titan X Pascal)|
| G-TAD | 50.36 | 34.6 | 9.02 | 34.09 | - | - | - | - | Y | - |
| CTR_AL | 43.47 | 33.91 | 9.21 | 30.12 | - | - | - | - | Y | - |
| SRG | 46.53 | - | - | - | - | 29.98 | - | - | - | 4.83 | 29.72 | - | - |
**Performance on ActivityNet v1.2 dataset.**
| LS-TD | 50.4 | - | - | - | - | 34.9 | - | - | - | 8.0 | 33.6 | - | - |
### Temporal Action Localization
* **TGM:** AJ Piergiovanni, Michael S. Ryoo.
"Temporal Gaussian Mixture Layer for Videos." ICML (2019).
[[paper](https://arxiv.org/pdf/1803.06316.pdf)]
[[code](https://github.com/piergiaj/tgm-icml19)]
* **RTD-Net:** Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu.
"Relaxed Transformer Decoders for Direct Action Prop." arXiv 2102.01894.
[[paper](https://arxiv.org/pdf/2102.01894.pdf)]
* **DPP:** Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu.
"Deep Point-wise Prediction for Action Temporal Proposal." ArXiv 1909.07725.
[[paper](https://arxiv.org/pdf/1909.07725.pdf)]
[[code](https://github.com/liluxuan1997/DPP)]
* **C-TCN:** Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li.
"Deep Concept-wise Temporal Convolutional Networks for Action Localization." ArXiv 1908.09442.
[[paper](https://arxiv.org/pdf/1908.09442.pdf)]
[[code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleVideo/models/ctcn/README.md)]
* **PGC-TAL:** Rui Su, Dong Xu, Lu Sheng, Wangli Ouyang.
"PCG-TAL: Progressive Cross-granularity Cooperation for Temporal Action Localization." TIP 2020.
[[paper](https://ieeexplore.ieee.org/document/9298475)]]
* **BMN:** Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.
"BMN : Boundary-Matching Network for Temporal Action Proposal Generation." ICCV (2019).
[[paper](https://arxiv.org/pdf/1907.09702.pdf)]
* **PGCN:** Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan.
"Graph Convolutional Networks for Temporal Action Localization." ICCV (2019).
[[paper](https://arxiv.org/pdf/1909.03252.pdf)]
[[code](https://github.com/Alvin-Zeng/PGCN)]
* **GTAN:** Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei.
"Gaussian Temporal Awareness Networks for Action Localization." CVPR (2019 **oral**).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localization_CVPR_2019_paper.pdf)]
* **AGCN:** Jun Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song.
"Graph Attention based Proposal 3D ConvNets for Action Detection." AAAI (2020).
[[paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf)]
* **MGG:** Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang.
"Multi-granularity Generator for Temporal Action Proposal." CVPR (2019).
[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Multi-Granularity_Generator_for_Temporal_Action_Proposal_CVPR_2019_paper.pdf)]
* **D-SSAD:** Yupan Huang, Qi Dai, Yutong Lu.
"Decoupling Localization and Classification in Single Shot Temporal Action Detection." ICME (2019).
[[paper](https://arxiv.org/pdf/1904.07442.pdf)]
[[code](https://github.com/HYPJUDY/Decouple-SSAD)]
* **TAL-Net:** Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar.
"Rethinking the Faster R-CNN Architecture for Temporal Action Localization." CVPR (2018).
[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Rethinking_the_Faster_CVPR_2018_paper.pdf)]
* **FRTS:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras.
"Exploring Feature Representation and Training strategies in Temporal Action Localization." ICIP (2019).
[[paper](https://arxiv.org/pdf/1905.10608.pdf)]
* **BSN:** Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.
"BSN: Boundary Sensitive Network for Temporal Action Proposal Generation." ECCV (2018).
[[paper](https://arxiv.org/pdf/1806.02964.pdf)]
[[code](https://github.com/wzmsltw/BSN-boundary-sensitive-network)]
* **FSN:** Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou.
"Exploring frame segmentation networks for temporal action localization." ECCV (2018).
[[paper](https://arxiv.org/pdf/1902.05488.pdf)]
* **CBR:** Jiyang Gao, Zhenheng Yang, Ram Nevatia.
"Cascaded Boundary Regression for Temporal Action Detection." BMVC (2017).
[[paper](https://arxiv.org/pdf/1705.01180.pdf)]
[[code](https://github.com/jiyanggao/CBR)]
* **ASSA*:** Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem.
"Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Humam_Alwassel_Action_Search_Spotting_ECCV_2018_paper.pdf)]
* **CTAP:** Jiyang Gao, Kan Chen, Ram Nevatia.
"CTAP: Complementary Temporal Action Proposal Generation." ECCV (2018).
[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiyang_Gao_CTAP_Complementary_Temporal_ECCV_2018_paper.pdf)]
[[code](https://github.com/jiyanggao/CTAP)]
* **SS-TAD:** Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles.
"End-to-end, single-stream temporal action detection in untrimmed videos." BMVC (2017).
[[paper](http://vision.stanford.edu/pdf/buch2017bmvc.pdf)]
[[code](https://github.com/shyamal-b/ss-tad)]
* **SSN:** Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin.
"Temporal Action Detection with Structured Segment Networks." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhao_Temporal_Action_Detection_ICCV_2017_paper.pdf)]
[[code](https://github.com/yjxiong/action-detection)]
* **R-C3D:** Huijuan Xu, Abir Das, Kate Saenko.
"R-C3D: Region Convolutional 3D Network for Temporal Activity Detection." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Xu_R-C3D_Region_Convolutional_ICCV_2017_paper.pdf)]
[[code](https://github.com/VisionLearningGroup/R-C3D)]
* **TAG:** Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang.
"A Pursuit of Temporal Accuracy in General Activity Detection." arXiv (1703).
[[paper](https://arxiv.org/pdf/1703.02716.pdf)]
* **TPC:** Ke Yang, Peng Qiao, Dongsheng Li, Shaohe Lv, Yong Dou.
"Exploring Temporal Preservation Networks for Precise Temporal Action Localization." AAAI (2018).
[[paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16164/16347)]
* **TURN:** Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia.
"TURN TAP : Temporal Unit Regression Network for Temporal Action Proposals." ICCV (2017).
[[paper](https://arxiv.org/pdf/1703.06189.pdf)]
[[code](https://github.com/jiyanggao/TURN-TAP)]
* **SSAD:** Tianwei Lin, Xu Zhao, Zheng Shou.
"Single shot temporal action detection." ACM MM (2017).
[[paper](https://arxiv.org/pdf/1710.06236.pdf)]
* **CDC:** Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.
"CDC Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shou_CDC_Convolutional-De-Convolutional_Networks_CVPR_2017_paper.pdf)]
[[code](https://github.com/ColumbiaDVMM/CDC)]
[[project](http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/)]
* **SST:** Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles.
"Single-stream temporal action proposals." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Buch_SST_Single-Stream_Temporal_CVPR_2017_paper.pdf)]
[[code](https://github.com/shyamal-b/sst)]
* **SCC:** Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem.
"SCC: Semantic context cascade for efficient action detection." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf)]
[[project](https://ivul.kaust.edu.sa/Pages/pub-scc-efficient-action-detection.aspx)]
* **S-CNN:** Zheng Shou, Dongang Wang, Shih-Fu Chang.
"Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shou_Temporal_Action_Localization_CVPR_2016_paper.pdf)]
[[code](https://github.com/zhengshou/scnn)]
* **PSDF*:** Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim.
"Temporal Action Localization with Pyramid of Score Distribution Features." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf)]
* **SMS*:** Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng.
"Temporal Action Localization by Structured Maximal Sums." CVPR (2017).
[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yuan_Temporal_Action_Localization_CVPR_2017_paper.pdf)]
* **ADFG*:** Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei.
"End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR (2016).
[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yeung_End-To-End_Learning_of_CVPR_2016_paper.pdf)]
[[code](https://github.com/syyeung/frameglimpses)]
* **TCN:** Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen.
"Temporal Context Network for Activity Localization in Videos." ICCV (2017).
[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Temporal_Context_Network_ICCV_2017_paper.pdf)]
[[code](https://github.com/vdavid70619/TCN)]
* **DAP:** Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem.
"DAPs: Deep Action Proposals for Action Understanding." ECCV (2016).
[[paper](https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf)]
[[code](https://github.com/escorciav/daps)]
* **G-TAD**Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem Visual Computing Center.
"G-TAD: Sub-Graph Localization for Temporal Action Detection" ArXiv(2019)
[[paper](https://arxiv.org/pdf/1911.11462.pdf)]
* **PTAL-ETP**Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He.
"Precise Temporal Action Localization by Evolving Temporal Proposals"ArXiv(2019)
[[paper](https://arxiv.org/pdf/1804.04803.pdf)]
* **CTR_AL**Peisen Zhao1, Lingxi Xie2, Chen Ju1, Ya Zhang1, Qi Tian.
"Constraining Temporal Relationship for Action Localization"ArXiv(2019)
[[paper](https://arxiv.org/pdf/2002.07358.pdf)]
* **SRG**Hyunjun Eun, Sumin Lee, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim.
"SRG: Snippet Relatedness-based Temporal Action Proposal Generator"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1911.11306.pdf)]
* **LS-TD**Yuan Zhou, Hongru Li, Sun-Yuan Kung, Life Fellow.
"Temporal Action Localization using Long Short-Term Dependency"Arxiv(2019)
[[paper](https://arxiv.org/pdf/1911.01060.pdf)]
* **IDU** Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick.
"Learning to Discriminate Information for Online Action Detection" CVPR(2020)
[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Eun_Learning_to_Discriminate_Information_for_Online_Action_Detection_CVPR_2020_paper.pdf)]