Repository: VividLe/awesome-weakly-supervised-action-localization Branch: master Commit: e86943a85504 Files: 4 Total size: 44.9 KB Directory structure: gitextract_ywnz0pl2/ ├── Other_Settings.md ├── README.md ├── Spatiotemporal.md └── Supervised.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: Other_Settings.md ================================================ # Action Localization Benchmarks ### Learning from action count supervision * **OCL:**Julien Schroeter, Kirill Sidorov, David Marshall.
"Weakly-Supervised Temporal Localization via Occurrence Count Learning" ICML 2019. [[paper](https://arxiv.org/pdf/1905.07293.pdf)] [[code](https://github.com/SchroeterJulien/ICML-2019-Weakly-Supervised-Temporal-Localization-via-Occurrence-Count-Learning)] ### Action Segment, Transformer * Action Modifiers :Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen.
"Action Modifiers: Learning from Adverbs in Instructional Videos." (CVPR 2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Doughty_Action_Modifiers_Learning_From_Adverbs_in_Instructional_Videos_CVPR_2020_paper.pdf)] ### Action Segment, Self-Supervised Learning * Action Segmentation : Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira.
"Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation." (CVPR 2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Action_Segmentation_With_Joint_Self-Supervised_Temporal_Domain_Adaptation_CVPR_2020_paper.pdf)] ### Action Segment, Transformer * SCT : Mohsen Fayyaz,Juergen Gall.
"SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation." (CVPR 2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fayyaz_SCT_Set_Constrained_Temporal_Transformer_for_Set_Supervised_Action_Segmentation_CVPR_2020_paper.pdf)] [[code](https://github.com/MohsenFayyaz89/SCT/)] ### Unintentional Action A new task. Interesting. * Oops : Dave Epstein Boyuan Chen Carl Vondrick.
"Oops! Predicting Unintentional Action in Video." (CVPR 2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Epstein_Oops_Predicting_Unintentional_Action_in_Video_CVPR_2020_paper.pdf)] [[project](https://oops.cs.columbia.edu/)] ### ActionBytes Learning from trimmed videos. * ActoinBytes: Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR (2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)] ### METAL ActivityNet v1.2, mAP@0.5: 41.9 [1-shot], 45.0 [5-shot] THUMOS14, mAP@0.5: 14.3 [1-shot], 16.2 [5-shot] * METAL : Da Zhang, Xiyang Dai, and Yuan-Fang Wang.
"METAL : Minimum Effort Temporal Activity Localization in Untrimmed Videos." CVPR (2020). [[paper](https://sites.cs.ucsb.edu/~yfwang/papers/cvpr2020.pdf)] ### Hierarchical Action Search A new task. A learned space where videos are positioned in entailment cones formed by different subtrees. * Uncertain :Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek.
"Searching for Actions on the Hyperbole." CVPR (2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Long_Searching_for_Actions_on_the_Hyperbole_CVPR_2020_paper.pdf)] ### Fine-grained Action Recognition and Localization A new task. Hierarchical annotation for recognition and localization. * FineGym :Dian Shao Yue Zhao Bo Dai Dahua Lin.
"FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." (CVPR 2020, oral, 3 strong accepts). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)] [[project](https://sdolivia.github.io/FineGym/)] ### Mining undefined sub-actions A new task. Find the common sub-actions from multiple videos. * TAPOs :Dian Shao Yue Zhao Bo Dai Dahua Lin.
"Intra- and Inter-Action Understanding via Temporal Action Parsing." (CVPR 2020). [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_Intra-_and_Inter-Action_Understanding_via_Temporal_Action_Parsing_CVPR_2020_paper.pdf)] [[project](https://sdolivia.github.io/TAPOS/)] ================================================ FILE: README.md ================================================ # Action Localization Benchmarks Papers and Results of Temporal Action Localization **Weakly Supervised Performance on THUMOS'14 dataset.** - The detectors are sorted by the mAP with threshold 0.5. - "c" indicates whether release code, yes (Y) or no (N). - "e" indicates the evaluation code, THUMOS (T), ActivityNet (A) or implemented by themselves. | Detector | Pub |c|e| 0.1 | 0.2 | 0.3 |0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |avg | info | | :---------: |:------:|-|-|:---:|:----|:----|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: | | D2-Net | arXiv-20-12-11 |N|A|65.6 |60.0 |52.1 |43.3 |35.9 | - | - | - | - | - | The same author with 3C-Net | | Lee et al | AAAI21 |Y|A|67.5 |61.2 |52.3 |43.4 |33.7 | 22.9 | 12.1 | - | - | - | The same author with BaS-Net | | HAM-Net | AAAI21 |N|A|65.9 |59.6 |52.2 |43.1 |32.6 | 21.9 | 12.5 | - | - | - | - | | ACSNet | AAAI21 |N|A| - | - |51.4 |42.7 |32.4 | 22.0 | 11.7 | - | - | - | - | | EM-MIL | ECCV20 |N|A|59.1 |52.7 |45.5 |36.8 |30.5 | 22.7 | 16.4 | - | - | - | Use existing classifiation results | | A2CL-PT | ECCV20 |Y|A| 61.2 | 56.1 |48.1 | 39.0 |30.1 |19.2 |10.6 | 4.8 | 1.0 | 30.0 | Report unsupervised performance as well | | ACL | CVPR20 |N|A| - | - |46.9 |38.9 |30.1 |19.8 |10.4 | - | - | - | Report unsupervised performance as well | | Liu et al. | AAAI21 |N|A| 61.7 | 58.0 |50.8 | 41.7 |29.6 |20.1 |10.7 | 4.3 | 0.5 | - | - | | WSTAL | WACV20 |-| |62.3 |- |46.8 |- |29.6 |- |9.7 |- |- |- | - | | ActionBytes | CVPR20 |N|A| - | - |43.0 |37.5 |29.0 | - |9.5 | - | - | - | - | | DGAM | CVPR20 |Y|A| 60.0|54.2 |46.8 |38.2 |28.8 |19.8 |11.4 |3.6 |0.4 | - | - | | TSCN | ECCV20 |N|A| 63.4|57.6 |47.8 |37.7 |28.7 |19.4 |10.2 |3.9 |0.7 | - | - | | BaSNet-I3D | AAAI20 |Y|A| 58.2|52.3 |44.6 |36.0 |27.0 |18.6 |10.4 |3.9 |0.5 | - | - | | BaSNet-UNT | AAAI20 |Y|A| 56.2|50.3 |42.8 |34.7 |25.1 |17.1 |9.3 |3.7 |0.5 | - | - | | WSBM | ICCV19 |N|A| 60.4|56.0 |46.6 |37.5 |26.8 |17.6 |9.0 |3.3 |0.4 | - | - | | 3C-Net | ICCV19 |Y|I| 59.1|53.5 |44.2 |34.1 |26.6 | - |8.1 | - | - | - | - | | ASSG | ACM 19 |N| | 65.6|59.4 |50.4 |38.7 |25.4 |15.0 |6.6 | - | - | - | - | | TSM | ICCV19 |N|T| - | - |39.5 |31.9 |24.5 |13.8 |7.1 | - | - |23.4 | - | | CleanNet | ICCV19 |N|T| - | - |37.0 |30.9 |23.9 |13.9 |7.1 | - | - | - | - | | CMCS-I3D | CVPR19 |Y|T| 57.4|50.8 |41.2 |32.1 |23.1 |15.0 |7.0 | - | - | - |report avg-mAP| | CMCS-UNT | CVPR19 |Y|T| 53.5|46.8 |37.5 |29.1 |19.9 |12.3 |6.0 | - | - | - | - | | STARNet | AAAI19 |N|A|68.8 |60.0 |48.7 |34.7 |23.0 |- |- | - | - |- | - | | W-TALC | ECCV18 |Y|I|55.2 |49.6 |40.1 |31.1 |22.8 |- |- | - | - |7.6 | - | | AutoLoc | ECCV18 |Y|T|- |- |35.8 |29.0 |21.2 |13.4 |5.8 | - | - |- | - | | MAAN | ICLR19 |Y|A|59.8 |50.8 |41.1 |30.6 |20.3 |12.0 |6.9 |2.6 |0.2 |24.9 | - | | LTSR | AAAI19 |N|T|55.9 |46.9 |38.3 |28.1 |18.6 |11.0 |5.59 |2.19 |0.29 |- | - | | WSGN | WACV20 |-|T|51.1 |44.4 |34.9 |26.3 |18.1 |11.6 |6.5 |- |- |- | - | | STPN | CVPR18 |I|A|52.0 |44.7 |35.5 |25.8 |16.9 |9.9 |4.3 |1.2 |0.1 |- | - | | CPMN | ACCV18 |N|T|47.1 |41.6 |32.8 |24.7 |16.1 |10.1 |5.5 |- |- |- | - | | S-O-C | ACM18 |N|T|45.8 |39.0 |31.1 |22.5 |15.9 |- |- |- |- |- | - | |UntrimmedNets| CVPR17 |Y|T|44.4 |37.7 |28.2 |21.1 |13.7 |- |- |- |- |- | - | | H&S | ICCV17 |Y|T|36.44|27.84|19.49|12.66|6.84 |- |- |- |- |- | - | |LPAT-I3D+TEM | arXiv |-| |- |- |46.9 |37.4 |28.0 |16.6 |9.2 |- |- |27.6 | - | | LPAT-I3D | arXiv |-| |- |- |46.7 |37.5 |27.9 |17.6 |9.2 |- |- |27.6 | - | | LPAT-U | arXiv |-| |- |- |39.9 |31.5 |22.6 |14.2 |7.9 |- |- |27.6 | - | |RefineLoc-I3D| arXiv |-|T|- |- |40.8 |- |23.1 |- |5.3 |- |- |- | - | |RefineLoc-TSN| arXiv |-|T|- |- |36.1 |- |22.6 |- |5.8 |- |- |- | - | **Weakly Supervised Performance on ActivityNet v1.2 dataset.** | Detector | Pub |c| 0.5 | 0.55|0.60 | 0.65|0.70 |0.75 | 0.80|0.85 |0.90 |0.95 | avg |test | info | | :---------: |:------:|-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: | | D2-Net | arXiv-20-12-11 |N|42.3 | - | - | - | - |25.5 | - | - | - | 5.8 | 26.0| - | - | | ACSNet | AAAI21 |N|40.1 | - | - | - | - |26.1 | - | - | - | 6.8 | 26.0| - | - | | Lee et al | AAAI21 |N|41.2 | - | - | - | - |25.6 | - | - | - | 6.0 | 25.9| - | - | | Liu at al. | AAAI21 |N|39.2 |- |- |- | - |25.6 |- |- |- | 6.8 | 25.5| - | - | | HAM-Net | AAAI21 |N|41.0 | - | - | - | - |24.8 | - | - | - | 5.3 | 25.1| - | - | | BaSNet | AAAI20 |Y|38.5 | - | - | - | - |24.2 | - | - | - | 5.6 | 24.3| - | - | | TSCN | ECCV20 |N|37.6 |- |- |- | - |23.7 |- |- |- | 5.7 | 23.6| - | - | | CMCS | CVPR19 |Y|36.8 |- |- |- | - |22.0 |- |- |- | 5.6 | 22.4| - | - | | 3C-Net | ICCV19 |Y|35.4 | - | - | - | - |22.9 | - | - | - | 8.5 | 21.1| - | - | | TSM | ICCV19 |N|28.3 |26.0 |23.6 |21.2 | 18.9|17.0 |14.0 |11.1 |7.5 | 3.5 | - | - | - | | CleanNet | ICCV19 |N|37.1 |33.4 |29.9 |26.7 | 23.4|20.3 |17.2 |13.9 |9.2 | 5.0 | 21.6| - | - | | EM-MIL | ECCV20 |N|37.4 |- |- |- | 23.1|- |- |- |2.0 | - | 20.3| - | - | | W-TALC | ECCV18 |Y|37.0 |- |- |- | 14.6|- |- |- |- | - | 18.0| - | - | | AutoLoc | ECCV18 |Y|27.3 |24.9 |22.5 |19.9 | 17.5|15.1 |13.0 |10.0 |6.8 | 3.3 | 16.0| - | - | |RefineLoc-I3D| arXiv |-|38.7 |- |- |- | - |22.6 |- |- |- | 5.5 | 23.2| - | - | |RefineLoc-TSN| arXiv |-|38.8 |- |- |- | - |22.2 |- |- |- | 5.3 | 23.2| - | - | | LPAT | arXiv |-|37.6 |34.6 |31.6 |28.7 | 25.6|22.6 |19.6 |15.3 |10.9 | 4.9 | 23.1| - | - | | WSTAL | arXiv |-|35.2 |- |- |- | 16.3|- |- |- |- | - | - | - | - | **Weakly Supervised Performance on ActivityNet v1.3 dataset.** | Detector | Pub |c| 0.5 |0.75 |0.95 |avg | | :---------: |:------:|-|:---:|:---:|:---:|:---:| | ACSNet | AAAI21 |N|36.3 |24.2 |5.8 |23.9 | | Lee et al.| AAAI21 |Y|37.0 |23.9 |5.7 |23.7 | | Liu et al.| AAAI21 |N|35.1 |23.7 |5.6 |23.2 | | A2CL-PT | ECCV20 |Y|36.8 |22.0 |5.2 |22.5 | | BaSNet-I3D | AAAI20 |Y|34.5 |22.5 |4.9 |22.2 | | TSCN | ECCV20 |N|35.3 |21.4 |5.3 |21.7 | | WSBM | ICCV19 |N|36.4 |19.2 | 2.9 |- | | ASSG | ACM 19 |N|32.3 |20.1 | 4.0 |- | | TSM | ICCV19 |N|30.0 |19.0 | 4.5 |- | | CMCS | CVPR19 |Y|34.0 |20.9 | 5.7 |21.2 | | STARNet | AAAI19 |N|31.1 |18.8 | 4.7 |- | | MAAN | ICLR19 |Y|33.7 |21.9 | 5.5 |- | | LTSR | AAAI19 |N|33.1 |18.7 |3.32 |21.78| | STPN | CVPR18 |I|29.3 |16.9 |2.6 |- | | CPMN | ACCV18 |N|39.29|24.09|6.71 |24.42| | S-O-C | ACM18 |N|27.3 |14.7 |2.9 |15.6 | ### Weakly Supervised Temporal Action Localization * **D2-Net:** Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao.
"D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations" arXiv:2012.06440. [[paper](https://arxiv.org/pdf/2012.06440.pdf)] * **Lee et al:** Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun.
"Weakly-supervised Temporal Action Localization by Uncertainty Modeling" AAAI 2021. [[paper](https://arxiv.org/pdf/2006.07006.pdf)] [[code](https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling)] * **HAM-Net:** Ashraful Islam, Chengjiang Long , Richard J. Radke.
"A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization" AAAI 2021. [[paper](https://arxiv.org/pdf/2101.00545.pdf)] [[code](https://github.com/asrafulashiq/hamnet)] * **Liu et al.:** Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.
"ACSNet : Action-Context Separation Network for Weakly Supervised Temporal Action Localization" AAAI 2021. [[paper](http://gr.xjtu.edu.cn/web/lewang)] * **Liu at al.:** Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.
"Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context" AAAI 2021. [[paper](http://gr.xjtu.edu.cn/web/lewang)] * **EM-MIL:** Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu.
"Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning" ECCV 2020. [[paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740715.pdf)] * **A2CL-PT:** Kyle Min, Jason J. Corso.
"Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization." ECCV 2020. [[paper](https://link.springer.com/chapter/10.1007%2F978-3-030-58568-6_17)] [[code](https://github.com/MichiganCOG/A2CL-PT)] * **ACL:** Guoqiang Gong, Xinghan Wang, Yadong Mu, Qi Tian.
"Learning Temporal Co-Attention Models for Unsupervised Video Action Localization." CVPR 2020, oral. [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.pdf)] * **WSTAL**Ashraful Islam, Richard J. Radke.
"Weakly Supervised Temporal Action Localization Using Deep Metric Learning" WACV 2020. [[paper](https://arxiv.org/pdf/2001.07793.pdf)] [[code](https://github.com/asrafulashiq/wsad)] * **ActoinBytes:** Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.
"ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR 2020. [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)] * **DGAM:** Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang.
"Weakly-Supervised Action Localization by Generative Attention Modeling." CVPR 2020. [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf)] [[code](https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization)] * **TSCN:** Zhai, Yuanhao and Wang, Le and Tang, Wei and Zhang, Qilin and Yuan, Junsong and Hua, Gang.
"Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization." ECCV 2020. [[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf)] * **BaSNet:** Pilhyeon Lee, Youngjung Uh, Hyeran Byun.
"Background Suppression Networks for Weakly-supervised Temporal Action Localization." AAAI 2020. [[paper](https://arxiv.org/pdf/1911.09963.pdf)] [[code](https://github.com/Pilhyeon/BaSNet-pytorch)] * **3C-Net:** Sanath Narayan, Hisham Cholakkal, Fahad Shabaz Khan, Ling Shao.
"3C-Net : Category Count and Center Loss for Weakly-Supervised Action Localization." ICCV 2019. [[paper](https://arxiv.org/pdf/1908.08216.pdf)] [[code](https://github.com/naraysa/3c-net)] * **CMCS:** Daochang Liu, Tingting Jiang, Yizhou Wang.
"Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization." CVPR 2019. [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.pdf)] [[code](https://github.com/Finspire13/CMCS-Temporal-Action-Localization)] * **ASSG:** Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang, Pu Fei Wu, Futai Zou.
"Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization" ACM MM 2019. [[paper](https://arxiv.org/pdf/1908.02422.pdf)] * **AutoLoc:**Zheng Shou, Hang Gao, Lei Zhang, KazuyukiMiyazawa, Shih-Fu Chang.
"AutoLoc Weakly-supervised Temporal Action Localization in Untrimmed Videos"ECCV 2018. [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Zheng_Shou_AutoLoc_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)] [[code](https://github.com/zhengshou/AutoLoc)] * **CPMN:**Haisheng Su, Xu Zhao, Tianwei Lin.
"Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization"ACCV 2018. [[paper](https://arxiv.org/pdf/1810.11794.pdf)] * **H&S:**Krishna Kumar Singh, Yong Jae Lee.
"Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization"ICCV 2017. [[paper](https://arxiv.org/pdf/1704.04232.pdf)] [[code](https://github.com/goddoe/hide-and-seek)] * **LTSR:**Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan.
"Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision"AAAI 2019. [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4958/4831)] * **WSGN:**Basura Fernando, Cheston Tan Yin Chet.
"Weakly Supervised Gaussian Networks for Action Detection" WACV(2020) * **MAAN:**Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung.
"MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING"ICLR 2019. [[paper](https://arxiv.org/pdf/1905.08586.pdf)] [[code](https://github.com/yyuanad/MAAN)] * **S-O-C:**Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li.
" Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector"ACM MM 2018. [[paper](https://arxiv.org/pdf/1807.02929.pdf)] * **STARNet:**Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu.
"Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection"AAAI 2019. [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4939/4812)] * **TSM:**Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan.
"Temporal Structure Mining for Weakly Supervised Action Detection"ICCV(2019). [[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf)] * **UntrimmedNets:**Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool.
"UntrimmedNets for Weakly Supervised Action Recognition and Detection"CVPR 2017. [[paper](https://wanglimin.github.io/papers/WangXLV_CVPR17.pdf)] [[code](https://github.com/wanglimin/UntrimmedNet)] * **WSBM:**Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes.
"Weakly-supervised Action Localization with Background Modeling"ICCV 2019. [[paper](https://arxiv.org/pdf/1908.06552.pdf)] * **CleanNet:**Ziyi Liu, Le Wang1∗ Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua.
"Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks"ICCV 2019. [[paper](https://qilin-zhang.github.io/_pages/pdfs/Weakly_Supervised_Temporal_Action_Localization_through_Contrast_based_Evaluation_Networks.pdf)] * **STPN**Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han.
"Weakly Supervised Action Localization by Sparse Temporal Pooling Network" CVPR 2018. [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Weakly_Supervised_Action_CVPR_2018_paper.pdf)] [[code](https://github.com/bellos1203/STPN)] * **W-TALC**Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury.
"W-TALC: Weakly-supervised Temporal Activity Localization and Classification" ECCV 2018. [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Sujoy_Paul_W-TALC_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)] [[code](https://github.com/sujoyp/wtalc-pytorch)] * **LPAT**Xudong, Lin Zheng, Shou Shih-Fu Chang.
"LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization" arXiv 2019. [[paper](https://arxiv.org/pdf/1910.11285.pdf)] * **RefineLoc:**Humam Alwassel1, Alejandro Pardo1, Fabian Caba Heilbron, Ali Thabet1 Bernard Ghanem1.
"RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization"Arxiv(2019) [[paper](https://arxiv.org/pdf/1904.00227.pdf)] [[paper](https://basurafernando.github.io/papers/wacv2020_wsgn.pdf)] ### Expecting for paper * **lvr:** Xingyu Liu, Joon-Young Lee, Hailin Jin.
"Learning Video Representations from Correspondence Proposals." CVPR 2019 **oral**. ## Dataset * **THUMOS'14:** Yu-Gang Jiang, Jingen Liu, Amir R. Zamir, George Toderici.
"THUMOS Challenge 2014" [[project](https://www.crcv.ucf.edu/THUMOS14/home.html)] * **Activity:** Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia.
"A Large-Scale Video Benchmark for Human Activity Understanding" [[project](http://activity-net.org/index.html)] * **THUMOS'15:** Alexander Gorban, Haroon Idrees, Yu-Gang Jiang, Amir R. Zamir.
"THUMOS Challenge 2015" [[project](http://www.thumos.info/)] * **COIN:** Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou.
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis." CVPR 2019. [[paper](https://arxiv.org/pdf/1903.02874.pdf)] [[project](https://coin-dataset.github.io/)] ================================================ FILE: Spatiotemporal.md ================================================ # Spatio-Temporal Action Detection Papers and Results for Spatio-Temporal Action Detection ## Spatio-Temporal Action Localization **Performance on AVA v2.1 dataset.** - Metric: mAP with threshold 0.5. | Detector | val | test | | :---------: | :-----: | :-------: | | LFB | 27.70 | 27.20 | | VATN* | 25.00 | 24.93 | | SMAD* | 22.20 | - | | STEP | 18.60 | - | | ACRN | 17.40 | - | | AVA | 15.80 | - | | SlowFast | 27.30 | 27.10 | ## Spatio-temporal Action Detection * **PSCS:** Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu.
"Improving Action Localization by Progressive Cross-stream Cooperation." CVPR (2019). [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Su_Improving_Action_Localization_by_Progressive_Cross-Stream_Cooperation_CVPR_2019_paper.pdf)] * **SlowFast:** Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He.
"SlowFast Networks for Video Recognition." arXiv (1812). [[paper](https://arxiv.org/pdf/1812.05038.pdf)] [[unofficial_code](https://github.com/r1ch88/SlowFastNetworks)] [[unofficial_code](https://github.com/Guocode/SlowFast-Networks)] * **DwF:** Jiaojiao Zhao, Cees G.M. Snoek.
"Dance with Flow: Two-in-One Stream Action Detection." CVPR (2019). [[paper](https://arxiv.org/pdf/1904.00696.pdf)] * **STEP:** Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz.
"STEP: Spatio-Temporal Progressive Learning for Video Action Detection." CVPR (2019 **oral**). [[paper](https://arxiv.org/pdf/1904.09288.pdf)] * **LFB:** Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick.
"Long-Term Feature Banks for Detailed Video Understanding." CVPR (2019). [[paper](https://arxiv.org/pdf/1812.05038.pdf)] [[project](https://github.com/facebookresearch/video-long-term-feature-banks)] * **VATN*:** Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman.
"Video Action Transformer Network." CVPR (2019 **oral**). [[paper](https://arxiv.org/pdf/1812.02707.pdf)] [[project](https://rohitgirdhar.github.io/ActionTransformer/)] * **LAEO:** Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman.
"LAEO-Net: revisiting people Looking At Each Other in videos." CVPR (2019). [[paper](http://www.robots.ox.ac.uk/~vgg/research/laeonet/cvpr2019LAEO.pdf)] [[code](https://github.com/AVAuco/laeonet/)] [[project](http://www.robots.ox.ac.uk/~vgg/research/laeonet/)] * **SMAD*:** Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid.
"A Structured Model For Action Detection." CVPR (2019). [[paper](https://arxiv.org/pdf/1812.03544.pdf)] * **TACNet:** Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun.
"TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection." CVPR (2019). [[paper](http://www.skicyyu.org/Paper/CVPR2019_TACNET.pdf)] * **AVA:** Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik.
"AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions." CVPR (2018). [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gu_AVA_A_Video_CVPR_2018_paper.pdf)] [[project](https://research.google.com/ava/)] * **ACRN:** Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid.
"Actor-Centric Relation Network." ECCV (2018). [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.pdf)] * **T-CNN:** Rui Hou, Chen Chen, Mubarak Shah.
"Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos." ICCV (2017). [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hou_Tube_Convolutional_Neural_ICCV_2017_paper.pdf)] [[code](https://www.crcv.ucf.edu/projects/TCNN/#Code)] [[project](https://www.crcv.ucf.edu/projects/TCNN/)] ## Dataset * **YouTube-8M-Segments:** Ke Chen, Julia Elliott, Nisarg Kothari, Hanhan Li, et.al.
"YouTube-8M Segments Dataset" [[project](https://research.google.com/youtube8m/)] ## Distinguished Researchers & Teams [WILLOW](https://www.di.ens.fr/willow/publications/YearOnly/publications.html) [Ivan Laptev](https://www.di.ens.fr/~laptev/#Publications) [Christoph Feichtenhofer](https://feichtenhofer.github.io/) ================================================ FILE: Supervised.md ================================================ # Action Detection Benchmarks Papers and Results of Temporal Action Localization ## Temporal Action Localization **Performance on THUMOS'14 dataset.** - The detectors are ordered by the mAP with threshold 0.5. - `Deep Learning`: deep learning related method. | Detector | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 |Deep Learning|Comment | | :---------: | :-----: |:-------:|:-------:|:-------:|:-------:|:-------:| :------: | :--------: | :--: | | TGM | - | - | - | - | 53.5 | - | - | Y | - | | C-TCN | 72.2 | 71.4 | 68.0 | 62.3 | 52.1 | - | - | Y | - | | RTD-Net | - | - | 68.3 | 62.3 | 51.9 | 38.8 | 23.7 | Y | based on P-GCN | | PGC-TAL | 71.2 | 68.9 | 65.1 | 59.5 | 51.2 | - | - | Y | based on P-GCN | | DPP | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - | | PGCN | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - | Y | - | | BMN | - | - | 56.0 | 47.4 | 38.8 | 29.7 | 20.5 | Y | - | | D-SSAD | - | - | 60.2 | 54.1 | 44.2 | 32.3 | 19.1 | Y | - | | TAL-Net | 59.8 | 57.1 | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 | Y | - | | FRTS | - | - | 53.5 | 50.2 | 44.2 | 33.9 | 22.7 | Y | - | | GTAN | 69.1 | 63.7 | 57.8 | 47.2 | 38.8 | - | - | Y | - | | AGCN | 59.3 | 59.6 | 57.1 | 51.6 | 38.6 | 28.9 | 17.0 | Y | - | | MGG | - | - | 53.9 | 46.8 | 37.4 | 29.5 | 21.3 | Y | - | | BSN | - | - | 53.5 | 45.0 | 36.9 | 28.4 | 20.0 | Y | - | | FSN | - | - | 51.8 | 41.5 | 32.1 | 22.9 | 14.7 | Y | - | | CBR | 60.1 | 56.7 | 50.1 | 41.3 | 31.0 | 19.1 | 9.9 | Y | - | | ASSA* | - | - | 51.8 | 42.4 | 30.8 | 20.2 | 11.1 | Y | - | | CTAP | - | - | - | - | 29.9 | - | - | Y | - | | SS-TAD | - | - | 45.7 | - | 29.2 | - | 9.6 | Y |701 fps(GTX Titan X (Maxwell))| | SSN | 60.3 | 56.2 | 50.6 | 40.8 | 29.1 | - | - | Y | - | | R-C3D | 54.5 | 51.5 | 44.8 | 35.6 | 28.9 | - | - | Y |569 fps| | TAG | 64.1 | 57.7 | 48.7 | 39.8 | 28.2 | - | - | Y | - | | TPC | - | - | 44.1 | 37.1 | 28.2 | 20.6 | 12.7 | Y |250 fps (GTX Titan X)| | TURN | 54.0 | 50.9 | 44.1 | 34.9 | 25.6 | - | - | Y |129.4 fps (GTX Titan X)| | SSAD | 50.1 | 47.8 | 43.0 | 35.0 | 24.6 | - | - | Y | - | | CDC | - | - | 40.1 | 29.4 | 23.3 | 13.1 | 7.9 | Y |500 fps (GTX Titan X)| | SST | - | - | 37.8 | - | 23.0 | - | - | Y |308 fps Titan-X| | S-CNN | 47.7 | 43.5 | 36.3 | 28.7 | 19.0 | 10.3 | 5.3 | Y |60 fps(GeForce GTX 980)| | PSDF* | 51.4 | 42.6 | 33.6 | 26.1 | 18.8 | - | - | Y | - | | SMS* | 51.0 | 45.2 | 36.5 | 27.8 | 17.8 | - | - | N | - | | ADFG* | 48.9 | 44.0 | 36.0 | 26.4 | 17.1 | - | - | Y | - | | TCN | - | - | 33.3 | 25.6 | 15.9 | 9.0 | - | Y | - | | DAP | - | - | - | - | 13.9 | - | - | Y |134.1 fps Titan-X| | G-TAD | - | - | 54.5 | 47.6 | 40.2 | 30.8 | 23.4 | Y | - | | PTAL-ETP | - | - | 48.2 | 42.4 | 34.2 | 23.4 | 13.9 | Y | - | | CTR_AL | - | - | 53.9 | 50.7 | 45.4 | 38.0 | 28.5 | Y | - | | LS-TD | arXiv |-|63.5 |61.0 |56.7 |50.6 |42.6 |32.5 |21.4 |- |- |- | - | | SRG | arXiv |-|- |- |54.5 |46.9 |39.1 |31.4 |22.2 |- |- |- | - | | IDU | arXiv |-|- |- |- |- |- |- |- |- |- |Under a different metric | - | **Performance on ActivityNet v1.3 dataset.** - The left half is score on ActivityNet v1.3 validation dataset. The right half is score on ActivityNet v1.3 testing dataset. - `Deep Learning`: deep learning related method. | Detector | 0.50 | 0.75 | 0.95 | @Avg | 0.50 | 0.75 | 0.95 | @Avg |Deep Learning|Speed | | :---------: | :-----: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :------: | :--------: | :--: | | BMN | 50.07 | 34.78 | 8.29 | 33.85 | - | - | - | 36.42 | Y | - | | GTAN | 52.61 | 34.14 | 8.91 | 34.31 | - | - | - | 35.54 | Y | - | | PGCN | 48.26 | 33.16 | 3.27 | 31.11 | - | - | - | - | Y | - | | RTD-Net | 46.43 | 30.45 | 8.64 | 30.46 | - | - | - | - | Y | - | | BSN_ori | 46.45 | 29.96 | 8.02 | 30.03 | - | - | - | 32.84 | Y | - | | BSN_new | 52.50 | 33.53 | 8.85 | 33.72 | - | - | - | 34.42 | Y | - | | C-TCN | 47.6 | 31.9 | 6.2 | 31.1 | - | - | - | - | Y | - | | PGC-TAL | 44.31 | 29.85 | 5.47 | 28.85 | - | - | - | - | Y | based on P-GCN | | SSN | - | - | - | - | 43.26 | 28.70 | 5.63 | 28.28 | Y | - | | TAG | 39.12 | 23.48 | 5.49 | 23.98 | 40.69 | 26.02 | 6.67 | 26.05 | Y | - | | CDC | 45.30 | 26.00 | 0.20 | 23.80 | - | - | - | - | Y |500 fps (GTX Titan X)| | TCN | 36.44 | 21.15 | 3.90 | - | 37.49 | 23.47 | 4.47 | 23.58 | Y | - | | TAL-Net | 38.23 | 18.30 | 1.30 | 20.22 | - | - | - | - | Y | - | | AGCN | 30.4 | - | - | - | - | - | - | - | Y | - | | SCC | - | - | - | - | 39.90 | 18.70 | 4.70 | 19.30 | Y | 35.9 fps | | R-C3D | 26.80 | - | - | 12.70 | - | - | - | 13.10 | Y |569 fps (GTX Titan X (Maxwell))
1030 fps (Titan X Pascal)| | G-TAD | 50.36 | 34.6 | 9.02 | 34.09 | - | - | - | - | Y | - | | CTR_AL | 43.47 | 33.91 | 9.21 | 30.12 | - | - | - | - | Y | - | | SRG | 46.53 | - | - | - | - | 29.98 | - | - | - | 4.83 | 29.72 | - | - | **Performance on ActivityNet v1.2 dataset.** | LS-TD | 50.4 | - | - | - | - | 34.9 | - | - | - | 8.0 | 33.6 | - | - | ### Temporal Action Localization * **TGM:** AJ Piergiovanni, Michael S. Ryoo.
"Temporal Gaussian Mixture Layer for Videos." ICML (2019). [[paper](https://arxiv.org/pdf/1803.06316.pdf)] [[code](https://github.com/piergiaj/tgm-icml19)] * **RTD-Net:** Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu.
"Relaxed Transformer Decoders for Direct Action Prop." arXiv 2102.01894. [[paper](https://arxiv.org/pdf/2102.01894.pdf)] * **DPP:** Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu.
"Deep Point-wise Prediction for Action Temporal Proposal." ArXiv 1909.07725. [[paper](https://arxiv.org/pdf/1909.07725.pdf)] [[code](https://github.com/liluxuan1997/DPP)] * **C-TCN:** Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li.
"Deep Concept-wise Temporal Convolutional Networks for Action Localization." ArXiv 1908.09442. [[paper](https://arxiv.org/pdf/1908.09442.pdf)] [[code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleVideo/models/ctcn/README.md)] * **PGC-TAL:** Rui Su, Dong Xu, Lu Sheng, Wangli Ouyang.
"PCG-TAL: Progressive Cross-granularity Cooperation for Temporal Action Localization." TIP 2020. [[paper](https://ieeexplore.ieee.org/document/9298475)]] * **BMN:** Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.
"BMN : Boundary-Matching Network for Temporal Action Proposal Generation." ICCV (2019). [[paper](https://arxiv.org/pdf/1907.09702.pdf)] * **PGCN:** Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan.
"Graph Convolutional Networks for Temporal Action Localization." ICCV (2019). [[paper](https://arxiv.org/pdf/1909.03252.pdf)] [[code](https://github.com/Alvin-Zeng/PGCN)] * **GTAN:** Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei.
"Gaussian Temporal Awareness Networks for Action Localization." CVPR (2019 **oral**). [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localization_CVPR_2019_paper.pdf)] * **AGCN:** Jun Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song.
"Graph Attention based Proposal 3D ConvNets for Action Detection." AAAI (2020). [[paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf)] * **MGG:** Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang.
"Multi-granularity Generator for Temporal Action Proposal." CVPR (2019). [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Multi-Granularity_Generator_for_Temporal_Action_Proposal_CVPR_2019_paper.pdf)] * **D-SSAD:** Yupan Huang, Qi Dai, Yutong Lu.
"Decoupling Localization and Classification in Single Shot Temporal Action Detection." ICME (2019). [[paper](https://arxiv.org/pdf/1904.07442.pdf)] [[code](https://github.com/HYPJUDY/Decouple-SSAD)] * **TAL-Net:** Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar.
"Rethinking the Faster R-CNN Architecture for Temporal Action Localization." CVPR (2018). [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Rethinking_the_Faster_CVPR_2018_paper.pdf)] * **FRTS:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras.
"Exploring Feature Representation and Training strategies in Temporal Action Localization." ICIP (2019). [[paper](https://arxiv.org/pdf/1905.10608.pdf)] * **BSN:** Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.
"BSN: Boundary Sensitive Network for Temporal Action Proposal Generation." ECCV (2018). [[paper](https://arxiv.org/pdf/1806.02964.pdf)] [[code](https://github.com/wzmsltw/BSN-boundary-sensitive-network)] * **FSN:** Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou.
"Exploring frame segmentation networks for temporal action localization." ECCV (2018). [[paper](https://arxiv.org/pdf/1902.05488.pdf)] * **CBR:** Jiyang Gao, Zhenheng Yang, Ram Nevatia.
"Cascaded Boundary Regression for Temporal Action Detection." BMVC (2017). [[paper](https://arxiv.org/pdf/1705.01180.pdf)] [[code](https://github.com/jiyanggao/CBR)] * **ASSA*:** Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem.
"Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization." ECCV (2018). [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Humam_Alwassel_Action_Search_Spotting_ECCV_2018_paper.pdf)] * **CTAP:** Jiyang Gao, Kan Chen, Ram Nevatia.
"CTAP: Complementary Temporal Action Proposal Generation." ECCV (2018). [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiyang_Gao_CTAP_Complementary_Temporal_ECCV_2018_paper.pdf)] [[code](https://github.com/jiyanggao/CTAP)] * **SS-TAD:** Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles.
"End-to-end, single-stream temporal action detection in untrimmed videos." BMVC (2017). [[paper](http://vision.stanford.edu/pdf/buch2017bmvc.pdf)] [[code](https://github.com/shyamal-b/ss-tad)] * **SSN:** Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin.
"Temporal Action Detection with Structured Segment Networks." ICCV (2017). [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhao_Temporal_Action_Detection_ICCV_2017_paper.pdf)] [[code](https://github.com/yjxiong/action-detection)] * **R-C3D:** Huijuan Xu, Abir Das, Kate Saenko.
"R-C3D: Region Convolutional 3D Network for Temporal Activity Detection." ICCV (2017). [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Xu_R-C3D_Region_Convolutional_ICCV_2017_paper.pdf)] [[code](https://github.com/VisionLearningGroup/R-C3D)] * **TAG:** Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang.
"A Pursuit of Temporal Accuracy in General Activity Detection." arXiv (1703). [[paper](https://arxiv.org/pdf/1703.02716.pdf)] * **TPC:** Ke Yang, Peng Qiao, Dongsheng Li, Shaohe Lv, Yong Dou.
"Exploring Temporal Preservation Networks for Precise Temporal Action Localization." AAAI (2018). [[paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16164/16347)] * **TURN:** Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia.
"TURN TAP : Temporal Unit Regression Network for Temporal Action Proposals." ICCV (2017). [[paper](https://arxiv.org/pdf/1703.06189.pdf)] [[code](https://github.com/jiyanggao/TURN-TAP)] * **SSAD:** Tianwei Lin, Xu Zhao, Zheng Shou.
"Single shot temporal action detection." ACM MM (2017). [[paper](https://arxiv.org/pdf/1710.06236.pdf)] * **CDC:** Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.
"CDC Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos." CVPR (2017). [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shou_CDC_Convolutional-De-Convolutional_Networks_CVPR_2017_paper.pdf)] [[code](https://github.com/ColumbiaDVMM/CDC)] [[project](http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/)] * **SST:** Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles.
"Single-stream temporal action proposals." CVPR (2017). [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Buch_SST_Single-Stream_Temporal_CVPR_2017_paper.pdf)] [[code](https://github.com/shyamal-b/sst)] * **SCC:** Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem.
"SCC: Semantic context cascade for efficient action detection." CVPR (2017). [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf)] [[project](https://ivul.kaust.edu.sa/Pages/pub-scc-efficient-action-detection.aspx)] * **S-CNN:** Zheng Shou, Dongang Wang, Shih-Fu Chang.
"Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR (2016). [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shou_Temporal_Action_Localization_CVPR_2016_paper.pdf)] [[code](https://github.com/zhengshou/scnn)] * **PSDF*:** Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim.
"Temporal Action Localization with Pyramid of Score Distribution Features." CVPR (2016). [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf)] * **SMS*:** Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng.
"Temporal Action Localization by Structured Maximal Sums." CVPR (2017). [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yuan_Temporal_Action_Localization_CVPR_2017_paper.pdf)] * **ADFG*:** Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei.
"End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR (2016). [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yeung_End-To-End_Learning_of_CVPR_2016_paper.pdf)] [[code](https://github.com/syyeung/frameglimpses)] * **TCN:** Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen.
"Temporal Context Network for Activity Localization in Videos." ICCV (2017). [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Temporal_Context_Network_ICCV_2017_paper.pdf)] [[code](https://github.com/vdavid70619/TCN)] * **DAP:** Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem.
"DAPs: Deep Action Proposals for Action Understanding." ECCV (2016). [[paper](https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf)] [[code](https://github.com/escorciav/daps)] * **G-TAD**Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem Visual Computing Center.
"G-TAD: Sub-Graph Localization for Temporal Action Detection" ArXiv(2019) [[paper](https://arxiv.org/pdf/1911.11462.pdf)] * **PTAL-ETP**Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He.
"Precise Temporal Action Localization by Evolving Temporal Proposals"ArXiv(2019) [[paper](https://arxiv.org/pdf/1804.04803.pdf)] * **CTR_AL**Peisen Zhao1, Lingxi Xie2, Chen Ju1, Ya Zhang1, Qi Tian.
"Constraining Temporal Relationship for Action Localization"ArXiv(2019) [[paper](https://arxiv.org/pdf/2002.07358.pdf)] * **SRG**Hyunjun Eun, Sumin Lee, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim.
"SRG: Snippet Relatedness-based Temporal Action Proposal Generator"Arxiv(2019) [[paper](https://arxiv.org/pdf/1911.11306.pdf)] * **LS-TD**Yuan Zhou, Hongru Li, Sun-Yuan Kung, Life Fellow.
"Temporal Action Localization using Long Short-Term Dependency"Arxiv(2019) [[paper](https://arxiv.org/pdf/1911.01060.pdf)] * **IDU** Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick.
"Learning to Discriminate Information for Online Action Detection" CVPR(2020) [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Eun_Learning_to_Discriminate_Information_for_Online_Action_Detection_CVPR_2020_paper.pdf)]