[
  {
    "path": "Other_Settings.md",
    "content": "# Action Localization Benchmarks\n\n### Learning from action count supervision\n* **OCL:**Julien Schroeter, Kirill Sidorov, David Marshall.<br />\n  \"Weakly-Supervised Temporal Localization via Occurrence Count Learning\" ICML 2019.\n  [[paper](https://arxiv.org/pdf/1905.07293.pdf)]\n  [[code](https://github.com/SchroeterJulien/ICML-2019-Weakly-Supervised-Temporal-Localization-via-Occurrence-Count-Learning)]\n\n### Action Segment, Transformer\n\n* Action Modifiers :Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas,  Dima Damen.<br />\n  \"Action Modifiers: Learning from Adverbs in Instructional Videos.\" (CVPR  2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Doughty_Action_Modifiers_Learning_From_Adverbs_in_Instructional_Videos_CVPR_2020_paper.pdf)]\n\n### Action Segment, Self-Supervised Learning\n\n* Action Segmentation : Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira.<br />\n  \"Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation.\" (CVPR  2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Action_Segmentation_With_Joint_Self-Supervised_Temporal_Domain_Adaptation_CVPR_2020_paper.pdf)]\n\n### Action Segment, Transformer\n\n* SCT : Mohsen Fayyaz,Juergen Gall.<br />\n  \"SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation.\" (CVPR  2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fayyaz_SCT_Set_Constrained_Temporal_Transformer_for_Set_Supervised_Action_Segmentation_CVPR_2020_paper.pdf)]\n  [[code](https://github.com/MohsenFayyaz89/SCT/)]\n\n\n### Unintentional Action\n\nA new task. Interesting.\n\n* Oops : Dave Epstein Boyuan Chen Carl Vondrick.<br />\n  \"Oops! Predicting Unintentional Action in Video.\" (CVPR  2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Epstein_Oops_Predicting_Unintentional_Action_in_Video_CVPR_2020_paper.pdf)]\n  [[project](https://oops.cs.columbia.edu/)]\n\n### ActionBytes\n\nLearning from trimmed videos.\n* ActoinBytes: Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />\n  \"ActionBytes: Learning from Trimmed Videos to Localize Actions.\" CVPR  (2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]\n\n### METAL\n\nActivityNet v1.2, mAP@0.5: 41.9 [1-shot], 45.0 [5-shot]\n\nTHUMOS14, mAP@0.5: 14.3 [1-shot], 16.2 [5-shot]\n\n* METAL : Da Zhang, Xiyang Dai, and Yuan-Fang Wang.<br />\n  \"METAL : Minimum Effort Temporal Activity Localization in Untrimmed Videos.\" CVPR  (2020). \n  [[paper](https://sites.cs.ucsb.edu/~yfwang/papers/cvpr2020.pdf)]\n\n### Hierarchical Action Search\n\nA new task. A learned space where videos are positioned in entailment cones formed by different subtrees.\n\n* Uncertain :Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek.<br />\n  \"Searching for Actions on the Hyperbole.\" CVPR  (2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Long_Searching_for_Actions_on_the_Hyperbole_CVPR_2020_paper.pdf)]\n\n\n### Fine-grained Action Recognition and Localization\n\nA new task. Hierarchical annotation for recognition and localization.\n\n* FineGym :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />\n  \"FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding.\" (CVPR  2020, oral, 3 strong accepts). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)]\n  [[project](https://sdolivia.github.io/FineGym/)]\n\n\n### Mining undefined sub-actions\n\nA new task. Find the common sub-actions from multiple videos.\n\n* TAPOs :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />\n  \"Intra- and Inter-Action Understanding via Temporal Action Parsing.\" (CVPR  2020). \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_Intra-_and_Inter-Action_Understanding_via_Temporal_Action_Parsing_CVPR_2020_paper.pdf)]\n  [[project](https://sdolivia.github.io/TAPOS/)]\n"
  },
  {
    "path": "README.md",
    "content": "# Action Localization Benchmarks\nPapers and Results of Temporal Action Localization\n\n**Weakly Supervised Performance on THUMOS'14 dataset.**\n\n- The detectors are sorted by the mAP with threshold 0.5.\n- \"c\" indicates whether release code, yes (Y) or no (N).\n- \"e\" indicates the evaluation code, THUMOS (T), ActivityNet (A) or implemented by themselves.\n\n\n|  Detector   |   Pub  |c|e| 0.1 | 0.2 | 0.3 |0.4  | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |avg  |   info   |\n| :---------: |:------:|-|-|:---:|:----|:----|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |\n| D2-Net  | arXiv-20-12-11 |N|A|65.6 |60.0 |52.1 |43.3 |35.9 | - | - | - | - | -   | The same author with 3C-Net |\n| Lee et al  | AAAI21 |Y|A|67.5 |61.2 |52.3 |43.4 |33.7 | 22.9 | 12.1 | - | - | -   | The same author with BaS-Net |\n| HAM-Net  | AAAI21 |N|A|65.9 |59.6 |52.2 |43.1 |32.6 | 21.9 | 12.5 | - | - | -   | - |\n| ACSNet  | AAAI21 |N|A| - | - |51.4 |42.7 |32.4 | 22.0 | 11.7 | - | - | -   | - |\n| EM-MIL  | ECCV20 |N|A|59.1 |52.7 |45.5 |36.8 |30.5 | 22.7 | 16.4 | - | - | -   | Use existing classifiation results |\n| A2CL-PT  | ECCV20 |Y|A| 61.2 | 56.1 |48.1 | 39.0 |30.1 |19.2 |10.6 | 4.8 | 1.0 | 30.0 | Report unsupervised performance as well |\n| ACL  | CVPR20 |N|A| - | - |46.9 |38.9 |30.1 |19.8 |10.4 | -  | -  | -   | Report unsupervised performance as well |\n| Liu et al.  | AAAI21 |N|A| 61.7 | 58.0 |50.8 | 41.7 |29.6 |20.1 |10.7 | 4.3 | 0.5 | -   | - |\n| WSTAL       | WACV20  |-| |62.3 |-    |46.8 |-    |29.6 |-    |9.7  |-    |-    |-    |    -     |\n| ActionBytes  | CVPR20 |N|A| - | - |43.0 |37.5 |29.0 | - |9.5 | -  | -  | -   | - |\n| DGAM  | CVPR20 |Y|A| 60.0|54.2 |46.8 |38.2 |28.8 |19.8 |11.4 |3.6  |0.4  | -   |    -     |\n| TSCN  | ECCV20 |N|A| 63.4|57.6 |47.8 |37.7 |28.7 |19.4 |10.2 |3.9  |0.7  | -   |    -     |\n| BaSNet-I3D  | AAAI20 |Y|A| 58.2|52.3 |44.6 |36.0 |27.0 |18.6 |10.4 |3.9  |0.5  | -   |    -     |\n| BaSNet-UNT  | AAAI20 |Y|A| 56.2|50.3 |42.8 |34.7 |25.1 |17.1 |9.3  |3.7  |0.5  | -   |    -     |\n|   WSBM      | ICCV19 |N|A| 60.4|56.0 |46.6 |37.5 |26.8 |17.6 |9.0  |3.3  |0.4  | -   |    -     |\n|   3C-Net    | ICCV19 |Y|I| 59.1|53.5 |44.2 |34.1 |26.6 |  -  |8.1  |  -  | -   | -   |    -     |\n|    ASSG     | ACM 19 |N| | 65.6|59.4 |50.4 |38.7 |25.4 |15.0 |6.6  |  -  |  -  | -   |    -     |\n|    TSM      | ICCV19 |N|T|  -  | -   |39.5 |31.9 |24.5 |13.8 |7.1  |  -  |  -  |23.4 |    -     |\n|  CleanNet   | ICCV19 |N|T|  -  | -   |37.0 |30.9 |23.9 |13.9 |7.1  |  -  |  -  | -   |    -     |\n|  CMCS-I3D   | CVPR19 |Y|T| 57.4|50.8 |41.2 |32.1 |23.1 |15.0 |7.0  |  -  | -   | -   |report avg-mAP|\n|  CMCS-UNT   | CVPR19 |Y|T| 53.5|46.8 |37.5 |29.1 |19.9 |12.3 |6.0  |  -  | -   | -   |    -     |\n|  STARNet    | AAAI19 |N|A|68.8 |60.0 |48.7 |34.7 |23.0 |-    |-    |  -  |  -  |-    |    -     |\n|  W-TALC     | ECCV18 |Y|I|55.2 |49.6 |40.1 |31.1 |22.8 |-    |-    |  -  |  -  |7.6  |    -     |\n|  AutoLoc    | ECCV18 |Y|T|-    |-    |35.8 |29.0 |21.2 |13.4 |5.8  |  -  |  -  |-    |    -     |\n|  MAAN       | ICLR19 |Y|A|59.8 |50.8 |41.1 |30.6 |20.3 |12.0 |6.9  |2.6  |0.2  |24.9 |    -     |\n|  LTSR       | AAAI19 |N|T|55.9 |46.9 |38.3 |28.1 |18.6 |11.0 |5.59 |2.19 |0.29 |-    |    -     |\n| WSGN        | WACV20 |-|T|51.1 |44.4 |34.9 |26.3 |18.1 |11.6 |6.5  |-    |-    |-    |    -     |\n|  STPN       | CVPR18 |I|A|52.0 |44.7 |35.5 |25.8 |16.9 |9.9  |4.3  |1.2  |0.1  |-    |    -     |\n|  CPMN       | ACCV18 |N|T|47.1 |41.6 |32.8 |24.7 |16.1 |10.1 |5.5  |-    |-    |-    |    -     |\n|  S-O-C      | ACM18  |N|T|45.8 |39.0 |31.1 |22.5 |15.9 |-    |-    |-    |-    |-    |    -     |\n|UntrimmedNets| CVPR17 |Y|T|44.4 |37.7 |28.2 |21.1 |13.7 |-    |-    |-    |-    |-    |    -     |\n|  H&S        | ICCV17 |Y|T|36.44|27.84|19.49|12.66|6.84 |-    |-    |-    |-    |-    |    -     |\n|LPAT-I3D+TEM | arXiv  |-| |-    |-    |46.9 |37.4 |28.0 |16.6 |9.2  |-    |-    |27.6 |    -     |\n| LPAT-I3D    | arXiv  |-| |-    |-    |46.7 |37.5 |27.9 |17.6 |9.2  |-    |-    |27.6 |    -     |\n| LPAT-U      | arXiv  |-| |-    |-    |39.9 |31.5 |22.6 |14.2 |7.9  |-    |-    |27.6 |    -     |\n|RefineLoc-I3D| arXiv  |-|T|-    |-    |40.8 |-    |23.1 |-    |5.3  |-    |-    |-    |    -     |\n|RefineLoc-TSN| arXiv  |-|T|-    |-    |36.1 |-    |22.6 |-    |5.8  |-    |-    |-    |    -     |\n\n\n**Weakly Supervised Performance on ActivityNet v1.2 dataset.**\n\n|  Detector   |  Pub   |c| 0.5 | 0.55|0.60 | 0.65|0.70 |0.75 | 0.80|0.85 |0.90 |0.95 | avg |test |   info   |\n| :---------: |:------:|-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |\n| D2-Net | arXiv-20-12-11 |N|42.3 |   - | -   |  -  | -   |25.5 |  -  | -   | -   | 5.8 | 26.0|   - |     -    |\n| ACSNet | AAAI21 |N|40.1 |   - | -   |  -  | -   |26.1 |  -  | -   | -   | 6.8 | 26.0|   - |     -    |\n| Lee et al | AAAI21 |N|41.2 |   - | -   |  -  | -   |25.6 |  -  | -   | -   | 6.0 | 25.9|   - |     -    |\n| Liu at al.        | AAAI21 |N|39.2 |-    |-    |-    | -   |25.6 |-    |-    |-    | 6.8 | 25.5|   - |     -    |\n| HAM-Net | AAAI21 |N|41.0 |   - | -   |  -  | -   |24.8 |  -  | -   | -   | 5.3 | 25.1|   - |     -    |\n| BaSNet  | AAAI20 |Y|38.5 |   - | -   |  -  | -   |24.2 |  -  | -   | -   | 5.6 | 24.3|   - |     -    |\n| TSCN        | ECCV20 |N|37.6 |-    |-    |-    | -   |23.7 |-    |-    |-    | 5.7 | 23.6|   - |     -    |\n| CMCS        | CVPR19 |Y|36.8 |-    |-    |-    | -   |22.0 |-    |-    |-    | 5.6 | 22.4|   - |     -    |\n| 3C-Net      | ICCV19 |Y|35.4 |   - | -   |  -  | -   |22.9 |  -  | -   | -   | 8.5 | 21.1|   - |     -    |\n| TSM         | ICCV19 |N|28.3 |26.0 |23.6 |21.2 | 18.9|17.0 |14.0 |11.1 |7.5  | 3.5 | -   |   - |     -    |\n| CleanNet    | ICCV19 |N|37.1 |33.4 |29.9 |26.7 | 23.4|20.3 |17.2 |13.9 |9.2  | 5.0 | 21.6|   - |     -    |\n| EM-MIL      | ECCV20 |N|37.4 |-    |-    |-    | 23.1|-    |-    |-    |2.0  |  -  | 20.3|   - |     -    |\n| W-TALC      | ECCV18 |Y|37.0 |-    |-    |-    | 14.6|-    |-    |-    |-    | -   | 18.0|   - |     -    |\n| AutoLoc     | ECCV18 |Y|27.3 |24.9 |22.5 |19.9 | 17.5|15.1 |13.0 |10.0 |6.8  | 3.3 | 16.0| -   |     -    |\n|RefineLoc-I3D| arXiv  |-|38.7 |-    |-    |-    | -   |22.6 |-    |-    |-    | 5.5 | 23.2| -   |     -    |\n|RefineLoc-TSN| arXiv  |-|38.8 |-    |-    |-    | -   |22.2 |-    |-    |-    | 5.3 | 23.2| -   |     -    |\n| LPAT        | arXiv  |-|37.6 |34.6 |31.6 |28.7 | 25.6|22.6 |19.6 |15.3 |10.9 | 4.9 | 23.1| -   |     -    |\n|   WSTAL     | arXiv  |-|35.2 |-    |-    |-    | 16.3|-    |-    |-    |-    | -   | -   | -   |     -    |\n\n\n\n**Weakly Supervised Performance on ActivityNet v1.3 dataset.**\n|  Detector   |  Pub   |c| 0.5 |0.75 |0.95 |avg  |\n| :---------: |:------:|-|:---:|:---:|:---:|:---:|\n|   ACSNet    | AAAI21 |N|36.3 |24.2 |5.8  |23.9 |\n|   Lee et al.| AAAI21 |Y|37.0 |23.9 |5.7  |23.7 |\n|   Liu et al.| AAAI21 |N|35.1 |23.7 |5.6  |23.2 |\n| A2CL-PT     | ECCV20 |Y|36.8 |22.0 |5.2  |22.5 |\n| BaSNet-I3D  | AAAI20 |Y|34.5 |22.5 |4.9  |22.2 |\n| TSCN        | ECCV20 |N|35.3 |21.4 |5.3  |21.7 |\n| WSBM        | ICCV19 |N|36.4 |19.2 | 2.9 |-    |\n| ASSG        | ACM 19 |N|32.3 |20.1 | 4.0 |-    |\n| TSM         | ICCV19 |N|30.0 |19.0 | 4.5 |-    |\n| CMCS        | CVPR19 |Y|34.0 |20.9 | 5.7 |21.2 |\n| STARNet     | AAAI19 |N|31.1 |18.8 | 4.7 |-    |\n| MAAN        | ICLR19 |Y|33.7 |21.9 | 5.5 |-    |\n| LTSR        | AAAI19 |N|33.1 |18.7 |3.32 |21.78|\n| STPN        | CVPR18 |I|29.3 |16.9 |2.6  |-    |\n| CPMN        | ACCV18 |N|39.29|24.09|6.71 |24.42|\n| S-O-C       | ACM18  |N|27.3 |14.7 |2.9  |15.6 |\n\n\n### Weakly Supervised Temporal Action Localization\n* **D2-Net:** Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao.<br />\n  \"D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations\" arXiv:2012.06440.\n  [[paper](https://arxiv.org/pdf/2012.06440.pdf)]\n* **Lee et al:** Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun.<br />\n  \"Weakly-supervised Temporal Action Localization by Uncertainty Modeling\" AAAI 2021.\n  [[paper](https://arxiv.org/pdf/2006.07006.pdf)]\n  [[code](https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling)]\n* **HAM-Net:** Ashraful Islam, Chengjiang Long , Richard J. Radke.<br />\n  \"A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization\" AAAI 2021.\n  [[paper](https://arxiv.org/pdf/2101.00545.pdf)]\n  [[code](https://github.com/asrafulashiq/hamnet)]\n* **Liu et al.:** Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />\n  \"ACSNet : Action-Context Separation Network for Weakly Supervised Temporal Action Localization\" AAAI 2021.\n  [[paper](http://gr.xjtu.edu.cn/web/lewang)]\n* **Liu at al.:** Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />\n  \"Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context\" AAAI 2021.\n  [[paper](http://gr.xjtu.edu.cn/web/lewang)]\n* **EM-MIL:** Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu.<br />\n  \"Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning\" ECCV 2020.\n  [[paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740715.pdf)]\n* **A2CL-PT:** Kyle Min, Jason J. Corso.<br />\n  \"Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization.\" ECCV 2020. \n  [[paper](https://link.springer.com/chapter/10.1007%2F978-3-030-58568-6_17)]\n  [[code](https://github.com/MichiganCOG/A2CL-PT)]\n* **ACL:** Guoqiang Gong, Xinghan Wang, Yadong Mu, Qi Tian.<br />\n  \"Learning Temporal Co-Attention Models for Unsupervised Video Action Localization.\" CVPR 2020, oral. \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.pdf)]\n* **WSTAL**Ashraful Islam, Richard J. Radke.<br />\n  \"Weakly Supervised Temporal Action Localization Using Deep Metric Learning\" WACV 2020.\n  [[paper](https://arxiv.org/pdf/2001.07793.pdf)]\n  [[code](https://github.com/asrafulashiq/wsad)]\n* **ActoinBytes:** Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />\n  \"ActionBytes: Learning from Trimmed Videos to Localize Actions.\" CVPR 2020. \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]\n* **DGAM:** Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang.<br />\n  \"Weakly-Supervised Action Localization by Generative Attention Modeling.\" CVPR 2020. \n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf)]\n  [[code](https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization)]\n* **TSCN:** Zhai, Yuanhao and Wang, Le and Tang, Wei and Zhang, Qilin and Yuan, Junsong and Hua, Gang.<br />\n  \"Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization.\" ECCV 2020. \n  [[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf)]\n* **BaSNet:** Pilhyeon Lee, Youngjung Uh, Hyeran Byun.<br />\n  \"Background Suppression Networks for Weakly-supervised Temporal Action Localization.\" AAAI 2020. \n  [[paper](https://arxiv.org/pdf/1911.09963.pdf)]\n  [[code](https://github.com/Pilhyeon/BaSNet-pytorch)]\n* **3C-Net:** Sanath Narayan, Hisham Cholakkal, Fahad Shabaz Khan, Ling Shao.<br />\n  \"3C-Net : Category Count and Center Loss for Weakly-Supervised Action Localization.\" ICCV 2019. \n  [[paper](https://arxiv.org/pdf/1908.08216.pdf)]\n  [[code](https://github.com/naraysa/3c-net)]\n* **CMCS:** Daochang Liu, Tingting Jiang, Yizhou Wang.<br />\n  \"Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization.\" CVPR 2019. \n  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.pdf)]\n  [[code](https://github.com/Finspire13/CMCS-Temporal-Action-Localization)]\n* **ASSG:** Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang, Pu Fei Wu, Futai Zou.<br />\n  \"Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization\" ACM MM 2019. \n  [[paper](https://arxiv.org/pdf/1908.02422.pdf)]\n* **AutoLoc:**Zheng Shou, Hang Gao, Lei Zhang, KazuyukiMiyazawa, Shih-Fu Chang.<br />\n  \"AutoLoc Weakly-supervised Temporal Action Localization in Untrimmed Videos\"ECCV 2018.\n  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Zheng_Shou_AutoLoc_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]\n  [[code](https://github.com/zhengshou/AutoLoc)]\n* **CPMN:**Haisheng Su, Xu Zhao, Tianwei Lin.<br />\n  \"Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization\"ACCV 2018.\n  [[paper](https://arxiv.org/pdf/1810.11794.pdf)]\n* **H&S:**Krishna Kumar Singh, Yong Jae Lee.<br />\n  \"Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization\"ICCV 2017.\n  [[paper](https://arxiv.org/pdf/1704.04232.pdf)]\n  [[code](https://github.com/goddoe/hide-and-seek)]\n* **LTSR:**Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan.<br />\n  \"Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision\"AAAI 2019.\n  [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4958/4831)]\n* **WSGN:**Basura Fernando, Cheston Tan Yin Chet.<br />\n  \"Weakly Supervised Gaussian Networks for Action Detection\" WACV(2020)\n* **MAAN:**Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung.<br />\n  \"MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING\"ICLR 2019.\n  [[paper](https://arxiv.org/pdf/1905.08586.pdf)]\n  [[code](https://github.com/yyuanad/MAAN)]\n* **S-O-C:**Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li.<br />\n  \" Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector\"ACM MM 2018.\n  [[paper](https://arxiv.org/pdf/1807.02929.pdf)]\n* **STARNet:**Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu.<br />\n  \"Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection\"AAAI 2019.\n  [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4939/4812)]\n* **TSM:**Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan.<br />\n  \"Temporal Structure Mining for Weakly Supervised Action Detection\"ICCV(2019).\n  [[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf)]\n* **UntrimmedNets:**Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool.<br />\n  \"UntrimmedNets for Weakly Supervised Action Recognition and Detection\"CVPR 2017.\n  [[paper](https://wanglimin.github.io/papers/WangXLV_CVPR17.pdf)]\n  [[code](https://github.com/wanglimin/UntrimmedNet)]\n* **WSBM:**Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes.<br />\n  \"Weakly-supervised Action Localization with Background Modeling\"ICCV 2019.\n  [[paper](https://arxiv.org/pdf/1908.06552.pdf)]\n* **CleanNet:**Ziyi Liu, Le Wang1∗ Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua.<br />\n  \"Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks\"ICCV 2019.\n  [[paper](https://qilin-zhang.github.io/_pages/pdfs/Weakly_Supervised_Temporal_Action_Localization_through_Contrast_based_Evaluation_Networks.pdf)]\n* **STPN**Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han.<br />\n  \"Weakly Supervised Action Localization by Sparse Temporal Pooling Network\" CVPR 2018.\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Weakly_Supervised_Action_CVPR_2018_paper.pdf)]\n  [[code](https://github.com/bellos1203/STPN)]\n* **W-TALC**Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury.<br />\n  \"W-TALC: Weakly-supervised Temporal Activity Localization and Classification\" ECCV 2018.\n  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Sujoy_Paul_W-TALC_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]\n  [[code](https://github.com/sujoyp/wtalc-pytorch)]\n* **LPAT**Xudong, Lin Zheng, Shou Shih-Fu Chang.<br />\n  \"LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization\" arXiv 2019.\n  [[paper](https://arxiv.org/pdf/1910.11285.pdf)]\n* **RefineLoc:**Humam Alwassel1, Alejandro Pardo1, Fabian Caba Heilbron,  Ali Thabet1 Bernard Ghanem1.<br />\n  \"RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization\"Arxiv(2019)\n  [[paper](https://arxiv.org/pdf/1904.00227.pdf)]\n  [[paper](https://basurafernando.github.io/papers/wacv2020_wsgn.pdf)]\n\n\n### Expecting for paper\n* **lvr:** Xingyu Liu, Joon-Young Lee, Hailin Jin.<br />\n  \"Learning Video Representations from Correspondence Proposals.\" CVPR 2019 **oral**.\n\n\n## Dataset\n* **THUMOS'14:** Yu-Gang Jiang, Jingen Liu, Amir R. Zamir, George Toderici.<br />\n  \"THUMOS Challenge 2014\" \n  [[project](https://www.crcv.ucf.edu/THUMOS14/home.html)]\n\n* **Activity:** Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia.<br />\n  \"A Large-Scale Video Benchmark for Human Activity Understanding\" \n  [[project](http://activity-net.org/index.html)]\n  \n* **THUMOS'15:** Alexander Gorban, Haroon Idrees, Yu-Gang Jiang, Amir R. Zamir.<br />\n  \"THUMOS Challenge 2015\" \n  [[project](http://www.thumos.info/)]\n\n* **COIN:** Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou.<br />\n  \"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis.\" CVPR 2019. \n  [[paper](https://arxiv.org/pdf/1903.02874.pdf)]\n  [[project](https://coin-dataset.github.io/)]\n"
  },
  {
    "path": "Spatiotemporal.md",
    "content": "# Spatio-Temporal Action Detection\r\nPapers and Results for Spatio-Temporal Action Detection\r\n\r\n## Spatio-Temporal Action Localization\r\n\r\n**Performance on AVA v2.1 dataset.**\r\n\r\n- Metric: mAP with threshold 0.5.\r\n\r\n|  Detector   |   val   |   test    |\r\n| :---------: | :-----: | :-------: |\r\n|    LFB      |\t27.70 \t|\t27.20 \t|\r\n|   VATN*     |\t25.00 \t|\t24.93 \t|\r\n|   SMAD*     |\t22.20 \t|\t  -  \t|\r\n|   STEP      |\t18.60 \t|\t  -  \t|\r\n|   ACRN      |\t17.40 \t|\t  -  \t|\r\n|    AVA      |\t15.80 \t|\t  -  \t|\r\n|  SlowFast   |\t27.30 \t|\t27.10\t|\r\n\r\n## Spatio-temporal Action Detection\r\n\r\n* **PSCS:** Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu.<br />\r\n  \"Improving Action Localization by Progressive Cross-stream Cooperation.\" CVPR (2019).\r\n  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Su_Improving_Action_Localization_by_Progressive_Cross-Stream_Cooperation_CVPR_2019_paper.pdf)]\r\n\r\n* **SlowFast:** Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He.<br />\r\n  \"SlowFast Networks for Video Recognition.\" arXiv (1812). \r\n  [[paper](https://arxiv.org/pdf/1812.05038.pdf)]\r\n  [[unofficial_code](https://github.com/r1ch88/SlowFastNetworks)]\r\n  [[unofficial_code](https://github.com/Guocode/SlowFast-Networks)]\r\n\r\n* **DwF:** Jiaojiao Zhao, Cees G.M. Snoek.<br />\r\n  \"Dance with Flow: Two-in-One Stream Action Detection.\" CVPR (2019). \r\n  [[paper](https://arxiv.org/pdf/1904.00696.pdf)]\r\n\r\n* **STEP:** Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz.<br />\r\n  \"STEP: Spatio-Temporal Progressive Learning for Video Action Detection.\" CVPR (2019 **oral**). \r\n  [[paper](https://arxiv.org/pdf/1904.09288.pdf)]\r\n\r\n* **LFB:** Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick.<br />\r\n  \"Long-Term Feature Banks for Detailed Video Understanding.\" CVPR (2019). \r\n  [[paper](https://arxiv.org/pdf/1812.05038.pdf)]\r\n  [[project](https://github.com/facebookresearch/video-long-term-feature-banks)]\r\n\r\n* **VATN*:** Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman. <br />\r\n  \"Video Action Transformer Network.\" CVPR (2019 **oral**). \r\n  [[paper](https://arxiv.org/pdf/1812.02707.pdf)]\r\n  [[project](https://rohitgirdhar.github.io/ActionTransformer/)]\r\n\r\n* **LAEO:** Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman. <br />\r\n  \"LAEO-Net: revisiting people Looking At Each Other in videos.\" CVPR (2019). \r\n  [[paper](http://www.robots.ox.ac.uk/~vgg/research/laeonet/cvpr2019LAEO.pdf)]\r\n  [[code](https://github.com/AVAuco/laeonet/)]\r\n  [[project](http://www.robots.ox.ac.uk/~vgg/research/laeonet/)]\r\n\r\n* **SMAD*:** Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid. <br />\r\n  \"A Structured Model For Action Detection.\" CVPR (2019). \r\n  [[paper](https://arxiv.org/pdf/1812.03544.pdf)]\r\n\r\n* **TACNet:** Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun. <br />\r\n  \"TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection.\" CVPR (2019). \r\n  [[paper](http://www.skicyyu.org/Paper/CVPR2019_TACNET.pdf)]\r\n\r\n* **AVA:** Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. <br />\r\n  \"AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions.\" CVPR (2018). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gu_AVA_A_Video_CVPR_2018_paper.pdf)]\r\n  [[project](https://research.google.com/ava/)]\r\n\r\n* **ACRN:** Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid. <br />\r\n  \"Actor-Centric Relation Network.\" ECCV (2018). \r\n  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.pdf)]\r\n\r\n* **T-CNN:** Rui Hou, Chen Chen, Mubarak Shah. <br />\r\n  \"Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos.\" ICCV (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hou_Tube_Convolutional_Neural_ICCV_2017_paper.pdf)]\r\n  [[code](https://www.crcv.ucf.edu/projects/TCNN/#Code)]\r\n  [[project](https://www.crcv.ucf.edu/projects/TCNN/)]\r\n\r\n\r\n## Dataset\r\n* **YouTube-8M-Segments:** Ke Chen, Julia Elliott, Nisarg Kothari, Hanhan Li, et.al.<br />\r\n  \"YouTube-8M Segments Dataset\" \r\n  [[project](https://research.google.com/youtube8m/)]\r\n  \r\n\r\n## Distinguished Researchers & Teams\r\n[WILLOW](https://www.di.ens.fr/willow/publications/YearOnly/publications.html)\r\n[Ivan Laptev](https://www.di.ens.fr/~laptev/#Publications)\r\n[Christoph Feichtenhofer](https://feichtenhofer.github.io/)\r\n"
  },
  {
    "path": "Supervised.md",
    "content": "# Action Detection Benchmarks\r\nPapers and Results of Temporal Action Localization\r\n\r\n## Temporal Action Localization\r\n\r\n**Performance on THUMOS'14 dataset.**\r\n\r\n- The detectors are ordered by the mAP with threshold 0.5.\r\n- `Deep Learning`: deep learning related method.\r\n\r\n|  Detector   |   0.1   |    0.2  |    0.3  |    0.4  |   0.5   |    0.6  |    0.7   |Deep Learning|Comment      |\r\n| :---------: | :-----: |:-------:|:-------:|:-------:|:-------:|:-------:| :------: | :--------:  | :--:      |\r\n|     TGM     |    -  \t|    -  \t|    -  \t|    -  \t|   53.5 \t|    -  \t|    -     |      Y      |     -     |\r\n|   C-TCN     |   72.2 \t|   71.4 \t|   68.0 \t|   62.3 \t|   52.1 \t|    -   \t|    -   \t |      Y      |     -     |\r\n|   RTD-Net     |    -   \t|    -  \t|   68.3 \t|   62.3  \t|   51.9 \t|   38.8   \t|    23.7 \t |      Y      |  based on P-GCN    |\r\n|   PGC-TAL     |   71.2  \t|   68.9 \t|   65.1 \t|   59.5 \t|   51.2 \t|    -   \t|    -   \t |      Y      |  based on P-GCN    |\r\n|    DPP      |   69.5 \t|   67.8 \t|   63.6 \t|   57.8 \t|   49.1 \t|    -   \t|    -   \t |      Y      |     -     |\r\n|    PGCN     |   69.5 \t|   67.8 \t|   63.6 \t|   57.8 \t|   49.1 \t|    -   \t|    -   \t |      Y      |     -     |\r\n|     BMN     |    -  \t|    -  \t|   56.0 \t|   47.4 \t|   38.8 \t|   29.7 \t|   20.5   |      Y      |     -     |\r\n|   D-SSAD    |    -  \t|    -  \t|   60.2 \t|   54.1 \t|   44.2 \t|   32.3 \t|   19.1   |      Y      |     -     |\r\n|   TAL-Net   |   59.8 \t|   57.1 \t|   53.2 \t|   48.5 \t|   42.8 \t|   33.8 \t|   20.8   |      Y      |     -     |\r\n|    FRTS     |    -  \t|    -  \t|   53.5 \t|   50.2 \t|   44.2 \t|   33.9 \t|   22.7   |      Y      |     -     |\r\n|    GTAN     |   69.1 \t|   63.7 \t|   57.8 \t|   47.2 \t|   38.8 \t|    -   \t|    -     |      Y      |     -     |\r\n|    AGCN     |   59.3 \t|   59.6 \t|   57.1 \t|   51.6 \t|   38.6 \t|    28.9 \t|   17.0   |      Y      |     -     |\r\n|    MGG      |    -  \t|    -  \t|   53.9 \t|   46.8 \t|   37.4 \t|   29.5 \t|   21.3   |      Y      |     -     |\r\n|    BSN      |    -  \t|    -  \t|   53.5 \t|   45.0 \t|   36.9 \t|   28.4 \t|   20.0   |      Y      |     -     |\r\n|    FSN      |    -  \t|    -  \t|   51.8 \t|   41.5 \t|   32.1 \t|   22.9 \t|   14.7   |      Y      |     -     |\r\n|    CBR      |   60.1 \t|   56.7 \t|   50.1 \t|   41.3 \t|   31.0 \t|   19.1 \t|   9.9    |      Y      |     -     |\r\n|   ASSA*     |    -  \t|    -  \t|   51.8 \t|   42.4 \t|   30.8 \t|   20.2 \t|   11.1   |      Y      |     -     |\r\n|   CTAP      |    -  \t|    -  \t|    -  \t|    -  \t|   29.9 \t|    -  \t|    -     |      Y      |     -     |\r\n|  SS-TAD     |    -  \t|    -  \t|   45.7 \t|    -  \t|   29.2 \t|    -  \t|   9.6    |      Y      |701 fps(GTX Titan X (Maxwell))|\r\n|    SSN      |   60.3 \t|   56.2 \t|   50.6 \t|   40.8 \t|   29.1 \t|    -  \t|    -     |      Y      |     -     |\r\n|   R-C3D     |   54.5 \t|   51.5 \t|   44.8 \t|   35.6 \t|   28.9 \t|    -  \t|    -     |      Y      |569 fps|\r\n|    TAG      |   64.1 \t|   57.7 \t|   48.7 \t|   39.8 \t|   28.2 \t|    -  \t|    -     |      Y      |     -     |\r\n|    TPC      |    -  \t|    -  \t|   44.1 \t|   37.1 \t|   28.2 \t|   20.6 \t|   12.7   |      Y      |250 fps (GTX Titan X)|\r\n|   TURN      |   54.0 \t|   50.9 \t|   44.1 \t|   34.9 \t|   25.6 \t|    -  \t|    -     |      Y      |129.4 fps (GTX Titan X)|\r\n|   SSAD      |   50.1 \t|   47.8 \t|   43.0 \t|   35.0 \t|   24.6 \t|    -  \t|    -     |      Y      |     -     |\r\n|    CDC      |    -  \t|    -  \t|   40.1 \t|   29.4 \t|   23.3 \t|   13.1 \t|   7.9    |      Y      |500 fps (GTX Titan X)|\r\n|    SST      |    -  \t|    -  \t|   37.8 \t|    -  \t|   23.0 \t|    -  \t|    -     |      Y      |308 fps Titan-X|\r\n|   S-CNN     |   47.7 \t|   43.5 \t|   36.3 \t|   28.7 \t|   19.0 \t|   10.3 \t|   5.3    |      Y      |60 fps(GeForce GTX 980)|\r\n|   PSDF*     |   51.4 \t|   42.6 \t|   33.6 \t|   26.1 \t|   18.8 \t|    -  \t|    -     |      Y      |     -     |\r\n|   SMS*      |   51.0 \t|   45.2 \t|   36.5 \t|   27.8 \t|   17.8 \t|    -  \t|    -     |      N      |     -     |\r\n|   ADFG*     |   48.9 \t|   44.0 \t|   36.0 \t|   26.4 \t|   17.1 \t|    -  \t|    -     |      Y      |     -     |\r\n|    TCN      |    -  \t|    -  \t|   33.3 \t|   25.6 \t|   15.9 \t|   9.0 \t|    -     |      Y      |     -     |\r\n|    DAP      |    -  \t|    -  \t|    -  \t|    -  \t|   13.9 \t|    -  \t|    -     |      Y      |134.1 fps Titan-X|\r\n|  G-TAD      |   -   \t|    -   \t|   54.5\t|    47.6\t|   40.2 \t|   30.8 \t|   23.4   |      Y      |     -     |\r\n|  PTAL-ETP   |   -   \t|    -   \t|   48.2\t|    42.4\t|   34.2 \t|   23.4 \t|   13.9   |      Y      |     -     |\r\n|  CTR_AL     |   -   \t|    -   \t|   53.9\t|    50.7\t|   45.4 \t|   38.0 \t|   28.5   |      Y      |     -     |\r\n| LS-TD       | arXiv  |-|63.5 |61.0 |56.7 |50.6 |42.6 |32.5 |21.4 |-    |-    |-    |    -     |\r\n| SRG         | arXiv  |-|-    |-    |54.5 |46.9 |39.1 |31.4 |22.2 |-    |-    |-    |    -     |\r\n| IDU         | arXiv  |-|-    |-    |- |- |- |- |- |-    |-    |Under a different metric    |    -     |\r\n\r\n\r\n**Performance on ActivityNet v1.3 dataset.**\r\n- The left half is score on ActivityNet v1.3 validation dataset. The right half is score on ActivityNet v1.3 testing dataset. \r\n- `Deep Learning`: deep learning related method.\r\n\r\n|  Detector   |   0.50  |    0.75   |    0.95   |    @Avg   |   0.50    |   0.75    |   0.95    |    @Avg   |Deep Learning|Speed  |\r\n| :---------: | :-----: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :------:  | :--------:  | :--:  |\r\n|     BMN     |\t50.07 \t|\t   34.78 \t|\t8.29 \t    |\t33.85 \t  |\t -      \t|\t -  \t    |\t -       \t|\t36.42\t    |      Y      | -     |\r\n|    GTAN     |\t52.61 \t|\t   34.14 \t|\t8.91 \t    |\t34.31 \t  |\t -      \t|\t -  \t    |\t -  \t    |\t35.54\t    |      Y      | -     |\r\n|    PGCN     |\t48.26 \t|\t  33.16 \t|\t3.27 \t    |\t31.11   \t|\t -      \t|\t -  \t    |\t -      \t|\t  -  \t    |      Y      | -     |\r\n|   RTD-Net   |\t46.43 \t|\t  30.45  \t|\t8.64      |\t30.46   \t|\t -      \t|\t -  \t    |\t -      \t|\t  -  \t    |      Y      | -     |\r\n|   BSN_ori   |\t46.45 \t|\t  29.96 \t|\t8.02    \t|\t30.03   \t|\t -      \t|\t -      \t|\t -      \t|\t32.84    \t|      Y      | -     |\r\n|   BSN_new   |\t52.50 \t|\t  33.53 \t|\t8.85 \t    |\t33.72   \t|\t -      \t|\t -      \t|\t -      \t|\t34.42\t    |      Y      | -     |\r\n|    C-TCN    |\t 47.6 \t|  \t31.9  \t|\t6.2      \t|\t 31.1   \t|\t -      \t|\t -  \t    |\t -      \t|\t  -     \t|      Y      | -     |\r\n|    PGC-TAL    |\t44.31 \t|  \t29.85  \t|\t5.47  \t|\t 28.85   \t|\t -      \t|\t -  \t    |\t -      \t|\t  -     \t|      Y      | based on P-GCN  |\r\n|    SSN      |\t -    \t|\t    -   \t|\t -  \t    |\t -  \t    |\t43.26    \t|\t28.70    \t|\t5.63     \t|\t28.28\t    |      Y      | -     |\r\n|    TAG      |\t39.12 \t| \t23.48 \t|\t5.49    \t|\t23.98    \t|\t40.69 \t  |\t26.02   \t|\t6.67 \t    |\t26.05\t    |      Y      | -     |\r\n|    CDC      |\t45.30 \t|\t  26.00 \t|\t0.20 \t    |\t23.80   \t|\t -      \t|\t -      \t|\t -      \t|\t -  \t    |      Y      |500 fps (GTX Titan X)|\r\n|    TCN      |\t36.44 \t|\t  21.15 \t|\t3.90 \t    |\t -  \t    |\t37.49   \t|\t23.47   \t|\t4.47    \t|\t23.58\t    |      Y      | -     |\r\n|   TAL-Net   |\t38.23 \t|\t  18.30 \t|\t1.30 \t    |\t20.22   \t|\t -  \t    |\t -  \t    |\t -  \t    |\t -  \t    |      Y      | -     |\r\n|   AGCN   |\t30.4 \t|\t   -  \t\t|\t -  \t    |\t -  \t   \t|\t -  \t    |\t -  \t    |\t -  \t    |\t -  \t    |      Y      | -     |\r\n|    SCC      |\t   -  \t|\t -      \t|\t -  \t    |\t -  \t    |\t39.90    \t|\t18.70   \t|\t4.70    \t|\t19.30\t    |      Y      |  35.9 fps  |\r\n|   R-C3D     |\t26.80 \t|\t -      \t|\t -  \t    |\t12.70   \t|\t -  \t    |\t -  \t    |\t -      \t|\t13.10    \t|      Y      |569 fps (GTX Titan X (Maxwell))  <br>1030 fps (Titan X Pascal)|\r\n| G-TAD       |  50.36  |   34.6    |  9.02     |  34.09    |  -        |     -     |\t -      \t|\t-         |      Y      | -     |\r\n| CTR_AL      |  43.47  |   33.91   |  9.21     |  30.12    |  -        |     -     |\t -      \t|\t-         |      Y      | -     |\r\n| SRG         |  46.53  |    -    |    -    |    -     |    -    |  29.98 |    -    |    -    |   -    |  4.83 |  29.72 |  -    |     -    |\r\n\r\n\r\n**Performance on ActivityNet v1.2 dataset.**\r\n|   LS-TD     |  50.4   | -       |   -     |  -       |    -    |  34.9  |  -      |   -     |  -     |   8.0 |  33.6  | -     |     -    |\r\n\r\n### Temporal Action Localization\r\n* **TGM:** AJ Piergiovanni, Michael S. Ryoo.<br />\r\n  \"Temporal Gaussian Mixture Layer for Videos.\" ICML (2019). \r\n  [[paper](https://arxiv.org/pdf/1803.06316.pdf)]\r\n  [[code](https://github.com/piergiaj/tgm-icml19)]\r\n  \r\n* **RTD-Net:** Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu.<br />\r\n  \"Relaxed Transformer Decoders for Direct Action Prop.\" arXiv 2102.01894. \r\n  [[paper](https://arxiv.org/pdf/2102.01894.pdf)]\r\n\r\n* **DPP:** Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu.<br />\r\n  \"Deep Point-wise Prediction for Action Temporal Proposal.\" ArXiv 1909.07725. \r\n  [[paper](https://arxiv.org/pdf/1909.07725.pdf)]\r\n  [[code](https://github.com/liluxuan1997/DPP)]\r\n  \r\n* **C-TCN:** Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li.<br />\r\n  \"Deep Concept-wise Temporal Convolutional Networks for Action Localization.\" ArXiv 1908.09442.\r\n  [[paper](https://arxiv.org/pdf/1908.09442.pdf)]\r\n  [[code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleVideo/models/ctcn/README.md)]\r\n\r\n* **PGC-TAL:** Rui Su, Dong Xu, Lu Sheng, Wangli Ouyang.<br />\r\n  \"PCG-TAL: Progressive Cross-granularity Cooperation for Temporal Action Localization.\" TIP 2020.\r\n  [[paper](https://ieeexplore.ieee.org/document/9298475)]]\r\n\r\n* **BMN:** Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.<br />\r\n  \"BMN : Boundary-Matching Network for Temporal Action Proposal Generation.\" ICCV (2019). \r\n  [[paper](https://arxiv.org/pdf/1907.09702.pdf)]\r\n\r\n* **PGCN:** Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan.<br />\r\n  \"Graph Convolutional Networks for Temporal Action Localization.\" ICCV (2019). \r\n  [[paper](https://arxiv.org/pdf/1909.03252.pdf)]\r\n  [[code](https://github.com/Alvin-Zeng/PGCN)]\r\n\r\n* **GTAN:** Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei.<br />\r\n  \"Gaussian Temporal Awareness Networks for Action Localization.\" CVPR (2019 **oral**). \r\n  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localization_CVPR_2019_paper.pdf)]\r\n\r\n* **AGCN:** Jun Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song.<br />\r\n  \"Graph Attention based Proposal 3D ConvNets for Action Detection.\" AAAI (2020).\r\n  [[paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf)]\r\n\r\n* **MGG:** Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang.<br />\r\n  \"Multi-granularity Generator for Temporal Action Proposal.\" CVPR (2019). \r\n  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Multi-Granularity_Generator_for_Temporal_Action_Proposal_CVPR_2019_paper.pdf)]\r\n\r\n* **D-SSAD:** Yupan Huang, Qi Dai, Yutong Lu.<br />\r\n  \"Decoupling Localization and Classification in Single Shot Temporal Action Detection.\" ICME (2019). \r\n  [[paper](https://arxiv.org/pdf/1904.07442.pdf)]\r\n  [[code](https://github.com/HYPJUDY/Decouple-SSAD)]\r\n\r\n* **TAL-Net:** Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar.<br />\r\n  \"Rethinking the Faster R-CNN Architecture for Temporal Action Localization.\" CVPR (2018). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Rethinking_the_Faster_CVPR_2018_paper.pdf)]\r\n  \r\n * **FRTS:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras.<br />\r\n    \"Exploring Feature Representation and Training strategies in Temporal Action Localization.\" ICIP (2019). \r\n    [[paper](https://arxiv.org/pdf/1905.10608.pdf)]\r\n\r\n* **BSN:** Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.<br />\r\n  \"BSN: Boundary Sensitive Network for Temporal Action Proposal Generation.\" ECCV (2018). \r\n  [[paper](https://arxiv.org/pdf/1806.02964.pdf)]\r\n  [[code](https://github.com/wzmsltw/BSN-boundary-sensitive-network)]\r\n\r\n* **FSN:** Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou.<br />\r\n  \"Exploring frame segmentation networks for temporal action localization.\" ECCV (2018). \r\n  [[paper](https://arxiv.org/pdf/1902.05488.pdf)]\r\n\r\n* **CBR:** Jiyang Gao, Zhenheng Yang, Ram Nevatia.<br />\r\n  \"Cascaded Boundary Regression for Temporal Action Detection.\" BMVC (2017). \r\n  [[paper](https://arxiv.org/pdf/1705.01180.pdf)]\r\n  [[code](https://github.com/jiyanggao/CBR)]\r\n\r\n* **ASSA*:** Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem.<br />\r\n  \"Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization.\" ECCV (2018). \r\n  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Humam_Alwassel_Action_Search_Spotting_ECCV_2018_paper.pdf)]\r\n\r\n* **CTAP:** Jiyang Gao, Kan Chen, Ram Nevatia.<br />\r\n  \"CTAP: Complementary Temporal Action Proposal Generation.\" ECCV (2018). \r\n  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiyang_Gao_CTAP_Complementary_Temporal_ECCV_2018_paper.pdf)]\r\n  [[code](https://github.com/jiyanggao/CTAP)]\r\n\r\n* **SS-TAD:** Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles.<br />\r\n  \"End-to-end, single-stream temporal action detection in untrimmed videos.\" BMVC (2017). \r\n  [[paper](http://vision.stanford.edu/pdf/buch2017bmvc.pdf)]\r\n  [[code](https://github.com/shyamal-b/ss-tad)]\r\n\r\n* **SSN:** Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin.<br />\r\n  \"Temporal Action Detection with Structured Segment Networks.\" ICCV (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhao_Temporal_Action_Detection_ICCV_2017_paper.pdf)]\r\n  [[code](https://github.com/yjxiong/action-detection)]\r\n\r\n* **R-C3D:** Huijuan Xu, Abir Das, Kate Saenko.<br />\r\n  \"R-C3D: Region Convolutional 3D Network for Temporal Activity Detection.\" ICCV (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Xu_R-C3D_Region_Convolutional_ICCV_2017_paper.pdf)]\r\n  [[code](https://github.com/VisionLearningGroup/R-C3D)]\r\n\r\n* **TAG:** Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang.<br />\r\n  \"A Pursuit of Temporal Accuracy in General Activity Detection.\" arXiv (1703). \r\n  [[paper](https://arxiv.org/pdf/1703.02716.pdf)]\r\n\r\n* **TPC:** Ke Yang, Peng Qiao, Dongsheng Li, Shaohe Lv, Yong Dou.<br />\r\n  \"Exploring Temporal Preservation Networks for Precise Temporal Action Localization.\" AAAI (2018). \r\n  [[paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16164/16347)]\r\n\r\n* **TURN:** Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia.<br />\r\n  \"TURN TAP : Temporal Unit Regression Network for Temporal Action Proposals.\" ICCV (2017). \r\n  [[paper](https://arxiv.org/pdf/1703.06189.pdf)]\r\n  [[code](https://github.com/jiyanggao/TURN-TAP)]\r\n\r\n* **SSAD:** Tianwei Lin, Xu Zhao, Zheng Shou.<br />\r\n  \"Single shot temporal action detection.\" ACM MM (2017). \r\n  [[paper](https://arxiv.org/pdf/1710.06236.pdf)]\r\n\r\n* **CDC:** Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.<br />\r\n  \"CDC Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos.\" CVPR (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shou_CDC_Convolutional-De-Convolutional_Networks_CVPR_2017_paper.pdf)]\r\n  [[code](https://github.com/ColumbiaDVMM/CDC)]\r\n  [[project](http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/)]\r\n\r\n* **SST:** Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles.<br />\r\n  \"Single-stream temporal action proposals.\" CVPR (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Buch_SST_Single-Stream_Temporal_CVPR_2017_paper.pdf)]\r\n  [[code](https://github.com/shyamal-b/sst)]\r\n\r\n* **SCC:** Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem.<br />\r\n  \"SCC: Semantic context cascade for efficient action detection.\" CVPR (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf)]\r\n  [[project](https://ivul.kaust.edu.sa/Pages/pub-scc-efficient-action-detection.aspx)]\r\n\r\n* **S-CNN:** Zheng Shou, Dongang Wang, Shih-Fu Chang.<br />\r\n  \"Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs.\" CVPR (2016). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shou_Temporal_Action_Localization_CVPR_2016_paper.pdf)]\r\n  [[code](https://github.com/zhengshou/scnn)]\r\n\r\n* **PSDF*:** Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim.<br />\r\n  \"Temporal Action Localization with Pyramid of Score Distribution Features.\" CVPR (2016). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf)]\r\n\r\n* **SMS*:** Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng.<br />\r\n  \"Temporal Action Localization by Structured Maximal Sums.\" CVPR (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yuan_Temporal_Action_Localization_CVPR_2017_paper.pdf)]\r\n\r\n* **ADFG*:** Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei.<br />\r\n  \"End-to-end Learning of Action Detection from Frame Glimpses in Videos.\" CVPR (2016). \r\n  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yeung_End-To-End_Learning_of_CVPR_2016_paper.pdf)]\r\n  [[code](https://github.com/syyeung/frameglimpses)]\r\n\r\n* **TCN:** Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen.<br />\r\n  \"Temporal Context Network for Activity Localization in Videos.\" ICCV (2017). \r\n  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Temporal_Context_Network_ICCV_2017_paper.pdf)]\r\n  [[code](https://github.com/vdavid70619/TCN)]\r\n  \r\n * **DAP:** Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem.<br />\r\n    \"DAPs: Deep Action Proposals for Action Understanding.\" ECCV (2016). \r\n    [[paper](https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf)]\r\n    [[code](https://github.com/escorciav/daps)] \r\n\r\n* **G-TAD**Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem Visual Computing Center.<br />\r\n  \"G-TAD: Sub-Graph Localization for Temporal Action Detection\" ArXiv(2019)\r\n  [[paper](https://arxiv.org/pdf/1911.11462.pdf)]\r\n\r\n* **PTAL-ETP**Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He.<br />\r\n  \"Precise Temporal Action Localization by Evolving Temporal Proposals\"ArXiv(2019)\r\n  [[paper](https://arxiv.org/pdf/1804.04803.pdf)]\r\n\r\n* **CTR_AL**Peisen Zhao1, Lingxi Xie2, Chen Ju1, Ya Zhang1, Qi Tian.<br />\r\n  \"Constraining Temporal Relationship for Action Localization\"ArXiv(2019)\r\n  [[paper](https://arxiv.org/pdf/2002.07358.pdf)]\r\n\r\n* **SRG**Hyunjun Eun, Sumin Lee, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim.<br />\r\n  \"SRG: Snippet Relatedness-based Temporal Action Proposal Generator\"Arxiv(2019)\r\n  [[paper](https://arxiv.org/pdf/1911.11306.pdf)]\r\n\r\n* **LS-TD**Yuan Zhou, Hongru Li, Sun-Yuan Kung, Life Fellow.<br />\r\n  \"Temporal Action Localization using Long Short-Term Dependency\"Arxiv(2019)\r\n  [[paper](https://arxiv.org/pdf/1911.01060.pdf)]\r\n\r\n* **IDU** Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick.<br />\r\n  \"Learning to Discriminate Information for Online Action Detection\" CVPR(2020)\r\n  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Eun_Learning_to_Discriminate_Information_for_Online_Action_Detection_CVPR_2020_paper.pdf)]\r\n"
  }
]