Repository: VividLe/awesome-weakly-supervised-action-localization
Branch: master
Commit: e86943a85504
Files: 4
Total size: 44.9 KB

Directory structure:
gitextract_ywnz0pl2/

├── Other_Settings.md
├── README.md
├── Spatiotemporal.md
└── Supervised.md

================================================
FILE CONTENTS
================================================

================================================
FILE: Other_Settings.md
================================================
# Action Localization Benchmarks

### Learning from action count supervision
* **OCL:**Julien Schroeter, Kirill Sidorov, David Marshall.<br />
  "Weakly-Supervised Temporal Localization via Occurrence Count Learning" ICML 2019.
  [[paper](https://arxiv.org/pdf/1905.07293.pdf)]
  [[code](https://github.com/SchroeterJulien/ICML-2019-Weakly-Supervised-Temporal-Localization-via-Occurrence-Count-Learning)]

### Action Segment, Transformer

* Action Modifiers :Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas,  Dima Damen.<br />
  "Action Modifiers: Learning from Adverbs in Instructional Videos." (CVPR  2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Doughty_Action_Modifiers_Learning_From_Adverbs_in_Instructional_Videos_CVPR_2020_paper.pdf)]

### Action Segment, Self-Supervised Learning

* Action Segmentation : Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira.<br />
  "Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation." (CVPR  2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Action_Segmentation_With_Joint_Self-Supervised_Temporal_Domain_Adaptation_CVPR_2020_paper.pdf)]

### Action Segment, Transformer

* SCT : Mohsen Fayyaz,Juergen Gall.<br />
  "SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation." (CVPR  2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fayyaz_SCT_Set_Constrained_Temporal_Transformer_for_Set_Supervised_Action_Segmentation_CVPR_2020_paper.pdf)]
  [[code](https://github.com/MohsenFayyaz89/SCT/)]


### Unintentional Action

A new task. Interesting.

* Oops : Dave Epstein Boyuan Chen Carl Vondrick.<br />
  "Oops! Predicting Unintentional Action in Video." (CVPR  2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Epstein_Oops_Predicting_Unintentional_Action_in_Video_CVPR_2020_paper.pdf)]
  [[project](https://oops.cs.columbia.edu/)]

### ActionBytes

Learning from trimmed videos.
* ActoinBytes: Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />
  "ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR  (2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]

### METAL

ActivityNet v1.2, mAP@0.5: 41.9 [1-shot], 45.0 [5-shot]

THUMOS14, mAP@0.5: 14.3 [1-shot], 16.2 [5-shot]

* METAL : Da Zhang, Xiyang Dai, and Yuan-Fang Wang.<br />
  "METAL : Minimum Effort Temporal Activity Localization in Untrimmed Videos." CVPR  (2020). 
  [[paper](https://sites.cs.ucsb.edu/~yfwang/papers/cvpr2020.pdf)]

### Hierarchical Action Search

A new task. A learned space where videos are positioned in entailment cones formed by different subtrees.

* Uncertain :Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek.<br />
  "Searching for Actions on the Hyperbole." CVPR  (2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Long_Searching_for_Actions_on_the_Hyperbole_CVPR_2020_paper.pdf)]


### Fine-grained Action Recognition and Localization

A new task. Hierarchical annotation for recognition and localization.

* FineGym :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />
  "FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." (CVPR  2020, oral, 3 strong accepts). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)]
  [[project](https://sdolivia.github.io/FineGym/)]


### Mining undefined sub-actions

A new task. Find the common sub-actions from multiple videos.

* TAPOs :Dian Shao Yue Zhao Bo Dai Dahua Lin.<br />
  "Intra- and Inter-Action Understanding via Temporal Action Parsing." (CVPR  2020). 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_Intra-_and_Inter-Action_Understanding_via_Temporal_Action_Parsing_CVPR_2020_paper.pdf)]
  [[project](https://sdolivia.github.io/TAPOS/)]


================================================
FILE: README.md
================================================
# Action Localization Benchmarks
Papers and Results of Temporal Action Localization

**Weakly Supervised Performance on THUMOS'14 dataset.**

- The detectors are sorted by the mAP with threshold 0.5.
- "c" indicates whether release code, yes (Y) or no (N).
- "e" indicates the evaluation code, THUMOS (T), ActivityNet (A) or implemented by themselves.


|  Detector   |   Pub  |c|e| 0.1 | 0.2 | 0.3 |0.4  | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |avg  |   info   |
| :---------: |:------:|-|-|:---:|:----|:----|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net  | arXiv-20-12-11 |N|A|65.6 |60.0 |52.1 |43.3 |35.9 | - | - | - | - | -   | The same author with 3C-Net |
| Lee et al  | AAAI21 |Y|A|67.5 |61.2 |52.3 |43.4 |33.7 | 22.9 | 12.1 | - | - | -   | The same author with BaS-Net |
| HAM-Net  | AAAI21 |N|A|65.9 |59.6 |52.2 |43.1 |32.6 | 21.9 | 12.5 | - | - | -   | - |
| ACSNet  | AAAI21 |N|A| - | - |51.4 |42.7 |32.4 | 22.0 | 11.7 | - | - | -   | - |
| EM-MIL  | ECCV20 |N|A|59.1 |52.7 |45.5 |36.8 |30.5 | 22.7 | 16.4 | - | - | -   | Use existing classifiation results |
| A2CL-PT  | ECCV20 |Y|A| 61.2 | 56.1 |48.1 | 39.0 |30.1 |19.2 |10.6 | 4.8 | 1.0 | 30.0 | Report unsupervised performance as well |
| ACL  | CVPR20 |N|A| - | - |46.9 |38.9 |30.1 |19.8 |10.4 | -  | -  | -   | Report unsupervised performance as well |
| Liu et al.  | AAAI21 |N|A| 61.7 | 58.0 |50.8 | 41.7 |29.6 |20.1 |10.7 | 4.3 | 0.5 | -   | - |
| WSTAL       | WACV20  |-| |62.3 |-    |46.8 |-    |29.6 |-    |9.7  |-    |-    |-    |    -     |
| ActionBytes  | CVPR20 |N|A| - | - |43.0 |37.5 |29.0 | - |9.5 | -  | -  | -   | - |
| DGAM  | CVPR20 |Y|A| 60.0|54.2 |46.8 |38.2 |28.8 |19.8 |11.4 |3.6  |0.4  | -   |    -     |
| TSCN  | ECCV20 |N|A| 63.4|57.6 |47.8 |37.7 |28.7 |19.4 |10.2 |3.9  |0.7  | -   |    -     |
| BaSNet-I3D  | AAAI20 |Y|A| 58.2|52.3 |44.6 |36.0 |27.0 |18.6 |10.4 |3.9  |0.5  | -   |    -     |
| BaSNet-UNT  | AAAI20 |Y|A| 56.2|50.3 |42.8 |34.7 |25.1 |17.1 |9.3  |3.7  |0.5  | -   |    -     |
|   WSBM      | ICCV19 |N|A| 60.4|56.0 |46.6 |37.5 |26.8 |17.6 |9.0  |3.3  |0.4  | -   |    -     |
|   3C-Net    | ICCV19 |Y|I| 59.1|53.5 |44.2 |34.1 |26.6 |  -  |8.1  |  -  | -   | -   |    -     |
|    ASSG     | ACM 19 |N| | 65.6|59.4 |50.4 |38.7 |25.4 |15.0 |6.6  |  -  |  -  | -   |    -     |
|    TSM      | ICCV19 |N|T|  -  | -   |39.5 |31.9 |24.5 |13.8 |7.1  |  -  |  -  |23.4 |    -     |
|  CleanNet   | ICCV19 |N|T|  -  | -   |37.0 |30.9 |23.9 |13.9 |7.1  |  -  |  -  | -   |    -     |
|  CMCS-I3D   | CVPR19 |Y|T| 57.4|50.8 |41.2 |32.1 |23.1 |15.0 |7.0  |  -  | -   | -   |report avg-mAP|
|  CMCS-UNT   | CVPR19 |Y|T| 53.5|46.8 |37.5 |29.1 |19.9 |12.3 |6.0  |  -  | -   | -   |    -     |
|  STARNet    | AAAI19 |N|A|68.8 |60.0 |48.7 |34.7 |23.0 |-    |-    |  -  |  -  |-    |    -     |
|  W-TALC     | ECCV18 |Y|I|55.2 |49.6 |40.1 |31.1 |22.8 |-    |-    |  -  |  -  |7.6  |    -     |
|  AutoLoc    | ECCV18 |Y|T|-    |-    |35.8 |29.0 |21.2 |13.4 |5.8  |  -  |  -  |-    |    -     |
|  MAAN       | ICLR19 |Y|A|59.8 |50.8 |41.1 |30.6 |20.3 |12.0 |6.9  |2.6  |0.2  |24.9 |    -     |
|  LTSR       | AAAI19 |N|T|55.9 |46.9 |38.3 |28.1 |18.6 |11.0 |5.59 |2.19 |0.29 |-    |    -     |
| WSGN        | WACV20 |-|T|51.1 |44.4 |34.9 |26.3 |18.1 |11.6 |6.5  |-    |-    |-    |    -     |
|  STPN       | CVPR18 |I|A|52.0 |44.7 |35.5 |25.8 |16.9 |9.9  |4.3  |1.2  |0.1  |-    |    -     |
|  CPMN       | ACCV18 |N|T|47.1 |41.6 |32.8 |24.7 |16.1 |10.1 |5.5  |-    |-    |-    |    -     |
|  S-O-C      | ACM18  |N|T|45.8 |39.0 |31.1 |22.5 |15.9 |-    |-    |-    |-    |-    |    -     |
|UntrimmedNets| CVPR17 |Y|T|44.4 |37.7 |28.2 |21.1 |13.7 |-    |-    |-    |-    |-    |    -     |
|  H&S        | ICCV17 |Y|T|36.44|27.84|19.49|12.66|6.84 |-    |-    |-    |-    |-    |    -     |
|LPAT-I3D+TEM | arXiv  |-| |-    |-    |46.9 |37.4 |28.0 |16.6 |9.2  |-    |-    |27.6 |    -     |
| LPAT-I3D    | arXiv  |-| |-    |-    |46.7 |37.5 |27.9 |17.6 |9.2  |-    |-    |27.6 |    -     |
| LPAT-U      | arXiv  |-| |-    |-    |39.9 |31.5 |22.6 |14.2 |7.9  |-    |-    |27.6 |    -     |
|RefineLoc-I3D| arXiv  |-|T|-    |-    |40.8 |-    |23.1 |-    |5.3  |-    |-    |-    |    -     |
|RefineLoc-TSN| arXiv  |-|T|-    |-    |36.1 |-    |22.6 |-    |5.8  |-    |-    |-    |    -     |


**Weakly Supervised Performance on ActivityNet v1.2 dataset.**

|  Detector   |  Pub   |c| 0.5 | 0.55|0.60 | 0.65|0.70 |0.75 | 0.80|0.85 |0.90 |0.95 | avg |test |   info   |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| :------: |
| D2-Net | arXiv-20-12-11 |N|42.3 |   - | -   |  -  | -   |25.5 |  -  | -   | -   | 5.8 | 26.0|   - |     -    |
| ACSNet | AAAI21 |N|40.1 |   - | -   |  -  | -   |26.1 |  -  | -   | -   | 6.8 | 26.0|   - |     -    |
| Lee et al | AAAI21 |N|41.2 |   - | -   |  -  | -   |25.6 |  -  | -   | -   | 6.0 | 25.9|   - |     -    |
| Liu at al.        | AAAI21 |N|39.2 |-    |-    |-    | -   |25.6 |-    |-    |-    | 6.8 | 25.5|   - |     -    |
| HAM-Net | AAAI21 |N|41.0 |   - | -   |  -  | -   |24.8 |  -  | -   | -   | 5.3 | 25.1|   - |     -    |
| BaSNet  | AAAI20 |Y|38.5 |   - | -   |  -  | -   |24.2 |  -  | -   | -   | 5.6 | 24.3|   - |     -    |
| TSCN        | ECCV20 |N|37.6 |-    |-    |-    | -   |23.7 |-    |-    |-    | 5.7 | 23.6|   - |     -    |
| CMCS        | CVPR19 |Y|36.8 |-    |-    |-    | -   |22.0 |-    |-    |-    | 5.6 | 22.4|   - |     -    |
| 3C-Net      | ICCV19 |Y|35.4 |   - | -   |  -  | -   |22.9 |  -  | -   | -   | 8.5 | 21.1|   - |     -    |
| TSM         | ICCV19 |N|28.3 |26.0 |23.6 |21.2 | 18.9|17.0 |14.0 |11.1 |7.5  | 3.5 | -   |   - |     -    |
| CleanNet    | ICCV19 |N|37.1 |33.4 |29.9 |26.7 | 23.4|20.3 |17.2 |13.9 |9.2  | 5.0 | 21.6|   - |     -    |
| EM-MIL      | ECCV20 |N|37.4 |-    |-    |-    | 23.1|-    |-    |-    |2.0  |  -  | 20.3|   - |     -    |
| W-TALC      | ECCV18 |Y|37.0 |-    |-    |-    | 14.6|-    |-    |-    |-    | -   | 18.0|   - |     -    |
| AutoLoc     | ECCV18 |Y|27.3 |24.9 |22.5 |19.9 | 17.5|15.1 |13.0 |10.0 |6.8  | 3.3 | 16.0| -   |     -    |
|RefineLoc-I3D| arXiv  |-|38.7 |-    |-    |-    | -   |22.6 |-    |-    |-    | 5.5 | 23.2| -   |     -    |
|RefineLoc-TSN| arXiv  |-|38.8 |-    |-    |-    | -   |22.2 |-    |-    |-    | 5.3 | 23.2| -   |     -    |
| LPAT        | arXiv  |-|37.6 |34.6 |31.6 |28.7 | 25.6|22.6 |19.6 |15.3 |10.9 | 4.9 | 23.1| -   |     -    |
|   WSTAL     | arXiv  |-|35.2 |-    |-    |-    | 16.3|-    |-    |-    |-    | -   | -   | -   |     -    |


**Weakly Supervised Performance on ActivityNet v1.3 dataset.**
|  Detector   |  Pub   |c| 0.5 |0.75 |0.95 |avg  |
| :---------: |:------:|-|:---:|:---:|:---:|:---:|
|   ACSNet    | AAAI21 |N|36.3 |24.2 |5.8  |23.9 |
|   Lee et al.| AAAI21 |Y|37.0 |23.9 |5.7  |23.7 |
|   Liu et al.| AAAI21 |N|35.1 |23.7 |5.6  |23.2 |
| A2CL-PT     | ECCV20 |Y|36.8 |22.0 |5.2  |22.5 |
| BaSNet-I3D  | AAAI20 |Y|34.5 |22.5 |4.9  |22.2 |
| TSCN        | ECCV20 |N|35.3 |21.4 |5.3  |21.7 |
| WSBM        | ICCV19 |N|36.4 |19.2 | 2.9 |-    |
| ASSG        | ACM 19 |N|32.3 |20.1 | 4.0 |-    |
| TSM         | ICCV19 |N|30.0 |19.0 | 4.5 |-    |
| CMCS        | CVPR19 |Y|34.0 |20.9 | 5.7 |21.2 |
| STARNet     | AAAI19 |N|31.1 |18.8 | 4.7 |-    |
| MAAN        | ICLR19 |Y|33.7 |21.9 | 5.5 |-    |
| LTSR        | AAAI19 |N|33.1 |18.7 |3.32 |21.78|
| STPN        | CVPR18 |I|29.3 |16.9 |2.6  |-    |
| CPMN        | ACCV18 |N|39.29|24.09|6.71 |24.42|
| S-O-C       | ACM18  |N|27.3 |14.7 |2.9  |15.6 |


### Weakly Supervised Temporal Action Localization
* **D2-Net:** Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao.<br />
  "D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations" arXiv:2012.06440.
  [[paper](https://arxiv.org/pdf/2012.06440.pdf)]
* **Lee et al:** Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun.<br />
  "Weakly-supervised Temporal Action Localization by Uncertainty Modeling" AAAI 2021.
  [[paper](https://arxiv.org/pdf/2006.07006.pdf)]
  [[code](https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling)]
* **HAM-Net:** Ashraful Islam, Chengjiang Long , Richard J. Radke.<br />
  "A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization" AAAI 2021.
  [[paper](https://arxiv.org/pdf/2101.00545.pdf)]
  [[code](https://github.com/asrafulashiq/hamnet)]
* **Liu et al.:** Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />
  "ACSNet : Action-Context Separation Network for Weakly Supervised Temporal Action Localization" AAAI 2021.
  [[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **Liu at al.:** Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua.<br />
  "Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context" AAAI 2021.
  [[paper](http://gr.xjtu.edu.cn/web/lewang)]
* **EM-MIL:** Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu.<br />
  "Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning" ECCV 2020.
  [[paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740715.pdf)]
* **A2CL-PT:** Kyle Min, Jason J. Corso.<br />
  "Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization." ECCV 2020. 
  [[paper](https://link.springer.com/chapter/10.1007%2F978-3-030-58568-6_17)]
  [[code](https://github.com/MichiganCOG/A2CL-PT)]
* **ACL:** Guoqiang Gong, Xinghan Wang, Yadong Mu, Qi Tian.<br />
  "Learning Temporal Co-Attention Models for Unsupervised Video Action Localization." CVPR 2020, oral. 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.pdf)]
* **WSTAL**Ashraful Islam, Richard J. Radke.<br />
  "Weakly Supervised Temporal Action Localization Using Deep Metric Learning" WACV 2020.
  [[paper](https://arxiv.org/pdf/2001.07793.pdf)]
  [[code](https://github.com/asrafulashiq/wsad)]
* **ActoinBytes:** Mihir Jain1, Amir Ghodrati, Cees G. M. Snoek.<br />
  "ActionBytes: Learning from Trimmed Videos to Localize Actions." CVPR 2020. 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf)]
* **DGAM:** Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang.<br />
  "Weakly-Supervised Action Localization by Generative Attention Modeling." CVPR 2020. 
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf)]
  [[code](https://github.com/bfshi/DGAM-Weakly-Supervised-Action-Localization)]
* **TSCN:** Zhai, Yuanhao and Wang, Le and Tang, Wei and Zhang, Qilin and Yuan, Junsong and Hua, Gang.<br />
  "Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization." ECCV 2020. 
  [[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf)]
* **BaSNet:** Pilhyeon Lee, Youngjung Uh, Hyeran Byun.<br />
  "Background Suppression Networks for Weakly-supervised Temporal Action Localization." AAAI 2020. 
  [[paper](https://arxiv.org/pdf/1911.09963.pdf)]
  [[code](https://github.com/Pilhyeon/BaSNet-pytorch)]
* **3C-Net:** Sanath Narayan, Hisham Cholakkal, Fahad Shabaz Khan, Ling Shao.<br />
  "3C-Net : Category Count and Center Loss for Weakly-Supervised Action Localization." ICCV 2019. 
  [[paper](https://arxiv.org/pdf/1908.08216.pdf)]
  [[code](https://github.com/naraysa/3c-net)]
* **CMCS:** Daochang Liu, Tingting Jiang, Yizhou Wang.<br />
  "Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization." CVPR 2019. 
  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.pdf)]
  [[code](https://github.com/Finspire13/CMCS-Temporal-Action-Localization)]
* **ASSG:** Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang, Pu Fei Wu, Futai Zou.<br />
  "Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization" ACM MM 2019. 
  [[paper](https://arxiv.org/pdf/1908.02422.pdf)]
* **AutoLoc:**Zheng Shou, Hang Gao, Lei Zhang, KazuyukiMiyazawa, Shih-Fu Chang.<br />
  "AutoLoc Weakly-supervised Temporal Action Localization in Untrimmed Videos"ECCV 2018.
  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Zheng_Shou_AutoLoc_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
  [[code](https://github.com/zhengshou/AutoLoc)]
* **CPMN:**Haisheng Su, Xu Zhao, Tianwei Lin.<br />
  "Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization"ACCV 2018.
  [[paper](https://arxiv.org/pdf/1810.11794.pdf)]
* **H&S:**Krishna Kumar Singh, Yong Jae Lee.<br />
  "Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization"ICCV 2017.
  [[paper](https://arxiv.org/pdf/1704.04232.pdf)]
  [[code](https://github.com/goddoe/hide-and-seek)]
* **LTSR:**Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan.<br />
  "Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision"AAAI 2019.
  [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4958/4831)]
* **WSGN:**Basura Fernando, Cheston Tan Yin Chet.<br />
  "Weakly Supervised Gaussian Networks for Action Detection" WACV(2020)
* **MAAN:**Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung.<br />
  "MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING"ICLR 2019.
  [[paper](https://arxiv.org/pdf/1905.08586.pdf)]
  [[code](https://github.com/yyuanad/MAAN)]
* **S-O-C:**Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li.<br />
  " Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector"ACM MM 2018.
  [[paper](https://arxiv.org/pdf/1807.02929.pdf)]
* **STARNet:**Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu.<br />
  "Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection"AAAI 2019.
  [[paper](https://www.aaai.org/ojs/index.php/AAAI/article/download/4939/4812)]
* **TSM:**Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan.<br />
  "Temporal Structure Mining for Weakly Supervised Action Detection"ICCV(2019).
  [[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf)]
* **UntrimmedNets:**Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool.<br />
  "UntrimmedNets for Weakly Supervised Action Recognition and Detection"CVPR 2017.
  [[paper](https://wanglimin.github.io/papers/WangXLV_CVPR17.pdf)]
  [[code](https://github.com/wanglimin/UntrimmedNet)]
* **WSBM:**Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes.<br />
  "Weakly-supervised Action Localization with Background Modeling"ICCV 2019.
  [[paper](https://arxiv.org/pdf/1908.06552.pdf)]
* **CleanNet:**Ziyi Liu, Le Wang1∗ Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua.<br />
  "Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks"ICCV 2019.
  [[paper](https://qilin-zhang.github.io/_pages/pdfs/Weakly_Supervised_Temporal_Action_Localization_through_Contrast_based_Evaluation_Networks.pdf)]
* **STPN**Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han.<br />
  "Weakly Supervised Action Localization by Sparse Temporal Pooling Network" CVPR 2018.
  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Weakly_Supervised_Action_CVPR_2018_paper.pdf)]
  [[code](https://github.com/bellos1203/STPN)]
* **W-TALC**Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury.<br />
  "W-TALC: Weakly-supervised Temporal Activity Localization and Classification" ECCV 2018.
  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Sujoy_Paul_W-TALC_Weakly-supervised_Temporal_ECCV_2018_paper.pdf)]
  [[code](https://github.com/sujoyp/wtalc-pytorch)]
* **LPAT**Xudong, Lin Zheng, Shou Shih-Fu Chang.<br />
  "LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization" arXiv 2019.
  [[paper](https://arxiv.org/pdf/1910.11285.pdf)]
* **RefineLoc:**Humam Alwassel1, Alejandro Pardo1, Fabian Caba Heilbron,  Ali Thabet1 Bernard Ghanem1.<br />
  "RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization"Arxiv(2019)
  [[paper](https://arxiv.org/pdf/1904.00227.pdf)]
  [[paper](https://basurafernando.github.io/papers/wacv2020_wsgn.pdf)]


### Expecting for paper
* **lvr:** Xingyu Liu, Joon-Young Lee, Hailin Jin.<br />
  "Learning Video Representations from Correspondence Proposals." CVPR 2019 **oral**.


## Dataset
* **THUMOS'14:** Yu-Gang Jiang, Jingen Liu, Amir R. Zamir, George Toderici.<br />
  "THUMOS Challenge 2014" 
  [[project](https://www.crcv.ucf.edu/THUMOS14/home.html)]

* **Activity:** Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia.<br />
  "A Large-Scale Video Benchmark for Human Activity Understanding" 
  [[project](http://activity-net.org/index.html)]
  
* **THUMOS'15:** Alexander Gorban, Haroon Idrees, Yu-Gang Jiang, Amir R. Zamir.<br />
  "THUMOS Challenge 2015" 
  [[project](http://www.thumos.info/)]

* **COIN:** Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou.<br />
  "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis." CVPR 2019. 
  [[paper](https://arxiv.org/pdf/1903.02874.pdf)]
  [[project](https://coin-dataset.github.io/)]


================================================
FILE: Spatiotemporal.md
================================================
# Spatio-Temporal Action Detection
Papers and Results for Spatio-Temporal Action Detection

## Spatio-Temporal Action Localization

**Performance on AVA v2.1 dataset.**

- Metric: mAP with threshold 0.5.

|  Detector   |   val   |   test    |
| :---------: | :-----: | :-------: |
|    LFB      |	27.70 	|	27.20 	|
|   VATN*     |	25.00 	|	24.93 	|
|   SMAD*     |	22.20 	|	  -  	|
|   STEP      |	18.60 	|	  -  	|
|   ACRN      |	17.40 	|	  -  	|
|    AVA      |	15.80 	|	  -  	|
|  SlowFast   |	27.30 	|	27.10	|

## Spatio-temporal Action Detection

* **PSCS:** Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu.<br />
  "Improving Action Localization by Progressive Cross-stream Cooperation." CVPR (2019).
  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Su_Improving_Action_Localization_by_Progressive_Cross-Stream_Cooperation_CVPR_2019_paper.pdf)]

* **SlowFast:** Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He.<br />
  "SlowFast Networks for Video Recognition." arXiv (1812). 
  [[paper](https://arxiv.org/pdf/1812.05038.pdf)]
  [[unofficial_code](https://github.com/r1ch88/SlowFastNetworks)]
  [[unofficial_code](https://github.com/Guocode/SlowFast-Networks)]

* **DwF:** Jiaojiao Zhao, Cees G.M. Snoek.<br />
  "Dance with Flow: Two-in-One Stream Action Detection." CVPR (2019). 
  [[paper](https://arxiv.org/pdf/1904.00696.pdf)]

* **STEP:** Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz.<br />
  "STEP: Spatio-Temporal Progressive Learning for Video Action Detection." CVPR (2019 **oral**). 
  [[paper](https://arxiv.org/pdf/1904.09288.pdf)]

* **LFB:** Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick.<br />
  "Long-Term Feature Banks for Detailed Video Understanding." CVPR (2019). 
  [[paper](https://arxiv.org/pdf/1812.05038.pdf)]
  [[project](https://github.com/facebookresearch/video-long-term-feature-banks)]

* **VATN*:** Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman. <br />
  "Video Action Transformer Network." CVPR (2019 **oral**). 
  [[paper](https://arxiv.org/pdf/1812.02707.pdf)]
  [[project](https://rohitgirdhar.github.io/ActionTransformer/)]

* **LAEO:** Manuel J Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman. <br />
  "LAEO-Net: revisiting people Looking At Each Other in videos." CVPR (2019). 
  [[paper](http://www.robots.ox.ac.uk/~vgg/research/laeonet/cvpr2019LAEO.pdf)]
  [[code](https://github.com/AVAuco/laeonet/)]
  [[project](http://www.robots.ox.ac.uk/~vgg/research/laeonet/)]

* **SMAD*:** Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid. <br />
  "A Structured Model For Action Detection." CVPR (2019). 
  [[paper](https://arxiv.org/pdf/1812.03544.pdf)]

* **TACNet:** Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun. <br />
  "TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection." CVPR (2019). 
  [[paper](http://www.skicyyu.org/Paper/CVPR2019_TACNET.pdf)]

* **AVA:** Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. <br />
  "AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions." CVPR (2018). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gu_AVA_A_Video_CVPR_2018_paper.pdf)]
  [[project](https://research.google.com/ava/)]

* **ACRN:** Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid. <br />
  "Actor-Centric Relation Network." ECCV (2018). 
  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Chen_Sun_Actor-centric_Relation_Network_ECCV_2018_paper.pdf)]

* **T-CNN:** Rui Hou, Chen Chen, Mubarak Shah. <br />
  "Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos." ICCV (2017). 
  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hou_Tube_Convolutional_Neural_ICCV_2017_paper.pdf)]
  [[code](https://www.crcv.ucf.edu/projects/TCNN/#Code)]
  [[project](https://www.crcv.ucf.edu/projects/TCNN/)]


## Dataset
* **YouTube-8M-Segments:** Ke Chen, Julia Elliott, Nisarg Kothari, Hanhan Li, et.al.<br />
  "YouTube-8M Segments Dataset" 
  [[project](https://research.google.com/youtube8m/)]
  

## Distinguished Researchers & Teams
[WILLOW](https://www.di.ens.fr/willow/publications/YearOnly/publications.html)
[Ivan Laptev](https://www.di.ens.fr/~laptev/#Publications)
[Christoph Feichtenhofer](https://feichtenhofer.github.io/)


================================================
FILE: Supervised.md
================================================
# Action Detection Benchmarks
Papers and Results of Temporal Action Localization

## Temporal Action Localization

**Performance on THUMOS'14 dataset.**

- The detectors are ordered by the mAP with threshold 0.5.
- `Deep Learning`: deep learning related method.

|  Detector   |   0.1   |    0.2  |    0.3  |    0.4  |   0.5   |    0.6  |    0.7   |Deep Learning|Comment      |
| :---------: | :-----: |:-------:|:-------:|:-------:|:-------:|:-------:| :------: | :--------:  | :--:      |
|     TGM     |    -  	|    -  	|    -  	|    -  	|   53.5 	|    -  	|    -     |      Y      |     -     |
|   C-TCN     |   72.2 	|   71.4 	|   68.0 	|   62.3 	|   52.1 	|    -   	|    -   	 |      Y      |     -     |
|   RTD-Net     |    -   	|    -  	|   68.3 	|   62.3  	|   51.9 	|   38.8   	|    23.7 	 |      Y      |  based on P-GCN    |
|   PGC-TAL     |   71.2  	|   68.9 	|   65.1 	|   59.5 	|   51.2 	|    -   	|    -   	 |      Y      |  based on P-GCN    |
|    DPP      |   69.5 	|   67.8 	|   63.6 	|   57.8 	|   49.1 	|    -   	|    -   	 |      Y      |     -     |
|    PGCN     |   69.5 	|   67.8 	|   63.6 	|   57.8 	|   49.1 	|    -   	|    -   	 |      Y      |     -     |
|     BMN     |    -  	|    -  	|   56.0 	|   47.4 	|   38.8 	|   29.7 	|   20.5   |      Y      |     -     |
|   D-SSAD    |    -  	|    -  	|   60.2 	|   54.1 	|   44.2 	|   32.3 	|   19.1   |      Y      |     -     |
|   TAL-Net   |   59.8 	|   57.1 	|   53.2 	|   48.5 	|   42.8 	|   33.8 	|   20.8   |      Y      |     -     |
|    FRTS     |    -  	|    -  	|   53.5 	|   50.2 	|   44.2 	|   33.9 	|   22.7   |      Y      |     -     |
|    GTAN     |   69.1 	|   63.7 	|   57.8 	|   47.2 	|   38.8 	|    -   	|    -     |      Y      |     -     |
|    AGCN     |   59.3 	|   59.6 	|   57.1 	|   51.6 	|   38.6 	|    28.9 	|   17.0   |      Y      |     -     |
|    MGG      |    -  	|    -  	|   53.9 	|   46.8 	|   37.4 	|   29.5 	|   21.3   |      Y      |     -     |
|    BSN      |    -  	|    -  	|   53.5 	|   45.0 	|   36.9 	|   28.4 	|   20.0   |      Y      |     -     |
|    FSN      |    -  	|    -  	|   51.8 	|   41.5 	|   32.1 	|   22.9 	|   14.7   |      Y      |     -     |
|    CBR      |   60.1 	|   56.7 	|   50.1 	|   41.3 	|   31.0 	|   19.1 	|   9.9    |      Y      |     -     |
|   ASSA*     |    -  	|    -  	|   51.8 	|   42.4 	|   30.8 	|   20.2 	|   11.1   |      Y      |     -     |
|   CTAP      |    -  	|    -  	|    -  	|    -  	|   29.9 	|    -  	|    -     |      Y      |     -     |
|  SS-TAD     |    -  	|    -  	|   45.7 	|    -  	|   29.2 	|    -  	|   9.6    |      Y      |701 fps(GTX Titan X (Maxwell))|
|    SSN      |   60.3 	|   56.2 	|   50.6 	|   40.8 	|   29.1 	|    -  	|    -     |      Y      |     -     |
|   R-C3D     |   54.5 	|   51.5 	|   44.8 	|   35.6 	|   28.9 	|    -  	|    -     |      Y      |569 fps|
|    TAG      |   64.1 	|   57.7 	|   48.7 	|   39.8 	|   28.2 	|    -  	|    -     |      Y      |     -     |
|    TPC      |    -  	|    -  	|   44.1 	|   37.1 	|   28.2 	|   20.6 	|   12.7   |      Y      |250 fps (GTX Titan X)|
|   TURN      |   54.0 	|   50.9 	|   44.1 	|   34.9 	|   25.6 	|    -  	|    -     |      Y      |129.4 fps (GTX Titan X)|
|   SSAD      |   50.1 	|   47.8 	|   43.0 	|   35.0 	|   24.6 	|    -  	|    -     |      Y      |     -     |
|    CDC      |    -  	|    -  	|   40.1 	|   29.4 	|   23.3 	|   13.1 	|   7.9    |      Y      |500 fps (GTX Titan X)|
|    SST      |    -  	|    -  	|   37.8 	|    -  	|   23.0 	|    -  	|    -     |      Y      |308 fps Titan-X|
|   S-CNN     |   47.7 	|   43.5 	|   36.3 	|   28.7 	|   19.0 	|   10.3 	|   5.3    |      Y      |60 fps(GeForce GTX 980)|
|   PSDF*     |   51.4 	|   42.6 	|   33.6 	|   26.1 	|   18.8 	|    -  	|    -     |      Y      |     -     |
|   SMS*      |   51.0 	|   45.2 	|   36.5 	|   27.8 	|   17.8 	|    -  	|    -     |      N      |     -     |
|   ADFG*     |   48.9 	|   44.0 	|   36.0 	|   26.4 	|   17.1 	|    -  	|    -     |      Y      |     -     |
|    TCN      |    -  	|    -  	|   33.3 	|   25.6 	|   15.9 	|   9.0 	|    -     |      Y      |     -     |
|    DAP      |    -  	|    -  	|    -  	|    -  	|   13.9 	|    -  	|    -     |      Y      |134.1 fps Titan-X|
|  G-TAD      |   -   	|    -   	|   54.5	|    47.6	|   40.2 	|   30.8 	|   23.4   |      Y      |     -     |
|  PTAL-ETP   |   -   	|    -   	|   48.2	|    42.4	|   34.2 	|   23.4 	|   13.9   |      Y      |     -     |
|  CTR_AL     |   -   	|    -   	|   53.9	|    50.7	|   45.4 	|   38.0 	|   28.5   |      Y      |     -     |
| LS-TD       | arXiv  |-|63.5 |61.0 |56.7 |50.6 |42.6 |32.5 |21.4 |-    |-    |-    |    -     |
| SRG         | arXiv  |-|-    |-    |54.5 |46.9 |39.1 |31.4 |22.2 |-    |-    |-    |    -     |
| IDU         | arXiv  |-|-    |-    |- |- |- |- |- |-    |-    |Under a different metric    |    -     |


**Performance on ActivityNet v1.3 dataset.**
- The left half is score on ActivityNet v1.3 validation dataset. The right half is score on ActivityNet v1.3 testing dataset. 
- `Deep Learning`: deep learning related method.

|  Detector   |   0.50  |    0.75   |    0.95   |    @Avg   |   0.50    |   0.75    |   0.95    |    @Avg   |Deep Learning|Speed  |
| :---------: | :-----: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :------:  | :--------:  | :--:  |
|     BMN     |	50.07 	|	   34.78 	|	8.29 	    |	33.85 	  |	 -      	|	 -  	    |	 -       	|	36.42	    |      Y      | -     |
|    GTAN     |	52.61 	|	   34.14 	|	8.91 	    |	34.31 	  |	 -      	|	 -  	    |	 -  	    |	35.54	    |      Y      | -     |
|    PGCN     |	48.26 	|	  33.16 	|	3.27 	    |	31.11   	|	 -      	|	 -  	    |	 -      	|	  -  	    |      Y      | -     |
|   RTD-Net   |	46.43 	|	  30.45  	|	8.64      |	30.46   	|	 -      	|	 -  	    |	 -      	|	  -  	    |      Y      | -     |
|   BSN_ori   |	46.45 	|	  29.96 	|	8.02    	|	30.03   	|	 -      	|	 -      	|	 -      	|	32.84    	|      Y      | -     |
|   BSN_new   |	52.50 	|	  33.53 	|	8.85 	    |	33.72   	|	 -      	|	 -      	|	 -      	|	34.42	    |      Y      | -     |
|    C-TCN    |	 47.6 	|  	31.9  	|	6.2      	|	 31.1   	|	 -      	|	 -  	    |	 -      	|	  -     	|      Y      | -     |
|    PGC-TAL    |	44.31 	|  	29.85  	|	5.47  	|	 28.85   	|	 -      	|	 -  	    |	 -      	|	  -     	|      Y      | based on P-GCN  |
|    SSN      |	 -    	|	    -   	|	 -  	    |	 -  	    |	43.26    	|	28.70    	|	5.63     	|	28.28	    |      Y      | -     |
|    TAG      |	39.12 	| 	23.48 	|	5.49    	|	23.98    	|	40.69 	  |	26.02   	|	6.67 	    |	26.05	    |      Y      | -     |
|    CDC      |	45.30 	|	  26.00 	|	0.20 	    |	23.80   	|	 -      	|	 -      	|	 -      	|	 -  	    |      Y      |500 fps (GTX Titan X)|
|    TCN      |	36.44 	|	  21.15 	|	3.90 	    |	 -  	    |	37.49   	|	23.47   	|	4.47    	|	23.58	    |      Y      | -     |
|   TAL-Net   |	38.23 	|	  18.30 	|	1.30 	    |	20.22   	|	 -  	    |	 -  	    |	 -  	    |	 -  	    |      Y      | -     |
|   AGCN   |	30.4 	|	   -  		|	 -  	    |	 -  	   	|	 -  	    |	 -  	    |	 -  	    |	 -  	    |      Y      | -     |
|    SCC      |	   -  	|	 -      	|	 -  	    |	 -  	    |	39.90    	|	18.70   	|	4.70    	|	19.30	    |      Y      |  35.9 fps  |
|   R-C3D     |	26.80 	|	 -      	|	 -  	    |	12.70   	|	 -  	    |	 -  	    |	 -      	|	13.10    	|      Y      |569 fps (GTX Titan X (Maxwell))  <br>1030 fps (Titan X Pascal)|
| G-TAD       |  50.36  |   34.6    |  9.02     |  34.09    |  -        |     -     |	 -      	|	-         |      Y      | -     |
| CTR_AL      |  43.47  |   33.91   |  9.21     |  30.12    |  -        |     -     |	 -      	|	-         |      Y      | -     |
| SRG         |  46.53  |    -    |    -    |    -     |    -    |  29.98 |    -    |    -    |   -    |  4.83 |  29.72 |  -    |     -    |


**Performance on ActivityNet v1.2 dataset.**
|   LS-TD     |  50.4   | -       |   -     |  -       |    -    |  34.9  |  -      |   -     |  -     |   8.0 |  33.6  | -     |     -    |

### Temporal Action Localization
* **TGM:** AJ Piergiovanni, Michael S. Ryoo.<br />
  "Temporal Gaussian Mixture Layer for Videos." ICML (2019). 
  [[paper](https://arxiv.org/pdf/1803.06316.pdf)]
  [[code](https://github.com/piergiaj/tgm-icml19)]
  
* **RTD-Net:** Jing Tan, Jiaqi Tang, Limin Wang, Gangshan Wu.<br />
  "Relaxed Transformer Decoders for Direct Action Prop." arXiv 2102.01894. 
  [[paper](https://arxiv.org/pdf/2102.01894.pdf)]

* **DPP:** Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu.<br />
  "Deep Point-wise Prediction for Action Temporal Proposal." ArXiv 1909.07725. 
  [[paper](https://arxiv.org/pdf/1909.07725.pdf)]
  [[code](https://github.com/liluxuan1997/DPP)]
  
* **C-TCN:** Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li.<br />
  "Deep Concept-wise Temporal Convolutional Networks for Action Localization." ArXiv 1908.09442.
  [[paper](https://arxiv.org/pdf/1908.09442.pdf)]
  [[code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleVideo/models/ctcn/README.md)]

* **PGC-TAL:** Rui Su, Dong Xu, Lu Sheng, Wangli Ouyang.<br />
  "PCG-TAL: Progressive Cross-granularity Cooperation for Temporal Action Localization." TIP 2020.
  [[paper](https://ieeexplore.ieee.org/document/9298475)]]

* **BMN:** Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.<br />
  "BMN : Boundary-Matching Network for Temporal Action Proposal Generation." ICCV (2019). 
  [[paper](https://arxiv.org/pdf/1907.09702.pdf)]

* **PGCN:** Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan.<br />
  "Graph Convolutional Networks for Temporal Action Localization." ICCV (2019). 
  [[paper](https://arxiv.org/pdf/1909.03252.pdf)]
  [[code](https://github.com/Alvin-Zeng/PGCN)]

* **GTAN:** Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei.<br />
  "Gaussian Temporal Awareness Networks for Action Localization." CVPR (2019 **oral**). 
  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localization_CVPR_2019_paper.pdf)]

* **AGCN:** Jun Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song.<br />
  "Graph Attention based Proposal 3D ConvNets for Action Detection." AAAI (2020).
  [[paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf)]

* **MGG:** Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang.<br />
  "Multi-granularity Generator for Temporal Action Proposal." CVPR (2019). 
  [[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Multi-Granularity_Generator_for_Temporal_Action_Proposal_CVPR_2019_paper.pdf)]

* **D-SSAD:** Yupan Huang, Qi Dai, Yutong Lu.<br />
  "Decoupling Localization and Classification in Single Shot Temporal Action Detection." ICME (2019). 
  [[paper](https://arxiv.org/pdf/1904.07442.pdf)]
  [[code](https://github.com/HYPJUDY/Decouple-SSAD)]

* **TAL-Net:** Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar.<br />
  "Rethinking the Faster R-CNN Architecture for Temporal Action Localization." CVPR (2018). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Rethinking_the_Faster_CVPR_2018_paper.pdf)]
  
 * **FRTS:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras.<br />
    "Exploring Feature Representation and Training strategies in Temporal Action Localization." ICIP (2019). 
    [[paper](https://arxiv.org/pdf/1905.10608.pdf)]

* **BSN:** Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.<br />
  "BSN: Boundary Sensitive Network for Temporal Action Proposal Generation." ECCV (2018). 
  [[paper](https://arxiv.org/pdf/1806.02964.pdf)]
  [[code](https://github.com/wzmsltw/BSN-boundary-sensitive-network)]

* **FSN:** Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou.<br />
  "Exploring frame segmentation networks for temporal action localization." ECCV (2018). 
  [[paper](https://arxiv.org/pdf/1902.05488.pdf)]

* **CBR:** Jiyang Gao, Zhenheng Yang, Ram Nevatia.<br />
  "Cascaded Boundary Regression for Temporal Action Detection." BMVC (2017). 
  [[paper](https://arxiv.org/pdf/1705.01180.pdf)]
  [[code](https://github.com/jiyanggao/CBR)]

* **ASSA*:** Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem.<br />
  "Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization." ECCV (2018). 
  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Humam_Alwassel_Action_Search_Spotting_ECCV_2018_paper.pdf)]

* **CTAP:** Jiyang Gao, Kan Chen, Ram Nevatia.<br />
  "CTAP: Complementary Temporal Action Proposal Generation." ECCV (2018). 
  [[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiyang_Gao_CTAP_Complementary_Temporal_ECCV_2018_paper.pdf)]
  [[code](https://github.com/jiyanggao/CTAP)]

* **SS-TAD:** Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles.<br />
  "End-to-end, single-stream temporal action detection in untrimmed videos." BMVC (2017). 
  [[paper](http://vision.stanford.edu/pdf/buch2017bmvc.pdf)]
  [[code](https://github.com/shyamal-b/ss-tad)]

* **SSN:** Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin.<br />
  "Temporal Action Detection with Structured Segment Networks." ICCV (2017). 
  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhao_Temporal_Action_Detection_ICCV_2017_paper.pdf)]
  [[code](https://github.com/yjxiong/action-detection)]

* **R-C3D:** Huijuan Xu, Abir Das, Kate Saenko.<br />
  "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection." ICCV (2017). 
  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Xu_R-C3D_Region_Convolutional_ICCV_2017_paper.pdf)]
  [[code](https://github.com/VisionLearningGroup/R-C3D)]

* **TAG:** Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang.<br />
  "A Pursuit of Temporal Accuracy in General Activity Detection." arXiv (1703). 
  [[paper](https://arxiv.org/pdf/1703.02716.pdf)]

* **TPC:** Ke Yang, Peng Qiao, Dongsheng Li, Shaohe Lv, Yong Dou.<br />
  "Exploring Temporal Preservation Networks for Precise Temporal Action Localization." AAAI (2018). 
  [[paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16164/16347)]

* **TURN:** Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia.<br />
  "TURN TAP : Temporal Unit Regression Network for Temporal Action Proposals." ICCV (2017). 
  [[paper](https://arxiv.org/pdf/1703.06189.pdf)]
  [[code](https://github.com/jiyanggao/TURN-TAP)]

* **SSAD:** Tianwei Lin, Xu Zhao, Zheng Shou.<br />
  "Single shot temporal action detection." ACM MM (2017). 
  [[paper](https://arxiv.org/pdf/1710.06236.pdf)]

* **CDC:** Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.<br />
  "CDC Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos." CVPR (2017). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shou_CDC_Convolutional-De-Convolutional_Networks_CVPR_2017_paper.pdf)]
  [[code](https://github.com/ColumbiaDVMM/CDC)]
  [[project](http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/)]

* **SST:** Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles.<br />
  "Single-stream temporal action proposals." CVPR (2017). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Buch_SST_Single-Stream_Temporal_CVPR_2017_paper.pdf)]
  [[code](https://github.com/shyamal-b/sst)]

* **SCC:** Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem.<br />
  "SCC: Semantic context cascade for efficient action detection." CVPR (2017). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf)]
  [[project](https://ivul.kaust.edu.sa/Pages/pub-scc-efficient-action-detection.aspx)]

* **S-CNN:** Zheng Shou, Dongang Wang, Shih-Fu Chang.<br />
  "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR (2016). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shou_Temporal_Action_Localization_CVPR_2016_paper.pdf)]
  [[code](https://github.com/zhengshou/scnn)]

* **PSDF*:** Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim.<br />
  "Temporal Action Localization with Pyramid of Score Distribution Features." CVPR (2016). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf)]

* **SMS*:** Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng.<br />
  "Temporal Action Localization by Structured Maximal Sums." CVPR (2017). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yuan_Temporal_Action_Localization_CVPR_2017_paper.pdf)]

* **ADFG*:** Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei.<br />
  "End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR (2016). 
  [[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yeung_End-To-End_Learning_of_CVPR_2016_paper.pdf)]
  [[code](https://github.com/syyeung/frameglimpses)]

* **TCN:** Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen.<br />
  "Temporal Context Network for Activity Localization in Videos." ICCV (2017). 
  [[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Temporal_Context_Network_ICCV_2017_paper.pdf)]
  [[code](https://github.com/vdavid70619/TCN)]
  
 * **DAP:** Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem.<br />
    "DAPs: Deep Action Proposals for Action Understanding." ECCV (2016). 
    [[paper](https://ivul.kaust.edu.sa/Documents/Publications/2016/DAPs%20Deep%20Action%20Proposals%20for%20Action%20Understanding.pdf)]
    [[code](https://github.com/escorciav/daps)] 

* **G-TAD**Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem Visual Computing Center.<br />
  "G-TAD: Sub-Graph Localization for Temporal Action Detection" ArXiv(2019)
  [[paper](https://arxiv.org/pdf/1911.11462.pdf)]

* **PTAL-ETP**Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He.<br />
  "Precise Temporal Action Localization by Evolving Temporal Proposals"ArXiv(2019)
  [[paper](https://arxiv.org/pdf/1804.04803.pdf)]

* **CTR_AL**Peisen Zhao1, Lingxi Xie2, Chen Ju1, Ya Zhang1, Qi Tian.<br />
  "Constraining Temporal Relationship for Action Localization"ArXiv(2019)
  [[paper](https://arxiv.org/pdf/2002.07358.pdf)]

* **SRG**Hyunjun Eun, Sumin Lee, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim.<br />
  "SRG: Snippet Relatedness-based Temporal Action Proposal Generator"Arxiv(2019)
  [[paper](https://arxiv.org/pdf/1911.11306.pdf)]

* **LS-TD**Yuan Zhou, Hongru Li, Sun-Yuan Kung, Life Fellow.<br />
  "Temporal Action Localization using Long Short-Term Dependency"Arxiv(2019)
  [[paper](https://arxiv.org/pdf/1911.01060.pdf)]

* **IDU** Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick.<br />
  "Learning to Discriminate Information for Online Action Detection" CVPR(2020)
  [[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Eun_Learning_to_Discriminate_Information_for_Online_Action_Detection_CVPR_2020_paper.pdf)]