Full Code of SilenceEagle/paper_downloader for AI

master 7a76ffa26612 cached

30 files

345.0 KB

72.1k tokens

70 symbols

1 requests

Download .txt

Showing preview only (359K chars total). Download the full file or copy to clipboard to get everything.

Repository: SilenceEagle/paper_downloader
Branch: master
Commit: 7a76ffa26612
Files: 30
Total size: 345.0 KB

Directory structure:
gitextract_691ya0bm/

├── .gitignore
├── LICENSE
├── README.md
├── code/
│   ├── paper_downloader_AAAI.py
│   ├── paper_downloader_AAMAS.py
│   ├── paper_downloader_AISTATS.py
│   ├── paper_downloader_COLT.py
│   ├── paper_downloader_CORL.py
│   ├── paper_downloader_CVF.py
│   ├── paper_downloader_ECCV.py
│   ├── paper_downloader_ICLR.py
│   ├── paper_downloader_ICML.py
│   ├── paper_downloader_IJCAI.py
│   ├── paper_downloader_JMLR.py
│   ├── paper_downloader_NIPS.py
│   └── paper_downloader_RSS.py
├── lib/
│   ├── IDM.py
│   ├── __init__.py
│   ├── arxiv.py
│   ├── csv_process.py
│   ├── cvf.py
│   ├── downloader.py
│   ├── my_request.py
│   ├── openreview.py
│   ├── pmlr.py
│   ├── proxy.py
│   ├── springer.py
│   ├── supplement_porcess.py
│   └── user_agents.py
└── sharelinks.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# ---> Python
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
# mylib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

csv/
data/
log/
temp_zip
urls/
*.txt

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2020 silenceagle

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# paper_downloader

Download papers and supplemental materials only from **OPEN ACCESS** paper
website, such as **AAAI**, **AAMAS**, **AISTATS**, **COLT**, **CORL**, **CVPR**, **ECCV**,
**ICCV**, **ICLR**, **ICML**, **IJCAI**, **JMLR**, **NIPS**,
**RSS**, **WACV**.

---

The number of papers that could be downloaded using this repo (also provide **Aliyundrive** or **123Pan** share link and `access code`):

<sub>
<sup>

|  year\conf   | [AAAI](https://aaai.org/aaai-publications/aaai-conference-proceedings/#aaai) | [AAMAS](https://www.ifaamas.org/Proceedings/aamas2024/) |                                  [ACCV](https://openaccess.thecvf.com/menu)                                  |          [AISTATS](https://www.aistats.org/)           |           [COLT](http://learningtheory.org/)           | [CORL](https://www.corl.org/) |                                   [CVPR](http://openaccess.thecvf.com/menu)                                    |         [ECCV](https://www.ecva.net/papers.php)         |                                   [ICCV](http://openaccess.thecvf.com/menu)                                    |                    [ICLR](https://iclr.cc/)                    |                [ICML](https://icml.cc/)                 |            [IJCAI](https://www.ijcai.org/)             | [JMLR](http://www.jmlr.org/) |                [NIPS ](https://nips.cc/)                | [RSS](https://www.roboticsproceedings.org/index.html) |                                  [WACV](https://openaccess.thecvf.com/menu)                                  |
|:------------:|:----------------------------------------------------------------------------:|:-------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------:|:------------------------------------------------------:|:-----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:------------------------------------------------------:|:----------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------:|:------------------------------------------------------------------------------------------------------------:|
|   **1969**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           64                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1971**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           66                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1973**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           85                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1975**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          146                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1977**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          251                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1979**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           12                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1980**   |            [95](https://www.aliyundrive.com/s/ucngMrKSTmi)`96eg`             |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1981**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          108                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1982**   |                                     104                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1983**   |            [92](https://www.aliyundrive.com/s/L3GfxhEqyWg)`09jo`             |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          237                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1984**   |                                      69                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1985**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          259                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1986**   |                                     194                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1987**   |                                     149                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          246                           |              --              |                           90                            |                          --                           |                                                      --                                                      |
|   **1988**   |                                     159                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           94                            |                          --                           |                                                      --                                                      |
|   **1989**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          269                           |              --              |                           101                           |                          --                           |                                                      --                                                      |
|   **1990**   |                                     173                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           49                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           143                           |                          --                           |                                                      --                                                      |
|   **1991**   |                                     144                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          192                           |              --              |                           144                           |                          --                           |                                                      --                                                      |
|   **1992**   |                                     134                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           49                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           127                           |                          --                           |                                                      --                                                      |
|   **1993**   |                                     135                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          138                           |              --              |                           158                           |                          --                           |                                                      --                                                      |
|   **1994**   |                                     302                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           140                           |                          --                           |                                                      --                                                      |
|   **1995**   |                                      --                                      |                           --                            |                                                      --                                                      |                           64                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          282                           |              --              |                           152                           |                          --                           |                                                      --                                                      |
|   **1996**   |                                     275                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           152                           |                          --                           |                                                      --                                                      |
|   **1997**   |                                     186                                      |                           --                            |                                                      --                                                      |                           57                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          180                           |              --              |                           150                           |                          --                           |                                                      --                                                      |
|   **1998**   |                                     187                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           151                           |                          --                           |                                                      --                                                      |
|   **1999**   |                                     182                                      |                           --                            |                                                      --                                                      |                           17                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          204                           |              --              |                           150                           |                          --                           |                                                      --                                                      |
| **2000/v1**  |                                     221                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              11              |                           152                           |                          --                           |                                                      --                                                      |
| **2001/v2**  |                                      --                                      |                           --                            |                                                      --                                                      |                           46                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           17                           |              31              |                           197                           |                          --                           |                                                      --                                                      |
| **2002/v3**  |                                     187                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           196                           |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              59              |                           207                           |                          --                           |                                                      --                                                      |
| **2003/v4**  |                                     ---                                      |                            /                            |                                                      --                                                      |                           44                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           121                           |                          297                           |              59              |                           198                           |                          --                           |                                                      --                                                      |
| **2004/v5**  |                                     177                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           190                           |                                                       --                                                       |                               --                               |                           118                           |                           --                           |              56              |                           207                           |                          --                           |                                                      --                                                      |
| **2005/v6**  |                                     328                                      |                            /                            |                                                      --                                                      |                           56                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           133                           |                          350                           |              73              |                           207                           |                          48                           |                                                      --                                                      |
| **2006/v7**  |                                     393                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                         192+11                          |                                                       --                                                       |                               --                               |                           --                            |                           --                           |             100              |                           204                           |                          39                           |                                                      --                                                      |
| **2007/v8**  |                                     375                                      |                            /                            |                                                      --                                                      |                           86                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           150                           |                          478                           |              91              |                           217                           |                          41                           |                                                      --                                                      |
| **2008/v9**  |                                     355                                      |                           254                           |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           196                           |                                                       --                                                       |                               --                               |                           158                           |                           --                           |              97              |                           250                           |                          40                           |                                                      --                                                      |
| **2009/v10** |                                      --                                      |                           130                           |                                                      --                                                      |                           84                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           160                           |                          342                           |             100              |                           262                           |                          39                           |                                                      --                                                      |
| **2010/v11** |                                     300                                      |                           163                           |                                                      --                                                      |                          126                           |                           --                           |              --               |                                                       --                                                       |                         286+63                          |                                                       --                                                       |                               --                               |                           159                           |                           --                           |             118              |                           292                           |                          40                           |                                                      --                                                      |
| **2011/v12** |                                     302                                      |                           125                           |                                                      --                                                      |                          108                           |                           43                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           153                           |                          490                           |             105              |                           306                           |                          45                           |                                                      --                                                      |
| **2012/v13** |                                     353                                      |                           136                           |                                                      --                                                      |                          160                           |                           46                           |              --               |                                                       --                                                       |                         329+147                         |                                                       --                                                       |                               --                               |                           243                           |                           --                           |             119              |                           368                           |                          60                           |                                                      --                                                      |
| **2013/v14** |                                     251                                      |                           321                           |                                                      --                                                      |                           72                           |                           50                           |              --               |                           [471](https://www.aliyundrive.com/s/ZFvga9JZ5aY)`5p0q`+156                           |                           --                            |                                                    455+142                                                     |                              14+9                              |                           283                           |                          496                           |              84              |                           360                           |                          55                           |                                                      --                                                      |
| **2014/v15** |                                     447                                      |                           378                           |                                                      --                                                      |                          124                           |                           61                           |              --               |                                                    545+125                                                     |                         334+158                         |                                                       --                                                       |                               35                               |                           310                           |                           --                           |             120              |                           411                           |                          57                           |                                                      --                                                      |
| **2015/v16** |                                     455                                      |                           363                           |                                                      --                                                      |                          134                           |                           77                           |              --               |                                                    602+133                                                     |                           --                            |                                                    526+133                                                     |                               42                               |                           270                           |                          656                           |             118              |                           403                           |                          49                           |                                                      --                                                      |
| **2016/v17** |                                     676                                      |                           280                           |                                                      --                                                      |                          168                           |                           70                           |              --               |                                                    643+194                                                     |                         372+132                         |                                                       --                                                       |                               80                               |                           322                           |                          658                           |             236              |                           568                           |                          47                           |                                                      --                                                      |
| **2017/v18** |                                     765                                      |                           318                           |                                                      --                                                      |                          175                           |                           75                           |              48               |                                                    783+281                                                     |                           --                            |                                                    621+353                                                     |                              198                               |                           434                           |                          781                           |             234              |                           679                           |                          75                           |                                                      --                                                      |
| **2018/v19** |                                     1102                                     |                           390                           |                                                      --                                                      |                          230                           |                           94                           |              75               |                                                    979+346                                                     |                         732+262                         |                                                       --                                                       |                              336                               |                           466                           |                          870                           |              84              |                          1009                           |                          71                           |                                                      --                                                      |
| **2019/v20** |                                     1343                                     |                           433                           |                                                      --                                                      |                          403                           |                          127                           |              110              |                                                    1294+612                                                    |                           --                            |                                                    1075+498                                                    |                              502                               |                           773                           |                          964                           |             184              |                          1428                           |                          84                           |                                                      --                                                      |
| **2020/v21** |           [1864](https://www.aliyundrive.com/s/kbWKUpHGR3k)`5ls6`            |                           369                           | [254](https://www.aliyundrive.com/s/Dt2ErKCmePQ)`dn93`+[13](https://www.aliyundrive.com/s/AhGvgotrMUv)`d9o6` | [796](https://www.aliyundrive.com/s/iQ4AWTHG4bk)`61yu` | [126](https://www.aliyundrive.com/s/apP8KUFLPe4)`3mv9` |              165              | [1467](https://www.aliyundrive.com/s/eJF4BTFzFJq)`y89b`+[517](https://www.aliyundrive.com/s/5wk7Mjo9XyU)`0fz9` | [1358](https://www.aliyundrive.com/s/EYyjxRmmg8d)`a5i0` |                                                       --                                                       |     [687](https://www.aliyundrive.com/s/cVRD5Bu2SgN)`4x1c`     | [1084](https://www.aliyundrive.com/s/BHqtEbi6Dix)`5yw0` | [776](https://www.aliyundrive.com/s/vMZpsjCbWMV)`4xq3` |             254              | [1899](https://www.aliyundrive.com/s/GEMFqxKeHWu)`3g3d` |                          103                          | [378](https://www.aliyundrive.com/s/gfFKwcKrCP1)`l1m8`+[24](https://www.aliyundrive.com/s/2uCW6cq9WHk)`me08` |
| **2021/v22** |           [1961](https://www.aliyundrive.com/s/cdeGciNZch8)`b69m`            |                           304                           |                                                      --                                                      | [845](https://www.aliyundrive.com/s/3hbAhxYFHER)`93ig` | [140](https://www.aliyundrive.com/s/gwhdNT1vGDD)`96ln` |              166              |                          1660+[517](https://www.aliyundrive.com/s/ziBfXVKPXSY)`le14`                           |                           --                            | [1612](https://www.aliyundrive.com/s/ME21PfkyAec)`99uu`+[465](https://www.aliyundrive.com/s/ZahPmXSn9an)`16es` |     [860](https://www.aliyundrive.com/s/wGos6n5R93v)`ef43`     | [1183](https://www.aliyundrive.com/s/SYTtH38GiVS)`g8b1` | [723](https://www.aliyundrive.com/s/io3sAjsN5pw)`40is` |             290              | [2334](https://www.aliyundrive.com/s/13sHmhuEdxA)`v6g1` |                          92                           | [406](https://www.aliyundrive.com/s/kTwfaX9tren)`1id9`+[23](https://www.aliyundrive.com/s/7Joy4svvUfy)`90rl` |
| **2022/v23** |           [1624](https://www.aliyundrive.com/s/ePXvUw4VFdQ)`fp76`            |                           306                           | [279](https://www.aliyundrive.com/s/zCCTJMPrfSr)`47jy`+[25](https://www.aliyundrive.com/s/f4kdMXixwJL)`s7a9` | [492](https://www.aliyundrive.com/s/xj2fRMwZxfC)`f16o` |                          155                           |              197              | [2077](https://www.aliyundrive.com/s/Q8DG9dKbx6S)`i16a`+[562](https://www.aliyundrive.com/s/f9Zx3hFFyq4)`11kj` | [1645](https://www.aliyundrive.com/s/dv4fhuueRHs)`6d7j` |                                                       --                                                       | [54+176+865](https://www.aliyundrive.com/s/gfANcdbM9TC)`b1l3`  | [1234](https://www.aliyundrive.com/s/eopQ5H8Hz2a)`81ov` | [862](https://www.aliyundrive.com/s/DBVKNsqN2UZ)`ea46` |             351              | [2673](https://www.aliyundrive.com/s/VFLmfnzSAsA)`eh49` |                          74                           | [406](https://www.aliyundrive.com/s/xRhdpencLQU)`ab53`+[80](https://www.aliyundrive.com/s/JCCcQXij7WX)`q6d2` |
| **2023/v24** |                                     2021                                     |                           527                           |                                                      --                                                      | [496](https://www.aliyundrive.com/s/CD3Kz9cxu1U)`l5m9` |                          170                           |              199              |                                          [2358+698](./sharelinks.md)                                           |                           --                            |                                                    2161+491                                                    | [90+284+1205](https://www.aliyundrive.com/s/PZ1Wann4B8A)`29sf` |                          1805                           |                          846                           |             397              |                       67+378+2773                       |                          112                          | [639](https://www.aliyundrive.com/s/fP52KxJEUE5)`mo78`+[74](https://www.aliyundrive.com/s/XZG992JqQfn)`nj80` |
| **2024/v25** |                                     2581                                     |                           460                           |                                                    268+46                                                    |                          547                           |                          170                           |              264              |                                                    2716+773                                                    |                          2387                           |                                                       --                                                       |                          86+369+1810                           |                      144+191+2275                       |                          1048                          |             419              |                       61+326+3650                       |                          131                          |                                                   846+120                                                    |
| **2025/v26** |                                     3028                                     |                           479                           |                                                     ---                                                      |                          583                           |                          182                           |              263              |                                                    2871+659                                                    |                           --                            |                                                    2701+765                                                    |                      208+373+3060+6+6+56                       |                      108+211+2967                       |                          1276                          |             308              |                       77+683+4515                       |                          163                          |                                                     929                                                      |
| **2026/v27** |                                     2375                                     |                          29 May                         |                                                    18 Dec.                                                   |                         2 May                          |                         3 July                         |             12 Nov.           |                                                     7 June                                                     |                         13 Sep.                         |                                                     29 Sep.                                                    |                            225+5131                            |                         11 July                         |                         21 Aug                         |             50               |                          13 Dec                         |                        17 July                        |                                                   831+191                                                    |

</sup>
</sub>

<!--| **2023/v24** |                                             
|                                                                                                                   |                                                             |                                                             |                                                                                                                     |                                                                                                                    |                                                                                                                     |                                                                    |                                                              |                                                             |                              |                                                              |      |                                                                                                             |-->

[Download from 123pan.com](https://www.123pan.com/s/PwXljv-QErwd.html)
(ACCESS CODE: `FdX2`)

 (May miss some papers due to the (older version of) 123pan's limitation on the length of filename)

NOTE: all the shared papers' pdf files are collected from network, and the original authors/providers hold the copyrights.

---

## Usage

**For example: download AAAI-2022 papers**

1. Install [Internet Downloader Manager/IDM](https://www.internetdownloadmanager.com/) [*Windows*] [*OPTIONAL*]

   **Note:** If the IDM is NOT installed at the DEFAULT location, then the
   code in [lib/IDM.py](./lib/IDM.py) should also be modified:

   ```python
   # should replace with your IDM path
   idm_path = '"your path to IDMan.exe"'  

   # default:
   # idm_path = '"C:\Program Files (x86)\Internet Download Manager\IDMan.exe"'
   ```

   **Uesful tip**: [Disable the downloading popup pages of IDM would be better](https://github.com/SilenceEagle/paper_downloader/issues/17#issuecomment-773763300)

2. Install [Chrome](https://www.google.com/chrome) [Needed for `ICLR`, `ICML`, some of `NIPS` and `CORL` papers]
3. Change the code block at the end of
   [code/paper_downloader_AAAI.py](./code/paper_downloader_AAAI.py)

   ```python
   if __name__ == '__main__':
      year = 2022
      total_paper_number = save_csv(year)  # save papers urls to csv/AAAI_2022.csv
      download_from_csv(
         year, 
         save_dir=f'..\\AAAI_{year}', # change to your save location
         time_step_in_seconds=5,  # time step (seconds) between two downloading requests
         total_paper_number=total_paper_number,
         downloader=None # use python "requests" package to download papers, workable on Windows/MacOS/Linux
         # downloader='IDM'  # use Internet Download Manager software to 
                              # download papers, Windows only
      )
   ```

4. Then run the code:

   ```python
   python code/paper_downloader_AAAI.py  # download AAAI papers
   ```

---

**This repo also provides the function to process supplemental material:**

1. Merge the main supplemental material pdf file and the main paper into one single pdf file;
2. Move the supplemental material pdf files (extracted from the downloaded zip files if presented) into the main papers' folder.

## Star history

[![Star History Chart](https://api.star-history.com/svg?repos=SilenceEagle/paper_downloader&type=Date)](https://star-history.com/#SilenceEagle/paper_downloader&Date)


================================================
FILE: code/paper_downloader_AAAI.py
================================================
"""paper_downloader_AAAI.py"""
import time
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
import random

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.user_agents import user_agents
from lib.my_request import urlopen_with_retry


def get_track_urls(year):
    """
    get all the technical tracks urls given AAAI proceeding year
    Args:
        year (int): AAAI proceeding year, such 2023

    Returns:
        dict : All the urls of technical tracks included in
            the given AAAI proceeding. Keys are the tracks name-volume,
            and values are the corresponding urls.
    """
    # assert int(year) >= 2023, f"only support year >= 2023, but get {year}!!!"
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    dat_file_pathname = os.path.join(
        project_root_folder, 'urls', f'track_archive_url_AAAI_{year}.dat'
    )
    proceeding_th_dict = {
        1980: 1,
        1902: 2,
        1983: 3,
        1984: 4,
        1986: 5,
        1987: 6,
        1988: 7,
        1990: 8,
        1991: 9,
        1992: 10,
        1993: 11,
        1994: 12,
        1996: 13,
        1997: 14,
        1998: 15,
        1999: 16,
        2000: 17,
        2002: 18,
        2004: 19,
        2005: 20,
        2006: 21,
        2007: 22,
        2008: 23
    }
    if year >= 2023:
        base_url = r'https://ojs.aaai.org/index.php/AAAI/issue/archive'
        headers = {
            'User-Agent': user_agents[-1],
            'Host': 'ojs.aaai.org',
            'Referer': "https://ojs.aaai.org",
            'GET': base_url
        }
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            content = urlopen_with_retry(url=base_url, headers=headers)
            # req = urllib.request.Request(url=base_url, headers=headers)
            # content = urllib.request.urlopen(req).read()
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        tracks = soup.find('ul', {'class': 'issues_archive'}).find_all('li')
        track_urls = dict()
        for tr in tracks:
            h2 = tr.find('h2')
            this_track = slugify(h2.a.text)
            if this_track.startswith(f'aaai-{year-2000}'):
                this_track += slugify(h2.div.text) + '-' + this_track
                this_url = h2.a.get('href')
                track_urls[this_track] = this_url
                print(f'find track: {this_track}({this_url})')
    else:
        if year >= 2010:
            proceeding_th = year - 1986
        elif year in proceeding_th_dict:
            proceeding_th = proceeding_th_dict[year]
        else:
            print(f'ERROR: AAAI proceeding was not held in year {year}!!!')
            return

        base_url = f'https://aaai.org/proceeding/aaai-{proceeding_th:02d}-{year}/'
        headers = {
            'User-Agent': user_agents[-1],
            'Host': 'aaai.org',
            'Referer': "https://aaai.org",
            'GET': base_url
        }
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            # req = urllib.request.Request(url=base_url, headers=headers)
            # content = urllib.request.urlopen(req).read()
            content = urlopen_with_retry(url=base_url, headers=headers)
            # content = open(f'..\\AAAI_{year}.html', 'rb').read()
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        tracks = soup.find('main', {'class': 'content'}).find_all('li')
        track_urls = dict()
        for tr in tracks:
            this_track = slugify(tr.a.text)
            this_url = tr.a.get('href')
            track_urls[this_track] = this_url
            print(f'find track: {this_track}({this_url})')
    return track_urls


def get_papers_of_track_ojs(track_url):
    """
    get all the papers' title, belonging track group name and download link.
    the link should be hosted on https://ojs.aaai.org/
    Args:
        track_url (str): track url

    Returns:
        list[dict]: a list contains all the collected papers' information,
            each item in list is a dictionary, whose keys include
            ['title', 'main link', 'group']
            And the group is the specific track name.
    """
    debug = False
    paper_list = []
    headers = {
        'User-Agent': user_agents[-1],
        'Host': 'ojs.aaai.org',
        'Referer': "https://ojs.aaai.org",
        'GET': track_url
    }
    content = urlopen_with_retry(url=track_url, headers=headers)

    soup = BeautifulSoup(content, 'html5lib')
    tracks = soup.find('div', {'class': 'sections'}).find_all(
        'div', {'class': 'section'})
    for tr in tracks:
        this_group = slugify(tr.h2.text)
        this_paper_dict = {
            'group': this_group,
            'title': '',
            'main link': ''
        }
        papers = tr.find_all('li')
        for p in papers:
            this_paper_dict['title'] = ''
            this_paper_dict['main link'] = ''
            try:
                title = slugify(p.find('h3', {'class': 'title'}).text)
                link = p.find(
                    'a', {'class': 'obj_galley_link pdf'}
                ).get('href').replace('view', 'download')
                this_paper_dict['title'] = title
                this_paper_dict['main link'] = link
                paper_list.append(this_paper_dict.copy())
                if debug:
                    print(
                        f'paper: {title}\n\tlink:{link}\n\tgroup:{this_group}')
            except Exception as e:
                # skip unwanted target
                # print(f'ERROR: {str(e)}')
                pass
                # continue

    return paper_list


def get_papers_of_track(track_url):
    """
    get all the papers' title, belonging track group name and download link.
    the link should be hosted on https://aaai.org/
    Args:
        track_url (str): track url

    Returns:
        list[dict]: a list contains all the collected papers' information,
            each item in list is a dictionary, whose keys include
            ['title', 'main link', 'group']
            And the group is the specific track name.
    """
    debug = False
    paper_list = []
    headers = {
        'User-Agent': user_agents[-1],
        'Host': 'aaai.org',
        'Referer': "https://aaai.org",
        'GET': track_url
    }
    content = urlopen_with_retry(url=track_url, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    tracks = soup.find('main', {'id': 'genesis-content'}).find_all(
        'div', {'class': 'track-wrap'})
    for tr in tracks:
        this_group = slugify(tr.h2.text)
        this_paper_dict = {
            'group': this_group,
            'title': '',
            'main link': ''
        }
        papers = tr.find_all('li')
        for p in papers:
            this_paper_dict['title'] = ''
            this_paper_dict['main link'] = ''
            try:
                title = slugify(p.find('h5').text)
                link = p.find(
                    'a', {'class': 'wp-block-button'}
                ).get('href')
                this_paper_dict['title'] = title
                this_paper_dict['main link'] = link
                paper_list.append(this_paper_dict.copy())
                if debug:
                    print(
                        f'paper: {title}\n\tlink:{link}\n\tgroup:{this_group}')
            except Exception as e:
                # skip unwanted target
                # print(f'ERROR: {str(e)}')
                pass
                # continue

    return paper_list


def save_csv(year):
    """
    write AAAI papers' urls in one csv file
    :param year: int, AAAI year, such 2019
    :return: peper_index: int, the total number of papers
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'AAAI_{year}.csv'
    )
    error_log = []
    paper_index = 0
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'group']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        track_urls = get_track_urls(year)
        for tr_name in track_urls:
            tr_url = track_urls[tr_name]
            print(f'collecting paper from {tr_name}({tr_url})')
            if year >= 2023:
                papers_dict_list = get_papers_of_track_ojs(tr_url)
            else:
                papers_dict_list = get_papers_of_track(tr_url)
            print(f'\tfind {len(papers_dict_list)} papers')
            for p in papers_dict_list:
                paper_index += 1
                writer.writerow(p)
            csvfile.flush()
            s = random.randint(3, 7)
            print(f'random sleeping {s} seconds...')
            time.sleep(s)  # avoid requesting too frequently

    #  write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')
    return paper_index


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None,
        csv_filename=None, downloader='IDM'):
    """
    download all AAAI paper given year
    :param year: int, AAAI year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param csv_filename: None or str, the csv file's name, None means to use
        default setting
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    postfix = f'AAAI_{year}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_path = os.path.join(
        project_root_folder, 'csv',
        f'AAAI_{year}.csv' if csv_filename is None else csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    # total_paper_number = 3028
    total_paper_number = save_csv(year)
    download_from_csv(
        year,
        save_dir=fr'D:\AAAI_{year}',
        time_step_in_seconds=15,
        total_paper_number=total_paper_number)
    # for year in range(2012, 2018, 2):
    #     print(year)
    #     total_paper_number = None
    #     # total_paper_number = save_csv(year)
    #     download_from_csv(year, save_dir=f'..\\AAAI_{year}',
    #                       time_step_in_seconds=10,
    #                       total_paper_number=total_paper_number)
    #     time.sleep(2)
    # for i in range(1, 12):
    #     print(f'issue {i}/{11}')
    #     year = 2022
    #     total_paper_number = save_csv_given_urls(
    #         urls=f'https://www.aaai.org/Library/AAAI/aaai{year - 2000}-issue{i:0>2}.php',
    #         csv_filename=f'.\AAAI_{year}_issue_{i}.csv'
    #     )
    #     # total_paper_number = 156
    #     download_from_csv(
    #         year=year,
    #         csv_filename=f'.\AAAI_{year}_issue_{i}.csv',
    #         save_dir=rf'D:\AAAI_{year}',
    #         time_step_in_seconds=1,
    #         total_paper_number=total_paper_number)

    # print(get_track_urls(1980))
    # get_papers_of_track(r'https://ojs.aaai.org/index.php/AAAI/issue/view/548')

    pass


================================================
FILE: code/paper_downloader_AAMAS.py
================================================
"""paper_downloader_AAMAS.py
"""

import time
import urllib
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write AAMAS papers' urls in one csv file
    :param year: int, AAMAS year, such 2023
    :return: peper_index: int, the total number of papers
    """
    conference = "AAMAS"
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_{year}.csv'
    )

    init_url_dict = {
        2010: 'https://www.ifaamas.org/Proceedings/aamas2010/resources/_fullpapers.html',
        2009: 'https://www.ifaamas.org/Proceedings/aamas2009/TOC/01_FP/FP_Session.html',
        2008: 'https://www.ifaamas.org/Proceedings/aamas2008/proceedings/mainTrackPapers.htm',
    }

    error_log = []
    paper_index = 0
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'group', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        if year >=  2013:
            init_url = f'https://www.ifaamas.org/Proceedings/aamas{year}' \
                f'/forms/contents.htm'
        elif year >= 2011:
            init_url = f'https://www.ifaamas.org/Proceedings/aamas{year}'\
                f'/resources/fullpapers.html'
        elif year in init_url_dict:
            init_url = init_url_dict[year]
        else:   
            # TODO: support downloading 2002 ~ 2007 papers
            return
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', 
            f'init_url_{conference}_{year}.dat'''
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                    'AppleWebKit/537.36 (KHTML, like Gecko) '
                    'Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0'}
            content = urlopen_with_retry(url=init_url, headers=headers)
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)

        soup = BeautifulSoup(content, 'html5lib')
        # soup = BeautifulSoup(content, 'html.parser')
        if year >=  2013:
            group_list = soup.find('tbody').find_all('tr', recursive=False)[3:]
            # skip "conference title", "Table of Contents" and "Contents table"  
            
            group_list_bar = tqdm(group_list)
            paper_index = 0
            is_start = False
            for group in group_list_bar:
                if not is_start:
                    # if group.find('a', {'id': 'KT'}): # year 2019, 2023, 2024
                    #     is_start = True
                    if group.find('strong'):
                        group_text = slugify(group.find('strong').text)
                        if not group_text.startswith('table') and \
                            not group_text.startswith('aamas'):  
                            # skip Table of Contents, AAMAS 20xx
                            is_start = True
                        else:
                            continue
                    else:
                        continue
                
                try:
                    tds = group.find_all('td', recursive=False)
                    if len(tds) < 2:
                        continue
                    group = tds[1]
                    papers = group.find_all('p')

                    for p in papers:
                        # group title is in <strong>...</strong>
                        if p.find('strong', recursive=False):
                            group_title = slugify(p.text)
                            continue
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        if p.find('a') is None and p.find('b') is None:
                            # last empty <p>...</p> in some <tr>...</tr>
                            continue
                        a = p.find('a')
                        if a is None:
                            title = slugify(p.find('b').text)
                            main_link = ''
                            print(f'\nWarning: No link found for {title}!')
                        else:
                            title = slugify(a.text)
                            main_link = urllib.parse.urljoin(init_url, a.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        elif year >= 2010:
            class_name = {
                2010: 'plist',
                2011: 'plist',
                2012: 'pindex'
            }
            papers = soup.find('div', {'class': class_name[year]}).find_all(['h2', 'div'])
            papers_bar = tqdm(papers)
            paper_index = 0
            for p in papers_bar:
                if p.name == 'h2': # group title
                    group_title = slugify(p.text)
                else:  # div, paper
                    paper_dict = {'title': '',
                                'group': group_title,
                                'main link': '',
                                'supplemental link': ''}
                    a = p.find('span', {'class': 'title'}).find('a')
                    # title = slugify(a.find(string=True, recursive=False)) # drop abs
                    direct_text = ''.join(child for child in a.contents 
                                          if isinstance(child, str)).strip()
                    title = slugify(direct_text)
                    main_link = urllib.parse.urljoin(init_url, a.get('href'))
                    paper_dict['title'] = title
                    paper_dict['main link'] = main_link
                    paper_index += 1
                    papers_bar.set_description_str(
                        f'Collected paper {paper_index}: {title}')
                    writer.writerow(paper_dict)
                    csvfile.flush()  # write to file immediately
        elif year == 2009:
            group_list = soup.find('div', {'id': 'mainContent'}).find_all('p')
            group_list_bar = tqdm(group_list)
            paper_index = 0
            is_start = False
            for group in group_list_bar:
                if not is_start:
                    if group.find('strong'):
                        group_text = slugify(group.find('strong').text)
                        is_start = True
                    else:
                        continue
                if group.find('strong'):
                    group_title = slugify(group.text)
                    continue
                try:
                    papers = group.find_all('a')
                    for p in papers:
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        title = slugify(p.text)
                        main_link = urllib.parse.urljoin(init_url, p.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        elif year == 2008:
            # papers = soup.find_all(lambda tag: 
            #     (tag.name == 'p' and 'title' in tag.get('class', [])) or 
            #     tag.name == 'a'
            # )
            group_list = soup.find('div', {'id': 'mainbody'}).find(
                'table').find('tbody').find_all('tr', recursive=False)[2:]
            # skip "conference title", "Table of Contents" 
            
            group_list_bar = tqdm(group_list)
            paper_index = 0
            for group in group_list_bar:
                
                try:
                    p_class_title = group.find('p', {'class': 'title'})
                    h3 = group.find('h3')
                    if p_class_title:                       
                        group_title = slugify(p_class_title.text)
                    elif h3:  # find <h3></h3>
                        group_title = slugify(h3.text)
                    else:
                        raise ValueError('Parse group title failed!')

                    papers = group.find_all('a')

                    for p in papers:
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        
                        title = slugify(p.text)
                        if not p.get('href'):
                            continue # group title
                        main_link = urllib.parse.urljoin(init_url, p.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        else:
            # TODO: support downloading 2002 ~ 2008 papers
            return

    #  write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')
    return paper_index


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None,
        csv_filename=None, downloader='IDM', is_random_step=True,
        proxy_ip_port=None):
    """
    download all AAMAS paper given year
    :param year: int, AAMAS year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param csv_filename: None or str, the csv file's name, None means to use
        default setting
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    conference = "AAMAS"
    postfix = f'{conference}_{year}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_path = os.path.join(
        project_root_folder, 'csv',
        f'{conference}_{year}.csv' if csv_filename is None else csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader,
        is_random_step=is_random_step,
        proxy_ip_port=proxy_ip_port
    )


if __name__ == '__main__':
    year = 2025
    # total_paper_number = 2021
    total_paper_number = save_csv(year)
    download_from_csv(
        year,
        save_dir=fr'D:\AAMAS_{year}',
        time_step_in_seconds=5,
        total_paper_number=total_paper_number)
    # for year in range(2008, 2025, 1):
    #     print(year)
    #     # total_paper_number = 134
    #     total_paper_number = save_csv(year)
    #     download_from_csv(year, save_dir=fr'E:\AAMAS\AAMAS_{year}',
    #                       time_step_in_seconds=10,
    #                       total_paper_number=total_paper_number)
    #     time.sleep(2)

    pass

================================================
FILE: code/paper_downloader_AISTATS.py
================================================
"""paper_downloader_AISTATS.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr
from lib.supplement_porcess import merge_main_supplement, move_main_and_supplement_2_one_directory, \
    move_main_and_supplement_2_one_directory_with_group


def download_paper(year, save_dir, is_download_supplement=True, time_step_in_seconds=5, downloader='IDM'):
    """
    download all AISTATS paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, AISTATS year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    AISTATS_year_dict = {
        2025: 258,
        2024: 238,
        2023: 206,
        2022: 151,
        2021: 130,
        2020: 108,
        2019: 89,
        2018: 84,
        2017: 54,
        2016: 51,
        2015: 38,
        2014: 33,
        2013: 31,
        2012: 22,
        2011: 15,
        2010: 9,
        2009: 5,
        2007: 2
    }
    AISTATS_year_dict_R = {
        1995: 0,
        1997: 1,
        1999: 2,
        2001: 3,
        2003: 4,
        2005: 5

    }
    if year in AISTATS_year_dict.keys():
        volume = f'v{AISTATS_year_dict[year]}'
    elif year in AISTATS_year_dict_R.keys():
        volume = f'r{AISTATS_year_dict_R[year]}'
    else:
        raise ValueError('''the given year's url is unknown !''')
    postfix = f'AISTATS_{year}'

    pmlr.download_paper_given_volume(
        volume=volume,
        save_dir=save_dir,
        postfix=postfix,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'D:\AISTATS_{year}',
        is_download_supplement=True,
        time_step_in_seconds=25,
        downloader='IDM'
    )
    # move_main_and_supplement_2_one_directory(
    #     main_path=rf'D:\AISTATS_{year}\main_paper',
    #     supplement_path=rf'D:\AISTATS_{year}\supplement',
    #     supp_pdf_save_path=rf'D:\AISTATS_{year}\supplement_pdf'
    # )
    pass


================================================
FILE: code/paper_downloader_COLT.py
================================================
"""paper_downloader_COLT.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr


def download_paper(year, save_dir, is_download_supplement=False, time_step_in_seconds=5, downloader='IDM'):
    """
    download all COLT paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, COLT year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    COLT_year_dict = {
        2025: 291,
        2024: 247,
        2023: 195,
        2022: 178,
        2021: 134,
        2020: 125,
        2019: 99,
        2018: 75,
        2017: 65,
        2016: 49,
        2015: 40,
        2014: 35,
        2013: 30,
        2012: 23,
        2011: 19
                      }
    if year in COLT_year_dict.keys():
        volume = f'v{COLT_year_dict[year]}'
    else:
        raise ValueError('''the given year's url is unknown !''')
    postfix = f'COLT_{year}'

    pmlr.download_paper_given_volume(
        volume=volume,
        save_dir=save_dir,
        postfix=postfix,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'D:\COLT_{year}',
        is_download_supplement=False,
        time_step_in_seconds=3,
        downloader='IDM'
    )
    pass


================================================
FILE: code/paper_downloader_CORL.py
================================================
"""paper_downloader_CORL.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr
import lib.openreview as openreview


def download_paper(year, save_dir, is_download_supplement=False, 
                   time_step_in_seconds=5, downloader='IDM',
                   source=None, proxy_ip_port=None):
    """
    download all CORL paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, CORL year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param source: str, download source, support "pmlr" and "openreview". 
        Defaults to None, means first try to download from pmlr. If failed, 
        then try to download from openreview.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890".  Only useful for webdriver and request
        downloader (downloader=None). Default: None.
    :type proxy_ip_port: str | None
    :return: True
    """
    CORL_year_dict = {
        2025: 305,
        2024: 270,
        2023: 229,
        2022: 205,
        2021: 164,
        2020: 155,
        2019: 100,
        2018: 87,
        2017: 78
    }
    postfix = f'CORL_{year}'

    if source != 'openreview':
        if year in CORL_year_dict.keys():  # download from pmlr
            volume = f'v{CORL_year_dict[year]}'
            pmlr.download_paper_given_volume(
                volume=volume,
                save_dir=save_dir,
                postfix=postfix,
                is_download_supplement=is_download_supplement,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader
            )
            return True
        elif source == 'pmlr':
            raise ValueError(f'Not found CoRL {year} in pmlr!')
        
    # try to download from openreview
    base_url = f'https://openreview.net/group?id=robot-learning.org/'\
               f'CoRL/{year}/Conference'
    group_id_dict = {
        2023: ['accept--oral-', 'accept--poster-'],
        2024: ['accept']
    }
    for gid in group_id_dict[year]:
        openreview.download_papers_given_url_and_group_id(
            save_dir=save_dir,
            year=year,
            base_url=f'{base_url}#{gid}',
            group_id=gid,
            conference='CORL',
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )
    return True 


if __name__ == '__main__':
    year=2025
    download_paper(
        year,
        rf'D:\CORL\CORL_{year}',
        is_download_supplement=False,
        time_step_in_seconds=30,
        downloader='IDM'
        # downloader = None
    )
    pass


================================================
FILE: code/paper_downloader_CVF.py
================================================
"""paper_downloader_CVF.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from slugify import slugify
import csv
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.supplement_porcess import merge_main_supplement, move_main_and_supplement_2_one_directory, \
    move_main_and_supplement_2_one_directory_with_group, \
    rename_2_short_name, rename_2_short_name_within_group
from lib.cvf import get_paper_dict_list
from lib import csv_process
import time
from lib.my_request import urlopen_with_retry


def save_csv(year, conference, proxy_ip_port=None):
    """
    write CVF conference papers' and supplemental material's urls in one csv file
    :param year: int
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV', 'ACCV']
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    if conference not in ['CVPR', 'ICCV', 'WACV', 'ACCV']:
        raise ValueError(f'{conference} is not found in '
                         f'https://openaccess.thecvf.com/menu, '
                         f'maybe a spelling mistake!')
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_{year}.csv'
    )
    print(f'saving {conference}-{year} paper urls into {csv_file_pathname}')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link', 'arxiv']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        init_url = f'http://openaccess.thecvf.com/{conference}{year}'
        if conference == 'ICCV' and year == 2021:
            init_url = 'https://openaccess.thecvf.com/ICCV2021?day=all'
        elif conference == 'CVPR' and year >= 2022:
            init_url = f'https://openaccess.thecvf.com/CVPR{year}?day=all'
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_{conference}_{year}.dat'
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            content = urlopen_with_retry(
                url=init_url, headers=headers, proxy_ip_port=proxy_ip_port)
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)

        soup = BeautifulSoup(content, 'html5lib')
        tmp_list = soup.find('div', {'id': 'content'}).find_all('dt')
        if len(tmp_list) <= 1:
            paper_different_days_list_bar = soup.find(
                'div', {'id': 'content'}).find_all('dd')
            paper_index = 0
            for group in paper_different_days_list_bar:
                # get group name
                a = group.find('a')
                print(a.text)
                group_link = urllib.parse.urljoin(init_url, a.get('href'))
                group_paper_dict_list, _ = get_paper_dict_list(
                    url=group_link
                )
                paper_index += len(group_paper_dict_list)
                for paper_dict in group_paper_dict_list:
                    writer.writerow(paper_dict)
            return paper_index
        else:
            paper_dict_list, content = get_paper_dict_list(
                url=init_url,
                content=content)
            for paper_dict in paper_dict_list:
                writer.writerow(paper_dict)
            return len(paper_dict_list)


def save_csv_workshops(year, conference, proxy_ip_port=None):
    """
    write CVF workshops papers' and supplemental material's urls in one csv file
    :param year: int
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV', 'ACCV']
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    if conference not in ['CVPR', 'ICCV', 'WACV', 'ACCV']:
        raise ValueError(f'{conference} is not found in '
                         f'https://openaccess.thecvf.com/menu, '
                         f'maybe a spelling mistake!')
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_WS_{year}.csv'
    )
    print(f'saving {conference}-WS-{year} paper urls into {csv_file_pathname}')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['group', 'title', 'main link', 'supplemental link',
                      'arxiv']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}

        init_url = f'https://openaccess.thecvf.com/' \
                   f'{conference}{year}_workshops/menu'
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_{conference}_WS_{year}.dat'
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            content = urlopen_with_retry(
                url=init_url, headers=headers, proxy_ip_port=proxy_ip_port)
            # content = open(f'..\\{conference}_WS_{year}.html', 'rb').read()
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        paper_group_list_bar = soup.find('div', {'id': 'content'}).find_all('dd')
        paper_index = 0
        for group in paper_group_list_bar:
            # get group name
            a = group.find('a')
            group_name = slugify(a.text)
            print(f'GROUP: {group_name}')

            group_link = urllib.parse.urljoin(init_url, a.get('href'))

            repeat_time = 3
            for r in range(repeat_time):
                try:
                    group_paper_dict_list, _ = get_paper_dict_list(
                        url=group_link,
                        group_name=group_name,
                        timeout=20,
                    )
                    time.sleep(1)
                    break
                except Exception as e:
                    if r + 1 == repeat_time:
                        print(f'ERROR: {str(e)}')
                        continue

            paper_index += len(group_paper_dict_list)
            for paper_dict in group_paper_dict_list:
                writer.writerow(paper_dict)
    return paper_index


def download_from_csv(
        year, conference, save_dir, is_download_main_paper=True,
        is_download_supplement=True, time_step_in_seconds=5,
        total_paper_number=None, is_workshops=False, downloader='IDM',
        proxy_ip_port=None):
    """
    download all CVF paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, CVF year, such 2019
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV']
    :param save_dir: str, paper and supplement material's save path
    :param is_download_main_paper: bool, True for downloading main paper
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two downloading
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param is_workshops: bool, is to download workshops from csv file.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    postfix = f'{conference}_{year}'
    if is_workshops:
        postfix = f'{conference}_WS_{year}'
    csv_file_path = os.path.join(
        project_root_folder,
        'csv',
        f'{conference}_{year}.csv' if not is_workshops else
        f'{conference}_WS_{year}.csv'
    )
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_main_paper=is_download_main_paper,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader,

    )
    return True


def download_paper(
        year, conference, save_dir, is_download_main_paper=True,
        is_download_supplement=True, time_step_in_seconds=5,
        is_download_main_conference=True, is_download_workshops=True,
        downloader='IDM', proxy_ip_port=None):
    """
    download all CVF papers in given year, support downloading main conference
    and workshops.
    :param year: int, CVF year, such 2019.
    :param conference: str, one of {'CVPR', 'ICCV', 'WACV'}.
    :param save_dir: str, paper and supplement material's save path.
    :param is_download_main_paper: bool, True for downloading main paper.
    :param is_download_supplement: bool, True for downloading supplemental
        material.
    :param time_step_in_seconds: int, the interval time between two downloading
        request in seconds.
    :param is_download_main_conference: bool, this parameter controls whether to
        download main conference papers,
        it is a upper level control flag of parameters is_download_main_paper
        and is_download_supplement. eg. After setting
        is_download_main_conference=True, is_download_main_paper=False,
        is_download_supplement=True, the only the supplement materials of the
        conference (vs. workshops) will be downloaded.
    :param is_download_workshops: bool, True for downloading workshops paper
        and is similar with is_download_main_conference.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    # main conference
    if is_download_main_conference:
        csv_file_path = os.path.join(
            project_root_folder, 'csv', f'{conference}_{year}.csv')
        if not os.path.exists(csv_file_path):
            total_paper_number = save_csv(
                year=year, conference=conference, proxy_ip_port=proxy_ip_port)
        else:
            with open(csv_file_path, newline='') as csvfile:
                myreader = csv.DictReader(csvfile, delimiter=',')
                total_paper_number = sum(1 for row in myreader)

        download_from_csv(
            year=year,
            conference=conference,
            save_dir=os.path.join(save_dir, f'{conference}_{year}'),
            is_download_main_paper=is_download_main_paper,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            total_paper_number=total_paper_number,
            is_workshops=False,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )

    # workshops
    if is_download_workshops:
        csv_file_path = os.path.join(
            project_root_folder, 'csv', f'{conference}_WS_{year}.csv')
        if not os.path.exists(csv_file_path):
            total_paper_number = save_csv_workshops(
                year=year, conference=conference, proxy_ip_port=proxy_ip_port)
        else:
            with open(csv_file_path, newline='') as csvfile:
                myreader = csv.DictReader(csvfile, delimiter=',')
                total_paper_number = sum(1 for row in myreader)
        download_from_csv(
            year=year,
            conference=conference,
            save_dir=os.path.join(save_dir, f'{conference}_WS_{year}'),
            is_download_main_paper=is_download_main_paper,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            total_paper_number=total_paper_number,
            is_workshops=True,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )


if __name__ == '__main__':
    year = 2025
    conference = 'CVPR'
    download_paper(
        year,
        conference=conference,
        save_dir=fr'D:\{conference}',
        is_download_main_paper=True,
        is_download_supplement=True,
        time_step_in_seconds=10,
        is_download_main_conference=True,
        is_download_workshops=True,
        # proxy_ip_port='127.0.0.1:7897'
    )
    #
    # move_main_and_supplement_2_one_directory(
    #     main_path=rf'E:\{conference}\{conference}_{year}\main_paper',
    #     supplement_path=rf'E:\{conference}\{conference}_{year}\supplement',
    #     supp_pdf_save_path=rf'E:\{conference}\{conference}_{year}\main_paper'
    # )
    # move_main_and_supplement_2_one_directory_with_group(
    #     main_path=rf'E:\{conference}\{conference}_WS_{year}\main_paper',
    #     supplement_path=rf'E:\{conference}\{conference}_WS_{year}\supplement',
    #     supp_pdf_save_path=rf'E:\{conference}\{conference}_WS_{year}\main_paper'
    # )

    # rename to short filename for uploading to 123pan
    # rename_2_short_name(
    #     src_path=r'E:\CVPR\CVPR_2024\main_paper',
    #     save_path=r'E:\short_name_cvpr2024',
    #     target_max_length=128
    # )
    # rename_2_short_name_within_group(
    #     src_path=r'E:\CVPR\CVPR_WS_2024\main_paper',
    #     save_path=r'E:\short_name_cvpr2024_ws',
    #     target_max_length=128
    # )
    pass


================================================
FILE: code/paper_downloader_ECCV.py
================================================
"""paper_downloader_ECCV.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.supplement_porcess import move_main_and_supplement_2_one_directory
import lib.springer as springer
from lib import csv_process
from lib.downloader import Downloader
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write ECCV papers' and supplemental material's urls in one csv file
    :param year: int
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'ECCV_{year}.csv')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_ECCV_{year}.dat')
        if year >= 2018:
            init_url = f'https://www.ecva.net/papers.php'
            if os.path.exists(dat_file_pathname):
                with open(dat_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_url, headers=headers)
                with open(dat_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            soup = BeautifulSoup(content, 'html5lib')
            paper_list_bar = tqdm(soup.find_all(['dt', 'dd']))
            paper_index = 0
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            for paper in paper_list_bar:
                is_new_paper = False

                # get title
                try:
                    if 'dt' == paper.name and \
                            'ptitle' == paper.get('class')[0] and \
                            year == int(paper.a.get('href').split('_')[1][:4]):  # title:
                        # this_year = int(paper.a.get('href').split('_')[1][:4])
                        title = slugify(paper.text.strip())
                        paper_dict['title'] = title
                        paper_index += 1
                        paper_list_bar.set_description_str(
                            f'Downloading paper {paper_index}: {title}')
                    elif '' != paper_dict['title'] and 'dd' == paper.name:
                        all_as = paper.find_all('a')
                        for a in all_as:
                            if 'pdf' == slugify(a.text.strip()):
                                main_link = urllib.parse.urljoin(init_url,
                                                                 a.get('href'))
                                paper_dict['main link'] = main_link
                                is_new_paper = True
                            elif 'supp' in slugify(a.text.strip()):
                                supp_link = urllib.parse.urljoin(init_url,
                                                                 a.get('href'))
                                paper_dict['supplemental link'] = supp_link
                                break
                except:
                    pass
                if is_new_paper:
                    writer.writerow(paper_dict)
                    paper_dict = {'title': '',
                                  'main link': '',
                                  'supplemental link': ''}
        else:
            init_url = f'http://www.eccv{year}.org/main-conference/'
            if os.path.exists(dat_file_pathname):
                with open(dat_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_url, headers=headers)
                with open(dat_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            soup = BeautifulSoup(content, 'html5lib')
            paper_list_bar = tqdm(
                soup.find('div', {'class': 'entry-content'}).find_all(['p']))
            paper_index = 0
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            for paper in paper_list_bar:
                try:
                    if len(paper.find_all(['strong'])) and len(
                            paper.find_all(['a'])) and len(
                            paper.find_all(['img'])):
                        paper_index += 1
                        title = slugify(paper.find('strong').text)
                        paper_dict['title'] = title
                        paper_list_bar.set_description_str(
                            f'Downloading paper {paper_index}: {title}')
                        main_link = paper.find('a').get('href')
                        paper_dict['main link'] = main_link
                        writer.writerow(paper_dict)
                        paper_dict = {'title': '',
                                      'main link': '',
                                      'supplemental link': ''}
                except Exception as e:
                    print(f'ERROR: {str(e)}')
    return paper_index


def download_from_csv(
        year, save_dir, is_download_supplement=True, time_step_in_seconds=5,
        total_paper_number=None,
        is_workshops=False, downloader='IDM'):
    """
    download all ECCV paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement respectively
    :param year: int, ECCV year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds
    :param total_paper_number: int, the total number of papers that is going
        to download
    :param is_workshops: bool, is to download workshops from csv file.
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    postfix = f'ECCV_{year}'
    if is_workshops:
        postfix = f'ECCV_WS_{year}'
    csv_file_name = f'ECCV_{year}.csv' if not is_workshops else \
        f'ECCV_WS_{year}.csv'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_name = os.path.join(project_root_folder, 'csv', csv_file_name)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_name,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


def download_from_springer(
        year, save_dir, is_workshops=False, time_sleep_in_seconds=5,
        downloader='IDM'):
    os.makedirs(save_dir, exist_ok=True)
    if 2018 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-030-01246-5',
                'https://link.springer.com/book/10.1007/978-3-030-01216-8',
                'https://link.springer.com/book/10.1007/978-3-030-01219-9',
                'https://link.springer.com/book/10.1007/978-3-030-01225-0',
                'https://link.springer.com/book/10.1007/978-3-030-01228-1',
                'https://link.springer.com/book/10.1007/978-3-030-01231-1',
                'https://link.springer.com/book/10.1007/978-3-030-01234-2',
                'https://link.springer.com/book/10.1007/978-3-030-01237-3',
                'https://link.springer.com/book/10.1007/978-3-030-01240-3',
                'https://link.springer.com/book/10.1007/978-3-030-01249-6',
                'https://link.springer.com/book/10.1007/978-3-030-01252-6',
                'https://link.springer.com/book/10.1007/978-3-030-01258-8',
                'https://link.springer.com/book/10.1007/978-3-030-01261-8',
                'https://link.springer.com/book/10.1007/978-3-030-01264-9',
                'https://link.springer.com/book/10.1007/978-3-030-01267-0',
                'https://link.springer.com/book/10.1007/978-3-030-01270-0'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-030-11009-3',
                'https://link.springer.com/book/10.1007/978-3-030-11012-3',
                'https://link.springer.com/book/10.1007/978-3-030-11015-4',
                'https://link.springer.com/book/10.1007/978-3-030-11018-5',
                'https://link.springer.com/book/10.1007/978-3-030-11021-5',
                'https://link.springer.com/book/10.1007/978-3-030-11024-6'
            ]
    elif 2016 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007%2F978-3-319-46448-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-46475-6',
                'https://link.springer.com/book/10.1007%2F978-3-319-46487-9',
                'https://link.springer.com/book/10.1007%2F978-3-319-46493-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-46454-1',
                'https://link.springer.com/book/10.1007%2F978-3-319-46466-4',
                'https://link.springer.com/book/10.1007%2F978-3-319-46478-7',
                'https://link.springer.com/book/10.1007%2F978-3-319-46484-8'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007%2F978-3-319-46604-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-48881-3',
                'https://link.springer.com/book/10.1007%2F978-3-319-49409-8'
            ]
    elif 2014 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-319-10590-1',
                'https://link.springer.com/book/10.1007/978-3-319-10605-2',
                'https://link.springer.com/book/10.1007/978-3-319-10578-9',
                'https://link.springer.com/book/10.1007/978-3-319-10593-2',
                'https://link.springer.com/book/10.1007/978-3-319-10602-1',
                'https://link.springer.com/book/10.1007/978-3-319-10599-4',
                'https://link.springer.com/book/10.1007/978-3-319-10584-0'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-319-16178-5',
                'https://link.springer.com/book/10.1007/978-3-319-16181-5',
                'https://link.springer.com/book/10.1007/978-3-319-16199-0',
                'https://link.springer.com/book/10.1007/978-3-319-16220-1'
            ]
    elif 2012 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-33718-5',
                'https://link.springer.com/book/10.1007/978-3-642-33709-3',
                'https://link.springer.com/book/10.1007/978-3-642-33712-3',
                'https://link.springer.com/book/10.1007/978-3-642-33765-9',
                'https://link.springer.com/book/10.1007/978-3-642-33715-4',
                'https://link.springer.com/book/10.1007/978-3-642-33783-3',
                'https://link.springer.com/book/10.1007/978-3-642-33786-4'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-33863-2',
                'https://link.springer.com/book/10.1007/978-3-642-33868-7',
                'https://link.springer.com/book/10.1007/978-3-642-33885-4'
            ]
    elif 2010 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-15549-9',
                'https://link.springer.com/book/10.1007/978-3-642-15552-9',
                'https://link.springer.com/book/10.1007/978-3-642-15558-1',
                'https://link.springer.com/book/10.1007/978-3-642-15561-1',
                'https://link.springer.com/book/10.1007/978-3-642-15555-0',
                'https://link.springer.com/book/10.1007/978-3-642-15567-3'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-35749-7',
                'https://link.springer.com/book/10.1007/978-3-642-35740-4'
            ]
    elif 2008 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-540-88682-2',
                'https://link.springer.com/book/10.1007/978-3-540-88688-4',
                'https://link.springer.com/book/10.1007/978-3-540-88690-7',
                'https://link.springer.com/book/10.1007/978-3-540-88693-8'
            ]
        else:
            urls_list = []
    elif 2006 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/11744023',
                'https://link.springer.com/book/10.1007/11744047',
                'https://link.springer.com/book/10.1007/11744078',
                'https://link.springer.com/book/10.1007/11744085'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/11754336'
            ]
    elif 2004 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/b97865',
                'https://link.springer.com/book/10.1007/b97866',
                'https://link.springer.com/book/10.1007/b97871',
                'https://link.springer.com/book/10.1007/b97873'
            ]
        else:
            urls_list = [

            ]
    elif 2002 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-47969-4',
                'https://link.springer.com/book/10.1007/3-540-47967-8',
                'https://link.springer.com/book/10.1007/3-540-47977-5',
                'https://link.springer.com/book/10.1007/3-540-47979-1'
            ]
        else:
            urls_list = [

            ]
    elif 2000 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-45054-8',
                'https://link.springer.com/book/10.1007/3-540-45053-X'
            ]
        else:
            urls_list = [

            ]
    elif 1998 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0055655',
                'https://link.springer.com/book/10.1007/BFb0054729'
            ]
        else:
            urls_list = [

            ]
    elif 1996 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0015518',
                'https://link.springer.com/book/10.1007/3-540-61123-1'
            ]
        else:
            urls_list = [

            ]
    elif 1994 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-57956-7',
                'https://link.springer.com/book/10.1007/BFb0028329'
            ]
        else:
            urls_list = [

            ]
    elif 1992 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-55426-2'
            ]
        else:
            urls_list = [

            ]
    elif 1990 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0014843'
            ]
        else:
            urls_list = [

            ]
    else:
        raise ValueError(f'ECCV {year} is current not available!')
    for url in urls_list:
        __download_from_springer(
            url, save_dir, year, is_workshops=is_workshops,
            time_sleep_in_seconds=time_sleep_in_seconds,
            downloader=downloader)


def __download_from_springer(
        url, save_dir, year, is_workshops=False, time_sleep_in_seconds=5,
        downloader='IDM'):
    downloader = Downloader(downloader)
    for i in range(3):
        try:
            papers_dict = springer.get_paper_name_link_from_url(url)
            break
        except Exception as e:
            print(str(e))
    # total_paper_number = len(papers_dict)
    pbar = tqdm(papers_dict.keys())
    postfix = f'ECCV_{year}'
    if is_workshops:
        postfix = f'ECCV_WS_{year}'

    for name in pbar:
        pbar.set_description(f'Downloading paper {name}')
        if not os.path.exists(os.path.join(save_dir, f'{name}_{postfix}.pdf')):
            downloader.download(
                papers_dict[name],
                os.path.join(save_dir, f'{name}_{postfix}.pdf'),
                time_sleep_in_seconds)


if __name__ == '__main__':
    year = 2024
    # total_paper_number = 2387
    total_paper_number = save_csv(year)
    download_from_csv(year,
                      save_dir=fr'Z:\all_papers\ECCV\ECCV_{year}',
                      is_download_supplement=True,
                      time_step_in_seconds=5,
                      total_paper_number=total_paper_number,
                      is_workshops=False)
    # move_main_and_supplement_2_one_directory(
    #     main_path=f'E:\\ECCV_{year}\\main_paper',
    #     supplement_path=f'E:\\ECCV_{year}\\supplement',
    #     supp_pdf_save_path=f'E:\\ECCV_{year}\\main_paper'
    # )
    # for year in range(2018, 2017, -2):
    #     # download_from_springer(
    #     #     save_dir=f'F:\\ECCV_{year}',
    #     #     year=year,
    #     #     is_workshops=False, time_sleep_in_seconds=30)
    #     download_from_springer(
    #         save_dir=f'F:\\ECCV_WS_{year}',
    #         year=year,
    #         is_workshops=True, time_sleep_in_seconds=30)
    # pass


================================================
FILE: code/paper_downloader_ICLR.py
================================================
"""paper_downloader_ICLR.py"""

from tqdm import tqdm
import os
# https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename
from slugify import slugify
from bs4 import BeautifulSoup
import pickle
from urllib.request import urlopen
import urllib
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.downloader import Downloader
from lib.openreview import download_iclr_papers_given_url_and_group_id
from lib.arxiv import get_pdf_link_from_arxiv


def download_iclr_oral_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr oral papers for year 2017 ~ 2022, 2024~2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2026: "tab-accept-oral",
        2025: "tab-accept-oral",
        2024: "tab-accept-oral",
        2022: "oral-submissions",
        2021: "oral-presentations",
        2020: "oral-presentations",
        2019: "oral-presentations",
        2018: "accepted-oral-papers",
        2017: "oral-presentations",
        2013: "conferenceoral-iclr2013-conference"
    }
    
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
        
    print(f'Downloading ICLR-{year} oral papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year > 2021)
    )


def download_iclr_conditional_oral_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional oral papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-oral"
    }
    no_pages_year = [2025]
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional oral papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_top5_papers(save_dir, year, base_url=None, start_page=1,
                              time_step_in_seconds=10, downloader='IDM',
                              proxy_ip_port=None):
    """
    Download iclr notable-top-5% papers for year 2023.
    :param save_dir: str, paper save path
    :param year: int, iclr year
    :type year: int
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds. Default: 10.
    :type time_step_in_seconds: int
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    if base_url is None:
        if year == 2023:
            base_url = "https://openreview.net/group?id=ICLR.cc/" \
                       "2023/Conference#notable-top-5-"
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} top5 papers...')
    group_id = "notable-top-5-"
    return download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port
    )


def download_iclr_poster_papers(save_dir, year, base_url=None, start_page=1,
                                time_step_in_seconds=10, downloader='IDM',
                                proxy_ip_port=None):
    """
    Download iclr poster papers from year 2013, 2017 ~ 2024.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2026: "tab-accept-poster",
        2025: "tab-accept-poster",
        2024: "tab-accept-poster",
        2023: "poster",
        2022: "poster-submissions",
        2021: "poster-presentations",
        2020: "poster-presentations",
        2019: "poster-presentations",
        2018: "accepted-poster-papers",
        2017: "poster-presentations",
        2013: "conferenceposter-iclr2013-conference"
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} poster papers...')
    no_pages_year = [2013, 2018, 2019, 2020, 2021]
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id_dict[year].replace('tab-', ''),
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year),
        is_need_click_group_button=(year == 2018)
    )


def download_iclr_conditional_poster_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional poster papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-poster"
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional poster papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year > 2021)
    )


def download_iclr_spotlight_papers(save_dir, year, base_url=None,
                                   time_step_in_seconds=10, downloader='IDM',
                                   start_page=1, proxy_ip_port=None):
    """
    Download iclr spotlight papers between year 2020 and 2022, 2024~2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-spotlight",
        2024: "tab-accept-spotlight",
        2022: "spotlight-submissions",
        2021: "spotlight-presentations",
        2020: "spotlight-presentations",
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} spotlight papers...')
    no_pages_year = [2020, 2021]
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id_dict[year].replace('tab-', ''),
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_conditional_spotlight_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional spotlight papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-spotlight"
    }
    no_pages_year = [2025]
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional spotlight papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_top25_papers(save_dir, year, base_url=None, start_page=1,
                               time_step_in_seconds=10, downloader='IDM',
                               proxy_ip_port=None):
    """
    Download iclr notable-top-25% papers for year 2023.
    :param save_dir: str, paper save path
    :param year: int, iclr year
    :type year: int
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds. Default: 10.
    :type time_step_in_seconds: int
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    if base_url is None:
        if year == 2023:
            base_url = "https://openreview.net/group?id=ICLR.cc/" \
                       "2023/Conference#notable-top-25-"
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} top25 papers...')
    group_id = "notable-top-25-"
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port
    )


def download_iclr_paper(save_dir, year, base_url=None,
                        time_step_in_seconds=10, downloader='IDM',
                        start_page=1, proxy_ip_port=None):
    """
    Download iclr papers between year 2013 and 2024.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

    year_no_group = [2014]
    year_no_group_iclrcc = [2015, 2016]
    year_oral_poster = [2013, 2017, 2018, 2019, 2026]
    year_oral_spotlight_poster = [2020, 2021, 2022, 2024, 2025]
    year_top5_top25_poster = [2023]
    year_oral_spotlight_poster_conditional = [2025]

    # no group, openreview website
    if year in year_no_group:
        if base_url is None:
            if year == 2014:
                base_url = 'https://openreview.net/group?id=ICLR.cc/2014/conference'
            else:
                raise ValueError('the website url is not given for this year!')
        print(f'Downloading ICLR-{year} oral papers...')
        group_id_dict = {
            2014: "submitted-papers"
        }
        group_id = group_id_dict[year]
        no_pages_year = [2014]
        return download_iclr_papers_given_url_and_group_id(
            save_dir=save_dir,
            year=year,
            base_url=base_url,
            group_id=group_id,
            start_page=start_page,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port,
            is_have_pages=(year not in no_pages_year)
        )
    # no group, iclr.cc website
    if year in year_no_group_iclrcc:
        downloader = Downloader(downloader=downloader)
        paper_postfix = f'ICLR_{year}'
        if base_url is None:
            if year == 2016:
                base_url = 'https://iclr.cc/archive/www/doku.php%3Fid=iclr2016:main.html'
            elif year == 2015:
                base_url = 'https://iclr.cc/archive/www/doku.php%3Fid=iclr2015:main.html'
            elif year == 2014:
                base_url = 'https://iclr.cc/archive/2014/conference-proceedings/'
            else:
                raise ValueError('the website url is not given for this year!')
        os.makedirs(save_dir, exist_ok=True)
        if year == 2015:  # oral and poster seperated
            oral_save_path = os.path.join(save_dir, 'oral')
            poster_save_path = os.path.join(save_dir, 'poster')
            workshop_save_path = os.path.join(save_dir, 'ws')
            os.makedirs(oral_save_path, exist_ok=True)
            os.makedirs(poster_save_path, exist_ok=True)
            os.makedirs(workshop_save_path, exist_ok=True)
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_iclr_{year}.dat'
        )
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            req = urllib.request.Request(url=base_url, headers=headers)
            content = urllib.request.urlopen(req).read()
            with open(f'..\\urls\\init_url_iclr_{year}.dat', 'wb') as f:
                pickle.dump(content, f)
        error_log = []
        soup = BeautifulSoup(content, 'html.parser')
        print('open url successfully!')
        if year == 2016:
            papers = soup.find('h3',
                               {
                                   'id': 'accepted_papers_conference_track'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            # workshops
            papers = soup.find('h3',
                               {
                                   'id': 'workshop_track_posters_may_2nd'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://beta.openreview'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir, 'ws', pdf_name)):
                            pdf_link = get_pdf_link_from_openreview(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, 'ws',
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            papers = soup.find('h3',
                               {
                                   'id': 'workshop_track_posters_may_3rd'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://beta.openreview'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir, 'ws', pdf_name)):
                            pdf_link = get_pdf_link_from_openreview(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, 'ws',
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
        elif year == 2015:
            # oral papers
            oral_papers = soup.find('h3', {
                'id': 'conference_oral_presentations'}).findNext(
                'div').find_all(
                'a')
            for paper in tqdm(oral_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(oral_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(oral_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

            # workshops papers
            workshop_papers = soup.find('h3', {
                'id': 'may_7_workshop_poster_session'}).findNext(
                'div').find_all(
                'a')
            workshop_papers.append(
                soup.find('h3',
                          {'id': 'may_8_workshop_poster_session'}).findNext(
                    'div').find_all('a'))
            for paper in tqdm(workshop_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(workshop_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(workshop_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            # poster papers
            poster_papers = soup.find('h3', {
                'id': 'may_9_conference_poster_session'}).findNext(
                'div').find_all(
                'a')
            for paper in tqdm(poster_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(poster_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(poster_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
        elif year == 2014:
            papers = soup.find('div',
                               {'id': 'sites-canvas-main-content'}).find_all(
                'a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(os.path.join(save_dir, pdf_name)):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

            # workshops
            paper_postfix = f'ICLR_WS_{year}'
            base_url = 'https://sites.google.com/site/representationlearning2014/' \
                       'workshop-proceedings'
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            req = urllib.request.Request(url=base_url, headers=headers)
            content = urllib.request.urlopen(req).read()
            soup = BeautifulSoup(content, 'html.parser')
            workshop_save_path = os.path.join(save_dir, 'WS')
            os.makedirs(workshop_save_path, exist_ok=True)
            papers = soup.find(
                'div', {'id': 'sites-canvas-main-content'}).find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(workshop_save_path, pdf_name)):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(workshop_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

        # write error log
        print('write error log')
        log_file_pathname = os.path.join(
            project_root_folder, 'log', 'download_err_log.txt')
        with open(log_file_pathname, 'w') as f:
            for log in tqdm(error_log):
                for e in log:
                    if e is not None:
                        f.write(e)
                    else:
                        f.write('None')
                    f.write('\n')

                f.write('\n')
        return True

    # oral openreview
    if year in (year_oral_poster + year_oral_spotlight_poster):
        save_dir_oral = os.path.join(save_dir, 'oral')
        download_iclr_oral_papers(
            save_dir_oral,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )
    
    # conditional oral openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_oral = os.path.join(save_dir, 'conditional-oral')
        download_iclr_conditional_oral_papers(
            save_dir_cond_oral,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # poster openreview
    if year in (year_oral_poster + year_oral_spotlight_poster +
                year_top5_top25_poster):
        save_dir_poster = os.path.join(save_dir, 'poster')
        download_iclr_poster_papers(
            save_dir_poster,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )
    
    # conditional poster openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_poster = os.path.join(save_dir, 'conditional-poster')
        download_iclr_conditional_poster_papers(
            save_dir_cond_poster,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # spotlight openreview
    if year in year_oral_spotlight_poster:
        save_dir_spotlight = os.path.join(save_dir, 'spotlight')
        download_iclr_spotlight_papers(
            save_dir_spotlight,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # conditional spotlight openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_spotlight = os.path.join(save_dir, 'conditional-spotlight')
        download_iclr_conditional_spotlight_papers(
            save_dir_cond_spotlight,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # top5 openreview
    if year in year_top5_top25_poster:
        save_dir_top5 = os.path.join(save_dir, 'top5')
        download_iclr_top5_papers(
            save_dir_top5,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # top25 openreview
    if year in year_top5_top25_poster:
        save_dir_top25 = os.path.join(save_dir, 'top25')
        download_iclr_top25_papers(
            save_dir_top25,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )


def get_pdf_link_from_openreview(abs_link):
    return abs_link.replace('beta.', '').replace('forum', 'pdf')


if __name__ == '__main__':
    year = 2025
    save_dir_iclr = rf'E:\ICLR_{year}'
    # save_dir_iclr_oral = os.path.join(save_dir_iclr, 'oral')
    # save_dir_iclr_top5 = os.path.join(save_dir_iclr, 'top5')
    # save_dir_iclr_spotlight = os.path.join(save_dir_iclr, 'spotlight')
    # save_dir_iclr_top25 = os.path.join(save_dir_iclr, 'top25')
    # save_dir_iclr_poster = os.path.join(save_dir_iclr, 'poster')
    proxy_ip_port = None
    # proxy_ip_port = "http://127.0.0.1:7890"
    # download_iclr_oral_papers(save_dir_iclr_oral, year,
    #                           time_step_in_seconds=5)
    # download_iclr_top5_papers(save_dir_iclr_top5, year, start_page=1,
    #                           time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    # download_iclr_top25_papers(save_dir_iclr_top25, year, start_page=1,
    #                           time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    # download_iclr_spotlight_papers(save_dir_iclr_spotlight, year,
    #                                time_step_in_seconds=5)
    # download_iclr_poster_papers(save_dir_iclr_poster, year, start_page=1,
    #                             time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    download_iclr_paper(save_dir_iclr, year, time_step_in_seconds=5,
                        proxy_ip_port=proxy_ip_port)


================================================
FILE: code/paper_downloader_ICML.py
================================================
"""paper_downloader_ICML.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.downloader import Downloader
import lib.pmlr as pmlr
from lib.supplement_porcess import merge_main_supplement
from lib.openreview import download_icml_papers_given_url_and_group_id
from lib.my_request import urlopen_with_retry


def download_paper(year, save_dir, is_download_supplement=True,
                   time_step_in_seconds=5, downloader='IDM', source='pmlr',
                   proxy_ip_port=None):
    """
    download all ICML paper and supplement files given year, restore in
        save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, ICML year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param source: str, source website, 'pmlr' or 'openreview'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return: True
    """
    assert source in ['pmlr', 'openreview'], \
        f'only support source pmlr or openreview, but get {source}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    downloader = Downloader(downloader=downloader, proxy_ip_port=proxy_ip_port)
    ICML_year_dict = {
        2024: 235,
        2023: 202,
        2022: 162,
        2021: 139,
        2020: 119,
        2019: 97,
        2018: 80,
        2017: 70,
        2016: 48,
        2015: 37,
        2014: 32,
        2013: 28
    }
    if source == 'openreview':
        init_url = f'https://openreview.net/group?id=ICML.cc/{year}/Conference'
    else:  # pmlr
        if year >= 2013:
            init_url = f'http://proceedings.mlr.press/v{ICML_year_dict[year]}/'
        elif year == 2012:
            init_url = 'https://icml.cc/2012/papers.1.html'
        elif year == 2011:
            init_url = 'http://www.icml-2011.org/papers.php'
        elif 2009 == year:
            init_url = 'https://icml.cc/Conferences/2009/abstracts.html'
        elif 2008 == year:
            init_url = 'http://www.machinelearning.org/archive/icml2008/' \
                       'abstracts.shtml'
        elif 2007 == year:
            init_url = 'https://icml.cc/Conferences/2007/paperlist.html'
        elif year in [2006, 2004, 2005]:
            init_url = f'https://icml.cc/Conferences/{year}/proceedings.html'
        elif 2003 == year:
            init_url = 'https://aaai.org/Library/ICML/icml03contents.php'
        else:
            raise ValueError('''the given year's url is unknown !''')

    postfix = f'ICML_{year}'
    if source == 'openreview':  # download from openreview website:
        # oral paper
        group_id = 'oral'
        save_dir_oral = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_oral, exist_ok=True)
        download_icml_papers_given_url_and_group_id(
            save_dir=save_dir_oral,
            year=year,
            base_url=init_url,
            group_id=group_id,
            start_page=1,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader,
            proxy_ip_port=proxy_ip_port
        )
        # poster paper
        group_id = 'poster'
        save_dir_poster = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_poster, exist_ok=True)
        download_icml_papers_given_url_and_group_id(
            save_dir=os.path.join(save_dir, 'poster'),
            year=year,
            base_url=init_url,
            group_id=group_id,
            start_page=1,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader,
            proxy_ip_port=proxy_ip_port
        )
        # spotlight paper
        group_id = 'spotlight'
        save_dir_poster = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_poster, exist_ok=True)
        try:
            download_icml_papers_given_url_and_group_id(
                save_dir=os.path.join(save_dir, 'spotlight'),
                year=year,
                base_url=init_url,
                group_id=group_id,
                start_page=1,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader.downloader,
                proxy_ip_port=proxy_ip_port
            )
        except ValueError as e:  # no spotlight paper
            print(f"WARNING: {str(e)}")
        return

    dat_file_pathname = os.path.join(
        project_root_folder, 'urls', f'init_url_icml_{year}.dat')
    if os.path.exists(dat_file_pathname):
        with open(dat_file_pathname, 'rb') as f:
            content = pickle.load(f)
    else:
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}
        content = urlopen_with_retry(url=init_url, headers=headers)
        # content = open(f'..\\ICML_{year}.html', 'rb').read()
        with open(dat_file_pathname, 'wb') as f:
            pickle.dump(content, f)
    # soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(content, 'html5lib')
    # soup = BeautifulSoup(open(r'..\ICML_2011.html', 'rb'), 'html.parser')
    error_log = []
    if year >= 2013:
        if year in ICML_year_dict.keys():
            volume = f'v{ICML_year_dict[year]}'
        else:
            raise ValueError('''the given year's url is unknown !''')

        pmlr.download_paper_given_volume(
            volume=volume,
            save_dir=save_dir,
            postfix=postfix,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader
        )
    elif 2012 == year:  # 2012
        # base_url = f'https://icml.cc/{year}/'
        paper_list_bar = tqdm(soup.find_all('div', {'class': 'paper'}))
        paper_index = 0
        for paper in paper_list_bar:
            paper_index += 1
            title = ''
            title = slugify(paper.find('h2').text)
            link = None
            for a in paper.find_all('a'):
                if 'ICML version (pdf)' == a.text:
                    link = urllib.parse.urljoin(init_url, a.get('href'))
                    break
            if link is not None:
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path) :
                    paper_list_bar.set_description(
                        f'downloading paper {paper_index}:{title}')
                    downloader.download(
                        urls=link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
            else:
                error_log.append((title, 'no main link error'))
    elif 2011 == year:
        paper_list_bar = tqdm(soup.find_all('a'))
        paper_index = 0
        for paper in paper_list_bar:
            h3 = paper.find('h3')
            if h3 is not None:
                title = slugify(h3.text)
                paper_index += 1
            if 'download' == slugify(paper.text.strip()):
                link = paper.get('href')
                link = urllib.parse.urljoin(init_url, link)
                if link is not None:
                    this_paper_main_path = os.path.join(
                        save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                    paper_list_bar.set_description(
                        f'find paper {paper_index}:{title}')
                    if not os.path.exists(this_paper_main_path) :
                        paper_list_bar.set_description(
                            f'downloading paper {paper_index}:{title}')
                        downloader.download(
                            urls=link,
                            save_path=this_paper_main_path,
                            time_sleep_in_seconds=time_step_in_seconds
                        )
                else:
                    error_log.append((title, 'no main link error'))
    elif year in [2009, 2008]:
        if 2009 == year:
            paper_list_bar = tqdm(
                soup.find('div', {'id': 'right_column'}).find_all(['h3','a']))
        elif 2008 == year:
            paper_list_bar = tqdm(
                soup.find('div', {'class': 'content'}).find_all(['h3','a']))
        paper_index = 0
        title = None
        for paper in paper_list_bar:
            if 'h3' == paper.name:
                title = slugify(paper.text)
                paper_index += 1
            elif 'full-paper' == slugify(paper.text.strip()):  # a
                link = paper.get('href')
                if link is not None and title is not None:
                    link = urllib.parse.urljoin(init_url, link)
                    this_paper_main_path = os.path.join(
                        save_dir, f'{title}_{postfix}.pdf')
                    paper_list_bar.set_description(
                        f'find paper {paper_index}:{title}')
                    if not os.path.exists(this_paper_main_path):
                        paper_list_bar.set_description(
                            f'downloading paper {paper_index}:{title}')
                        downloader.download(
                            urls=link,
                            save_path=this_paper_main_path,
                            time_sleep_in_seconds=time_step_in_seconds
                        )
                    title = None
                else:
                    error_log.append((title, 'no main link error'))
    elif year in [2006, 2005]:
        paper_list_bar = tqdm(soup.find_all('a'))
        paper_index = 0
        for paper in paper_list_bar:
            title = slugify(paper.text.strip())
            link = paper.get('href')
            paper_index += 1
            if link is not None and title is not None and \
                    ('pdf' == link[-3:] or 'ps' == link[-2:]):
                link = urllib.parse.urljoin(init_url, link)
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path):
                    paper_list_bar.set_description(
                        f'downloading paper {paper_index}:{title}')
                    downloader.download(
                        urls=link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
    elif 2004 == year:
        paper_index = 0
        paper_list_bar = tqdm(
            soup.find('table', {'class': 'proceedings'}).find_all('tr'))
        title = None
        for paper in paper_list_bar:
            tr_class = None
            try:
                tr_class = paper.get('class')[0]
            except:
                pass
            if 'proc_2004_title' == tr_class:  # title
                title = slugify(paper.text.strip())
                paper_index += 1
            else:
                for a in paper.find_all('a'):
                    if '[Paper]' == a.text:
                        link = a.get('href')
                        if link is not None and title is not None:
                            link = urllib.parse.urljoin(init_url, link)
                            this_paper_main_path = os.path.join(
                                save_dir,
                                f'{title}_{postfix}.pdf'.replace(' ', '_'))
                            paper_list_bar.set_description(
                                f'find paper {paper_index}:{title}')
                            if not os.path.exists(this_paper_main_path):
                                paper_list_bar.set_description(
                                    f'downloading paper {paper_index}:{title}')
                                downloader.download(
                                    urls=link,
                                    save_path=this_paper_main_path,
                                    time_sleep_in_seconds=time_step_in_seconds
                                )
                        break
    elif 2003 == year:
        paper_index = 0
        paper_list_bar = tqdm(
            soup.find('div', {'id': 'content'}).find_all(
                'p', {'class': 'left'}))
        for paper in paper_list_bar:
            abs_link = None
            title = None
            link = None
            for a in paper.find_all('a'):
                abs_link = urllib.parse.urljoin(init_url, a.get('href'))
                if abs_link is not None:
                    title = slugify(a.text.strip())
                    break
            if title is not None:
                paper_index += 1
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path):
                    if abs_link is not None:
                        headers = {'User-Agent':
                                       'Mozilla/5.0 (Windows NT 6.1; WOW64; '
                                       'rv:23.0) Gecko/20100101 Firefox/23.0'}
                        abs_content = urlopen_with_retry(
                            url=abs_link, headers=headers,
                            raise_error_if_failed=False)
                        if abs_content is None:
                            print('error'+title)
                            error_log.append(
                                (title, abs_link, 'download error'))
                            continue
                        abs_soup = BeautifulSoup(abs_content, 'html5lib')
                        for a in abs_soup.find_all('a'):
                            try:
                                if 'pdf' == a.get('href')[-3:]:
                                    link = urllib.parse.urljoin(
                                        abs_link, a.get('href'))
                                    if link is not None:
                                        paper_list_bar.set_description(
                                            f'downloading paper {paper_index}:'
                                            f'{title}')
                                        downloader.download(
                                            urls=link,
                                            save_path=this_paper_main_path,
                                            time_sleep_in_seconds=time_step_in_seconds
                                        )
                                    break
                            except:
                                pass

    # write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt')
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')


def rename_downloaded_paper(year, source_path):
    """
    rename the downloaded ICML paper to {title}_ICML_2010.pdf and save to
    source_path
    :param year: int, year
    :param source_path: str, whose structure should be
        source_path/papers/pdf files (2010)
                   /index.html       (2010)
        source_path/icml2007_proc.html (2007)
   :return:
    """
    if not os.path.exists(source_path):
        raise ValueError(f'can not find {source_path}')
    postfix = f'ICML_{year}'
    if 2010 == year:
        soup = BeautifulSoup(
            open(os.path.join(source_path, 'index.html'), 'rb'), 'html5lib')
        paper_list_bar = tqdm(soup.find_all('span', {'class': 'boxpopup3'}))

        for paper in paper_list_bar:
            a = paper.find('a')
            title = slugify(a.text)
            ori_name = os.path.join(
                source_path, 'papers', a.get('href').split('/')[-1])
            os.rename(ori_name, os.path.join(
                source_path, f'{title}_{postfix}.pdf'))
            paper_list_bar.set_description(f'processing {title}')
    elif 2007 == year:
        soup = BeautifulSoup(open(os.path.join(
            source_path, 'icml2007_proc.html'), 'rb'), 'html5lib')
        paper_list_bar = tqdm(soup.find_all('td', {'colspan': '2'}))
        for paper in paper_list_bar:
            all_as = paper.find_all('a')
            if len(all_as) <= 1:
                title = slugify(paper.text.strip())
            else:
                for a in all_as:
                    if '[Paper]' == a.text:
                        sub_path = a.get('href')
                        os.rename(os.path.join(source_path, sub_path),
                                  os.path.join(
                                      source_path, f'{title}_{postfix}.pdf'))
                        paper_list_bar.set_description_str(
                            (f'processing {title}'))
                        break


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'E:\ICML_{year}',
        is_download_supplement=True,
        time_step_in_seconds=10,
        downloader='IDM',
        source='openreview'
    ) 
    # merge_main_supplement(main_path=f'..\\ICML_{year}\\main_paper',
    #                       supplement_path=f'..\\ICML_{year}\\supplement',
    #                       save_path=f'..\\ICML_{year}',
    #                       is_delete_ori_files=False)
    # rename_downloaded_paper(year, f'..\\ICML_{year}')
    pass


================================================
FILE: code/paper_downloader_IJCAI.py
================================================
"""paper_downloader_IJCAI.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write IJCAI papers' urls in one csv file
    :param year: int, IJCAI year, such 2019
    :return: peper_index: int, the total number of papers
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'IJCAI_{year}.csv'
    )
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'group']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        if year >= 2003:
            init_urls = [f'https://www.ijcai.org/proceedings/{year}/']
        elif year >= 1977:
            init_urls = [f'https://www.ijcai.org/Proceedings/{year}-1/',
                         f'https://www.ijcai.org/Proceedings/{year}-2/']
        elif year >= 1969:
            init_urls = [f'https://www.ijcai.org/Proceedings/{year}/']
        else:
            raise ValueError('invalid year!')
        error_log = []
        user_agents = [
            'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) '
            'Gecko/20071127 Firefox/2.0.0.11',

            'Opera/9.25 (Windows NT 5.1; U; en)',

            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; '
            '.NET CLR 1.1.4322; .NET CLR 2.0.50727)',

            'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) '
            'KHTML/3.5.5 (like Gecko) (Kubuntu)',

            'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) '
            'Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',

            'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',

            "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 "
            "(KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 "
            "Chrome/16.0.912.77 Safari/535.7",

            "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) "
            "Gecko/20100101 Firefox/10.0 ",

            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
            'AppleWebKit/537.36 (K

Download .txt

gitextract_691ya0bm/

├── .gitignore
├── LICENSE
├── README.md
├── code/
│   ├── paper_downloader_AAAI.py
│   ├── paper_downloader_AAMAS.py
│   ├── paper_downloader_AISTATS.py
│   ├── paper_downloader_COLT.py
│   ├── paper_downloader_CORL.py
│   ├── paper_downloader_CVF.py
│   ├── paper_downloader_ECCV.py
│   ├── paper_downloader_ICLR.py
│   ├── paper_downloader_ICML.py
│   ├── paper_downloader_IJCAI.py
│   ├── paper_downloader_JMLR.py
│   ├── paper_downloader_NIPS.py
│   └── paper_downloader_RSS.py
├── lib/
│   ├── IDM.py
│   ├── __init__.py
│   ├── arxiv.py
│   ├── csv_process.py
│   ├── cvf.py
│   ├── downloader.py
│   ├── my_request.py
│   ├── openreview.py
│   ├── pmlr.py
│   ├── proxy.py
│   ├── springer.py
│   ├── supplement_porcess.py
│   └── user_agents.py
└── sharelinks.md

Download .txt

SYMBOL INDEX (70 symbols across 24 files)

FILE: code/paper_downloader_AAAI.py
  function get_track_urls (line 20) | def get_track_urls(year):
  function get_papers_of_track_ojs (line 127) | def get_papers_of_track_ojs(track_url):
  function get_papers_of_track (line 184) | def get_papers_of_track(track_url):
  function save_csv (line 240) | def save_csv(year):
  function download_from_csv (line 292) | def download_from_csv(

FILE: code/paper_downloader_AAMAS.py
  function save_csv (line 22) | def save_csv(year):
  function download_from_csv (line 270) | def download_from_csv(

FILE: code/paper_downloader_AISTATS.py
  function download_paper (line 12) | def download_paper(year, save_dir, is_download_supplement=True, time_ste...

FILE: code/paper_downloader_COLT.py
  function download_paper (line 10) | def download_paper(year, save_dir, is_download_supplement=False, time_st...

FILE: code/paper_downloader_CORL.py
  function download_paper (line 11) | def download_paper(year, save_dir, is_download_supplement=False,

FILE: code/paper_downloader_CVF.py
  function save_csv (line 22) | def save_csv(year, conference, proxy_ip_port=None):
  function save_csv_workshops (line 94) | def save_csv_workshops(year, conference, proxy_ip_port=None):
  function download_from_csv (line 171) | def download_from_csv(
  function download_paper (line 223) | def download_paper(

FILE: code/paper_downloader_ECCV.py
  function save_csv (line 22) | def save_csv(year):
  function download_from_csv (line 128) | def download_from_csv(
  function download_from_springer (line 167) | def download_from_springer(
  function __download_from_springer (line 381) | def __download_from_springer(

FILE: code/paper_downloader_ICLR.py
  function download_iclr_oral_papers (line 21) | def download_iclr_oral_papers(save_dir, year, base_url=None,
  function download_iclr_conditional_oral_papers (line 77) | def download_iclr_conditional_oral_papers(save_dir, year, base_url=None,
  function download_iclr_top5_papers (line 123) | def download_iclr_top5_papers(save_dir, year, base_url=None, start_page=1,
  function download_iclr_poster_papers (line 165) | def download_iclr_poster_papers(save_dir, year, base_url=None, start_pag...
  function download_iclr_conditional_poster_papers (line 220) | def download_iclr_conditional_poster_papers(save_dir, year, base_url=None,
  function download_iclr_spotlight_papers (line 265) | def download_iclr_spotlight_papers(save_dir, year, base_url=None,
  function download_iclr_conditional_spotlight_papers (line 313) | def download_iclr_conditional_spotlight_papers(save_dir, year, base_url=...
  function download_iclr_top25_papers (line 359) | def download_iclr_top25_papers(save_dir, year, base_url=None, start_page=1,
  function download_iclr_paper (line 401) | def download_iclr_paper(save_dir, year, base_url=None,
  function get_pdf_link_from_openreview (line 830) | def get_pdf_link_from_openreview(abs_link):

FILE: code/paper_downloader_ICML.py
  function download_paper (line 20) | def download_paper(year, save_dir, is_download_supplement=True,
  function rename_downloaded_paper (line 374) | def rename_downloaded_paper(year, source_path):

FILE: code/paper_downloader_IJCAI.py
  function save_csv (line 18) | def save_csv(year):
  function download_from_csv (line 459) | def download_from_csv(

FILE: code/paper_downloader_JMLR.py
  function download_paper (line 18) | def download_paper(
  function download_special_topics_and_issues_paper (line 137) | def download_special_topics_and_issues_paper(save_dir, time_step_in_seco...

FILE: code/paper_downloader_NIPS.py
  function save_csv (line 22) | def save_csv(year):
  function download_from_csv (line 104) | def download_from_csv(

FILE: code/paper_downloader_RSS.py
  function get_paper_pdf_link (line 22) | def get_paper_pdf_link(abs_url):
  function save_csv (line 41) | def save_csv(year):
  function download_from_csv (line 159) | def download_from_csv(

FILE: lib/IDM.py
  function download (line 7) | def download(urls, save_path, time_sleep_in_seconds=5, is_random_step=True,

FILE: lib/arxiv.py
  function get_pdf_link_from_arxiv (line 8) | def get_pdf_link_from_arxiv(abs_link, is_use_mirror=False):

FILE: lib/csv_process.py
  function download_from_csv (line 13) | def download_from_csv(
  function short_name (line 207) | def short_name(name, max_length, verbose=False):

FILE: lib/cvf.py
  function get_paper_dict_list (line 13) | def get_paper_dict_list(url=None, content=None, group_name=None, timeout...

FILE: lib/downloader.py
  function _download (line 15) | def _download(urls, save_path, time_sleep_in_seconds=5, is_random_step=T...
  class Downloader (line 69) | class Downloader(object):
    method __init__ (line 70) | def __init__(self, downloader=None, is_random_step=True,
    method download (line 99) | def download(self, urls, save_path, time_sleep_in_seconds=5):

FILE: lib/my_request.py
  function urlopen_with_retry (line 12) | def urlopen_with_retry(url, headers=dict(), retry_time=3, time_out=20,

FILE: lib/openreview.py
  function get_driver (line 28) | def get_driver(proxy_ip_port=None):
  function __download_papers_given_divs (line 45) | def __download_papers_given_divs(driver, divs, save_dir, paper_postfix,
  function __get_into_pages_given_number (line 134) | def __get_into_pages_given_number(driver, page_number, pages, wait_fn,
  function download_nips_papers_given_url (line 147) | def download_nips_papers_given_url(
  function download_iclr_papers_given_url_and_group_id (line 381) | def download_iclr_papers_given_url_and_group_id(
  function download_icml_papers_given_url_and_group_id (line 633) | def download_icml_papers_given_url_and_group_id(
  function get_pages_str (line 839) | def get_pages_str(pages):
  function get_max_page_number (line 845) | def get_max_page_number(page_str_list):
  function download_papers_given_url_and_group_id (line 855) | def download_papers_given_url_and_group_id(

FILE: lib/pmlr.py
  function download_paper_given_volume (line 13) | def download_paper_given_volume(

FILE: lib/proxy.py
  function get_proxy (line 9) | def get_proxy(ip_port: str):
  function set_proxy_4_urllib_request (line 29) | def set_proxy_4_urllib_request(ip_port: str):
  function get_proxy_4_requests (line 51) | def get_proxy_4_requests(ip_port: str):

FILE: lib/springer.py
  function get_paper_name_link_from_url (line 15) | def get_paper_name_link_from_url(url):

FILE: lib/supplement_porcess.py
  function unzipfile (line 10) | def unzipfile(zip_file, save_path):
  function get_potential_supp_pdf (line 22) | def get_potential_supp_pdf(path):
  function move_main_and_supplement_2_one_directory_with_group (line 48) | def move_main_and_supplement_2_one_directory_with_group(main_path, suppl...
  function move_main_and_supplement_2_one_directory (line 143) | def move_main_and_supplement_2_one_directory(main_path, supplement_path,...
  function merge_main_supplement (line 238) | def merge_main_supplement(main_path, supplement_path, save_path, is_dele...
  function rename_2_short_name (line 366) | def rename_2_short_name(src_path, save_path, target_max_length=128,
  function rename_2_short_name_within_group (line 420) | def rename_2_short_name_within_group(src_path, save_path, target_max_len...

Download .json

Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (368K chars).

[
  {
    "path": ".gitignore",
    "chars": 3132,
    "preview": "# ---> Python\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribu"
  },
  {
    "path": "LICENSE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2020 silenceagle\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "README.md",
    "chars": 66174,
    "preview": "# paper_downloader\n\nDownload papers and supplemental materials only from **OPEN ACCESS** paper\nwebsite, such as **AAAI**"
  },
  {
    "path": "code/paper_downloader_AAAI.py",
    "chars": 12915,
    "preview": "\"\"\"paper_downloader_AAAI.py\"\"\"\r\nimport time\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom tqdm import t"
  },
  {
    "path": "code/paper_downloader_AAMAS.py",
    "chars": 14103,
    "preview": "\"\"\"paper_downloader_AAMAS.py\n\"\"\"\n\nimport time\nimport urllib\nfrom urllib.error import HTTPError\nfrom bs4 import Beautiful"
  },
  {
    "path": "code/paper_downloader_AISTATS.py",
    "chars": 2652,
    "preview": "\"\"\"paper_downloader_AISTATS.py\"\"\"\r\nimport os\r\nimport sys\r\nroot_folder = os.path.abspath(\r\n    os.path.dirname(os.path.di"
  },
  {
    "path": "code/paper_downloader_COLT.py",
    "chars": 1951,
    "preview": "\"\"\"paper_downloader_COLT.py\"\"\"\r\nimport os\r\nimport sys\r\nroot_folder = os.path.abspath(\r\n    os.path.dirname(os.path.dirna"
  },
  {
    "path": "code/paper_downloader_CORL.py",
    "chars": 3162,
    "preview": "\"\"\"paper_downloader_CORL.py\"\"\"\nimport os\nimport sys\nroot_folder = os.path.abspath(\n    os.path.dirname(os.path.dirname(o"
  },
  {
    "path": "code/paper_downloader_CVF.py",
    "chars": 14853,
    "preview": "\"\"\"paper_downloader_CVF.py\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom slugify im"
  },
  {
    "path": "code/paper_downloader_ECCV.py",
    "chars": 18815,
    "preview": "\"\"\"paper_downloader_ECCV.py\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom tqdm impo"
  },
  {
    "path": "code/paper_downloader_ICLR.py",
    "chars": 37951,
    "preview": "\"\"\"paper_downloader_ICLR.py\"\"\"\r\n\r\nfrom tqdm import tqdm\r\nimport os\r\n# https://stackoverflow.com/questions/295135/turn-a-"
  },
  {
    "path": "code/paper_downloader_ICML.py",
    "chars": 18996,
    "preview": "\"\"\"paper_downloader_ICML.py\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom tqdm impo"
  },
  {
    "path": "code/paper_downloader_IJCAI.py",
    "chars": 24185,
    "preview": "\"\"\"paper_downloader_IJCAI.py\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom tqdm imp"
  },
  {
    "path": "code/paper_downloader_JMLR.py",
    "chars": 8449,
    "preview": "\"\"\"paper_downloader_JMLR.py\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nimport pickle\r\nimport os\r\nfrom tqdm impo"
  },
  {
    "path": "code/paper_downloader_NIPS.py",
    "chars": 8151,
    "preview": "\"\"\"paper_downloader_NIPS.py\"\"\"\n\nimport urllib\nimport time\nfrom bs4 import BeautifulSoup\nimport pickle\nimport os\nfrom tqd"
  },
  {
    "path": "code/paper_downloader_RSS.py",
    "chars": 8252,
    "preview": "\"\"\"paper_downloader_RSS.py\n20240322\"\"\"\nimport time\nimport urllib\nfrom urllib.error import HTTPError\nfrom bs4 import Beau"
  },
  {
    "path": "lib/IDM.py",
    "chars": 1508,
    "preview": "import subprocess\r\nimport os\r\nimport time\r\nimport random\r\n\r\n\r\ndef download(urls, save_path, time_sleep_in_seconds=5, is_"
  },
  {
    "path": "lib/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lib/arxiv.py",
    "chars": 900,
    "preview": "\"\"\"\narxiv.py\n20240218\n\"\"\"\nfrom bs4 import BeautifulSoup\nfrom .my_request import urlopen_with_retry\n\ndef get_pdf_link_fro"
  },
  {
    "path": "lib/csv_process.py",
    "chars": 11336,
    "preview": "\"\"\"\r\ncsv_process.py\r\n20210617\r\n\"\"\"\r\n\r\nimport os\r\nfrom tqdm import tqdm\r\nfrom slugify import slugify\r\nimport csv\r\nfrom li"
  },
  {
    "path": "lib/cvf.py",
    "chars": 3288,
    "preview": "\"\"\"\r\ncvf.py\r\n20210617\r\n\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nfrom tqdm import tqdm\r\nfrom slugify import s"
  },
  {
    "path": "lib/downloader.py",
    "chars": 4717,
    "preview": "\"\"\"\r\ndownloader.py\r\n20210624\r\n\"\"\"\r\nimport time\r\nfrom lib import IDM\r\nimport requests\r\nimport os\r\nimport random\r\nfrom tqd"
  },
  {
    "path": "lib/my_request.py",
    "chars": 1938,
    "preview": "\"\"\"\nmy_request.py\n20240412\n\"\"\"\n\nimport urllib\nimport random\nfrom urllib.error import URLError, HTTPError\nfrom lib.proxy "
  },
  {
    "path": "lib/openreview.py",
    "chars": 49092,
    "preview": "\"\"\"\nopenreview.py\n20230104\n\"\"\"\n\nimport time\nfrom tqdm import tqdm\nfrom selenium import webdriver\nfrom selenium.webdriver"
  },
  {
    "path": "lib/pmlr.py",
    "chars": 6694,
    "preview": "\"\"\"\r\npmlr.py\r\n20210618\r\n\"\"\"\r\nfrom bs4 import BeautifulSoup\r\nimport os\r\nfrom tqdm import tqdm\r\nfrom slugify import slugif"
  },
  {
    "path": "lib/proxy.py",
    "chars": 2423,
    "preview": "\"\"\"\nproxy.py\n20230228\n\"\"\"\nfrom selenium.webdriver.common.proxy import Proxy, ProxyType\nimport urllib\n\n\ndef get_proxy(ip_"
  },
  {
    "path": "lib/springer.py",
    "chars": 1627,
    "preview": "\"\"\"\r\nspringer.py\r\nsome function for springer\r\n20201106\r\n\"\"\"\r\n\r\nimport urllib\r\nfrom bs4 import BeautifulSoup\r\nfrom tqdm i"
  },
  {
    "path": "lib/supplement_porcess.py",
    "chars": 22886,
    "preview": "\"\"\"\r\n    supplement_process.py\r\n\"\"\"\r\nfrom PyPDF3 import PdfFileMerger\r\nimport zipfile\r\nimport os\r\nimport shutil\r\nfrom tq"
  },
  {
    "path": "lib/user_agents.py",
    "chars": 987,
    "preview": "\"\"\"\nuser_agents.py\n\nuser agents\n20230702\n\"\"\"\n\n\nuser_agents = [\n    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8."
  },
  {
    "path": "sharelinks.md",
    "chars": 1125,
    "preview": "# SHARE LINKS\nAliyun share links\n\n注：阿里云盘更新了协议，**一个分享链接最多只能分享不超过500个文件**，所以我进行了拆分，一个链接放499个文件，直至分享完。\n\n## CVPR\n\n### main c"
  }
]

About this extraction

This page contains the full source code of the SilenceEagle/paper_downloader GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (345.0 KB), approximately 72.1k tokens, and a symbol index with 70 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo