Repository: SilenceEagle/paper_downloader
Branch: master
Commit: 7a76ffa26612
Files: 30
Total size: 345.0 KB

Directory structure:
gitextract_691ya0bm/

├── .gitignore
├── LICENSE
├── README.md
├── code/
│   ├── paper_downloader_AAAI.py
│   ├── paper_downloader_AAMAS.py
│   ├── paper_downloader_AISTATS.py
│   ├── paper_downloader_COLT.py
│   ├── paper_downloader_CORL.py
│   ├── paper_downloader_CVF.py
│   ├── paper_downloader_ECCV.py
│   ├── paper_downloader_ICLR.py
│   ├── paper_downloader_ICML.py
│   ├── paper_downloader_IJCAI.py
│   ├── paper_downloader_JMLR.py
│   ├── paper_downloader_NIPS.py
│   └── paper_downloader_RSS.py
├── lib/
│   ├── IDM.py
│   ├── __init__.py
│   ├── arxiv.py
│   ├── csv_process.py
│   ├── cvf.py
│   ├── downloader.py
│   ├── my_request.py
│   ├── openreview.py
│   ├── pmlr.py
│   ├── proxy.py
│   ├── springer.py
│   ├── supplement_porcess.py
│   └── user_agents.py
└── sharelinks.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# ---> Python
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
# mylib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

csv/
data/
log/
temp_zip
urls/
*.txt

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2020 silenceagle

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# paper_downloader

Download papers and supplemental materials only from **OPEN ACCESS** paper
website, such as **AAAI**, **AAMAS**, **AISTATS**, **COLT**, **CORL**, **CVPR**, **ECCV**,
**ICCV**, **ICLR**, **ICML**, **IJCAI**, **JMLR**, **NIPS**,
**RSS**, **WACV**.

---

The number of papers that could be downloaded using this repo (also provide **Aliyundrive** or **123Pan** share link and `access code`):

<sub>
<sup>

|  year\conf   | [AAAI](https://aaai.org/aaai-publications/aaai-conference-proceedings/#aaai) | [AAMAS](https://www.ifaamas.org/Proceedings/aamas2024/) |                                  [ACCV](https://openaccess.thecvf.com/menu)                                  |          [AISTATS](https://www.aistats.org/)           |           [COLT](http://learningtheory.org/)           | [CORL](https://www.corl.org/) |                                   [CVPR](http://openaccess.thecvf.com/menu)                                    |         [ECCV](https://www.ecva.net/papers.php)         |                                   [ICCV](http://openaccess.thecvf.com/menu)                                    |                    [ICLR](https://iclr.cc/)                    |                [ICML](https://icml.cc/)                 |            [IJCAI](https://www.ijcai.org/)             | [JMLR](http://www.jmlr.org/) |                [NIPS ](https://nips.cc/)                | [RSS](https://www.roboticsproceedings.org/index.html) |                                  [WACV](https://openaccess.thecvf.com/menu)                                  |
|:------------:|:----------------------------------------------------------------------------:|:-------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------:|:------------------------------------------------------:|:-----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:------------------------------------------------------:|:----------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------:|:------------------------------------------------------------------------------------------------------------:|
|   **1969**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           64                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1971**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           66                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1973**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           85                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1975**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          146                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1977**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          251                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1979**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           12                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1980**   |            [95](https://www.aliyundrive.com/s/ucngMrKSTmi)`96eg`             |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1981**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          108                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1982**   |                                     104                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1983**   |            [92](https://www.aliyundrive.com/s/L3GfxhEqyWg)`09jo`             |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          237                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1984**   |                                      69                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1985**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          259                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1986**   |                                     194                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           --                            |                          --                           |                                                      --                                                      |
|   **1987**   |                                     149                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          246                           |              --              |                           90                            |                          --                           |                                                      --                                                      |
|   **1988**   |                                     159                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           94                            |                          --                           |                                                      --                                                      |
|   **1989**   |                                      --                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          269                           |              --              |                           101                           |                          --                           |                                                      --                                                      |
|   **1990**   |                                     173                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           49                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           143                           |                          --                           |                                                      --                                                      |
|   **1991**   |                                     144                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          192                           |              --              |                           144                           |                          --                           |                                                      --                                                      |
|   **1992**   |                                     134                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           49                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           127                           |                          --                           |                                                      --                                                      |
|   **1993**   |                                     135                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          138                           |              --              |                           158                           |                          --                           |                                                      --                                                      |
|   **1994**   |                                     302                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           140                           |                          --                           |                                                      --                                                      |
|   **1995**   |                                      --                                      |                           --                            |                                                      --                                                      |                           64                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          282                           |              --              |                           152                           |                          --                           |                                                      --                                                      |
|   **1996**   |                                     275                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           152                           |                          --                           |                                                      --                                                      |
|   **1997**   |                                     186                                      |                           --                            |                                                      --                                                      |                           57                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          180                           |              --              |                           150                           |                          --                           |                                                      --                                                      |
|   **1998**   |                                     187                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              --              |                           151                           |                          --                           |                                                      --                                                      |
|   **1999**   |                                     182                                      |                           --                            |                                                      --                                                      |                           17                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                          204                           |              --              |                           150                           |                          --                           |                                                      --                                                      |
| **2000/v1**  |                                     221                                      |                           --                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           98                            |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              11              |                           152                           |                          --                           |                                                      --                                                      |
| **2001/v2**  |                                      --                                      |                           --                            |                                                      --                                                      |                           46                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           --                            |                           17                           |              31              |                           197                           |                          --                           |                                                      --                                                      |
| **2002/v3**  |                                     187                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           196                           |                                                       --                                                       |                               --                               |                           --                            |                           --                           |              59              |                           207                           |                          --                           |                                                      --                                                      |
| **2003/v4**  |                                     ---                                      |                            /                            |                                                      --                                                      |                           44                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           121                           |                          297                           |              59              |                           198                           |                          --                           |                                                      --                                                      |
| **2004/v5**  |                                     177                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           190                           |                                                       --                                                       |                               --                               |                           118                           |                           --                           |              56              |                           207                           |                          --                           |                                                      --                                                      |
| **2005/v6**  |                                     328                                      |                            /                            |                                                      --                                                      |                           56                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           133                           |                          350                           |              73              |                           207                           |                          48                           |                                                      --                                                      |
| **2006/v7**  |                                     393                                      |                            /                            |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                         192+11                          |                                                       --                                                       |                               --                               |                           --                            |                           --                           |             100              |                           204                           |                          39                           |                                                      --                                                      |
| **2007/v8**  |                                     375                                      |                            /                            |                                                      --                                                      |                           86                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           150                           |                          478                           |              91              |                           217                           |                          41                           |                                                      --                                                      |
| **2008/v9**  |                                     355                                      |                           254                           |                                                      --                                                      |                           --                           |                           --                           |              --               |                                                       --                                                       |                           196                           |                                                       --                                                       |                               --                               |                           158                           |                           --                           |              97              |                           250                           |                          40                           |                                                      --                                                      |
| **2009/v10** |                                      --                                      |                           130                           |                                                      --                                                      |                           84                           |                           --                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           160                           |                          342                           |             100              |                           262                           |                          39                           |                                                      --                                                      |
| **2010/v11** |                                     300                                      |                           163                           |                                                      --                                                      |                          126                           |                           --                           |              --               |                                                       --                                                       |                         286+63                          |                                                       --                                                       |                               --                               |                           159                           |                           --                           |             118              |                           292                           |                          40                           |                                                      --                                                      |
| **2011/v12** |                                     302                                      |                           125                           |                                                      --                                                      |                          108                           |                           43                           |              --               |                                                       --                                                       |                           --                            |                                                       --                                                       |                               --                               |                           153                           |                          490                           |             105              |                           306                           |                          45                           |                                                      --                                                      |
| **2012/v13** |                                     353                                      |                           136                           |                                                      --                                                      |                          160                           |                           46                           |              --               |                                                       --                                                       |                         329+147                         |                                                       --                                                       |                               --                               |                           243                           |                           --                           |             119              |                           368                           |                          60                           |                                                      --                                                      |
| **2013/v14** |                                     251                                      |                           321                           |                                                      --                                                      |                           72                           |                           50                           |              --               |                           [471](https://www.aliyundrive.com/s/ZFvga9JZ5aY)`5p0q`+156                           |                           --                            |                                                    455+142                                                     |                              14+9                              |                           283                           |                          496                           |              84              |                           360                           |                          55                           |                                                      --                                                      |
| **2014/v15** |                                     447                                      |                           378                           |                                                      --                                                      |                          124                           |                           61                           |              --               |                                                    545+125                                                     |                         334+158                         |                                                       --                                                       |                               35                               |                           310                           |                           --                           |             120              |                           411                           |                          57                           |                                                      --                                                      |
| **2015/v16** |                                     455                                      |                           363                           |                                                      --                                                      |                          134                           |                           77                           |              --               |                                                    602+133                                                     |                           --                            |                                                    526+133                                                     |                               42                               |                           270                           |                          656                           |             118              |                           403                           |                          49                           |                                                      --                                                      |
| **2016/v17** |                                     676                                      |                           280                           |                                                      --                                                      |                          168                           |                           70                           |              --               |                                                    643+194                                                     |                         372+132                         |                                                       --                                                       |                               80                               |                           322                           |                          658                           |             236              |                           568                           |                          47                           |                                                      --                                                      |
| **2017/v18** |                                     765                                      |                           318                           |                                                      --                                                      |                          175                           |                           75                           |              48               |                                                    783+281                                                     |                           --                            |                                                    621+353                                                     |                              198                               |                           434                           |                          781                           |             234              |                           679                           |                          75                           |                                                      --                                                      |
| **2018/v19** |                                     1102                                     |                           390                           |                                                      --                                                      |                          230                           |                           94                           |              75               |                                                    979+346                                                     |                         732+262                         |                                                       --                                                       |                              336                               |                           466                           |                          870                           |              84              |                          1009                           |                          71                           |                                                      --                                                      |
| **2019/v20** |                                     1343                                     |                           433                           |                                                      --                                                      |                          403                           |                          127                           |              110              |                                                    1294+612                                                    |                           --                            |                                                    1075+498                                                    |                              502                               |                           773                           |                          964                           |             184              |                          1428                           |                          84                           |                                                      --                                                      |
| **2020/v21** |           [1864](https://www.aliyundrive.com/s/kbWKUpHGR3k)`5ls6`            |                           369                           | [254](https://www.aliyundrive.com/s/Dt2ErKCmePQ)`dn93`+[13](https://www.aliyundrive.com/s/AhGvgotrMUv)`d9o6` | [796](https://www.aliyundrive.com/s/iQ4AWTHG4bk)`61yu` | [126](https://www.aliyundrive.com/s/apP8KUFLPe4)`3mv9` |              165              | [1467](https://www.aliyundrive.com/s/eJF4BTFzFJq)`y89b`+[517](https://www.aliyundrive.com/s/5wk7Mjo9XyU)`0fz9` | [1358](https://www.aliyundrive.com/s/EYyjxRmmg8d)`a5i0` |                                                       --                                                       |     [687](https://www.aliyundrive.com/s/cVRD5Bu2SgN)`4x1c`     | [1084](https://www.aliyundrive.com/s/BHqtEbi6Dix)`5yw0` | [776](https://www.aliyundrive.com/s/vMZpsjCbWMV)`4xq3` |             254              | [1899](https://www.aliyundrive.com/s/GEMFqxKeHWu)`3g3d` |                          103                          | [378](https://www.aliyundrive.com/s/gfFKwcKrCP1)`l1m8`+[24](https://www.aliyundrive.com/s/2uCW6cq9WHk)`me08` |
| **2021/v22** |           [1961](https://www.aliyundrive.com/s/cdeGciNZch8)`b69m`            |                           304                           |                                                      --                                                      | [845](https://www.aliyundrive.com/s/3hbAhxYFHER)`93ig` | [140](https://www.aliyundrive.com/s/gwhdNT1vGDD)`96ln` |              166              |                          1660+[517](https://www.aliyundrive.com/s/ziBfXVKPXSY)`le14`                           |                           --                            | [1612](https://www.aliyundrive.com/s/ME21PfkyAec)`99uu`+[465](https://www.aliyundrive.com/s/ZahPmXSn9an)`16es` |     [860](https://www.aliyundrive.com/s/wGos6n5R93v)`ef43`     | [1183](https://www.aliyundrive.com/s/SYTtH38GiVS)`g8b1` | [723](https://www.aliyundrive.com/s/io3sAjsN5pw)`40is` |             290              | [2334](https://www.aliyundrive.com/s/13sHmhuEdxA)`v6g1` |                          92                           | [406](https://www.aliyundrive.com/s/kTwfaX9tren)`1id9`+[23](https://www.aliyundrive.com/s/7Joy4svvUfy)`90rl` |
| **2022/v23** |           [1624](https://www.aliyundrive.com/s/ePXvUw4VFdQ)`fp76`            |                           306                           | [279](https://www.aliyundrive.com/s/zCCTJMPrfSr)`47jy`+[25](https://www.aliyundrive.com/s/f4kdMXixwJL)`s7a9` | [492](https://www.aliyundrive.com/s/xj2fRMwZxfC)`f16o` |                          155                           |              197              | [2077](https://www.aliyundrive.com/s/Q8DG9dKbx6S)`i16a`+[562](https://www.aliyundrive.com/s/f9Zx3hFFyq4)`11kj` | [1645](https://www.aliyundrive.com/s/dv4fhuueRHs)`6d7j` |                                                       --                                                       | [54+176+865](https://www.aliyundrive.com/s/gfANcdbM9TC)`b1l3`  | [1234](https://www.aliyundrive.com/s/eopQ5H8Hz2a)`81ov` | [862](https://www.aliyundrive.com/s/DBVKNsqN2UZ)`ea46` |             351              | [2673](https://www.aliyundrive.com/s/VFLmfnzSAsA)`eh49` |                          74                           | [406](https://www.aliyundrive.com/s/xRhdpencLQU)`ab53`+[80](https://www.aliyundrive.com/s/JCCcQXij7WX)`q6d2` |
| **2023/v24** |                                     2021                                     |                           527                           |                                                      --                                                      | [496](https://www.aliyundrive.com/s/CD3Kz9cxu1U)`l5m9` |                          170                           |              199              |                                          [2358+698](./sharelinks.md)                                           |                           --                            |                                                    2161+491                                                    | [90+284+1205](https://www.aliyundrive.com/s/PZ1Wann4B8A)`29sf` |                          1805                           |                          846                           |             397              |                       67+378+2773                       |                          112                          | [639](https://www.aliyundrive.com/s/fP52KxJEUE5)`mo78`+[74](https://www.aliyundrive.com/s/XZG992JqQfn)`nj80` |
| **2024/v25** |                                     2581                                     |                           460                           |                                                    268+46                                                    |                          547                           |                          170                           |              264              |                                                    2716+773                                                    |                          2387                           |                                                       --                                                       |                          86+369+1810                           |                      144+191+2275                       |                          1048                          |             419              |                       61+326+3650                       |                          131                          |                                                   846+120                                                    |
| **2025/v26** |                                     3028                                     |                           479                           |                                                     ---                                                      |                          583                           |                          182                           |              263              |                                                    2871+659                                                    |                           --                            |                                                    2701+765                                                    |                      208+373+3060+6+6+56                       |                      108+211+2967                       |                          1276                          |             308              |                       77+683+4515                       |                          163                          |                                                     929                                                      |
| **2026/v27** |                                     2375                                     |                          29 May                         |                                                    18 Dec.                                                   |                         2 May                          |                         3 July                         |             12 Nov.           |                                                     7 June                                                     |                         13 Sep.                         |                                                     29 Sep.                                                    |                            225+5131                            |                         11 July                         |                         21 Aug                         |             50               |                          13 Dec                         |                        17 July                        |                                                   831+191                                                    |

</sup>
</sub>

<!--| **2023/v24** |                                             
|                                                                                                                   |                                                             |                                                             |                                                                                                                     |                                                                                                                    |                                                                                                                     |                                                                    |                                                              |                                                             |                              |                                                              |      |                                                                                                             |-->

[Download from 123pan.com](https://www.123pan.com/s/PwXljv-QErwd.html)
(ACCESS CODE: `FdX2`)

 (May miss some papers due to the (older version of) 123pan's limitation on the length of filename)

NOTE: all the shared papers' pdf files are collected from network, and the original authors/providers hold the copyrights.

---

## Usage

**For example: download AAAI-2022 papers**

1. Install [Internet Downloader Manager/IDM](https://www.internetdownloadmanager.com/) [*Windows*] [*OPTIONAL*]

   **Note:** If the IDM is NOT installed at the DEFAULT location, then the
   code in [lib/IDM.py](./lib/IDM.py) should also be modified:

   ```python
   # should replace with your IDM path
   idm_path = '"your path to IDMan.exe"'  

   # default:
   # idm_path = '"C:\Program Files (x86)\Internet Download Manager\IDMan.exe"'
   ```

   **Uesful tip**: [Disable the downloading popup pages of IDM would be better](https://github.com/SilenceEagle/paper_downloader/issues/17#issuecomment-773763300)

2. Install [Chrome](https://www.google.com/chrome) [Needed for `ICLR`, `ICML`, some of `NIPS` and `CORL` papers]
3. Change the code block at the end of
   [code/paper_downloader_AAAI.py](./code/paper_downloader_AAAI.py)

   ```python
   if __name__ == '__main__':
      year = 2022
      total_paper_number = save_csv(year)  # save papers urls to csv/AAAI_2022.csv
      download_from_csv(
         year, 
         save_dir=f'..\\AAAI_{year}', # change to your save location
         time_step_in_seconds=5,  # time step (seconds) between two downloading requests
         total_paper_number=total_paper_number,
         downloader=None # use python "requests" package to download papers, workable on Windows/MacOS/Linux
         # downloader='IDM'  # use Internet Download Manager software to 
                              # download papers, Windows only
      )
   ```

4. Then run the code:

   ```python
   python code/paper_downloader_AAAI.py  # download AAAI papers
   ```

---

**This repo also provides the function to process supplemental material:**

1. Merge the main supplemental material pdf file and the main paper into one single pdf file;
2. Move the supplemental material pdf files (extracted from the downloaded zip files if presented) into the main papers' folder.

## Star history

[![Star History Chart](https://api.star-history.com/svg?repos=SilenceEagle/paper_downloader&type=Date)](https://star-history.com/#SilenceEagle/paper_downloader&Date)


================================================
FILE: code/paper_downloader_AAAI.py
================================================
"""paper_downloader_AAAI.py"""
import time
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
import random

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.user_agents import user_agents
from lib.my_request import urlopen_with_retry


def get_track_urls(year):
    """
    get all the technical tracks urls given AAAI proceeding year
    Args:
        year (int): AAAI proceeding year, such 2023

    Returns:
        dict : All the urls of technical tracks included in
            the given AAAI proceeding. Keys are the tracks name-volume,
            and values are the corresponding urls.
    """
    # assert int(year) >= 2023, f"only support year >= 2023, but get {year}!!!"
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    dat_file_pathname = os.path.join(
        project_root_folder, 'urls', f'track_archive_url_AAAI_{year}.dat'
    )
    proceeding_th_dict = {
        1980: 1,
        1902: 2,
        1983: 3,
        1984: 4,
        1986: 5,
        1987: 6,
        1988: 7,
        1990: 8,
        1991: 9,
        1992: 10,
        1993: 11,
        1994: 12,
        1996: 13,
        1997: 14,
        1998: 15,
        1999: 16,
        2000: 17,
        2002: 18,
        2004: 19,
        2005: 20,
        2006: 21,
        2007: 22,
        2008: 23
    }
    if year >= 2023:
        base_url = r'https://ojs.aaai.org/index.php/AAAI/issue/archive'
        headers = {
            'User-Agent': user_agents[-1],
            'Host': 'ojs.aaai.org',
            'Referer': "https://ojs.aaai.org",
            'GET': base_url
        }
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            content = urlopen_with_retry(url=base_url, headers=headers)
            # req = urllib.request.Request(url=base_url, headers=headers)
            # content = urllib.request.urlopen(req).read()
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        tracks = soup.find('ul', {'class': 'issues_archive'}).find_all('li')
        track_urls = dict()
        for tr in tracks:
            h2 = tr.find('h2')
            this_track = slugify(h2.a.text)
            if this_track.startswith(f'aaai-{year-2000}'):
                this_track += slugify(h2.div.text) + '-' + this_track
                this_url = h2.a.get('href')
                track_urls[this_track] = this_url
                print(f'find track: {this_track}({this_url})')
    else:
        if year >= 2010:
            proceeding_th = year - 1986
        elif year in proceeding_th_dict:
            proceeding_th = proceeding_th_dict[year]
        else:
            print(f'ERROR: AAAI proceeding was not held in year {year}!!!')
            return

        base_url = f'https://aaai.org/proceeding/aaai-{proceeding_th:02d}-{year}/'
        headers = {
            'User-Agent': user_agents[-1],
            'Host': 'aaai.org',
            'Referer': "https://aaai.org",
            'GET': base_url
        }
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            # req = urllib.request.Request(url=base_url, headers=headers)
            # content = urllib.request.urlopen(req).read()
            content = urlopen_with_retry(url=base_url, headers=headers)
            # content = open(f'..\\AAAI_{year}.html', 'rb').read()
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        tracks = soup.find('main', {'class': 'content'}).find_all('li')
        track_urls = dict()
        for tr in tracks:
            this_track = slugify(tr.a.text)
            this_url = tr.a.get('href')
            track_urls[this_track] = this_url
            print(f'find track: {this_track}({this_url})')
    return track_urls


def get_papers_of_track_ojs(track_url):
    """
    get all the papers' title, belonging track group name and download link.
    the link should be hosted on https://ojs.aaai.org/
    Args:
        track_url (str): track url

    Returns:
        list[dict]: a list contains all the collected papers' information,
            each item in list is a dictionary, whose keys include
            ['title', 'main link', 'group']
            And the group is the specific track name.
    """
    debug = False
    paper_list = []
    headers = {
        'User-Agent': user_agents[-1],
        'Host': 'ojs.aaai.org',
        'Referer': "https://ojs.aaai.org",
        'GET': track_url
    }
    content = urlopen_with_retry(url=track_url, headers=headers)

    soup = BeautifulSoup(content, 'html5lib')
    tracks = soup.find('div', {'class': 'sections'}).find_all(
        'div', {'class': 'section'})
    for tr in tracks:
        this_group = slugify(tr.h2.text)
        this_paper_dict = {
            'group': this_group,
            'title': '',
            'main link': ''
        }
        papers = tr.find_all('li')
        for p in papers:
            this_paper_dict['title'] = ''
            this_paper_dict['main link'] = ''
            try:
                title = slugify(p.find('h3', {'class': 'title'}).text)
                link = p.find(
                    'a', {'class': 'obj_galley_link pdf'}
                ).get('href').replace('view', 'download')
                this_paper_dict['title'] = title
                this_paper_dict['main link'] = link
                paper_list.append(this_paper_dict.copy())
                if debug:
                    print(
                        f'paper: {title}\n\tlink:{link}\n\tgroup:{this_group}')
            except Exception as e:
                # skip unwanted target
                # print(f'ERROR: {str(e)}')
                pass
                # continue

    return paper_list


def get_papers_of_track(track_url):
    """
    get all the papers' title, belonging track group name and download link.
    the link should be hosted on https://aaai.org/
    Args:
        track_url (str): track url

    Returns:
        list[dict]: a list contains all the collected papers' information,
            each item in list is a dictionary, whose keys include
            ['title', 'main link', 'group']
            And the group is the specific track name.
    """
    debug = False
    paper_list = []
    headers = {
        'User-Agent': user_agents[-1],
        'Host': 'aaai.org',
        'Referer': "https://aaai.org",
        'GET': track_url
    }
    content = urlopen_with_retry(url=track_url, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    tracks = soup.find('main', {'id': 'genesis-content'}).find_all(
        'div', {'class': 'track-wrap'})
    for tr in tracks:
        this_group = slugify(tr.h2.text)
        this_paper_dict = {
            'group': this_group,
            'title': '',
            'main link': ''
        }
        papers = tr.find_all('li')
        for p in papers:
            this_paper_dict['title'] = ''
            this_paper_dict['main link'] = ''
            try:
                title = slugify(p.find('h5').text)
                link = p.find(
                    'a', {'class': 'wp-block-button'}
                ).get('href')
                this_paper_dict['title'] = title
                this_paper_dict['main link'] = link
                paper_list.append(this_paper_dict.copy())
                if debug:
                    print(
                        f'paper: {title}\n\tlink:{link}\n\tgroup:{this_group}')
            except Exception as e:
                # skip unwanted target
                # print(f'ERROR: {str(e)}')
                pass
                # continue

    return paper_list


def save_csv(year):
    """
    write AAAI papers' urls in one csv file
    :param year: int, AAAI year, such 2019
    :return: peper_index: int, the total number of papers
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'AAAI_{year}.csv'
    )
    error_log = []
    paper_index = 0
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'group']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        track_urls = get_track_urls(year)
        for tr_name in track_urls:
            tr_url = track_urls[tr_name]
            print(f'collecting paper from {tr_name}({tr_url})')
            if year >= 2023:
                papers_dict_list = get_papers_of_track_ojs(tr_url)
            else:
                papers_dict_list = get_papers_of_track(tr_url)
            print(f'\tfind {len(papers_dict_list)} papers')
            for p in papers_dict_list:
                paper_index += 1
                writer.writerow(p)
            csvfile.flush()
            s = random.randint(3, 7)
            print(f'random sleeping {s} seconds...')
            time.sleep(s)  # avoid requesting too frequently

    #  write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')
    return paper_index


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None,
        csv_filename=None, downloader='IDM'):
    """
    download all AAAI paper given year
    :param year: int, AAAI year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param csv_filename: None or str, the csv file's name, None means to use
        default setting
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    postfix = f'AAAI_{year}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_path = os.path.join(
        project_root_folder, 'csv',
        f'AAAI_{year}.csv' if csv_filename is None else csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    # total_paper_number = 3028
    total_paper_number = save_csv(year)
    download_from_csv(
        year,
        save_dir=fr'D:\AAAI_{year}',
        time_step_in_seconds=15,
        total_paper_number=total_paper_number)
    # for year in range(2012, 2018, 2):
    #     print(year)
    #     total_paper_number = None
    #     # total_paper_number = save_csv(year)
    #     download_from_csv(year, save_dir=f'..\\AAAI_{year}',
    #                       time_step_in_seconds=10,
    #                       total_paper_number=total_paper_number)
    #     time.sleep(2)
    # for i in range(1, 12):
    #     print(f'issue {i}/{11}')
    #     year = 2022
    #     total_paper_number = save_csv_given_urls(
    #         urls=f'https://www.aaai.org/Library/AAAI/aaai{year - 2000}-issue{i:0>2}.php',
    #         csv_filename=f'.\AAAI_{year}_issue_{i}.csv'
    #     )
    #     # total_paper_number = 156
    #     download_from_csv(
    #         year=year,
    #         csv_filename=f'.\AAAI_{year}_issue_{i}.csv',
    #         save_dir=rf'D:\AAAI_{year}',
    #         time_step_in_seconds=1,
    #         total_paper_number=total_paper_number)

    # print(get_track_urls(1980))
    # get_papers_of_track(r'https://ojs.aaai.org/index.php/AAAI/issue/view/548')

    pass


================================================
FILE: code/paper_downloader_AAMAS.py
================================================
"""paper_downloader_AAMAS.py
"""

import time
import urllib
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write AAMAS papers' urls in one csv file
    :param year: int, AAMAS year, such 2023
    :return: peper_index: int, the total number of papers
    """
    conference = "AAMAS"
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_{year}.csv'
    )

    init_url_dict = {
        2010: 'https://www.ifaamas.org/Proceedings/aamas2010/resources/_fullpapers.html',
        2009: 'https://www.ifaamas.org/Proceedings/aamas2009/TOC/01_FP/FP_Session.html',
        2008: 'https://www.ifaamas.org/Proceedings/aamas2008/proceedings/mainTrackPapers.htm',
    }

    error_log = []
    paper_index = 0
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'group', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        if year >=  2013:
            init_url = f'https://www.ifaamas.org/Proceedings/aamas{year}' \
                f'/forms/contents.htm'
        elif year >= 2011:
            init_url = f'https://www.ifaamas.org/Proceedings/aamas{year}'\
                f'/resources/fullpapers.html'
        elif year in init_url_dict:
            init_url = init_url_dict[year]
        else:   
            # TODO: support downloading 2002 ~ 2007 papers
            return
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', 
            f'init_url_{conference}_{year}.dat'''
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                    'AppleWebKit/537.36 (KHTML, like Gecko) '
                    'Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0'}
            content = urlopen_with_retry(url=init_url, headers=headers)
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)

        soup = BeautifulSoup(content, 'html5lib')
        # soup = BeautifulSoup(content, 'html.parser')
        if year >=  2013:
            group_list = soup.find('tbody').find_all('tr', recursive=False)[3:]
            # skip "conference title", "Table of Contents" and "Contents table"  
            
            group_list_bar = tqdm(group_list)
            paper_index = 0
            is_start = False
            for group in group_list_bar:
                if not is_start:
                    # if group.find('a', {'id': 'KT'}): # year 2019, 2023, 2024
                    #     is_start = True
                    if group.find('strong'):
                        group_text = slugify(group.find('strong').text)
                        if not group_text.startswith('table') and \
                            not group_text.startswith('aamas'):  
                            # skip Table of Contents, AAMAS 20xx
                            is_start = True
                        else:
                            continue
                    else:
                        continue
                
                try:
                    tds = group.find_all('td', recursive=False)
                    if len(tds) < 2:
                        continue
                    group = tds[1]
                    papers = group.find_all('p')

                    for p in papers:
                        # group title is in <strong>...</strong>
                        if p.find('strong', recursive=False):
                            group_title = slugify(p.text)
                            continue
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        if p.find('a') is None and p.find('b') is None:
                            # last empty <p>...</p> in some <tr>...</tr>
                            continue
                        a = p.find('a')
                        if a is None:
                            title = slugify(p.find('b').text)
                            main_link = ''
                            print(f'\nWarning: No link found for {title}!')
                        else:
                            title = slugify(a.text)
                            main_link = urllib.parse.urljoin(init_url, a.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        elif year >= 2010:
            class_name = {
                2010: 'plist',
                2011: 'plist',
                2012: 'pindex'
            }
            papers = soup.find('div', {'class': class_name[year]}).find_all(['h2', 'div'])
            papers_bar = tqdm(papers)
            paper_index = 0
            for p in papers_bar:
                if p.name == 'h2': # group title
                    group_title = slugify(p.text)
                else:  # div, paper
                    paper_dict = {'title': '',
                                'group': group_title,
                                'main link': '',
                                'supplemental link': ''}
                    a = p.find('span', {'class': 'title'}).find('a')
                    # title = slugify(a.find(string=True, recursive=False)) # drop abs
                    direct_text = ''.join(child for child in a.contents 
                                          if isinstance(child, str)).strip()
                    title = slugify(direct_text)
                    main_link = urllib.parse.urljoin(init_url, a.get('href'))
                    paper_dict['title'] = title
                    paper_dict['main link'] = main_link
                    paper_index += 1
                    papers_bar.set_description_str(
                        f'Collected paper {paper_index}: {title}')
                    writer.writerow(paper_dict)
                    csvfile.flush()  # write to file immediately
        elif year == 2009:
            group_list = soup.find('div', {'id': 'mainContent'}).find_all('p')
            group_list_bar = tqdm(group_list)
            paper_index = 0
            is_start = False
            for group in group_list_bar:
                if not is_start:
                    if group.find('strong'):
                        group_text = slugify(group.find('strong').text)
                        is_start = True
                    else:
                        continue
                if group.find('strong'):
                    group_title = slugify(group.text)
                    continue
                try:
                    papers = group.find_all('a')
                    for p in papers:
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        title = slugify(p.text)
                        main_link = urllib.parse.urljoin(init_url, p.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        elif year == 2008:
            # papers = soup.find_all(lambda tag: 
            #     (tag.name == 'p' and 'title' in tag.get('class', [])) or 
            #     tag.name == 'a'
            # )
            group_list = soup.find('div', {'id': 'mainbody'}).find(
                'table').find('tbody').find_all('tr', recursive=False)[2:]
            # skip "conference title", "Table of Contents" 
            
            group_list_bar = tqdm(group_list)
            paper_index = 0
            for group in group_list_bar:
                
                try:
                    p_class_title = group.find('p', {'class': 'title'})
                    h3 = group.find('h3')
                    if p_class_title:                       
                        group_title = slugify(p_class_title.text)
                    elif h3:  # find <h3></h3>
                        group_title = slugify(h3.text)
                    else:
                        raise ValueError('Parse group title failed!')

                    papers = group.find_all('a')

                    for p in papers:
                        paper_dict = {'title': '',
                                    'group': group_title,
                                    'main link': '',
                                    'supplemental link': ''}
                        
                        title = slugify(p.text)
                        if not p.get('href'):
                            continue # group title
                        main_link = urllib.parse.urljoin(init_url, p.get('href'))
                        
                        paper_dict['title'] = title
                        paper_dict['main link'] = main_link
                        paper_index += 1
                        group_list_bar.set_description_str(
                            f'Collected paper {paper_index}: {title}')
                        writer.writerow(paper_dict)
                        csvfile.flush()  # write to file immediately
                except Exception as e:
                    print(f'Warning: {str(e)}\n'
                        f'Current group: {group_title}\nCurrent paper: {title}')
        else:
            # TODO: support downloading 2002 ~ 2008 papers
            return

    #  write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')
    return paper_index


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None,
        csv_filename=None, downloader='IDM', is_random_step=True,
        proxy_ip_port=None):
    """
    download all AAMAS paper given year
    :param year: int, AAMAS year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param csv_filename: None or str, the csv file's name, None means to use
        default setting
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    conference = "AAMAS"
    postfix = f'{conference}_{year}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_path = os.path.join(
        project_root_folder, 'csv',
        f'{conference}_{year}.csv' if csv_filename is None else csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader,
        is_random_step=is_random_step,
        proxy_ip_port=proxy_ip_port
    )


if __name__ == '__main__':
    year = 2025
    # total_paper_number = 2021
    total_paper_number = save_csv(year)
    download_from_csv(
        year,
        save_dir=fr'D:\AAMAS_{year}',
        time_step_in_seconds=5,
        total_paper_number=total_paper_number)
    # for year in range(2008, 2025, 1):
    #     print(year)
    #     # total_paper_number = 134
    #     total_paper_number = save_csv(year)
    #     download_from_csv(year, save_dir=fr'E:\AAMAS\AAMAS_{year}',
    #                       time_step_in_seconds=10,
    #                       total_paper_number=total_paper_number)
    #     time.sleep(2)

    pass

================================================
FILE: code/paper_downloader_AISTATS.py
================================================
"""paper_downloader_AISTATS.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr
from lib.supplement_porcess import merge_main_supplement, move_main_and_supplement_2_one_directory, \
    move_main_and_supplement_2_one_directory_with_group


def download_paper(year, save_dir, is_download_supplement=True, time_step_in_seconds=5, downloader='IDM'):
    """
    download all AISTATS paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, AISTATS year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    AISTATS_year_dict = {
        2025: 258,
        2024: 238,
        2023: 206,
        2022: 151,
        2021: 130,
        2020: 108,
        2019: 89,
        2018: 84,
        2017: 54,
        2016: 51,
        2015: 38,
        2014: 33,
        2013: 31,
        2012: 22,
        2011: 15,
        2010: 9,
        2009: 5,
        2007: 2
    }
    AISTATS_year_dict_R = {
        1995: 0,
        1997: 1,
        1999: 2,
        2001: 3,
        2003: 4,
        2005: 5

    }
    if year in AISTATS_year_dict.keys():
        volume = f'v{AISTATS_year_dict[year]}'
    elif year in AISTATS_year_dict_R.keys():
        volume = f'r{AISTATS_year_dict_R[year]}'
    else:
        raise ValueError('''the given year's url is unknown !''')
    postfix = f'AISTATS_{year}'

    pmlr.download_paper_given_volume(
        volume=volume,
        save_dir=save_dir,
        postfix=postfix,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'D:\AISTATS_{year}',
        is_download_supplement=True,
        time_step_in_seconds=25,
        downloader='IDM'
    )
    # move_main_and_supplement_2_one_directory(
    #     main_path=rf'D:\AISTATS_{year}\main_paper',
    #     supplement_path=rf'D:\AISTATS_{year}\supplement',
    #     supp_pdf_save_path=rf'D:\AISTATS_{year}\supplement_pdf'
    # )
    pass


================================================
FILE: code/paper_downloader_COLT.py
================================================
"""paper_downloader_COLT.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr


def download_paper(year, save_dir, is_download_supplement=False, time_step_in_seconds=5, downloader='IDM'):
    """
    download all COLT paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, COLT year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    COLT_year_dict = {
        2025: 291,
        2024: 247,
        2023: 195,
        2022: 178,
        2021: 134,
        2020: 125,
        2019: 99,
        2018: 75,
        2017: 65,
        2016: 49,
        2015: 40,
        2014: 35,
        2013: 30,
        2012: 23,
        2011: 19
                      }
    if year in COLT_year_dict.keys():
        volume = f'v{COLT_year_dict[year]}'
    else:
        raise ValueError('''the given year's url is unknown !''')
    postfix = f'COLT_{year}'

    pmlr.download_paper_given_volume(
        volume=volume,
        save_dir=save_dir,
        postfix=postfix,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader
    )


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'D:\COLT_{year}',
        is_download_supplement=False,
        time_step_in_seconds=3,
        downloader='IDM'
    )
    pass


================================================
FILE: code/paper_downloader_CORL.py
================================================
"""paper_downloader_CORL.py"""
import os
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
import lib.pmlr as pmlr
import lib.openreview as openreview


def download_paper(year, save_dir, is_download_supplement=False, 
                   time_step_in_seconds=5, downloader='IDM',
                   source=None, proxy_ip_port=None):
    """
    download all CORL paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, CORL year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param source: str, download source, support "pmlr" and "openreview". 
        Defaults to None, means first try to download from pmlr. If failed, 
        then try to download from openreview.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890".  Only useful for webdriver and request
        downloader (downloader=None). Default: None.
    :type proxy_ip_port: str | None
    :return: True
    """
    CORL_year_dict = {
        2025: 305,
        2024: 270,
        2023: 229,
        2022: 205,
        2021: 164,
        2020: 155,
        2019: 100,
        2018: 87,
        2017: 78
    }
    postfix = f'CORL_{year}'

    if source != 'openreview':
        if year in CORL_year_dict.keys():  # download from pmlr
            volume = f'v{CORL_year_dict[year]}'
            pmlr.download_paper_given_volume(
                volume=volume,
                save_dir=save_dir,
                postfix=postfix,
                is_download_supplement=is_download_supplement,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader
            )
            return True
        elif source == 'pmlr':
            raise ValueError(f'Not found CoRL {year} in pmlr!')
        
    # try to download from openreview
    base_url = f'https://openreview.net/group?id=robot-learning.org/'\
               f'CoRL/{year}/Conference'
    group_id_dict = {
        2023: ['accept--oral-', 'accept--poster-'],
        2024: ['accept']
    }
    for gid in group_id_dict[year]:
        openreview.download_papers_given_url_and_group_id(
            save_dir=save_dir,
            year=year,
            base_url=f'{base_url}#{gid}',
            group_id=gid,
            conference='CORL',
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )
    return True 


if __name__ == '__main__':
    year=2025
    download_paper(
        year,
        rf'D:\CORL\CORL_{year}',
        is_download_supplement=False,
        time_step_in_seconds=30,
        downloader='IDM'
        # downloader = None
    )
    pass


================================================
FILE: code/paper_downloader_CVF.py
================================================
"""paper_downloader_CVF.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from slugify import slugify
import csv
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.supplement_porcess import merge_main_supplement, move_main_and_supplement_2_one_directory, \
    move_main_and_supplement_2_one_directory_with_group, \
    rename_2_short_name, rename_2_short_name_within_group
from lib.cvf import get_paper_dict_list
from lib import csv_process
import time
from lib.my_request import urlopen_with_retry


def save_csv(year, conference, proxy_ip_port=None):
    """
    write CVF conference papers' and supplemental material's urls in one csv file
    :param year: int
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV', 'ACCV']
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    if conference not in ['CVPR', 'ICCV', 'WACV', 'ACCV']:
        raise ValueError(f'{conference} is not found in '
                         f'https://openaccess.thecvf.com/menu, '
                         f'maybe a spelling mistake!')
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_{year}.csv'
    )
    print(f'saving {conference}-{year} paper urls into {csv_file_pathname}')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link', 'arxiv']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        init_url = f'http://openaccess.thecvf.com/{conference}{year}'
        if conference == 'ICCV' and year == 2021:
            init_url = 'https://openaccess.thecvf.com/ICCV2021?day=all'
        elif conference == 'CVPR' and year >= 2022:
            init_url = f'https://openaccess.thecvf.com/CVPR{year}?day=all'
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_{conference}_{year}.dat'
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            content = urlopen_with_retry(
                url=init_url, headers=headers, proxy_ip_port=proxy_ip_port)
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)

        soup = BeautifulSoup(content, 'html5lib')
        tmp_list = soup.find('div', {'id': 'content'}).find_all('dt')
        if len(tmp_list) <= 1:
            paper_different_days_list_bar = soup.find(
                'div', {'id': 'content'}).find_all('dd')
            paper_index = 0
            for group in paper_different_days_list_bar:
                # get group name
                a = group.find('a')
                print(a.text)
                group_link = urllib.parse.urljoin(init_url, a.get('href'))
                group_paper_dict_list, _ = get_paper_dict_list(
                    url=group_link
                )
                paper_index += len(group_paper_dict_list)
                for paper_dict in group_paper_dict_list:
                    writer.writerow(paper_dict)
            return paper_index
        else:
            paper_dict_list, content = get_paper_dict_list(
                url=init_url,
                content=content)
            for paper_dict in paper_dict_list:
                writer.writerow(paper_dict)
            return len(paper_dict_list)


def save_csv_workshops(year, conference, proxy_ip_port=None):
    """
    write CVF workshops papers' and supplemental material's urls in one csv file
    :param year: int
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV', 'ACCV']
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    if conference not in ['CVPR', 'ICCV', 'WACV', 'ACCV']:
        raise ValueError(f'{conference} is not found in '
                         f'https://openaccess.thecvf.com/menu, '
                         f'maybe a spelling mistake!')
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_WS_{year}.csv'
    )
    print(f'saving {conference}-WS-{year} paper urls into {csv_file_pathname}')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['group', 'title', 'main link', 'supplemental link',
                      'arxiv']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}

        init_url = f'https://openaccess.thecvf.com/' \
                   f'{conference}{year}_workshops/menu'
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_{conference}_WS_{year}.dat'
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            content = urlopen_with_retry(
                url=init_url, headers=headers, proxy_ip_port=proxy_ip_port)
            # content = open(f'..\\{conference}_WS_{year}.html', 'rb').read()
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html5lib')
        paper_group_list_bar = soup.find('div', {'id': 'content'}).find_all('dd')
        paper_index = 0
        for group in paper_group_list_bar:
            # get group name
            a = group.find('a')
            group_name = slugify(a.text)
            print(f'GROUP: {group_name}')

            group_link = urllib.parse.urljoin(init_url, a.get('href'))

            repeat_time = 3
            for r in range(repeat_time):
                try:
                    group_paper_dict_list, _ = get_paper_dict_list(
                        url=group_link,
                        group_name=group_name,
                        timeout=20,
                    )
                    time.sleep(1)
                    break
                except Exception as e:
                    if r + 1 == repeat_time:
                        print(f'ERROR: {str(e)}')
                        continue

            paper_index += len(group_paper_dict_list)
            for paper_dict in group_paper_dict_list:
                writer.writerow(paper_dict)
    return paper_index


def download_from_csv(
        year, conference, save_dir, is_download_main_paper=True,
        is_download_supplement=True, time_step_in_seconds=5,
        total_paper_number=None, is_workshops=False, downloader='IDM',
        proxy_ip_port=None):
    """
    download all CVF paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, CVF year, such 2019
    :param conference: str, one of ['CVPR', 'ICCV', 'WACV']
    :param save_dir: str, paper and supplement material's save path
    :param is_download_main_paper: bool, True for downloading main paper
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two downloading
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param is_workshops: bool, is to download workshops from csv file.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    postfix = f'{conference}_{year}'
    if is_workshops:
        postfix = f'{conference}_WS_{year}'
    csv_file_path = os.path.join(
        project_root_folder,
        'csv',
        f'{conference}_{year}.csv' if not is_workshops else
        f'{conference}_WS_{year}.csv'
    )
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_main_paper=is_download_main_paper,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader,

    )
    return True


def download_paper(
        year, conference, save_dir, is_download_main_paper=True,
        is_download_supplement=True, time_step_in_seconds=5,
        is_download_main_conference=True, is_download_workshops=True,
        downloader='IDM', proxy_ip_port=None):
    """
    download all CVF papers in given year, support downloading main conference
    and workshops.
    :param year: int, CVF year, such 2019.
    :param conference: str, one of {'CVPR', 'ICCV', 'WACV'}.
    :param save_dir: str, paper and supplement material's save path.
    :param is_download_main_paper: bool, True for downloading main paper.
    :param is_download_supplement: bool, True for downloading supplemental
        material.
    :param time_step_in_seconds: int, the interval time between two downloading
        request in seconds.
    :param is_download_main_conference: bool, this parameter controls whether to
        download main conference papers,
        it is a upper level control flag of parameters is_download_main_paper
        and is_download_supplement. eg. After setting
        is_download_main_conference=True, is_download_main_paper=False,
        is_download_supplement=True, the only the supplement materials of the
        conference (vs. workshops) will be downloaded.
    :param is_download_workshops: bool, True for downloading workshops paper
        and is similar with is_download_main_conference.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    # main conference
    if is_download_main_conference:
        csv_file_path = os.path.join(
            project_root_folder, 'csv', f'{conference}_{year}.csv')
        if not os.path.exists(csv_file_path):
            total_paper_number = save_csv(
                year=year, conference=conference, proxy_ip_port=proxy_ip_port)
        else:
            with open(csv_file_path, newline='') as csvfile:
                myreader = csv.DictReader(csvfile, delimiter=',')
                total_paper_number = sum(1 for row in myreader)

        download_from_csv(
            year=year,
            conference=conference,
            save_dir=os.path.join(save_dir, f'{conference}_{year}'),
            is_download_main_paper=is_download_main_paper,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            total_paper_number=total_paper_number,
            is_workshops=False,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )

    # workshops
    if is_download_workshops:
        csv_file_path = os.path.join(
            project_root_folder, 'csv', f'{conference}_WS_{year}.csv')
        if not os.path.exists(csv_file_path):
            total_paper_number = save_csv_workshops(
                year=year, conference=conference, proxy_ip_port=proxy_ip_port)
        else:
            with open(csv_file_path, newline='') as csvfile:
                myreader = csv.DictReader(csvfile, delimiter=',')
                total_paper_number = sum(1 for row in myreader)
        download_from_csv(
            year=year,
            conference=conference,
            save_dir=os.path.join(save_dir, f'{conference}_WS_{year}'),
            is_download_main_paper=is_download_main_paper,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            total_paper_number=total_paper_number,
            is_workshops=True,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )


if __name__ == '__main__':
    year = 2025
    conference = 'CVPR'
    download_paper(
        year,
        conference=conference,
        save_dir=fr'D:\{conference}',
        is_download_main_paper=True,
        is_download_supplement=True,
        time_step_in_seconds=10,
        is_download_main_conference=True,
        is_download_workshops=True,
        # proxy_ip_port='127.0.0.1:7897'
    )
    #
    # move_main_and_supplement_2_one_directory(
    #     main_path=rf'E:\{conference}\{conference}_{year}\main_paper',
    #     supplement_path=rf'E:\{conference}\{conference}_{year}\supplement',
    #     supp_pdf_save_path=rf'E:\{conference}\{conference}_{year}\main_paper'
    # )
    # move_main_and_supplement_2_one_directory_with_group(
    #     main_path=rf'E:\{conference}\{conference}_WS_{year}\main_paper',
    #     supplement_path=rf'E:\{conference}\{conference}_WS_{year}\supplement',
    #     supp_pdf_save_path=rf'E:\{conference}\{conference}_WS_{year}\main_paper'
    # )

    # rename to short filename for uploading to 123pan
    # rename_2_short_name(
    #     src_path=r'E:\CVPR\CVPR_2024\main_paper',
    #     save_path=r'E:\short_name_cvpr2024',
    #     target_max_length=128
    # )
    # rename_2_short_name_within_group(
    #     src_path=r'E:\CVPR\CVPR_WS_2024\main_paper',
    #     save_path=r'E:\short_name_cvpr2024_ws',
    #     target_max_length=128
    # )
    pass


================================================
FILE: code/paper_downloader_ECCV.py
================================================
"""paper_downloader_ECCV.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.supplement_porcess import move_main_and_supplement_2_one_directory
import lib.springer as springer
from lib import csv_process
from lib.downloader import Downloader
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write ECCV papers' and supplemental material's urls in one csv file
    :param year: int
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'ECCV_{year}.csv')
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_ECCV_{year}.dat')
        if year >= 2018:
            init_url = f'https://www.ecva.net/papers.php'
            if os.path.exists(dat_file_pathname):
                with open(dat_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_url, headers=headers)
                with open(dat_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            soup = BeautifulSoup(content, 'html5lib')
            paper_list_bar = tqdm(soup.find_all(['dt', 'dd']))
            paper_index = 0
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            for paper in paper_list_bar:
                is_new_paper = False

                # get title
                try:
                    if 'dt' == paper.name and \
                            'ptitle' == paper.get('class')[0] and \
                            year == int(paper.a.get('href').split('_')[1][:4]):  # title:
                        # this_year = int(paper.a.get('href').split('_')[1][:4])
                        title = slugify(paper.text.strip())
                        paper_dict['title'] = title
                        paper_index += 1
                        paper_list_bar.set_description_str(
                            f'Downloading paper {paper_index}: {title}')
                    elif '' != paper_dict['title'] and 'dd' == paper.name:
                        all_as = paper.find_all('a')
                        for a in all_as:
                            if 'pdf' == slugify(a.text.strip()):
                                main_link = urllib.parse.urljoin(init_url,
                                                                 a.get('href'))
                                paper_dict['main link'] = main_link
                                is_new_paper = True
                            elif 'supp' in slugify(a.text.strip()):
                                supp_link = urllib.parse.urljoin(init_url,
                                                                 a.get('href'))
                                paper_dict['supplemental link'] = supp_link
                                break
                except:
                    pass
                if is_new_paper:
                    writer.writerow(paper_dict)
                    paper_dict = {'title': '',
                                  'main link': '',
                                  'supplemental link': ''}
        else:
            init_url = f'http://www.eccv{year}.org/main-conference/'
            if os.path.exists(dat_file_pathname):
                with open(dat_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_url, headers=headers)
                with open(dat_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            soup = BeautifulSoup(content, 'html5lib')
            paper_list_bar = tqdm(
                soup.find('div', {'class': 'entry-content'}).find_all(['p']))
            paper_index = 0
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            for paper in paper_list_bar:
                try:
                    if len(paper.find_all(['strong'])) and len(
                            paper.find_all(['a'])) and len(
                            paper.find_all(['img'])):
                        paper_index += 1
                        title = slugify(paper.find('strong').text)
                        paper_dict['title'] = title
                        paper_list_bar.set_description_str(
                            f'Downloading paper {paper_index}: {title}')
                        main_link = paper.find('a').get('href')
                        paper_dict['main link'] = main_link
                        writer.writerow(paper_dict)
                        paper_dict = {'title': '',
                                      'main link': '',
                                      'supplemental link': ''}
                except Exception as e:
                    print(f'ERROR: {str(e)}')
    return paper_index


def download_from_csv(
        year, save_dir, is_download_supplement=True, time_step_in_seconds=5,
        total_paper_number=None,
        is_workshops=False, downloader='IDM'):
    """
    download all ECCV paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement respectively
    :param year: int, ECCV year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds
    :param total_paper_number: int, the total number of papers that is going
        to download
    :param is_workshops: bool, is to download workshops from csv file.
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    postfix = f'ECCV_{year}'
    if is_workshops:
        postfix = f'ECCV_WS_{year}'
    csv_file_name = f'ECCV_{year}.csv' if not is_workshops else \
        f'ECCV_WS_{year}.csv'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_name = os.path.join(project_root_folder, 'csv', csv_file_name)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_name,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


def download_from_springer(
        year, save_dir, is_workshops=False, time_sleep_in_seconds=5,
        downloader='IDM'):
    os.makedirs(save_dir, exist_ok=True)
    if 2018 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-030-01246-5',
                'https://link.springer.com/book/10.1007/978-3-030-01216-8',
                'https://link.springer.com/book/10.1007/978-3-030-01219-9',
                'https://link.springer.com/book/10.1007/978-3-030-01225-0',
                'https://link.springer.com/book/10.1007/978-3-030-01228-1',
                'https://link.springer.com/book/10.1007/978-3-030-01231-1',
                'https://link.springer.com/book/10.1007/978-3-030-01234-2',
                'https://link.springer.com/book/10.1007/978-3-030-01237-3',
                'https://link.springer.com/book/10.1007/978-3-030-01240-3',
                'https://link.springer.com/book/10.1007/978-3-030-01249-6',
                'https://link.springer.com/book/10.1007/978-3-030-01252-6',
                'https://link.springer.com/book/10.1007/978-3-030-01258-8',
                'https://link.springer.com/book/10.1007/978-3-030-01261-8',
                'https://link.springer.com/book/10.1007/978-3-030-01264-9',
                'https://link.springer.com/book/10.1007/978-3-030-01267-0',
                'https://link.springer.com/book/10.1007/978-3-030-01270-0'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-030-11009-3',
                'https://link.springer.com/book/10.1007/978-3-030-11012-3',
                'https://link.springer.com/book/10.1007/978-3-030-11015-4',
                'https://link.springer.com/book/10.1007/978-3-030-11018-5',
                'https://link.springer.com/book/10.1007/978-3-030-11021-5',
                'https://link.springer.com/book/10.1007/978-3-030-11024-6'
            ]
    elif 2016 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007%2F978-3-319-46448-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-46475-6',
                'https://link.springer.com/book/10.1007%2F978-3-319-46487-9',
                'https://link.springer.com/book/10.1007%2F978-3-319-46493-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-46454-1',
                'https://link.springer.com/book/10.1007%2F978-3-319-46466-4',
                'https://link.springer.com/book/10.1007%2F978-3-319-46478-7',
                'https://link.springer.com/book/10.1007%2F978-3-319-46484-8'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007%2F978-3-319-46604-0',
                'https://link.springer.com/book/10.1007%2F978-3-319-48881-3',
                'https://link.springer.com/book/10.1007%2F978-3-319-49409-8'
            ]
    elif 2014 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-319-10590-1',
                'https://link.springer.com/book/10.1007/978-3-319-10605-2',
                'https://link.springer.com/book/10.1007/978-3-319-10578-9',
                'https://link.springer.com/book/10.1007/978-3-319-10593-2',
                'https://link.springer.com/book/10.1007/978-3-319-10602-1',
                'https://link.springer.com/book/10.1007/978-3-319-10599-4',
                'https://link.springer.com/book/10.1007/978-3-319-10584-0'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-319-16178-5',
                'https://link.springer.com/book/10.1007/978-3-319-16181-5',
                'https://link.springer.com/book/10.1007/978-3-319-16199-0',
                'https://link.springer.com/book/10.1007/978-3-319-16220-1'
            ]
    elif 2012 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-33718-5',
                'https://link.springer.com/book/10.1007/978-3-642-33709-3',
                'https://link.springer.com/book/10.1007/978-3-642-33712-3',
                'https://link.springer.com/book/10.1007/978-3-642-33765-9',
                'https://link.springer.com/book/10.1007/978-3-642-33715-4',
                'https://link.springer.com/book/10.1007/978-3-642-33783-3',
                'https://link.springer.com/book/10.1007/978-3-642-33786-4'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-33863-2',
                'https://link.springer.com/book/10.1007/978-3-642-33868-7',
                'https://link.springer.com/book/10.1007/978-3-642-33885-4'
            ]
    elif 2010 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-15549-9',
                'https://link.springer.com/book/10.1007/978-3-642-15552-9',
                'https://link.springer.com/book/10.1007/978-3-642-15558-1',
                'https://link.springer.com/book/10.1007/978-3-642-15561-1',
                'https://link.springer.com/book/10.1007/978-3-642-15555-0',
                'https://link.springer.com/book/10.1007/978-3-642-15567-3'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-642-35749-7',
                'https://link.springer.com/book/10.1007/978-3-642-35740-4'
            ]
    elif 2008 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/978-3-540-88682-2',
                'https://link.springer.com/book/10.1007/978-3-540-88688-4',
                'https://link.springer.com/book/10.1007/978-3-540-88690-7',
                'https://link.springer.com/book/10.1007/978-3-540-88693-8'
            ]
        else:
            urls_list = []
    elif 2006 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/11744023',
                'https://link.springer.com/book/10.1007/11744047',
                'https://link.springer.com/book/10.1007/11744078',
                'https://link.springer.com/book/10.1007/11744085'
            ]
        else:
            urls_list = [
                'https://link.springer.com/book/10.1007/11754336'
            ]
    elif 2004 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/b97865',
                'https://link.springer.com/book/10.1007/b97866',
                'https://link.springer.com/book/10.1007/b97871',
                'https://link.springer.com/book/10.1007/b97873'
            ]
        else:
            urls_list = [

            ]
    elif 2002 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-47969-4',
                'https://link.springer.com/book/10.1007/3-540-47967-8',
                'https://link.springer.com/book/10.1007/3-540-47977-5',
                'https://link.springer.com/book/10.1007/3-540-47979-1'
            ]
        else:
            urls_list = [

            ]
    elif 2000 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-45054-8',
                'https://link.springer.com/book/10.1007/3-540-45053-X'
            ]
        else:
            urls_list = [

            ]
    elif 1998 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0055655',
                'https://link.springer.com/book/10.1007/BFb0054729'
            ]
        else:
            urls_list = [

            ]
    elif 1996 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0015518',
                'https://link.springer.com/book/10.1007/3-540-61123-1'
            ]
        else:
            urls_list = [

            ]
    elif 1994 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-57956-7',
                'https://link.springer.com/book/10.1007/BFb0028329'
            ]
        else:
            urls_list = [

            ]
    elif 1992 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/3-540-55426-2'
            ]
        else:
            urls_list = [

            ]
    elif 1990 == year:
        if not is_workshops:
            urls_list = [
                'https://link.springer.com/book/10.1007/BFb0014843'
            ]
        else:
            urls_list = [

            ]
    else:
        raise ValueError(f'ECCV {year} is current not available!')
    for url in urls_list:
        __download_from_springer(
            url, save_dir, year, is_workshops=is_workshops,
            time_sleep_in_seconds=time_sleep_in_seconds,
            downloader=downloader)


def __download_from_springer(
        url, save_dir, year, is_workshops=False, time_sleep_in_seconds=5,
        downloader='IDM'):
    downloader = Downloader(downloader)
    for i in range(3):
        try:
            papers_dict = springer.get_paper_name_link_from_url(url)
            break
        except Exception as e:
            print(str(e))
    # total_paper_number = len(papers_dict)
    pbar = tqdm(papers_dict.keys())
    postfix = f'ECCV_{year}'
    if is_workshops:
        postfix = f'ECCV_WS_{year}'

    for name in pbar:
        pbar.set_description(f'Downloading paper {name}')
        if not os.path.exists(os.path.join(save_dir, f'{name}_{postfix}.pdf')):
            downloader.download(
                papers_dict[name],
                os.path.join(save_dir, f'{name}_{postfix}.pdf'),
                time_sleep_in_seconds)


if __name__ == '__main__':
    year = 2024
    # total_paper_number = 2387
    total_paper_number = save_csv(year)
    download_from_csv(year,
                      save_dir=fr'Z:\all_papers\ECCV\ECCV_{year}',
                      is_download_supplement=True,
                      time_step_in_seconds=5,
                      total_paper_number=total_paper_number,
                      is_workshops=False)
    # move_main_and_supplement_2_one_directory(
    #     main_path=f'E:\\ECCV_{year}\\main_paper',
    #     supplement_path=f'E:\\ECCV_{year}\\supplement',
    #     supp_pdf_save_path=f'E:\\ECCV_{year}\\main_paper'
    # )
    # for year in range(2018, 2017, -2):
    #     # download_from_springer(
    #     #     save_dir=f'F:\\ECCV_{year}',
    #     #     year=year,
    #     #     is_workshops=False, time_sleep_in_seconds=30)
    #     download_from_springer(
    #         save_dir=f'F:\\ECCV_WS_{year}',
    #         year=year,
    #         is_workshops=True, time_sleep_in_seconds=30)
    # pass


================================================
FILE: code/paper_downloader_ICLR.py
================================================
"""paper_downloader_ICLR.py"""

from tqdm import tqdm
import os
# https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename
from slugify import slugify
from bs4 import BeautifulSoup
import pickle
from urllib.request import urlopen
import urllib
import sys

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.downloader import Downloader
from lib.openreview import download_iclr_papers_given_url_and_group_id
from lib.arxiv import get_pdf_link_from_arxiv


def download_iclr_oral_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr oral papers for year 2017 ~ 2022, 2024~2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2026: "tab-accept-oral",
        2025: "tab-accept-oral",
        2024: "tab-accept-oral",
        2022: "oral-submissions",
        2021: "oral-presentations",
        2020: "oral-presentations",
        2019: "oral-presentations",
        2018: "accepted-oral-papers",
        2017: "oral-presentations",
        2013: "conferenceoral-iclr2013-conference"
    }
    
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
        
    print(f'Downloading ICLR-{year} oral papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year > 2021)
    )


def download_iclr_conditional_oral_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional oral papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-oral"
    }
    no_pages_year = [2025]
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional oral papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_top5_papers(save_dir, year, base_url=None, start_page=1,
                              time_step_in_seconds=10, downloader='IDM',
                              proxy_ip_port=None):
    """
    Download iclr notable-top-5% papers for year 2023.
    :param save_dir: str, paper save path
    :param year: int, iclr year
    :type year: int
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds. Default: 10.
    :type time_step_in_seconds: int
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    if base_url is None:
        if year == 2023:
            base_url = "https://openreview.net/group?id=ICLR.cc/" \
                       "2023/Conference#notable-top-5-"
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} top5 papers...')
    group_id = "notable-top-5-"
    return download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port
    )


def download_iclr_poster_papers(save_dir, year, base_url=None, start_page=1,
                                time_step_in_seconds=10, downloader='IDM',
                                proxy_ip_port=None):
    """
    Download iclr poster papers from year 2013, 2017 ~ 2024.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2026: "tab-accept-poster",
        2025: "tab-accept-poster",
        2024: "tab-accept-poster",
        2023: "poster",
        2022: "poster-submissions",
        2021: "poster-presentations",
        2020: "poster-presentations",
        2019: "poster-presentations",
        2018: "accepted-poster-papers",
        2017: "poster-presentations",
        2013: "conferenceposter-iclr2013-conference"
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} poster papers...')
    no_pages_year = [2013, 2018, 2019, 2020, 2021]
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id_dict[year].replace('tab-', ''),
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year),
        is_need_click_group_button=(year == 2018)
    )


def download_iclr_conditional_poster_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional poster papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-poster"
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional poster papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year > 2021)
    )


def download_iclr_spotlight_papers(save_dir, year, base_url=None,
                                   time_step_in_seconds=10, downloader='IDM',
                                   start_page=1, proxy_ip_port=None):
    """
    Download iclr spotlight papers between year 2020 and 2022, 2024~2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-spotlight",
        2024: "tab-accept-spotlight",
        2022: "spotlight-submissions",
        2021: "spotlight-presentations",
        2020: "spotlight-presentations",
    }
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} spotlight papers...')
    no_pages_year = [2020, 2021]
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id_dict[year].replace('tab-', ''),
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_conditional_spotlight_papers(save_dir, year, base_url=None,
                              time_step_in_seconds=10, downloader='IDM',
                              start_page=1, proxy_ip_port=None):
    """
    Download iclr conditional spotlight papers for year 2025.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    group_id_dict = {
        2025: "tab-accept-conditional-spotlight"
    }
    no_pages_year = [2025]
    if base_url is None:
        if year in group_id_dict:
            base_url = 'https://openreview.net/group?id=ICLR.cc/' \
                f'{year}/Conference#{group_id_dict[year]}'
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} conditional spotlight papers...')
    group_id = group_id_dict[year].replace('tab-', '')
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port,
        is_have_pages=(year not in no_pages_year)
    )


def download_iclr_top25_papers(save_dir, year, base_url=None, start_page=1,
                               time_step_in_seconds=10, downloader='IDM',
                               proxy_ip_port=None):
    """
    Download iclr notable-top-25% papers for year 2023.
    :param save_dir: str, paper save path
    :param year: int, iclr year
    :type year: int
    :param base_url: str, paper website url
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two downlaod
        request in seconds. Default: 10.
    :type time_step_in_seconds: int
    :param downloader: str, the downloader to download, could be 'IDM' or
        None. Default: 'IDM'.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    if base_url is None:
        if year == 2023:
            base_url = "https://openreview.net/group?id=ICLR.cc/" \
                       "2023/Conference#notable-top-25-"
        else:
            raise ValueError('the website url is not given for this year!')
    print(f'Downloading ICLR-{year} top25 papers...')
    group_id = "notable-top-25-"
    download_iclr_papers_given_url_and_group_id(
        save_dir=save_dir,
        year=year,
        base_url=base_url,
        group_id=group_id,
        start_page=start_page,
        time_step_in_seconds=time_step_in_seconds,
        downloader=downloader,
        proxy_ip_port=proxy_ip_port
    )


def download_iclr_paper(save_dir, year, base_url=None,
                        time_step_in_seconds=10, downloader='IDM',
                        start_page=1, proxy_ip_port=None):
    """
    Download iclr papers between year 2013 and 2024.
    :param save_dir: str, paper save path
    :param year: int, iclr year, current only support year >= 2018
    :param base_url: str, paper website url
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds.
    :param downloader: str, the downloader to download, could be 'IDM' or
        None, default to 'IDM'.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Currently, this parameter is only used in year 2024.
        Default: 1.
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

    year_no_group = [2014]
    year_no_group_iclrcc = [2015, 2016]
    year_oral_poster = [2013, 2017, 2018, 2019, 2026]
    year_oral_spotlight_poster = [2020, 2021, 2022, 2024, 2025]
    year_top5_top25_poster = [2023]
    year_oral_spotlight_poster_conditional = [2025]

    # no group, openreview website
    if year in year_no_group:
        if base_url is None:
            if year == 2014:
                base_url = 'https://openreview.net/group?id=ICLR.cc/2014/conference'
            else:
                raise ValueError('the website url is not given for this year!')
        print(f'Downloading ICLR-{year} oral papers...')
        group_id_dict = {
            2014: "submitted-papers"
        }
        group_id = group_id_dict[year]
        no_pages_year = [2014]
        return download_iclr_papers_given_url_and_group_id(
            save_dir=save_dir,
            year=year,
            base_url=base_url,
            group_id=group_id,
            start_page=start_page,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port,
            is_have_pages=(year not in no_pages_year)
        )
    # no group, iclr.cc website
    if year in year_no_group_iclrcc:
        downloader = Downloader(downloader=downloader)
        paper_postfix = f'ICLR_{year}'
        if base_url is None:
            if year == 2016:
                base_url = 'https://iclr.cc/archive/www/doku.php%3Fid=iclr2016:main.html'
            elif year == 2015:
                base_url = 'https://iclr.cc/archive/www/doku.php%3Fid=iclr2015:main.html'
            elif year == 2014:
                base_url = 'https://iclr.cc/archive/2014/conference-proceedings/'
            else:
                raise ValueError('the website url is not given for this year!')
        os.makedirs(save_dir, exist_ok=True)
        if year == 2015:  # oral and poster seperated
            oral_save_path = os.path.join(save_dir, 'oral')
            poster_save_path = os.path.join(save_dir, 'poster')
            workshop_save_path = os.path.join(save_dir, 'ws')
            os.makedirs(oral_save_path, exist_ok=True)
            os.makedirs(poster_save_path, exist_ok=True)
            os.makedirs(workshop_save_path, exist_ok=True)
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_iclr_{year}.dat'
        )
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            req = urllib.request.Request(url=base_url, headers=headers)
            content = urllib.request.urlopen(req).read()
            with open(f'..\\urls\\init_url_iclr_{year}.dat', 'wb') as f:
                pickle.dump(content, f)
        error_log = []
        soup = BeautifulSoup(content, 'html.parser')
        print('open url successfully!')
        if year == 2016:
            papers = soup.find('h3',
                               {
                                   'id': 'accepted_papers_conference_track'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            # workshops
            papers = soup.find('h3',
                               {
                                   'id': 'workshop_track_posters_may_2nd'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://beta.openreview'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir, 'ws', pdf_name)):
                            pdf_link = get_pdf_link_from_openreview(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, 'ws',
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            papers = soup.find('h3',
                               {
                                   'id': 'workshop_track_posters_may_3rd'}).findNext(
                'div').find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://beta.openreview'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(save_dir, 'ws', pdf_name)):
                            pdf_link = get_pdf_link_from_openreview(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, 'ws',
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
        elif year == 2015:
            # oral papers
            oral_papers = soup.find('h3', {
                'id': 'conference_oral_presentations'}).findNext(
                'div').find_all(
                'a')
            for paper in tqdm(oral_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(oral_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(oral_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

            # workshops papers
            workshop_papers = soup.find('h3', {
                'id': 'may_7_workshop_poster_session'}).findNext(
                'div').find_all(
                'a')
            workshop_papers.append(
                soup.find('h3',
                          {'id': 'may_8_workshop_poster_session'}).findNext(
                    'div').find_all('a'))
            for paper in tqdm(workshop_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_ICLR_WS_{year}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(workshop_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(workshop_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
            # poster papers
            poster_papers = soup.find('h3', {
                'id': 'may_9_conference_poster_session'}).findNext(
                'div').find_all(
                'a')
            for paper in tqdm(poster_papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(poster_save_path,
                                             title + f'_{paper_postfix}.pdf')):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(poster_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))
        elif year == 2014:
            papers = soup.find('div',
                               {'id': 'sites-canvas-main-content'}).find_all(
                'a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(os.path.join(save_dir, pdf_name)):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(save_dir, pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

            # workshops
            paper_postfix = f'ICLR_WS_{year}'
            base_url = 'https://sites.google.com/site/representationlearning2014/' \
                       'workshop-proceedings'
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            req = urllib.request.Request(url=base_url, headers=headers)
            content = urllib.request.urlopen(req).read()
            soup = BeautifulSoup(content, 'html.parser')
            workshop_save_path = os.path.join(save_dir, 'WS')
            os.makedirs(workshop_save_path, exist_ok=True)
            papers = soup.find(
                'div', {'id': 'sites-canvas-main-content'}).find_all('a')
            for paper in tqdm(papers):
                link = paper.get('href')
                if link.startswith('http://arxiv'):
                    title = slugify(paper.text)
                    pdf_name = f'{title}_{paper_postfix}.pdf'
                    try:
                        if not os.path.exists(
                                os.path.join(workshop_save_path, pdf_name)):
                            pdf_link = get_pdf_link_from_arxiv(link)
                            print(f'downloading {title}')
                            downloader.download(
                                urls=pdf_link,
                                save_path=os.path.join(workshop_save_path,
                                                       pdf_name),
                                time_sleep_in_seconds=time_step_in_seconds)
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append(
                            (title, link, 'paper download error', str(e)))

        # write error log
        print('write error log')
        log_file_pathname = os.path.join(
            project_root_folder, 'log', 'download_err_log.txt')
        with open(log_file_pathname, 'w') as f:
            for log in tqdm(error_log):
                for e in log:
                    if e is not None:
                        f.write(e)
                    else:
                        f.write('None')
                    f.write('\n')

                f.write('\n')
        return True

    # oral openreview
    if year in (year_oral_poster + year_oral_spotlight_poster):
        save_dir_oral = os.path.join(save_dir, 'oral')
        download_iclr_oral_papers(
            save_dir_oral,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )
    
    # conditional oral openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_oral = os.path.join(save_dir, 'conditional-oral')
        download_iclr_conditional_oral_papers(
            save_dir_cond_oral,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # poster openreview
    if year in (year_oral_poster + year_oral_spotlight_poster +
                year_top5_top25_poster):
        save_dir_poster = os.path.join(save_dir, 'poster')
        download_iclr_poster_papers(
            save_dir_poster,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )
    
    # conditional poster openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_poster = os.path.join(save_dir, 'conditional-poster')
        download_iclr_conditional_poster_papers(
            save_dir_cond_poster,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # spotlight openreview
    if year in year_oral_spotlight_poster:
        save_dir_spotlight = os.path.join(save_dir, 'spotlight')
        download_iclr_spotlight_papers(
            save_dir_spotlight,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # conditional spotlight openreview
    if year in (year_oral_spotlight_poster_conditional):
        save_dir_cond_spotlight = os.path.join(save_dir, 'conditional-spotlight')
        download_iclr_conditional_spotlight_papers(
            save_dir_cond_spotlight,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # top5 openreview
    if year in year_top5_top25_poster:
        save_dir_top5 = os.path.join(save_dir, 'top5')
        download_iclr_top5_papers(
            save_dir_top5,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )

    # top25 openreview
    if year in year_top5_top25_poster:
        save_dir_top25 = os.path.join(save_dir, 'top25')
        download_iclr_top25_papers(
            save_dir_top25,
            year,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            start_page=start_page,
            proxy_ip_port=proxy_ip_port
        )


def get_pdf_link_from_openreview(abs_link):
    return abs_link.replace('beta.', '').replace('forum', 'pdf')


if __name__ == '__main__':
    year = 2025
    save_dir_iclr = rf'E:\ICLR_{year}'
    # save_dir_iclr_oral = os.path.join(save_dir_iclr, 'oral')
    # save_dir_iclr_top5 = os.path.join(save_dir_iclr, 'top5')
    # save_dir_iclr_spotlight = os.path.join(save_dir_iclr, 'spotlight')
    # save_dir_iclr_top25 = os.path.join(save_dir_iclr, 'top25')
    # save_dir_iclr_poster = os.path.join(save_dir_iclr, 'poster')
    proxy_ip_port = None
    # proxy_ip_port = "http://127.0.0.1:7890"
    # download_iclr_oral_papers(save_dir_iclr_oral, year,
    #                           time_step_in_seconds=5)
    # download_iclr_top5_papers(save_dir_iclr_top5, year, start_page=1,
    #                           time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    # download_iclr_top25_papers(save_dir_iclr_top25, year, start_page=1,
    #                           time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    # download_iclr_spotlight_papers(save_dir_iclr_spotlight, year,
    #                                time_step_in_seconds=5)
    # download_iclr_poster_papers(save_dir_iclr_poster, year, start_page=1,
    #                             time_step_in_seconds=5,
    #                           proxy_ip_port=proxy_ip_port)
    download_iclr_paper(save_dir_iclr, year, time_step_in_seconds=5,
                        proxy_ip_port=proxy_ip_port)


================================================
FILE: code/paper_downloader_ICML.py
================================================
"""paper_downloader_ICML.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.downloader import Downloader
import lib.pmlr as pmlr
from lib.supplement_porcess import merge_main_supplement
from lib.openreview import download_icml_papers_given_url_and_group_id
from lib.my_request import urlopen_with_retry


def download_paper(year, save_dir, is_download_supplement=True,
                   time_step_in_seconds=5, downloader='IDM', source='pmlr',
                   proxy_ip_port=None):
    """
    download all ICML paper and supplement files given year, restore in
        save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, ICML year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param source: str, source website, 'pmlr' or 'openreview'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Default: None.
    :type proxy_ip_port: str | None
    :return: True
    """
    assert source in ['pmlr', 'openreview'], \
        f'only support source pmlr or openreview, but get {source}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    downloader = Downloader(downloader=downloader, proxy_ip_port=proxy_ip_port)
    ICML_year_dict = {
        2024: 235,
        2023: 202,
        2022: 162,
        2021: 139,
        2020: 119,
        2019: 97,
        2018: 80,
        2017: 70,
        2016: 48,
        2015: 37,
        2014: 32,
        2013: 28
    }
    if source == 'openreview':
        init_url = f'https://openreview.net/group?id=ICML.cc/{year}/Conference'
    else:  # pmlr
        if year >= 2013:
            init_url = f'http://proceedings.mlr.press/v{ICML_year_dict[year]}/'
        elif year == 2012:
            init_url = 'https://icml.cc/2012/papers.1.html'
        elif year == 2011:
            init_url = 'http://www.icml-2011.org/papers.php'
        elif 2009 == year:
            init_url = 'https://icml.cc/Conferences/2009/abstracts.html'
        elif 2008 == year:
            init_url = 'http://www.machinelearning.org/archive/icml2008/' \
                       'abstracts.shtml'
        elif 2007 == year:
            init_url = 'https://icml.cc/Conferences/2007/paperlist.html'
        elif year in [2006, 2004, 2005]:
            init_url = f'https://icml.cc/Conferences/{year}/proceedings.html'
        elif 2003 == year:
            init_url = 'https://aaai.org/Library/ICML/icml03contents.php'
        else:
            raise ValueError('''the given year's url is unknown !''')

    postfix = f'ICML_{year}'
    if source == 'openreview':  # download from openreview website:
        # oral paper
        group_id = 'oral'
        save_dir_oral = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_oral, exist_ok=True)
        download_icml_papers_given_url_and_group_id(
            save_dir=save_dir_oral,
            year=year,
            base_url=init_url,
            group_id=group_id,
            start_page=1,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader,
            proxy_ip_port=proxy_ip_port
        )
        # poster paper
        group_id = 'poster'
        save_dir_poster = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_poster, exist_ok=True)
        download_icml_papers_given_url_and_group_id(
            save_dir=os.path.join(save_dir, 'poster'),
            year=year,
            base_url=init_url,
            group_id=group_id,
            start_page=1,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader,
            proxy_ip_port=proxy_ip_port
        )
        # spotlight paper
        group_id = 'spotlight'
        save_dir_poster = os.path.join(save_dir, group_id)
        os.makedirs(save_dir_poster, exist_ok=True)
        try:
            download_icml_papers_given_url_and_group_id(
                save_dir=os.path.join(save_dir, 'spotlight'),
                year=year,
                base_url=init_url,
                group_id=group_id,
                start_page=1,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader.downloader,
                proxy_ip_port=proxy_ip_port
            )
        except ValueError as e:  # no spotlight paper
            print(f"WARNING: {str(e)}")
        return

    dat_file_pathname = os.path.join(
        project_root_folder, 'urls', f'init_url_icml_{year}.dat')
    if os.path.exists(dat_file_pathname):
        with open(dat_file_pathname, 'rb') as f:
            content = pickle.load(f)
    else:
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}
        content = urlopen_with_retry(url=init_url, headers=headers)
        # content = open(f'..\\ICML_{year}.html', 'rb').read()
        with open(dat_file_pathname, 'wb') as f:
            pickle.dump(content, f)
    # soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(content, 'html5lib')
    # soup = BeautifulSoup(open(r'..\ICML_2011.html', 'rb'), 'html.parser')
    error_log = []
    if year >= 2013:
        if year in ICML_year_dict.keys():
            volume = f'v{ICML_year_dict[year]}'
        else:
            raise ValueError('''the given year's url is unknown !''')

        pmlr.download_paper_given_volume(
            volume=volume,
            save_dir=save_dir,
            postfix=postfix,
            is_download_supplement=is_download_supplement,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader.downloader
        )
    elif 2012 == year:  # 2012
        # base_url = f'https://icml.cc/{year}/'
        paper_list_bar = tqdm(soup.find_all('div', {'class': 'paper'}))
        paper_index = 0
        for paper in paper_list_bar:
            paper_index += 1
            title = ''
            title = slugify(paper.find('h2').text)
            link = None
            for a in paper.find_all('a'):
                if 'ICML version (pdf)' == a.text:
                    link = urllib.parse.urljoin(init_url, a.get('href'))
                    break
            if link is not None:
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path) :
                    paper_list_bar.set_description(
                        f'downloading paper {paper_index}:{title}')
                    downloader.download(
                        urls=link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
            else:
                error_log.append((title, 'no main link error'))
    elif 2011 == year:
        paper_list_bar = tqdm(soup.find_all('a'))
        paper_index = 0
        for paper in paper_list_bar:
            h3 = paper.find('h3')
            if h3 is not None:
                title = slugify(h3.text)
                paper_index += 1
            if 'download' == slugify(paper.text.strip()):
                link = paper.get('href')
                link = urllib.parse.urljoin(init_url, link)
                if link is not None:
                    this_paper_main_path = os.path.join(
                        save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                    paper_list_bar.set_description(
                        f'find paper {paper_index}:{title}')
                    if not os.path.exists(this_paper_main_path) :
                        paper_list_bar.set_description(
                            f'downloading paper {paper_index}:{title}')
                        downloader.download(
                            urls=link,
                            save_path=this_paper_main_path,
                            time_sleep_in_seconds=time_step_in_seconds
                        )
                else:
                    error_log.append((title, 'no main link error'))
    elif year in [2009, 2008]:
        if 2009 == year:
            paper_list_bar = tqdm(
                soup.find('div', {'id': 'right_column'}).find_all(['h3','a']))
        elif 2008 == year:
            paper_list_bar = tqdm(
                soup.find('div', {'class': 'content'}).find_all(['h3','a']))
        paper_index = 0
        title = None
        for paper in paper_list_bar:
            if 'h3' == paper.name:
                title = slugify(paper.text)
                paper_index += 1
            elif 'full-paper' == slugify(paper.text.strip()):  # a
                link = paper.get('href')
                if link is not None and title is not None:
                    link = urllib.parse.urljoin(init_url, link)
                    this_paper_main_path = os.path.join(
                        save_dir, f'{title}_{postfix}.pdf')
                    paper_list_bar.set_description(
                        f'find paper {paper_index}:{title}')
                    if not os.path.exists(this_paper_main_path):
                        paper_list_bar.set_description(
                            f'downloading paper {paper_index}:{title}')
                        downloader.download(
                            urls=link,
                            save_path=this_paper_main_path,
                            time_sleep_in_seconds=time_step_in_seconds
                        )
                    title = None
                else:
                    error_log.append((title, 'no main link error'))
    elif year in [2006, 2005]:
        paper_list_bar = tqdm(soup.find_all('a'))
        paper_index = 0
        for paper in paper_list_bar:
            title = slugify(paper.text.strip())
            link = paper.get('href')
            paper_index += 1
            if link is not None and title is not None and \
                    ('pdf' == link[-3:] or 'ps' == link[-2:]):
                link = urllib.parse.urljoin(init_url, link)
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path):
                    paper_list_bar.set_description(
                        f'downloading paper {paper_index}:{title}')
                    downloader.download(
                        urls=link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
    elif 2004 == year:
        paper_index = 0
        paper_list_bar = tqdm(
            soup.find('table', {'class': 'proceedings'}).find_all('tr'))
        title = None
        for paper in paper_list_bar:
            tr_class = None
            try:
                tr_class = paper.get('class')[0]
            except:
                pass
            if 'proc_2004_title' == tr_class:  # title
                title = slugify(paper.text.strip())
                paper_index += 1
            else:
                for a in paper.find_all('a'):
                    if '[Paper]' == a.text:
                        link = a.get('href')
                        if link is not None and title is not None:
                            link = urllib.parse.urljoin(init_url, link)
                            this_paper_main_path = os.path.join(
                                save_dir,
                                f'{title}_{postfix}.pdf'.replace(' ', '_'))
                            paper_list_bar.set_description(
                                f'find paper {paper_index}:{title}')
                            if not os.path.exists(this_paper_main_path):
                                paper_list_bar.set_description(
                                    f'downloading paper {paper_index}:{title}')
                                downloader.download(
                                    urls=link,
                                    save_path=this_paper_main_path,
                                    time_sleep_in_seconds=time_step_in_seconds
                                )
                        break
    elif 2003 == year:
        paper_index = 0
        paper_list_bar = tqdm(
            soup.find('div', {'id': 'content'}).find_all(
                'p', {'class': 'left'}))
        for paper in paper_list_bar:
            abs_link = None
            title = None
            link = None
            for a in paper.find_all('a'):
                abs_link = urllib.parse.urljoin(init_url, a.get('href'))
                if abs_link is not None:
                    title = slugify(a.text.strip())
                    break
            if title is not None:
                paper_index += 1
                this_paper_main_path = os.path.join(
                    save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
                paper_list_bar.set_description(
                    f'find paper {paper_index}:{title}')
                if not os.path.exists(this_paper_main_path):
                    if abs_link is not None:
                        headers = {'User-Agent':
                                       'Mozilla/5.0 (Windows NT 6.1; WOW64; '
                                       'rv:23.0) Gecko/20100101 Firefox/23.0'}
                        abs_content = urlopen_with_retry(
                            url=abs_link, headers=headers,
                            raise_error_if_failed=False)
                        if abs_content is None:
                            print('error'+title)
                            error_log.append(
                                (title, abs_link, 'download error'))
                            continue
                        abs_soup = BeautifulSoup(abs_content, 'html5lib')
                        for a in abs_soup.find_all('a'):
                            try:
                                if 'pdf' == a.get('href')[-3:]:
                                    link = urllib.parse.urljoin(
                                        abs_link, a.get('href'))
                                    if link is not None:
                                        paper_list_bar.set_description(
                                            f'downloading paper {paper_index}:'
                                            f'{title}')
                                        downloader.download(
                                            urls=link,
                                            save_path=this_paper_main_path,
                                            time_sleep_in_seconds=time_step_in_seconds
                                        )
                                    break
                            except:
                                pass

    # write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt')
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')


def rename_downloaded_paper(year, source_path):
    """
    rename the downloaded ICML paper to {title}_ICML_2010.pdf and save to
    source_path
    :param year: int, year
    :param source_path: str, whose structure should be
        source_path/papers/pdf files (2010)
                   /index.html       (2010)
        source_path/icml2007_proc.html (2007)
   :return:
    """
    if not os.path.exists(source_path):
        raise ValueError(f'can not find {source_path}')
    postfix = f'ICML_{year}'
    if 2010 == year:
        soup = BeautifulSoup(
            open(os.path.join(source_path, 'index.html'), 'rb'), 'html5lib')
        paper_list_bar = tqdm(soup.find_all('span', {'class': 'boxpopup3'}))

        for paper in paper_list_bar:
            a = paper.find('a')
            title = slugify(a.text)
            ori_name = os.path.join(
                source_path, 'papers', a.get('href').split('/')[-1])
            os.rename(ori_name, os.path.join(
                source_path, f'{title}_{postfix}.pdf'))
            paper_list_bar.set_description(f'processing {title}')
    elif 2007 == year:
        soup = BeautifulSoup(open(os.path.join(
            source_path, 'icml2007_proc.html'), 'rb'), 'html5lib')
        paper_list_bar = tqdm(soup.find_all('td', {'colspan': '2'}))
        for paper in paper_list_bar:
            all_as = paper.find_all('a')
            if len(all_as) <= 1:
                title = slugify(paper.text.strip())
            else:
                for a in all_as:
                    if '[Paper]' == a.text:
                        sub_path = a.get('href')
                        os.rename(os.path.join(source_path, sub_path),
                                  os.path.join(
                                      source_path, f'{title}_{postfix}.pdf'))
                        paper_list_bar.set_description_str(
                            (f'processing {title}'))
                        break


if __name__ == '__main__':
    year = 2025
    download_paper(
        year,
        rf'E:\ICML_{year}',
        is_download_supplement=True,
        time_step_in_seconds=10,
        downloader='IDM',
        source='openreview'
    ) 
    # merge_main_supplement(main_path=f'..\\ICML_{year}\\main_paper',
    #                       supplement_path=f'..\\ICML_{year}\\supplement',
    #                       save_path=f'..\\ICML_{year}',
    #                       is_delete_ori_files=False)
    # rename_downloaded_paper(year, f'..\\ICML_{year}')
    pass


================================================
FILE: code/paper_downloader_IJCAI.py
================================================
"""paper_downloader_IJCAI.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write IJCAI papers' urls in one csv file
    :param year: int, IJCAI year, such 2019
    :return: peper_index: int, the total number of papers
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'IJCAI_{year}.csv'
    )
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'group']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        if year >= 2003:
            init_urls = [f'https://www.ijcai.org/proceedings/{year}/']
        elif year >= 1977:
            init_urls = [f'https://www.ijcai.org/Proceedings/{year}-1/',
                         f'https://www.ijcai.org/Proceedings/{year}-2/']
        elif year >= 1969:
            init_urls = [f'https://www.ijcai.org/Proceedings/{year}/']
        else:
            raise ValueError('invalid year!')
        error_log = []
        user_agents = [
            'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) '
            'Gecko/20071127 Firefox/2.0.0.11',

            'Opera/9.25 (Windows NT 5.1; U; en)',

            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; '
            '.NET CLR 1.1.4322; .NET CLR 2.0.50727)',

            'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) '
            'KHTML/3.5.5 (like Gecko) (Kubuntu)',

            'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) '
            'Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',

            'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',

            "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 "
            "(KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 "
            "Chrome/16.0.912.77 Safari/535.7",

            "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) "
            "Gecko/20100101 Firefox/10.0 ",

            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
            'AppleWebKit/537.36 (KHTML, like Gecko) '
            'Chrome/105.0.0.0 Safari/537.36'

        ]
        headers = {
            'User-Agent': user_agents[-1],
            'Host': 'www.ijcai.org',
            'Referer': "https://www.ijcai.org",
            'GET': init_urls[0]
        }
        if len(init_urls) == 1:
            data_file_pathname = os.path.join(
                project_root_folder, 'urls', f'init_url_IJCAI_{year}.dat'
            )
            if os.path.exists(data_file_pathname):
                with open(data_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_urls[0], headers=headers)
                with open(data_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            contents = [content]
        else:
            contents = []
            data_file_pathname = os.path.join(
                project_root_folder, 'urls', f'init_url_IJCAI_0_{year}.dat'
            )
            if os.path.exists(data_file_pathname):
                with open(data_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_urls[0], headers=headers)
                with open(data_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            contents.append(content)
            data_file_pathname = os.path.join(
                project_root_folder, 'urls', f'init_url_IJCAI_1_{year}.dat'
            )
            if os.path.exists(data_file_pathname):
                with open(data_file_pathname, 'rb') as f:
                    content = pickle.load(f)
            else:
                content = urlopen_with_retry(url=init_urls[1], headers=headers)
                with open(data_file_pathname, 'wb') as f:
                    pickle.dump(content, f)
            contents.append(content)
        paper_index = 0
        for content in contents:
            soup = BeautifulSoup(content, 'html5lib')
            if year >= 2017:
                pbar = tqdm(soup.find_all('div', {'class': 'section_title'}))
                for section in pbar:
                    this_group = slugify(section.text)
                    papers = section.parent.find_all(
                        'div', {'class': ['paper_wrapper', 'subsection_title']})
                    sub_group = ''
                    for paper in papers:
                        if 'subsection_title' == paper.get('class')[0]:
                            sub_group = slugify(paper.text)
                            continue
                        paper_index += 1
                        is_get_link = False
                        title = slugify(
                            paper.find('div', {'class': 'title'}).text)
                        pbar.set_description(
                            f'downloading paper {paper_index}: {title}')
                        for a in paper.find(
                                'div', {'class': 'details'}).find_all('a'):
                            if 'PDF' == a.text:
                                link = urllib.parse.urljoin(
                                    init_urls[0], a.get('href'))
                                is_get_link = True
                                break
                        if is_get_link:
                            paper_dict = {'title': title,
                                          'main link': link,
                                          'group': this_group + '--' +
                                                   sub_group if
                                          sub_group != '' else this_group}
                        else:
                            paper_dict = {'title': title,
                                          'main link': 'error',
                                          'group': this_group + '--' +
                                                   sub_group if
                                          sub_group != '' else this_group}
                            print(f'get link for {title}_{year} failed!')
                            error_log.apend(title, 'no link')
                        writer.writerow(paper_dict)
            elif year in [2016]:  # no group
                papers_bar = tqdm(soup.find_all('p'))
                for paper in papers_bar:
                    all_as = paper.find_all('a')
                    if len(all_as) >= 2:  # paper pdf and abstract
                        paper_index += 1
                        title = slugify(paper.text.split('\n')[0])
                        papers_bar.set_description(
                            f'downloading paper {paper_index}: {title}')
                        is_get_link = False
                        for a in all_as:
                            if 'PDF' == a.text:
                                link = 'https://www.ijcai.org' + a.get('href')
                                is_get_link = True
                                break
                        if is_get_link:
                            paper_dict = {'title': title,
                                          'main link': link,
                                          'group': ''}
                        else:
                            paper_dict = {'title': title,
                                          'main link': 'error',
                                          'group': ''}
                            print(f'get link for {title}_{year} failed!')
                            error_log.apend(title, 'no link')
                        writer.writerow(paper_dict)
            elif year in [2015]:  # p group 'PDF'
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['h2', 'p', 'h3']))
                is_start = False
                this_group = ''
                for paper in papers_bar:
                    if not is_start:
                        if 'h2' == paper.name:  # find 'content'
                            if 'Contents' == paper.text:
                                is_start = True
                    else:
                        if 'h3' == paper.name: # group
                            this_group = slugify(paper.text)
                        elif 'p' == paper.name:  # paper
                            all_as = paper.find_all('a')
                            if len(all_as) >= 2:  # paper pdf and abstract
                                paper_index += 1
                                title = slugify(paper.text.split('\n')[0])
                                papers_bar.set_description(
                                    f'downloading paper {paper_index}: {title}')
                                is_get_link = False
                                for a in all_as:
                                    if 'PDF' == a.text:
                                        link = 'https://www.ijcai.org' + \
                                               a.get('href')
                                        is_get_link = True
                                        break
                                if is_get_link:
                                    paper_dict = {'title': title,
                                                  'main link': link,
                                                  'group': this_group}
                                else:
                                    paper_dict = {'title': title,
                                                  'main link': 'error',
                                                  'group': this_group}
                                    print(f'get link for {title}_{year} failed!')
                                    error_log.apend(title, 'no link')
                                writer.writerow(paper_dict)
            elif year in [2013, 2011, 2009, 2007]:  # p group
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['h2', 'p', 'h3', 'h4']))
                # papers_bar = div_content.find_all(['h2', 'p', 'h3', 'h4'])
                is_start = False
                this_group = ''
                this_group_v3 = ''
                this_group_v4 = ''
                for paper in papers_bar:
                    if not is_start:
                        if 'h2' == paper.name:  # find 'content'
                            if 'Contents' == paper.text or \
                                    'IJCAI-09 Contents' == paper.text or \
                                    'IJCAI-07 Contents' == paper.text:
                                is_start = True
                    else:
                        if 'h3' == paper.name: # group
                            this_group_v3 = slugify(paper.text)
                            this_group = this_group_v3
                        elif 'h4' == paper.name: # group
                            this_group_v4 = slugify(paper.text)
                            this_group = this_group_v3 + '--' + this_group_v4
                        elif 'p' == paper.name:  # paper
                            try:
                                all_as = paper.find_all('a')
                            except:
                                continue
                            if len(all_as) >= 1:  # paper
                                paper_index += 1
                                is_get_link = False
                                for a in all_as:
                                    if 'abstract' != slugify(a.text.strip()):
                                        title = slugify(a.text)
                                        link = a.get('href')
                                        is_get_link = True
                                        papers_bar.set_description(
                                            f'downloading paper {paper_index}: '
                                            f'{title}')
                                        break
                                if is_get_link:
                                    paper_dict = {'title': title,
                                                  'main link': link,
                                                  'group': this_group}
                                else:
                                    paper_dict = {'title': title,
                                                  'main link': 'error',
                                                  'group': this_group}
                                    print(f'get link for {title}_{year} failed!')
                                    error_log.append((title, 'no link'))
                                # papers_bar.set_description(f'downloading
                                # paper {paper_index}: {title}')
                                writer.writerow(paper_dict)
            elif year in [2005]:
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        paper_class = paper.get('class')[0]
                    except:
                        continue
                    if 'docsection' == paper_class:  # group
                        this_group = slugify(paper.text)
                    elif 'doctitle' == paper_class:  # paper
                        paper_index += 1
                        title = slugify(paper.a.text)
                        link = paper.a.get('href')
                        papers_bar.set_description(
                            f'downloading paper {paper_index}: {title}')
                        paper_dict = {'title': title,
                                      'main link': link,
                                      'group': this_group}
                        writer.writerow(paper_dict)
            elif year in [2003]:
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                base_url = 'https://www.ijcai.org'
                for paper in papers_bar:
                    try:
                        this_group = slugify(paper.b.text)
                    except:
                        pass
                    try:
                        title = slugify(paper.a.text)
                        link = base_url + paper.a.get('href')
                        paper_index += 1
                        papers_bar.set_description(
                            f'downloading paper {paper_index}: {title}')
                        paper_dict = {'title': title,
                                      'main link': link,
                                      'group': this_group}
                        writer.writerow(paper_dict)
                    except:
                        continue
            elif year in [2001]:
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        title = slugify(paper.a.text)
                        link = paper.a.get('href')
                        paper_index += 1
                        papers_bar.set_description(
                            f'downloading paper {paper_index}: {title}')
                        paper_dict = {'title': title,
                                      'main link': link,
                                      'group': this_group}
                        writer.writerow(paper_dict)
                    except:
                        continue
            elif year in [1999, 1997, 1995, 1993, 1991, 1989, 1987, 1981, 1979,
                          1977, 1969]:  # goup in capital in p.b.text
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        if paper.b.text.isupper():
                            # print(paper.b.text)
                            this_group = slugify(paper.b.text)
                    except:
                        pass
                    try:
                        for a in paper.find_all('a'):
                            title = slugify(a.text.strip())
                            link = a.get('href')
                            if link[-3:] == 'pdf' and '' != title:
                                paper_index += 1
                                papers_bar.set_description(
                                    f'downloading paper {paper_index}: {title}')
                                paper_dict = {'title': title,
                                              'main link': link,
                                              'group': this_group}
                                writer.writerow(paper_dict)
                                break
                            else:
                                continue

                    except:
                        continue
            elif year in [1985, 1975, 1971]:  # no group, paper in 'p'
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        for a in paper.find_all('a'):
                            title = slugify(a.text.strip())
                            link = a.get('href')
                            if link[-3:] == 'pdf' and '' != title:
                                paper_index += 1
                                papers_bar.set_description(
                                    f'downloading paper {paper_index}: {title}')
                                paper_dict = {'title': title,
                                              'main link': link,
                                              'group': this_group}
                                writer.writerow(paper_dict)
                                break
                            else:
                                continue

                    except:
                        continue
            elif year in [1983]:  # goup in capital p.text
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        if paper.text.isupper():
                            this_group = slugify(paper.text)
                    except:
                        pass
                    try:
                        for a in paper.find_all('a'):
                            title = slugify(a.text.strip())
                            link = a.get('href')
                            if link[-3:] == 'pdf' and '' != title:
                                paper_index += 1
                                papers_bar.set_description(
                                    f'downloading paper {paper_index}: {title}')
                                paper_dict = {'title': title,
                                              'main link': link,
                                              'group': this_group}
                                writer.writerow(paper_dict)
                                break
                            else:
                                continue

                    except:
                        continue
            elif year in [1973]:  # goup in p.b
                div_content = soup.find('div', {'id': 'content'})
                papers_bar = tqdm(div_content.find_all(['p']))
                this_group = ''
                for paper in papers_bar:
                    try:
                        if '' != paper.b.text.strip():
                            this_group = slugify(paper.b.text.strip())
                    except:
                        pass
                    try:
                        for a in paper.find_all('a'):
                            title = slugify(a.text.strip())
                            link = a.get('href')
                            if link[-3:] == 'pdf' and '' != title:
                                paper_index += 1
                                papers_bar.set_description(
                                    f'downloading paper {paper_index}: {title}')
                                paper_dict = {'title': title,
                                              'main link': link,
                                              'group': this_group}
                                writer.writerow(paper_dict)
                                break
                            else:
                                continue

                    except:
                        continue
        #  write error log
        print('write error log')
        log_file_pathname = os.path.join(
            project_root_folder, 'log', 'download_err_log.txt')
        with open(log_file_pathname, 'w') as f:
            for log in tqdm(error_log):
                for e in log:
                    if e is not None:
                        f.write(e)
                    else:
                        f.write('None')
                    f.write('\n')

                f.write('\n')

    return paper_index if paper_index is not None else None


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None, downloader='IDM'):
    """
    download all IJCAI paper given year
    :param year: int, IJCAI year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two downlaod request in seconds
    :param total_paper_number: int, the total number of papers that is going to download
    :param downloader: str, the downloader to download, could be 'IDM' or 'Thunder', default to 'IDM'
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    postfix = f'IJCAI_{year}'
    csv_filename = f'IJCAI_{year}.csv'
    csv_filename = os.path.join(project_root_folder, 'csv', csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_filename,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


if __name__ == '__main__':
    # for year in  range(1993, 1968, -2):
    #     print(year)
    #     # save_csv(year)
    #     # time.sleep(2)
    #     download_from_csv(year, save_dir=f'..\\IJCAI_{year}',
    #     time_step_in_seconds=1)
    year = 2024
    # total_paper_number = 723
    total_paper_number = save_csv(year)
    download_from_csv(
        year,
        save_dir=fr'E:\IJCAI_{year}',
        time_step_in_seconds=5,
        total_paper_number=total_paper_number,
        downloader=None)

    pass


================================================
FILE: code/paper_downloader_JMLR.py
================================================
"""paper_downloader_JMLR.py"""

import urllib
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import time
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.downloader import Downloader
from lib.my_request import urlopen_with_retry


def download_paper(
        volumn, save_dir, time_step_in_seconds=5, downloader='IDM', url=None,
        is_use_url=False, refresh_paper_list=True):
    """
    download all JMLR paper files given volumn and restore in save_dir
    respectively
    :param volumn: int, JMLR volumn, such as 2019
    :param save_dir: str, paper and supplement material's saving path
    :param time_step_in_seconds: int, the interval time between two downlaod request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or 'Thunder', default to 'IDM'
    :param url: None or str, None means to download volumn papers.
    :param is_use_url: bool, if to download papers from 'url'. url couldn't be None when is_use_url is True.
    :param refresh_paper_list: bool, if to refresh the saved paper list, default
        true, which means the "dat" file that contains the papers' information
        will be re-downloaded.
    :return: True
    """
    downloader = Downloader(downloader=downloader)
    # create current dict
    title_list = []
    # paper_dict = dict()
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    if not is_use_url:
        init_url = f'http://jmlr.org/papers/v{volumn}/'
        postfix = f'JMLR_v{volumn}'
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_JMLR_v{volumn}.dat')
        if not refresh_paper_list and \
                os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            print('collecting papers from website...')
            content = urlopen_with_retry(url=init_url, headers=headers)
            # content = open(f'..\\JMLR_{volumn}.html', 'rb').read()
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
    elif url is not None:
        content = urlopen_with_retry(url=url, headers=headers)
        postfix = f'JMLR'
    else:
        raise ValueError(''''url' could not be None when 'is_use_url'=True!!!''')
    # soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(content, 'html5lib')
    # soup = BeautifulSoup(open(r'..\JMLR_2011.html', 'rb'), 'html.parser')
    error_log = []
    os.makedirs(save_dir, exist_ok=True)

    if (not is_use_url) and volumn <= 4:
        paper_list = soup.find('div', {'id': 'content'}).find_all('tr')
    else:
        paper_list = soup.find('div', {'id': 'content'}).find_all('dl')
    # num_download = 5 # number of papers to download
    num_download = len(paper_list)
    print(f'total papers counting: {num_download}, start downloading...')
    for paper in tqdm(zip(paper_list, range(num_download))):
        # get title
        this_paper = paper[0]
        title = slugify(this_paper.find('dt').text)
        title_list.append(title)

        this_paper_main_path = os.path.join(save_dir, f'{title}_{postfix}.pdf'.replace(' ', '_'))
        if os.path.exists(this_paper_main_path):
            continue

        # get abstract page url
        links = this_paper.find_all('a')
        main_link = None
        for link in links:
            if '[pdf]' == link.text or 'pdf' == link.text:
                main_link = urllib.parse.urljoin('http://jmlr.org', link.get('href'))
                break

        # try 1 time
        # error_flag = False
        for d_iter in range(1):
            try:
                # download paper with IDM
                if not os.path.exists(this_paper_main_path) and main_link is not None:
                    try:
                        print('Downloading paper {}/{}: {}'.format(paper[1] + 1, num_download, title))
                    except:
                        print(title.encode('utf8'))
                    downloader.download(
                        urls=main_link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
            except Exception as e:
                # error_flag = True
                print('Error: ' + title + ' - ' + str(e))
                error_log.append((title, main_link, 'main paper download error', str(e)))

    # store the results
    # 1. store in the pickle file
    # with open(f'{postfix}_pre.dat', 'wb') as f:
    #     pickle.dump(paper_dict, f)

    # 2. write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt')
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')


def download_special_topics_and_issues_paper(save_dir, time_step_in_seconds=5, downloader='IDM'):
    """
    download all JMLR special topics and issues paper files given volumn and restore in save_dir
    respectively
    :param save_dir: str, paper and supplement material's saving path
    :param time_step_in_seconds: int, the interval time between two downlaod request in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or 'Thunder', default to 'IDM'
    :return: True
    """
    homepage = 'https://www.jmlr.org/papers/'
    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    # postfix = f'JMLR_v{volumn}'

    content = urlopen_with_retry(url=homepage, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    # soup = BeautifulSoup(open(r'..\JMLR_2011.html', 'rb'), 'html.parser')

    all_topics = soup.find('div', {'id': 'content'}).find_all(['h2', 'p'])
    is_topic = False
    is_issue = False
    for topic in all_topics:
        if 'h2' == topic.name and slugify(topic.text.strip()) == 'special-topics':
            is_topic = True
        elif 'h2' == topic.name:
            is_topic = False
            if 'special-issues' == slugify(topic.text.strip()):
                is_issue = True
        if is_topic and 'p' == topic.name:
            topic_name = slugify(topic.text.strip())
            topic_url = urllib.parse.urljoin(homepage, topic.a.get('href'))
            # print(f'T: {topic_name} url:{topic_url}')
            print(f'processing special topic: {topic_name}')
            download_paper(
                volumn=1000,
                save_dir=os.path.join(save_dir, 'special-topics', topic_name),
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                url=topic_url,
                is_use_url=True
            )
            time.sleep(time_step_in_seconds)
        if is_issue and 'p' == topic.name:
            issue_name = slugify(topic.text.strip())
            issue_url = urllib.parse.urljoin(homepage, topic.a.get('href'))
            # print(f'T: {issue_name} url:{issue_url}')
            print(f'processing special issue: {issue_name}')
            download_paper(
                volumn=1000,
                save_dir=os.path.join(save_dir, 'special-issues', issue_name),
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                url=issue_url,
                is_use_url=True
            )
            time.sleep(time_step_in_seconds)


if __name__ == '__main__':
    volumn = 25
    download_paper(volumn, rf'W:\all_papers\JMLR\JMLR_v{volumn}',
                   time_step_in_seconds=3)
    # download_special_topics_and_issues_paper(
    #     rf'Z:\all_papers\JMLR', time_step_in_seconds=3, downloader='IDM')
    pass


================================================
FILE: code/paper_downloader_NIPS.py
================================================
"""paper_downloader_NIPS.py"""

import urllib
import time
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib.supplement_porcess import move_main_and_supplement_2_one_directory
from lib.downloader import Downloader
from lib import csv_process
from lib.openreview import download_nips_papers_given_url
from lib.my_request import urlopen_with_retry


def save_csv(year):
    """
    write nips papers' and supplemental material's urls in one csv file
    :param year: int
    :return: num_download: int, the total number of papers.
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'NIPS_{year}.csv'
    )
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                'Gecko/20100101 Firefox/23.0'}
        init_url = f'https://proceedings.neurips.cc/paper/{year}'
        dat_file_pathname = os.path.join(
            project_root_folder, 'urls', f'init_url_nips_{year}.dat')
        if os.path.exists(dat_file_pathname):
            with open(dat_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            content = urlopen_with_retry(url=init_url, headers=headers)
            with open(dat_file_pathname, 'wb') as f:
                pickle.dump(content, f)
        soup = BeautifulSoup(content, 'html.parser')
        paper_list = soup.find(
            'div', {'class': 'container-fluid'}).find_all('li')
        # num_download = 5 # number of papers to download
        num_download = len(paper_list)
        paper_list_bar = tqdm(zip(paper_list, range(num_download)))
        for paper in tqdm(zip(paper_list, range(num_download))):
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            # get title
            # print('\n')
            this_paper = paper[0]
            title = slugify(this_paper.a.text)
            paper_dict['title'] = title
            # print('Downloading paper {}/{}: {}'.format(
            # paper[1] + 1, num_download, title))
            paper_list_bar.set_description(
                'Tracing paper {}/{}: {}'.format(
                    paper[1] + 1, num_download, title))

            # get abstract page url
            url2 = this_paper.a.get('href')
            abs_url = urllib.parse.urljoin(init_url, url2)
            abs_content = urlopen_with_retry(url=abs_url, headers=headers,
                                             raise_error_if_failed=False)
            if abs_content is not None:
                soup_temp = BeautifulSoup(abs_content, 'html.parser')
                # abstract = soup_temp.find(
                # 'p', {'class': 'abstract'}).text.strip()
                # paper_dict[title] = abstract
                all_a = soup_temp.findAll('a')
                for a in all_a:
                    # print(a.text[:-2])
                    # print(a.text[:-2].strip().lower())
                    if 'paper' == a.text[:-2].strip().lower():
                        paper_dict['main link'] = urllib.parse.urljoin(
                            abs_url, a.get('href'))
                    elif 'supplemental' == a.text[:-2].strip().lower():
                        paper_dict['supplemental link'] = \
                            urllib.parse.urljoin(abs_url, a.get('href'))
                        break
            else:
                print('Error: ' + title)
                if paper_dict['main link'] == '':
                    paper_dict['main link'] = 'error'
                if paper_dict['supplemental link'] == '':
                    paper_dict['supplemental link'] = 'error'
            writer.writerow(paper_dict)
            time.sleep(1)
    return num_download


def download_from_csv(
        year, save_dir, is_download_mainpaper=True, is_download_supplement=True,
        time_step_in_seconds=5, total_paper_number=None, downloader='IDM'):
    """
    download all NIPS paper and supplement files given year, restore in
    save_dir/main_paper and save_dir/supplement
    respectively
    :param year: int, NIPS year, such 2019
    :param save_dir: str, paper and supplement material's save path
    :param is_download_mainpaper: boot, True for downloading main papers
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    postfix = f'NIPS_{year}'
    csv_file_path = os.path.join(project_root_folder, 'csv', f'NIPS_{year}.csv')
    return csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=is_download_supplement,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader
    )


# def rename_supp( year, supp_dir):
#     """
#     rename supplemental material
#     :param year: int, NIPS year, such 2019
#     :param supp_dir: str, supplement material's save path
#     :return: True
#     """
#     if not os.path.exists(supp_dir):
#         raise ValueError(f'''can't find path {supp_dir}''')
#
#     postfix = f'NIPS_{year}'
#     with open(f'..\\csv\\NIPS_{year}.csv', newline='') as csvfile:
#         myreader = csv.DictReader(csvfile, delimiter=',')
#         pbar = tqdm(myreader)
#         for this_paper in pbar:
#             title = slugify(this_paper['title'])
#             this_paper_supp_path_no_ext = os.path.join(
#             supp_dir, f'{title}_{postfix}_supp.')
#
#             if '' != this_paper['supplemental link']:
#                 supp_ori_name = this_paper['supplemental link'].split('/')[-1]
#                 supp_type = supp_ori_name.split('.')[-1]
#                 if os.path.exists(os.path.join(supp_dir, supp_ori_name)) and \
#                 not os.path.exists(
#                         this_paper_supp_path_no_ext + supp_type):
#                     os.rename(
#                         os.path.join(supp_dir, supp_ori_name),
#                         this_paper_supp_path_no_ext + supp_type
#                     )
#                 pbar.set_description(f'Renaming paper: {title}...')


if __name__ == '__main__':
    year = 2024
    # total_paper_number = 1899
    # total_paper_number = save_csv(year)
    # download_from_csv(
    #     year, f'..\\NIPS_{year}',
    #     is_download_mainpaper=False,
    #     is_download_supplement=True,
    #     time_step_in_seconds=20,
    #     total_paper_number=total_paper_number,
    #     downloader='IDM')
    download_nips_papers_given_url(
        save_dir=rf'E:\NIPS_{year}',
        year=year,
        base_url=f'https://openreview.net/group?id=NeurIPS.cc/'
                 f'{year}/Conference',
        time_step_in_seconds=10,
        # download_groups=['poster'],
        downloader='IDM')
    # move_main_and_supplement_2_one_directory(
    #     main_path=rf'F:\workspace\python3_ws\paper_downloader-master\NIPS_{year}\main_paper',
    #     supplement_path=rf'F:\workspace\python3_ws\paper_downloader-master\NIPS_{year}\supplement',
    #     supp_pdf_save_path=rf'F:\workspace\python3_ws\paper_downloader-master\NIPS_{year}\supplement_pdf'
    # )


================================================
FILE: code/paper_downloader_RSS.py
================================================
"""paper_downloader_RSS.py
20240322"""
import time
import urllib
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import pickle
import os
from tqdm import tqdm
from slugify import slugify
import csv
import sys
from datetime import datetime

root_folder = os.path.abspath(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(root_folder)
from lib import csv_process
from lib.my_request import urlopen_with_retry


def get_paper_pdf_link(abs_url):
    """get paper pdf link in the abstract url.
       For newest papers that have not been added to 
       "https://www.roboticsproceedings.org/rss19/index.html"

    Args:
        abs_url (str): paper abstract page url.
    """
    headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
    content = urlopen_with_retry(url=abs_url, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    paper_pdf_div = soup.find('div', {'class': 'paper-pdf'})
    paper_pdf_div = paper_pdf_div.find('a').get('href')
    return paper_pdf_div


def save_csv(year):
    """
    write RSS papers' urls in one csv file
    :param year: int, RSS year, such 2023
    :return: peper_index: int, the total number of papers
    """
    conference = "RSS"
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_pathname = os.path.join(
        project_root_folder, 'csv', f'{conference}_{year}.csv'
    )
    error_log = []
    paper_index = 0
    with open(csv_file_pathname, 'w', newline='') as csvfile:
        fieldnames = ['title', 'main link', 'supplemental link']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        is_from_proceed = True  
        # True to get papaers from "https://www.roboticsproceedings.org"
        # False to get papers from "https://roboticsconference.org/"
        init_url = f'https://www.roboticsproceedings.org/rss' \
                   f'{year-2004 :0>2d}/index.html'
        # determine whether this year's papers had been added to 
        # "https://www.roboticsproceedings.org"
        # If not, get papers from "https://roboticsconference.org/"
        try:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            req = urllib.request.Request(url=init_url, headers=headers)
            urllib.request.urlopen(req, timeout=20)
        except HTTPError as e:
            if e.code == 404:  # not added
                current_year = datetime.now().year
                if year == current_year:
                    init_url = f'https://roboticsconference.org/program/papers/'
                else:
                    init_url = f'https://roboticsconference.org/{year}/program/papers/'
                is_from_proceed = False
        url_file_pathname = os.path.join(
            project_root_folder, 'urls', 
            f'init_url_{conference}_{year}_'
            f'''{'proc' if is_from_proceed else 'conf'}.dat'''
        )
        if os.path.exists(url_file_pathname):
            with open(url_file_pathname, 'rb') as f:
                content = pickle.load(f)
        else:
            headers = {
                'User-Agent':
                    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
                    'Gecko/20100101 Firefox/23.0'}
            content = urlopen_with_retry(url=init_url, headers=headers)
            with open(url_file_pathname, 'wb') as f:
                pickle.dump(content, f)

        soup = BeautifulSoup(content, 'html5lib')
        if is_from_proceed:
            paper_list = soup.find('div', {'class': 'content'}).find_all('tr')
        else:
            paper_list = soup.find('table', {'id': 'myTable'}).find_all('tr')
        paper_list_bar = tqdm(paper_list)
        paper_index = 0
        title_index = 0
        for i, paper in enumerate(paper_list_bar):
            paper_dict = {'title': '',
                          'main link': '',
                          'supplemental link': ''}
            # get title
            try:
                if not is_from_proceed and i == 0:
                    # header
                    fields = paper.find_all('th')
                    fields = [f.text.lower() for f in fields]
                    title_index = fields.index('title')
                tds = paper.find_all('td')
                if len(tds) < 2:  # seperator
                    continue
                if is_from_proceed:
                    title = slugify(tds[0].a.text)
                    main_link = tds[1].a.get('href')
                    main_link = urllib.parse.urljoin(init_url, main_link)
                else:
                    title = slugify(tds[title_index].a.text)
                    abs_link = tds[title_index].a.get('href')
                    abs_link = urllib.parse.urljoin(init_url, abs_link)
                    main_link = get_paper_pdf_link(abs_link)
                
                paper_dict['title'] = title
                paper_dict['main link'] = main_link
                paper_index += 1
                paper_list_bar.set_description_str(
                    f'Collected paper {paper_index}: {title}')
                writer.writerow(paper_dict)
                csvfile.flush()  # write to file immediately
            except Exception as e:
                print(f'Warning: {str(e)}')

    #  write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')

            f.write('\n')
    return paper_index


def download_from_csv(
        year, save_dir, time_step_in_seconds=5, total_paper_number=None,
        csv_filename=None, downloader='IDM', is_random_step=True,
        proxy_ip_port=None):
    """
    download all RSS paper given year
    :param year: int, RSS year, such as 2019
    :param save_dir: str, paper and supplement material's save path
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param csv_filename: None or str, the csv file's name, None means to use
        default setting
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder', default to 'IDM'
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :return: True
    """
    conference = "RSS"
    postfix = f'{conference}_{year}'
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    csv_file_path = os.path.join(
        project_root_folder, 'csv',
        f'{conference}_{year}.csv' if csv_filename is None else csv_filename)
    csv_process.download_from_csv(
        postfix=postfix,
        save_dir=save_dir,
        csv_file_path=csv_file_path,
        is_download_supplement=False,
        time_step_in_seconds=time_step_in_seconds,
        total_paper_number=total_paper_number,
        downloader=downloader,
        is_random_step=is_random_step,
        proxy_ip_port=proxy_ip_port
    )


if __name__ == '__main__':
    year = 2025
    total_paper_number = save_csv(year)
    # total_paper_number = 134
    download_from_csv(year, save_dir=fr'E:\RSS\RSS_{year}',
                        time_step_in_seconds=15,
                        total_paper_number=total_paper_number)
    time.sleep(2)

    pass


================================================
FILE: lib/IDM.py
================================================
import subprocess
import os
import time
import random


def download(urls, save_path, time_sleep_in_seconds=5, is_random_step=True,
             verbose=False):
    """
    download file from given urls and save it to given path
    :param urls: str, urls
    :param save_path: str, full path
    :param time_sleep_in_seconds: int, sleep seconds after call
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param verbose: bool, whether to display time step information.
        Default: False
    :return: None
    """
    idm_path = '"C:\Program Files (x86)\Internet Download Manager\IDMan.exe"'  # should replace by the local IDM path
    basic_command = [idm_path, '/d', 'xxxx', '/p', 'xxx', '/f', 'xxxx', '/n']
    head, tail = os.path.split(save_path)
    if '' != head:
        os.makedirs(head, exist_ok=True)
    basic_command[2] = urls
    basic_command[4] = head
    basic_command[6] = tail
    p = subprocess.Popen(' '.join(basic_command))
    # p.wait()
    if is_random_step:
        time_sleep_in_seconds = random.uniform(
            0.5 * time_sleep_in_seconds,
            1.5 * time_sleep_in_seconds,
        )
    if verbose:
        print(f'\t random sleep {time_sleep_in_seconds: .2f} seconds')
    time.sleep(time_sleep_in_seconds)


================================================
FILE: lib/__init__.py
================================================


================================================
FILE: lib/arxiv.py
================================================
"""
arxiv.py
20240218
"""
from bs4 import BeautifulSoup
from .my_request import urlopen_with_retry

def get_pdf_link_from_arxiv(abs_link, is_use_mirror=False):
    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
            'Gecko/20100101 Firefox/23.0'}
    mirror = 'cn.arxiv.org'
    if is_use_mirror:
        abs_link = abs_link.replace('arxiv.org', mirror)

    abs_content = urlopen_with_retry(
        url=abs_link, headers=headers, raise_error_if_failed=False)
    if abs_content is None:
        return None
    abs_soup = BeautifulSoup(abs_content, 'html.parser')
    pdf_link = 'http://arxiv.org' + abs_soup.find('div', {
        'class': 'full-text'}).find('ul').find('a').get('href')
    if pdf_link[-3:] != 'pdf':
        pdf_link += '.pdf'
    if is_use_mirror:
        pdf_link = pdf_link.replace('arxiv.org', mirror)
    return pdf_link


================================================
FILE: lib/csv_process.py
================================================
"""
csv_process.py
20210617
"""

import os
from tqdm import tqdm
from slugify import slugify
import csv
from lib.downloader import Downloader


def download_from_csv(
        postfix, save_dir, csv_file_path, is_download_main_paper=True,
        is_download_bib=True, is_download_supplement=True,
        time_step_in_seconds=5, total_paper_number=None,
        downloader='IDM', is_random_step=True, proxy_ip_port=None,
        max_length_filename=128
):
    """
    download paper, bibtex and supplement files and save them to
        save_dir/main_paper and save_dir/supplement respectively
    :param postfix: str, postfix that will be added at the end of papers' title
    :param save_dir: str, paper and supplement material's save path
    :param csv_file_path: str, the full path to csv file
    :param is_download_main_paper: bool, True for downloading main paper
    :param is_download_supplement: bool, True for downloading supplemental
        material
    :param time_step_in_seconds: int, the interval time between two downloading
        request in seconds
    :param total_paper_number: int, the total number of papers that is going to
        download
    :param downloader: str, the downloader to download, could be 'IDM' or None,
        default to 'IDM'.
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        Default: None
    :param max_length_filename: int or None, max filen name length. All the
            files whose name length is not less than this will be renamed
            before saving, the others will stay unchanged. None means
            no limitation. Default: 128.
    :return: True
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    downloader = Downloader(
        downloader=downloader, is_random_step=is_random_step,
        proxy_ip_port=proxy_ip_port)
    if not os.path.exists(csv_file_path):
        raise ValueError(f'ERROR: file not found in {csv_file_path}!!!')

    main_save_path = os.path.join(save_dir, 'main_paper')
    if is_download_main_paper:
        os.makedirs(main_save_path, exist_ok=True)
    if is_download_supplement:
        supplement_save_path = os.path.join(save_dir, 'supplement')
        os.makedirs(supplement_save_path, exist_ok=True)

    error_log = []
    with open(csv_file_path, newline='') as csvfile:
        myreader = csv.DictReader(csvfile, delimiter=',')
        pbar = tqdm(myreader, total=total_paper_number)
        i = 0
        for this_paper in pbar:
            is_download_bib &= ('bib' in this_paper)
            is_grouped = ('group' in this_paper)
            i += 1
            # get title
            if is_grouped:
                group = slugify(this_paper['group'])
            title = slugify(this_paper['title'])
            title_main_pdf = short_name(
                name=f'{title}_{postfix}.pdf',
                max_length=max_length_filename
            )
            if total_paper_number is not None:
                pbar.set_description(
                    f'Downloading {postfix} paper {i} /{total_paper_number}')
            else:
                pbar.set_description(f'Downloading {postfix} paper {i}')
            this_paper_main_path = os.path.join(
                main_save_path, title_main_pdf)
            if is_grouped:
                this_paper_main_path = os.path.join(
                    main_save_path, group, title_main_pdf)
            if is_download_supplement:
                this_paper_supp_title_no_ext = short_name(
                    name=f'{title}_{postfix}_supp.',
                    max_length=max_length_filename-3  # zip or pdf, so 3
                )
                this_paper_supp_path_no_ext = os.path.join(
                    supplement_save_path, this_paper_supp_title_no_ext)
                if is_grouped:
                    this_paper_supp_path_no_ext = os.path.join(
                        supplement_save_path, group,
                        this_paper_supp_title_no_ext
                    )
                if '' != this_paper['supplemental link'] and os.path.exists(
                        this_paper_main_path) and \
                        (os.path.exists(
                            this_paper_supp_path_no_ext + 'zip') or
                         os.path.exists(
                            this_paper_supp_path_no_ext + 'pdf')):
                    continue
                elif '' == this_paper['supplemental link'] and \
                        os.path.exists(this_paper_main_path):
                    continue
            elif os.path.exists(this_paper_main_path):
                continue
            if 'error' == this_paper['main link']:
                error_log.append((title, 'no MAIN link'))
            elif '' != this_paper['main link']:
                if is_grouped:
                    if is_download_main_paper:
                        os.makedirs(os.path.join(main_save_path, group),
                                    exist_ok=True)
                    if is_download_supplement:
                        os.makedirs(os.path.join(supplement_save_path, group),
                                    exist_ok=True)
                if is_download_main_paper:
                    try:
                        # download paper with IDM
                        if not os.path.exists(this_paper_main_path):
                            downloader.download(
                                urls=this_paper['main link'].replace(
                                    ' ', '%20'),
                                save_path=os.path.join(
                                    os.getcwd(), this_paper_main_path),
                                time_sleep_in_seconds=time_step_in_seconds
                            )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append((title, this_paper['main link'],
                                          'main paper download error', str(e)))
                # download supp
                if is_download_supplement:
                    # check whether the supp can be downloaded
                    if not (os.path.exists(
                            this_paper_supp_path_no_ext + 'zip') or
                            os.path.exists(
                                this_paper_supp_path_no_ext + 'pdf')):
                        if 'error' == this_paper['supplemental link']:
                            error_log.append((title, 'no SUPPLEMENTAL link'))
                        elif '' != this_paper['supplemental link']:
                            supp_type = \
                            this_paper['supplemental link'].split('.')[-1]
                            try:
                                downloader.download(
                                    urls=this_paper['supplemental link'],
                                    save_path=os.path.join(
                                        os.getcwd(),
                                        this_paper_supp_path_no_ext + supp_type),
                                    time_sleep_in_seconds=time_step_in_seconds
                                )
                            except Exception as e:
                                # error_flag = True
                                print('Error: ' + title + ' - ' + str(e))
                                error_log.append((title, this_paper[
                                    'supplemental link'],
                                                  'supplement download error',
                                                  str(e)))
                # download bibtex file
                if is_download_bib:
                    bib_path = this_paper_main_path[:-3] + 'bib'
                    if not os.path.exists(bib_path):
                        if 'error' == this_paper['bib']:
                            error_log.append((title, 'no bibtex link'))
                        elif '' != this_paper['bib']:
                            try:
                                downloader.download(
                                    urls=this_paper['bib'],
                                    save_path=os.path.join(os.getcwd(),
                                                           bib_path),
                                    time_sleep_in_seconds=time_step_in_seconds
                                )
                            except Exception as e:
                                # error_flag = True
                                print('Error: ' + title + ' - ' + str(e))
                                error_log.append((title, this_paper['bib'],
                                                  'bibtex download error',
                                                  str(e)))

        # 2. write error log
        print('write error log')
        log_file_pathname = os.path.join(
            project_root_folder, 'log', 'download_err_log.txt'
        )
        with open(log_file_pathname, 'w') as f:
            for log in tqdm(error_log):
                for e in log:
                    if e is not None:
                        f.write(e)
                    else:
                        f.write('None')
                    f.write('\n')

                f.write('\n')

    return True


def short_name(name, max_length, verbose=False):
    """
    rename to shorter name
    Args:
        name (str): original name
        max_length (int): max filen name length. All the
            files whose name length is not less than this will be renamed
            before saving, the others will stay unchanged. None means
            no limitation.
        verbose (bool): whether to print debug information. Default: False.
    Returns:
        new_name (str): short name.
    """
    if len(name) < max_length:
        new_name = name
    else:
        # rename
        try:
            [title, postfix] = name.split('_', 1)  # only split to 2 parts
            new_title = title[:max_length - len(postfix) - 2]
            new_name = f'{new_title}_{postfix}'
            if verbose:
                print(f'\nrenaming {name} \n\t-> {new_name}')
        except ValueError:
            # ValueError: not enough values to unpack (expected 2, got 1)
            if verbose:
                print(f'\nWARNING!!!:\n\tunable to parse postfix from {name}')
                print('\tSo, it will be just rename to short name')
            ext = os.path.splitext(name)[1]
            new_title = name[:max_length - len(ext) - 1]
            new_name = f'{new_title}{ext}'
            if verbose:
                print(f'\nrenaming {name} \n\t-> {new_name}')
    return new_name


================================================
FILE: lib/cvf.py
================================================
"""
cvf.py
20210617
"""

import urllib
from bs4 import BeautifulSoup
from tqdm import tqdm
from slugify import slugify
from .my_request import urlopen_with_retry


def get_paper_dict_list(url=None, content=None, group_name=None, timeout=10):
    """
    parse papers' title, link, supp link from content, and save in a list contains dictionaries with key "title",
        "main link", "supplemental link" and "group"(optional, if group_name is not None),
    :param url: str or None, url
    :param content: None of object return by urlopen
    :param group_name: str or None, the group name of the papers in given content
    :param timeout: int, the timeout value for open url, default to 10
    :return: paper_dict_list, list of dictionaries, that contains the dictionaries of papers with key "title",
        "main link",  "supplemental link" and "group"(optional, if group_name is not None)
        content, object return by urlopen
    """
    if url is None and content is None:
        raise ValueError('''one of "url" and "content" should be provide!!!''')
    paper_dict_list = []
    paper_dict = {'title': '', 'main link': '', 'supplemental link': '', 'arxiv': ''} if group_name is None else \
        {'group': group_name, 'title': '', 'main link': '', 'supplemental link': '', 'arxiv': ''}

    if content is None:
        headers = {
            'User-Agent':
                'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
        content = urlopen_with_retry(url=url, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    paper_list_bar = tqdm(soup.find('div', {'id': 'content'}).find_all(['dd', 'dt']))
    paper_index = 0
    for paper in paper_list_bar:
        is_new_paper = False

        # get title
        try:
            if 'dt' == paper.name and 'ptitle' == paper.get('class')[0]:  # title:
                title = slugify(paper.text.strip())
                paper_dict['title'] = title
                paper_index += 1
                paper_list_bar.set_description_str(f'Collecting paper {paper_index}: {title}')
            elif 'dd' == paper.name:
                all_as = paper.find_all('a')
                for a in all_as:
                    if 'pdf' == slugify(a.text.strip()):
                        main_link = urllib.parse.urljoin(url, a.get('href'))
                        paper_dict['main link'] = main_link
                        is_new_paper = True
                    elif 'supp' == slugify(a.text.strip()):
                        supp_link = urllib.parse.urljoin(url, a.get('href'))
                        paper_dict['supplemental link'] = supp_link
                    elif 'arxiv' == slugify(a.text.strip()):
                        arxiv = urllib.parse.urljoin(url, a.get('href'))
                        paper_dict['arxiv'] = arxiv
                        break
        except Exception as e:
            print(f'Warning: {str(e)}')

        if is_new_paper:
            paper_dict_list.append(paper_dict.copy())
            paper_dict['title'] = ''
            paper_dict['main link'] = ''
            paper_dict['supplemental link'] = ''
            paper_dict['arxiv'] = ''

    return paper_dict_list, content


================================================
FILE: lib/downloader.py
================================================
"""
downloader.py
20210624
"""
import time
from lib import IDM
import requests
import os
import random
from tqdm import tqdm
from threading import Thread
from lib.proxy import get_proxy_4_requests


def _download(urls, save_path, time_sleep_in_seconds=5, is_random_step=True,
              verbose=False, proxy_ip_port=None):
    """
    download file from given urls and save it to given path
    :param urls: str, urls
    :param save_path: str, full path
    :param time_sleep_in_seconds: int, sleep seconds after call
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :param verbose: bool, whether to display time step information.
        Default: False
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
    :return: None
    """

    def __download(urls, save_path, proxy_ip_port):
        head, tail = os.path.split(save_path)
        # debug
        # print(f'downloading {tail}')
        proxies = get_proxy_4_requests(proxy_ip_port)
        r = requests.get(urls, stream=True, proxies=proxies)
        # file size in MB
        length = round(int(r.headers['content-length']) / 1024**2, 2)
        process_bar = tqdm(
            colour='blue', total=length, unit='MB',desc=tail, initial=0)

        if '' != head:
            os.makedirs(head, exist_ok=True)

        for part in r.iter_content(1024 ** 2):
            process_bar.update(1)
            with open(save_path, 'ab') as file:
                file.write(part)
        r.close()

    # set daemon as False to continue downloading even if the main threading
    # has been killed due to KeyboardInterrupt
    t = Thread(
        target=__download, args=(urls, save_path, proxy_ip_port), daemon=False)
    t.start()

    if is_random_step:
        time_sleep_in_seconds = random.uniform(
            0.5 * time_sleep_in_seconds,
            1.5 * time_sleep_in_seconds,
        )
    if verbose:
        print(f'\t random sleep {time_sleep_in_seconds: .2f} seconds')
    time.sleep(time_sleep_in_seconds)


class Downloader(object):
    def __init__(self, downloader=None, is_random_step=True,
                 proxy_ip_port=None):
        """
        :param downloader: None or str, the downloader's name.
            if downloader is None, 'request' will be used to
            download files; if downloader is 'IDM', the
            "Internet Downloader Manager" will be used to download
            files; or a ValueError will be raised.
        :param is_random_step: bool, whether random sample the time step between
            two adjacent download requests. If True, the time step will be
            sampled from Uniform(0.5t, 1.5t), where t is the given
            time_step_in_seconds. Default: True.
        :param proxy_ip_port: str or None, proxy server ip address with or without
            protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
            (only useful for None|"request" downloader)
            Default: None
        """
        super(Downloader, self).__init__()
        if downloader is not None and downloader.lower() not in ['idm']:
            raise ValueError(
                f'''ERROR: Unsupported downloader: {downloader}, '''
                f'''we currently only support'''
                f''' None (means python's requests) or "IDM" '''
            )

        self.downloader = downloader
        self.is_random_step = is_random_step
        self.proxy_ip_port = proxy_ip_port

    def download(self, urls, save_path, time_sleep_in_seconds=5):
        """
        download file from given urls and save it to given path
        :param urls: str, urls
        :param save_path: str, full path
        :param time_sleep_in_seconds: int, sleep seconds after call
        :return: None
        """
        if self.downloader is None:
            _download(
                urls=urls,
                save_path=save_path,
                time_sleep_in_seconds=time_sleep_in_seconds,
                is_random_step=self.is_random_step,
                proxy_ip_port=self.proxy_ip_port
            )
        elif self.downloader.lower() == 'idm':
            IDM.download(
                urls=urls,
                save_path=save_path,
                time_sleep_in_seconds=time_sleep_in_seconds,
                is_random_step=self.is_random_step
            )


================================================
FILE: lib/my_request.py
================================================
"""
my_request.py
20240412
"""

import urllib
import random
from urllib.error import URLError, HTTPError
from lib.proxy import set_proxy_4_urllib_request


def urlopen_with_retry(url, headers=dict(), retry_time=3, time_out=20,
                       raise_error_if_failed=True, proxy_ip_port=None):
    """
    load content from url with given headers. Retry if error occurs.
    Args:
        url (str): url.
        headers (dict): request headers. Default: {}.
        retry_time (int): max retry time. Default: 3.
        time_out (int): time out in seconds. Default: 10.
        raise_error_if_failed (bool): whether to raise error if failed.
            Default: True.
        proxy_ip_port(str|None): proxy server ip address with or without
            protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
            Default: None

    Returns:
        content(str|None): url content. None will be returned if failed.

    """
    set_proxy_4_urllib_request(proxy_ip_port)
    req = urllib.request.Request(url=url, headers=headers)
    for r in range(retry_time):
        try:
            content = urllib.request.urlopen(req, timeout=time_out).read()
            return content
        except HTTPError as e:
            print('The server couldn\'t fulfill the request.')
            print('Error code: ', e.code)
            s = random.randint(3, 7)
            print(f'random sleeping {s} seconds and doing {r + 1}/{retry_time}'
                  f'-th retrying...')
        except URLError as e:
            print('We failed to reach a server.')
            print('Reason: ', e.reason)
            s = random.randint(3, 7)
            print(f'random sleeping {s} seconds and doing {r + 1}/{retry_time}'
                  f'-th retrying...')
    if raise_error_if_failed:
        raise ValueError(f'Failed to open {url} after trying {retry_time} '
                         f'times!')
    else:
        return None


================================================
FILE: lib/openreview.py
================================================
"""
openreview.py
20230104
"""

import time
from tqdm import tqdm
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
import os
# https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename
from slugify import slugify
from lib.downloader import Downloader
from lib.proxy import get_proxy
import urllib
from lib.arxiv import get_pdf_link_from_arxiv


def get_driver(proxy_ip_port=None):
    # driver = webdriver.Chrome(driver_path)
    capabilities = webdriver.DesiredCapabilities.CHROME
    if proxy_ip_port is not None:
        proxy = get_proxy(proxy_ip_port)
        proxy.add_to_capabilities(capabilities)
    
    # https://stackoverflow.com/a/78797164
    chrome_install = ChromeDriverManager().install()
    folder = os.path.dirname(chrome_install)
    chromedriver_path = os.path.join(folder, "chromedriver.exe")
    driver = webdriver.Chrome(
        service=Service(executable_path=chromedriver_path),
        desired_capabilities=capabilities)
    return driver


def __download_papers_given_divs(driver, divs, save_dir, paper_postfix,
                                 time_step_in_seconds=10, downloader='IDM',
                                 proxy_ip_port=None):
    error_log = []
    downloader = Downloader(downloader=downloader, proxy_ip_port=proxy_ip_port)
    
    # scroll to top of page
    # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
    driver.find_element(By.TAG_NAME, 'body').send_keys(
        Keys.CONTROL + Keys.HOME)
    time.sleep(0.3)

    # titles = [d.text for d in divs]
    titles = []
    for d in divs:
        for i in range(3):  # temp workaround
            try:
                titles.append(d.text)    
                break
            except Exception as e:
                if i == 2:
                    print(f'\tget Exception: {str(e.msg)}')
                time.sleep(0.3)
                       
    valid_divs = []
    for i, t in enumerate(titles):
        if len(t):
            valid_divs.append(divs[i])
    num_papers = len(valid_divs)
    print('found number of papers:', num_papers)
    name = None
    for index, paper in enumerate(valid_divs):
        is_get_paper = False
        try:
            a_hrefs = paper.find_elements(By.TAG_NAME, "a")
            name = slugify(a_hrefs[0].text.strip())
            if a_hrefs[1].get_attribute('class') == 'pdf-link':
                # has pdf button
                link = a_hrefs[1].get_attribute('href')
                link = urllib.parse.urljoin('https://openreview.net', link)
            else:
                # raise ValueError('pdf link not found!')
                print('\tWarning: pdf link not found, skip this download...')
                if name is not None:
                    error_log.append((name, str(index)))
                else:
                    error_log.append((str(index), str(index)))
                continue
                # TODO: find pdf link in paper abstract page
            if name == '':
                continue
            is_get_paper = True
        except Exception as e:
            print(f'\tget Exception: {str(e.msg)}')
            print('\tskip this download...')
            if name is not None:
                error_log.append((name, str(index)))
            else:
                error_log.append((str(index), str(index)))
        if not is_get_paper:
            continue

        # name = slugify(paper.find_element_by_class_name('note_content_title').text)
        # link = paper.find_element_by_class_name('note_content_pdf').get_attribute('href')
        pdf_name = name + '_' + paper_postfix + '.pdf'
        if not os.path.exists(os.path.join(save_dir, pdf_name)):
            print('Downloading paper {}/{}: {}'.format(index + 1, num_papers,
                                                       name))
            # get pdf link of arxiv if the original link is on arxiv.org
            if "arxiv.org/abs" in link:
                link = get_pdf_link_from_arxiv(abs_link=link)
            # try 1 times
            success_flag = False
            for d_iter in range(1):
                try:
                    downloader.download(
                        urls=link,
                        save_path=os.path.join(save_dir, pdf_name),
                        time_sleep_in_seconds=time_step_in_seconds
                    )
                    success_flag = True
                    break
                except Exception as e:
                    print('Error: ' + name + ' - ' + str(e))
            if not success_flag:
                error_log.append((name, link))
    return error_log, num_papers


def __get_into_pages_given_number(driver, page_number, pages, wait_fn,
                                  condition=None):
    wait_fn(driver, condition)
    for page in pages:
        if page.text.isnumeric() and int(page.text) == page_number:
            page_link = page.find_element(By.TAG_NAME, "a")
            page_link.click()
            wait_fn(driver, condition)
            return page

    return None


def download_nips_papers_given_url(
        save_dir, year, base_url, conference='NIPS', start_page=1,
        time_step_in_seconds=10, download_groups='all', downloader='IDM',
        proxy_ip_port=None):
    """
    download NeurIPS papers from the given web url.
    :param save_dir: str, paper save path
    :type save_dir: str
    :param year: int, iclr year, current only support year >= 2018
    :type year: int
    :param base_url: str, paper website url
    :type base_url: str
    :param conference: str, conference name, such as NIPS.
    :param start_page: int, the initial downloading webpage number, only the pages whose number is
                            equal to or greater than this number will be processed.
    :param time_step_in_seconds: int, the interval time between two downlaod request in seconds
    :param groups: group name, such as 'oral', 'spotlight', 'poster'.
        Default: 'all'.
    :type download_groups: str | list[str]
    :param downloader: str, the downloader to download, could be 'IDM' or None,
        default to 'IDM'
    :param proxy_ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
        (only useful for None|"request" downloader and webdriver)
        Default: None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    if year < 2023:
        sub_xpath = '''id="accepted-papers"'''
    else:
        sub_xpath = '''class="submissions-list"'''
    def mywait(driver, condition=None):
        # wait for the select element to become visible
        # print('Starting web driver wait...')
        # ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
        # wait = WebDriverWait(driver, 20, ignored_exceptions=ignored_exceptions)
        wait = WebDriverWait(driver, 20)
        # print('Starting web driver wait... finished')
        # res = wait.until(EC.presence_of_element_located((By.ID, "notes")))
        # print("Successful load the website!->", res)
        # res = wait.until(
        #     EC.presence_of_element_located((By.CLASS_NAME, "note")))
        res = wait.until(
            EC.presence_of_element_located((By.ID, "notes")))
        # print("Successful load the website notes!->", res)
        res = wait.until(EC.presence_of_element_located(
            (By.XPATH, f'''//*[@{sub_xpath}]/nav''')))
        # print("Successful load the website pagination!->", res)
        time.sleep(2)  # seconds, workaround for bugs

    def find_divs_of_papers():
        if year < 2023:
            divs = driver.find_element(By.ID, group_id). \
                find_elements(By.CLASS_NAME, 'note ')
        else:
            # divs = driver.find_element(By.ID, group_id). \
            #     find_elements(By.XPATH, '//*[@class="note  undefined"]')
            divs = driver.find_element(By.ID, group_id).find_elements(
                By.XPATH, 
                '//*[contains(@class, "note") and contains(@class, "undefined")]'
            )
        return divs

    paper_postfix = f'{conference}_{year}'
    error_log = []
    driver = get_driver(proxy_ip_port=proxy_ip_port)
    driver.get(base_url)

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    mywait(driver)
    # pages = driver.find_elements_by_xpath('//*[@id="accepted-papers"]/nav/ul/li')

    # download grouped papers, such as "Accepted Papaers" for year before 2023
    # "Accept (oral)", "Accept (spotlight)", "Accept (poster)" for year 2023
    groups = driver.find_elements(
        By.XPATH, f'//*[@id="notes"]/div/div[1]/ul/li')
    accept_groups = []
    for g in groups:
        if 'accept' in g.text.lower():
            # whether download this group
            is_download_group = True
            if not 'all' == download_groups:
                is_download_group = False
                for dg in download_groups:
                    if dg.lower() in g.text.lower():
                        is_download_group = True
                        break
            if is_download_group:
                accept_groups.append(g)
    group_name = None
    group_save_dir = save_dir
    for ag in accept_groups:
        group_name = slugify(ag.text)
        group_save_dir = os.path.join(save_dir, group_name)
        print(f'Downloading {group_name}...')
        os.makedirs(group_save_dir, exist_ok=True)
        number_paper_group = 0
        accept_group_link = ag.find_element(By.TAG_NAME, "a")
        # group_id = accept_group_link.get_attribute('aria-controls')
        group_id = accept_group_link.get_attribute('href').split('#')[-1]
        # scroll to top of page, if not at top, the click action not work
        # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
        driver.find_element(By.TAG_NAME, 'body').send_keys(
            Keys.CONTROL + Keys.HOME)
        time.sleep(0.2)
        accept_group_link.click()
        mywait(driver)
        pages = driver.find_elements(
            By.XPATH, f'//*[@{sub_xpath}]/nav[1]/ul/li')
        page_str_list = get_pages_str(pages)
        # print(f'Current page navigation bar:\n{page_str_list}')
        current_page = 1
        ind_page = 2  # 0 << ; 1 <
        # << | < | 1, 2, 3, ... | > | >>
        total_pages_number = get_max_page_number(page_str_list)
        last_total_pages = total_pages_number
        # get into start pages
        while current_page < start_page:
            if total_pages_number < start_page:  # flip pages until seeing the start page
                current_page = total_pages_number
                __get_into_pages_given_number(
                    driver=driver, page_number=current_page, pages=pages,
                    wait_fn=mywait)
                print(f'getting into web page {current_page}...')
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, '//*[@id="accepted-papers"]/ul/li/h4/a')))
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, '''//*[@id="accepted-papers"]/nav''')))
                mywait(driver)

                # print("Successful load the website pagination!->", res)
                # pages = driver.find_elements_by_xpath('//*[@id="accepted-papers"]/nav/ul/li')
                pages = pages = driver.find_elements(
                    By.XPATH, f'//*[@{sub_xpath}]/nav[1]/ul/li')
                page_str_list = get_pages_str(pages)
                total_pages_number = get_max_page_number(page_str_list)
                # # print(f'Current page navigation bar:\n{page_str_list}')
                if total_pages_number == last_total_pages:  # total page remain unchanged after reload
                    print(f'reached last({total_pages_number}-th) webpage')
                    # when get the last page, but the page number is till less than start page, so
                    # the start page doesn't exist. PRINT ERROR and return
                    print(f'ERROR: THE {start_page}-th webpage not found!')
                    return
            else:
                current_page = start_page

        page = __get_into_pages_given_number(
            driver=driver, page_number=current_page, pages=pages,
            wait_fn=mywait)

        while current_page <= total_pages_number:
            if page is None:
                break
            print(f'downloading papers in page: {current_page}')
            mywait(driver)

            # divs = driver.find_elements_by_xpath('//*[@id="accepted-papers"]/ul/li')
            # divs = driver.find_elements(By.XPATH, '//*[@id="accepted-papers"]/ul/li')
            divs = find_divs_of_papers()

            # temp workaround
            repeat_times = 3
            is_find_paper = False
            for r in range(repeat_times):
                try:
                    a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    is_find_paper = True
                    break
                except Exception as e:
                    if (r + 1) < repeat_times:
                        print(f'\terror occurre: {str(e)}')
                        print(f'\tsleep {(r + 1) * 5} seconds...')
                        time.sleep((r + 1) * 5)
                        print(f'{r + 1}-th reloading page')
                        divs = find_divs_of_papers()
                    else:
                        print('\tskip this page.')
            if not is_find_paper:
                continue

            # time.sleep(time_step_in_seconds)
            this_error_log, this_number_paper = __download_papers_given_divs(
                driver=driver,
                divs=divs,
                save_dir=group_save_dir,
                paper_postfix=paper_postfix,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                proxy_ip_port=proxy_ip_port
            )
            for e in this_error_log:
                error_log.append(e)
            number_paper_group += this_number_paper
            # get into next page
            current_page += 1
            # pages = driver.find_elements_by_xpath('//*[@id="accepted-papers"]/nav/ul/li')
            pages = driver.find_elements(
                By.XPATH, f'//*[@{sub_xpath}]/nav[1]/ul/li')
            page_str_list = get_pages_str(pages)
            total_pages_number = get_max_page_number(page_str_list)
            # print(f'Current page navigation bar:\n{page_str_list}')
            # if we do not reread the pages, all the pages will be not available with an exception:
            # selenium.common.exceptions.StaleElementReferenceException:
            # Message: stale element reference: element is not attached to the page document
            page = __get_into_pages_given_number(driver=driver,
                                                 page_number=current_page,
                                                 pages=pages,
                                                 wait_fn=mywait)
        # display total number of papers
        print(f'number of papers in {group_name}: {number_paper_group}')

    driver.quit()
    # 2. write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                f.write(e)
                f.write('\n')
            f.write('\n')


def download_iclr_papers_given_url_and_group_id(
        save_dir, year, base_url, group_id, conference='ICLR', start_page=1,
        time_step_in_seconds=10, downloader='IDM', proxy_ip_port=None,
        is_have_pages=True, is_need_click_group_button=False):
    """
    downlaod ICLR papers for the given web url and the paper group id
    :param save_dir: str, paper save path
    :type save_dir: str
    :param year: int, iclr year, current only support year >= 2018
    :type year: int
    :param base_url: str, paper website url
    :type base_url: str
    :param group_id: str, paper group id, such as "notable-top-5-",
        "notable-top-25-", "poster", "oral-submissions",
        "spotlight-submissions", "poster-submissions", etc.
    :type group_id: str
    :param conference: str, conference name, such as ICLR. Default: ICLR
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds. Default: 10
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder'. Default: 'IDM'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890".  Only useful for webdriver and request
        downloader (downloader=None). Default: None.
    :type proxy_ip_port: str | None
    :param is_have_pages: bool, is there pages in webpage. Default:
        True.
    :type is_have_pages: bool
    :param is_need_click_group_button: bool, is there need to click the
        group button in webpage. For some years, for example 2018, the
        navigation part "#xxxxx" in base url will not work. And it should
        be clicked before reading content from webpage. Default: False.
    :type is_need_click_group_button: bool
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    def _get_pages_xpath(year):
        if year <= 2023:
            xpath = f'''//*[@id="{group_id}"]/nav/ul/li'''
        else:
            xpath = f'''//*[@id="{group_id}"]/div/div/nav/ul/li'''
        return xpath

    def mywait(driver, condition=None):
        # wait for the select element to become visible
        # print('Starting web driver wait...')
        # ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
        # wait = WebDriverWait(driver, 20, ignored_exceptions=ignored_exceptions)
        wait = WebDriverWait(driver, 20)
        # print('Starting web driver wait... finished')
        # res = wait.until(EC.presence_of_element_located((By.ID, "notes")))
        # print("Successful load the website!->", res)
        if year <= 2023:
            res = wait.until(
                EC.presence_of_element_located((By.CLASS_NAME, "note")))
        # print("Successful load the website notes!->", res)
        # res = wait.until(EC.presence_of_element_located(
        #     (By.XPATH, f'''//*[@id="{group_id}"]/nav''')))
        if is_have_pages:
            # scroll to bottom of page
            # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
            driver.find_element(By.TAG_NAME, 'body').send_keys(
                Keys.CONTROL + Keys.END)
            if year <= 2023:
                wait.until(EC.element_to_be_clickable(
                    (By.XPATH, f'{_get_pages_xpath(year)}[3]/a')))
            else:
                wait.until(EC.element_to_be_clickable(
                    (By.XPATH, f'{_get_pages_xpath(year)}[3]/a')))
            # print("Successful load the website pagination!->", res)
        time.sleep(2)  # seconds, workaround for bugs

    paper_postfix = f'{conference}_{year}'
    error_log = []
    driver = get_driver(proxy_ip_port=proxy_ip_port)
    driver.get(base_url)

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    if is_need_click_group_button:
        archive_is_have_pages = is_have_pages
        is_have_pages = False
        mywait(driver)
        aria_controls = base_url.split('#')[-1]
        # scroll to home of page
        driver.find_element(By.TAG_NAME, 'body').send_keys(
            Keys.CONTROL + Keys.HOME)
        group_button = driver.find_element(
            By.XPATH, f"""//a[@aria-controls="{aria_controls}"]"""
        )
        group_button.click()
        is_have_pages = archive_is_have_pages
    mywait(driver)
    if is_have_pages:
        pages = driver.find_elements(By.XPATH, _get_pages_xpath(year))
        current_page = 1
        ind_page = 2  # 0 << ; 1 <
        total_pages_number = int(pages[-3].text)
        # << | < | 1, 2, 3, ... | > | >>
        last_total_pages = total_pages_number
        # get into start pages
        while current_page < start_page:
            # flip pages until seeing the start page
            if total_pages_number < start_page:
                current_page = total_pages_number
                __get_into_pages_given_number(
                    driver=driver, page_number=current_page, pages=pages,
                    wait_fn=mywait)
                print(f'getting into web page {current_page}...')
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, f'//*[@id="{group_id}"]/ul/li/h4/a')))
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, f'''//*[@id="{group_id}"]/nav''')))
                mywait(driver)

                # print("Successful load the website pagination!->", res)
                pages = driver.find_elements(
                    By.XPATH, _get_pages_xpath(year))
                total_pages_number = int(pages[-3].text)
                # total page remain unchanged after reload
                if total_pages_number == last_total_pages:
                    print(f'reached last({total_pages_number}-th) webpage')
                    # when get the last page, but the page number is till
                    # less than start page, so the start page doesn't exist.
                    # PRINT ERROR and return
                    print(f'ERROR: THE {start_page}-th webpage not found!')
                    return
            else:
                current_page = start_page

        page = __get_into_pages_given_number(
            driver=driver, page_number=current_page, pages=pages, wait_fn=mywait)

        while current_page <= total_pages_number:
            if page is None:
                break
            print(f'downloading {group_id} papers in page: {current_page}')
            mywait(driver)

            divs = driver.find_element(By.ID, group_id). \
                find_elements(By.CLASS_NAME, 'note ')

            # temp workaround
            repeat_times = 3
            is_find_paper = False
            for r in range(repeat_times):
                try:
                    a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    is_find_paper = True
                    break
                except Exception as e:
                    if (r + 1) < repeat_times:
                        print(f'\terror occurre: {str(e.msg)}')
                        print(f'\tsleep {(r + 1) * 5} seconds...')
                        time.sleep((r + 1) * 5)
                        print(f'{r + 1}-th reloading page')
                        divs = driver.find_element(By.ID, group_id). \
                            find_elements(By.CLASS_NAME, 'note ')
                    else:
                        print('\tskip this page.')
            if not is_find_paper:
                continue

            # time.sleep(time_step_in_seconds)
            this_error_log, this_number_paper = __download_papers_given_divs(
                driver=driver,
                divs=divs,
                save_dir=save_dir,
                paper_postfix=paper_postfix,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                proxy_ip_port=proxy_ip_port
            )
            for e in this_error_log:
                error_log.append(e)
            # get into next page
            current_page += 1
            pages = driver.find_elements(
                By.XPATH, _get_pages_xpath(year))
            total_pages_number = int(pages[-3].text)
            # if we do not reread the pages, all the pages will be not available
            # with an exception:
            # selenium.common.exceptions.StaleElementReferenceException:
            # Message: stale element reference: element is not attached to the
            # page document
            page = __get_into_pages_given_number(
                driver=driver, page_number=current_page, pages=pages,
                wait_fn=mywait)
    else:  # no pages
        divs = driver.find_element(By.ID, group_id). \
            find_elements(By.CLASS_NAME, 'note ')
        # temp workaround
        repeat_times = 3
        is_find_paper = False
        for r in range(repeat_times):
            try:
                a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                is_find_paper = True
                break
            except Exception as e:
                if (r + 1) < repeat_times:
                    print(f'\terror occurre: {str(e.msg)}')
                    print(f'\tsleep {(r + 1) * 5} seconds...')
                    time.sleep((r + 1) * 5)
                    print(f'{r + 1}-th reloading page')
                    divs = driver.find_element(By.ID, group_id). \
                        find_elements(By.CLASS_NAME, 'note ')
                else:
                    print('\tskipped!!!')
        if is_find_paper:
            # time.sleep(time_step_in_seconds)
            this_error_log, this_number_paper = __download_papers_given_divs(
                driver=driver,
                divs=divs,
                save_dir=save_dir,
                paper_postfix=paper_postfix,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                proxy_ip_port=proxy_ip_port
            )
            for e in this_error_log:
                error_log.append(e)

    driver.quit()
    # 2. write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                f.write(e)
                f.write('\n')
            f.write('\n')


def download_icml_papers_given_url_and_group_id(
        save_dir, year, base_url, group_id, conference='ICML', start_page=1,
        time_step_in_seconds=10, downloader='IDM', proxy_ip_port=None):
    """
    downlaod ICLR papers for the given web url and the paper group id
    :param save_dir: str, paper save path
    :type save_dir: str
    :param year: int, iclr year, current only support year >= 2018
    :type year: int
    :param base_url: str, paper website url
    :type base_url: str
    :param group_id: str, paper group id, such as "poster" and "oral".
    :type group_id: str
    :param conference: str, conference name, such as ICLR. Default: ICLR
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds. Default: 10
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder'. Default: 'IDM'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890". Only useful for webdriver and request
        downloader (downloader=None). Default: None.
    :type proxy_ip_port: str | None
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    def mywait(driver, aria_controls=None):
        # wait for the select element to become visible
        # print('Starting web driver wait...')
        wait = WebDriverWait(driver, 20)
        # ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
        # wait = WebDriverWait(driver, 20, ignored_exceptions=ignored_exceptions)
        # print('Starting web driver wait... finished')
        # res = wait.until(EC.presence_of_element_located((By.ID, "notes")))
        # print("Successful load the website!->", res)
        res = wait.until(EC.presence_of_element_located((By.ID, "notes")))
        res = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "submissions-list")))
        # print("Successful load the website notes!->", res)
        # res = wait.until(EC.presence_of_element_located(
        #     (By.XPATH, f'''//*[@id="{group_id}"]/nav''')))
        # scroll to bottom of page
        # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
        driver.find_element(By.TAG_NAME, 'body').send_keys(
            Keys.CONTROL + Keys.END)
        time.sleep(0.3)
        if aria_controls is None:
            wait.until(EC.element_to_be_clickable(
                (By.XPATH, f'//*[@class="submissions-list"]/nav/ul/li[3]/a''')))
        else:
            wait.until(EC.element_to_be_clickable(
                (By.XPATH,
                 f'''//*[@id='{aria_controls}']/div/div/nav/ul/li[3]/a''')))
            wait.until(EC.presence_of_element_located(
                (By.XPATH,
                 f'''//*[@id='{aria_controls}']/div/div/ul/li[1]/div/h4/a[1]''')))
        # print("Successful load the website pagination!->", res)
        time.sleep(2)  # seconds, workaround for bugs

    paper_postfix = f'{conference}_{year}'
    error_log = []
    driver = get_driver(proxy_ip_port=proxy_ip_port)
    driver.get(base_url)

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    # wait = WebDriverWait(driver, 20)
    mywait(driver)

    # get into poster or oral page
    nav_tap = driver.find_elements(
        By.XPATH, f'//ul[@class="nav nav-tabs"]/li')
    is_found_group = False
    for li in nav_tap:
        if group_id in li.text.lower():
            if 'poster' in group_id and 'spotlight' in li.text.lower():
                # spotlight-poster should be recognized as spotlight rather
                # than poster
                continue
            page_link = li.find_element(By.TAG_NAME, "a")
            # scroll to top of page, if not at top, the click action not work
            # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
            driver.find_element(By.TAG_NAME, 'body').send_keys(
                Keys.CONTROL + Keys.HOME)
            aria_controls = page_link.get_attribute('aria-controls')
            page_link.click()
            mywait(driver, aria_controls)  # there is no request in here
            is_found_group = True
            break
    if not is_found_group:
        raise ValueError(f'not found {group_id} papers at {base_url}!!!')

    # pages = driver.find_elements(
    #     By.XPATH, f'//nav[@aria-label="page navigation"]/ul/li')
    pages = driver.find_elements(
        By.XPATH, f'''//*[@id='{aria_controls}']/div/div/nav/ul/li''')
    current_page = 1
    # ind_page = 2  # 0 << ; 1 <
    total_pages_number = int(pages[-3].text)  # << | < | 1, 2, 3, ... | > | >>
    last_total_pages = total_pages_number
    # get into start pages
    while current_page < start_page:
        # flip pages until seeing the start page
        if total_pages_number < start_page:
            current_page = total_pages_number
            __get_into_pages_given_number(
                driver=driver, page_number=current_page, pages=pages,
                wait_fn=mywait, condition=aria_controls)
            print(f'getting into web page {current_page}...')

            # print("Successful load the website pagination!->", res)
            pages = driver.find_elements(
                By.XPATH, f'''//*[@id='{aria_controls}']/div/div/nav/ul/li''')
            total_pages_number = int(pages[-3].text)
            # total page remain unchanged after reload
            if total_pages_number == last_total_pages:
                print(f'reached last({total_pages_number}-th) webpage')
                # when get the last page, but the page number is till less than
                # start page, so the start page doesn't exist. PRINT ERROR and
                # return
                print(f'ERROR: THE {start_page}-th webpage not found!')
                return
        else:
            current_page = start_page

    page = __get_into_pages_given_number(
        driver=driver, page_number=current_page, pages=pages, wait_fn=mywait,
        condition=aria_controls)

    while current_page <= total_pages_number:
        if page is None:
            break
        print(f'downloading {group_id} papers in page: {current_page}')

        divs = driver.find_elements(
            By.XPATH, f'''//*[@id='{aria_controls}']/div/div/ul/li''')

        # temp workaround
        repeat_times = 3
        is_find_paper = False
        for r in range(repeat_times):
            try:
                a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                is_find_paper = True
                break
            except Exception as e:
                if (r+1) < repeat_times:
                    print(f'\terror occurre: {str(e.msg)}')
                    print(f'\tsleep {(r+1)*5} seconds...')
                    time.sleep((r+1)*5)
                    print(f'{r+1}-th reloading page')
                    divs = driver.find_elements(
                        By.XPATH,
                        f'''//*[@id='{aria_controls}']/div/div/ul/li''')
                else:
                    print('\tskip this page.')
        if not is_find_paper:
            continue
        # time.sleep(time_step_in_seconds)
        this_error_log, this_number_paper = __download_papers_given_divs(
            driver=driver,
            divs=divs,
            save_dir=save_dir,
            paper_postfix=paper_postfix,
            time_step_in_seconds=time_step_in_seconds,
            downloader=downloader,
            proxy_ip_port=proxy_ip_port
        )
        for e in this_error_log:
            error_log.append(e)
        # get into next page
        current_page += 1
        pages = driver.find_elements(
            By.XPATH, f'''//*[@id='{aria_controls}']/div/div/nav/ul/li''')
        total_pages_number = int(pages[-3].text)
        # if we do not reread the pages, all the pages will be not available
        # with an exception:
        # selenium.common.exceptions.StaleElementReferenceException:
        # Message: stale element reference: element is not attached to the
        # page document
        page = __get_into_pages_given_number(
            driver=driver, page_number=current_page, pages=pages,
            wait_fn=mywait, condition=aria_controls)

    driver.quit()
    # 2. write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                f.write(e)
                f.write('\n')
            f.write('\n')


def get_pages_str(pages):
    page_str_list = [p.text for p in pages]
    # print(f'Current page navigation bar:\n{page_str_list}')
    return page_str_list


def get_max_page_number(page_str_list):
    is_find_number = False
    for i, page_str in enumerate(page_str_list):
        if not page_str.isnumeric() and is_find_number:
            return int(page_str_list[i-1])
        if page_str.isnumeric():
            is_find_number = True
    return int(page_str_list[-1])


def download_papers_given_url_and_group_id(
        save_dir, year, base_url, group_id, conference, start_page=1,
        time_step_in_seconds=10, downloader='IDM', proxy_ip_port=None,
        is_have_pages=True, is_need_click_group_button=False):
    """
    downlaod papers for the given web url and the paper group id
    :param save_dir: str, paper save path
    :type save_dir: str
    :param year: int, iclr year, current only support year >= 2018
    :type year: int
    :param base_url: str, paper website url
    :type base_url: str
    :param group_id: str, paper group id, such as "notable-top-5-",
        "notable-top-25-", "poster", "oral-submissions",
        "spotlight-submissions", "poster-submissions", etc.
    :type group_id: str
    :param conference: str, conference name, such as CORL.
    :param start_page: int, the initial downloading webpage number, only the
        pages whose number is equal to or greater than this number will be
        processed. Default: 1
    :param time_step_in_seconds: int, the interval time between two download
        request in seconds. Default: 10
    :param downloader: str, the downloader to download, could be 'IDM' or
        'Thunder'. Default: 'IDM'
    :param proxy_ip_port: str or None, proxy ip address and port, eg.
        eg: "127.0.0.1:7890".  Only useful for webdriver and request
        downloader (downloader=None). Default: None.
    :type proxy_ip_port: str | None
    :param is_have_pages: bool, is there pages in webpage. Default:
        True.
    :type is_have_pages: bool
    :param is_need_click_group_button: bool, is there need to click the
        group button in webpage. For some years, for example 2018, the
        navigation part "#xxxxx" in base url will not work. And it should
        be clicked before reading content from webpage. Default: False.
    :type is_need_click_group_button: bool
    :return:
    """
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    def _get_pages_xpath(year):
        if year <= 2023:
            xpath = f'''//*[@id="{group_id}"]/nav/ul/li'''
        else:
            xpath = f'''//*[@id="{group_id}"]/div/div/nav/ul/li'''
        return xpath

    def mywait(driver, condition=None):
        # wait for the select element to become visible
        # print('Starting web driver wait...')
        # ignored_exceptions = (NoSuchElementException, 
        # StaleElementReferenceException,)
        # wait = WebDriverWait(driver, 20, ignored_exceptions=ignored_exceptions)
        wait = WebDriverWait(driver, 20)
        # print('Starting web driver wait... finished')
        # res = wait.until(EC.presence_of_element_located((By.ID, "notes")))
        # print("Successful load the website!->", res)
        # if year <= 2023:
        #     res = wait.until(
        #         EC.presence_of_element_located((By.CLASS_NAME, "note")))
        # print("Successful load the website notes!->", res)
        # res = wait.until(EC.presence_of_element_located(
        #     (By.XPATH, f'''//*[@id="{group_id}"]/nav''')))
        if is_have_pages:
            # scroll to bottom of page
            # https://stackoverflow.com/questions/45576958/scrolling-to-top-of-the-page-in-python-using-selenium
            driver.find_element(By.TAG_NAME, 'body').send_keys(
                Keys.CONTROL + Keys.END)
            if year <= 2023:
                wait.until(EC.element_to_be_clickable(
                    (By.XPATH, f'{_get_pages_xpath(year)}[3]/a')))
            else:
                wait.until(EC.element_to_be_clickable(
                    (By.XPATH, f'{_get_pages_xpath(year)}[3]/a')))
            # print("Successful load the website pagination!->", res)
        time.sleep(2)  # seconds, workaround for bugs

    paper_postfix = f'{conference}_{year}'
    error_log = []
    
    driver = get_driver(proxy_ip_port=proxy_ip_port)
    driver.get(base_url)

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    if is_need_click_group_button:
        archive_is_have_pages = is_have_pages
        is_have_pages = False
        mywait(driver)
        aria_controls = base_url.split('#')[-1]
        # scroll to home of page
        driver.find_element(By.TAG_NAME, 'body').send_keys(
            Keys.CONTROL + Keys.HOME)
        group_button = driver.find_element(
            By.XPATH, f"""//a[@aria-controls="{aria_controls}"]"""
        )
        group_button.click()
        is_have_pages = archive_is_have_pages
    mywait(driver)
    if is_have_pages:
        pages = driver.find_elements(By.XPATH, _get_pages_xpath(year))
        current_page = 1
        ind_page = 2  # 0 << ; 1 <
        total_pages_number = int(pages[-3].text)
        # << | < | 1, 2, 3, ... | > | >>
        last_total_pages = total_pages_number
        # get into start pages
        while current_page < start_page:
            # flip pages until seeing the start page
            if total_pages_number < start_page:
                current_page = total_pages_number
                __get_into_pages_given_number(
                    driver=driver, page_number=current_page, pages=pages,
                    wait_fn=mywait)
                print(f'getting into web page {current_page}...')
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, f'//*[@id="{group_id}"]/ul/li/h4/a')))
                # res = wait.until(EC.presence_of_element_located(
                #     (By.XPATH, f'''//*[@id="{group_id}"]/nav''')))
                mywait(driver)

                # print("Successful load the website pagination!->", res)
                pages = driver.find_elements(
                    By.XPATH, _get_pages_xpath(year))
                total_pages_number = int(pages[-3].text)
                # total page remain unchanged after reload
                if total_pages_number == last_total_pages:
                    print(f'reached last({total_pages_number}-th) webpage')
                    # when get the last page, but the page number is till
                    # less than start page, so the start page doesn't exist.
                    # PRINT ERROR and return
                    print(f'ERROR: THE {start_page}-th webpage not found!')
                    return
            else:
                current_page = start_page

        page = __get_into_pages_given_number(
            driver=driver, page_number=current_page, pages=pages, wait_fn=mywait)

        while current_page <= total_pages_number:
            if page is None:
                break
            print(f'downloading {group_id} papers in page: {current_page}')
            mywait(driver)

            divs = driver.find_element(By.ID, group_id). \
                find_elements(By.CLASS_NAME, 'note ')

            # temp workaround
            repeat_times = 3
            is_find_paper = False
            for r in range(repeat_times):
                try:
                    a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                    name = slugify(a_hrefs[0].text.strip())
                    link = a_hrefs[1].get_attribute('href')
                    is_find_paper = True
                    break
                except Exception as e:
                    if (r + 1) < repeat_times:
                        print(f'\terror occurre: {str(e.msg)}')
                        print(f'\tsleep {(r + 1) * 5} seconds...')
                        time.sleep((r + 1) * 5)
                        print(f'{r + 1}-th reloading page')
                        divs = driver.find_element(By.ID, group_id). \
                            find_elements(By.CLASS_NAME, 'note ')
                    else:
                        print('\tskip this page.')
            if not is_find_paper:
                continue

            # time.sleep(time_step_in_seconds)
            this_error_log, this_number_paper = __download_papers_given_divs(
                driver=driver,
                divs=divs,
                save_dir=save_dir,
                paper_postfix=paper_postfix,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                proxy_ip_port=proxy_ip_port
            )
            for e in this_error_log:
                error_log.append(e)
            # get into next page
            current_page += 1
            pages = driver.find_elements(
                By.XPATH, _get_pages_xpath(year))
            total_pages_number = int(pages[-3].text)
            # if we do not reread the pages, all the pages will be not available
            # with an exception:
            # selenium.common.exceptions.StaleElementReferenceException:
            # Message: stale element reference: element is not attached to the
            # page document
            page = __get_into_pages_given_number(
                driver=driver, page_number=current_page, pages=pages,
                wait_fn=mywait)
    else:  # no pages
        divs = driver.find_element(By.ID, group_id). \
            find_elements(By.CLASS_NAME, 'note ')
        # temp workaround
        repeat_times = 3
        is_find_paper = False
        for r in range(repeat_times):
            try:
                a_hrefs = divs[0].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                a_hrefs = divs[-1].find_elements(By.TAG_NAME, "a")
                name = slugify(a_hrefs[0].text.strip())
                link = a_hrefs[1].get_attribute('href')
                is_find_paper = True
                break
            except Exception as e:
                if (r + 1) < repeat_times:
                    print(f'\terror occurre: {str(e.msg)}')
                    print(f'\tsleep {(r + 1) * 5} seconds...')
                    time.sleep((r + 1) * 5)
                    print(f'{r + 1}-th reloading page')
                    divs = driver.find_element(By.ID, group_id). \
                        find_elements(By.CLASS_NAME, 'note ')
                else:
                    print('\tskipped!!!')
        if is_find_paper:
            # time.sleep(time_step_in_seconds)
            this_error_log, this_number_paper = __download_papers_given_divs(
                driver=driver,
                divs=divs,
                save_dir=save_dir,
                paper_postfix=paper_postfix,
                time_step_in_seconds=time_step_in_seconds,
                downloader=downloader,
                proxy_ip_port=proxy_ip_port
            )
            for e in this_error_log:
                error_log.append(e)

    driver.quit()
    # 2. write error log
    print('write error log')
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                f.write(e)
                f.write('\n')
            f.write('\n')


if __name__ == "__main__":
    year = 2023
    save_dir = rf'E:\ICML_{year}'
    base_url = 'https://openreview.net/group?id=ICML.cc/2023/Conference'
    # download_nips_papers_given_url(
    #     save_dir, year, base_url,
    #     start_page=1,
    #     time_step_in_seconds=10,
    #     downloader='IDM')
    # download_icml_papers_given_url_and_group_id(
    #     save_dir, year, base_url, group_id='oral', start_page=1,
    #     time_step_in_seconds=10, )


================================================
FILE: lib/pmlr.py
================================================
"""
pmlr.py
20210618
"""
from bs4 import BeautifulSoup
import os
from tqdm import tqdm
from slugify import slugify
from lib.downloader import Downloader
from .my_request import urlopen_with_retry


def download_paper_given_volume(
        volume, save_dir, postfix, is_download_supplement=True,
        time_step_in_seconds=5, downloader='IDM', is_random_step=True):
    """
    download main and supplement papers from PMLR.
    :param volume: str, such as 'v1', 'r1'
    :param save_dir: str, paper and supplement material's save path
    :param postfix: str, the postfix will be appended to the end of papers' titles
    :param is_download_supplement: bool, True for downloading supplemental material
    :param time_step_in_seconds: int, the interval time between two downloading
        requests in seconds
    :param downloader: str, the downloader to download, could be 'IDM' or None,
        Default: 'IDM'
    :param is_random_step: bool, whether random sample the time step between two
        adjacent download requests. If True, the time step will be sampled
        from Uniform(0.5t, 1.5t), where t is the given time_step_in_seconds.
        Default: True.
    :return: True
    """
    downloader = Downloader(
        downloader=downloader, is_random_step=is_random_step)
    init_url = f'http://proceedings.mlr.press/{volume}/'

    if is_download_supplement:
        main_save_path = os.path.join(save_dir, 'main_paper')
        supplement_save_path = os.path.join(save_dir, 'supplement')
        os.makedirs(main_save_path, exist_ok=True)
        os.makedirs(supplement_save_path, exist_ok=True)
    else:
        main_save_path = save_dir
        os.makedirs(main_save_path, exist_ok=True)
    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) '
            'Gecko/20100101 Firefox/23.0'}
    content = urlopen_with_retry(url=init_url, headers=headers)
    soup = BeautifulSoup(content, 'html.parser')
    paper_list = soup.find_all('div', {'class': 'paper'})
    error_log = []
    title_list = []
    num_download = len(paper_list)
    pbar = tqdm(zip(paper_list, range(num_download)), total=num_download)
    for paper in pbar:
        # get title
        this_paper = paper[0]
        title = slugify(this_paper.find_all('p', {'class': 'title'})[0].text)
        try:
            pbar.set_description(
                f'Downloading {postfix} paper {paper[1] + 1}/{num_download}:'
                f' {title}')
        except:
            pbar.set_description(
                f'''Downloading {postfix} paper {paper[1] + 1}/{num_download}: '''
                f'''{title.encode('utf8')}''')
        title_list.append(title)

        this_paper_main_path = os.path.join(main_save_path,
                                            f'{title}_{postfix}.pdf')
        if is_download_supplement:
            this_paper_supp_path = os.path.join(
                supplement_save_path, f'{title}_{postfix}_supp.pdf')
            this_paper_supp_path_no_ext = os.path.join(
                supplement_save_path, f'{title}_{postfix}_supp.')

            if os.path.exists(this_paper_main_path) and os.path.exists(
                    this_paper_supp_path):
                continue
        else:
            if os.path.exists(this_paper_main_path):
                continue

        # get abstract page url
        links = this_paper.find_all('p', {'class': 'links'})[0].find_all('a')
        supp_link = None
        main_link = None
        for link in links:
            if 'Download PDF' == link.text or 'pdf' == link.text:
                main_link = link.get('href')
            elif is_download_supplement and \
                    ('Supplementary PDF' == link.text or
                     'Supplementary Material' == link.text or
                     'supplementary' == link.text or
                     'Supplementary ZIP' == link.text or
                     'Other Files' == link.text):
                supp_link = link.get('href')
                if supp_link[-3:] != 'pdf':
                    this_paper_supp_path = this_paper_supp_path_no_ext + \
                                           supp_link[-3:]

        # try 1 time
        # error_flag = False
        for d_iter in range(1):
            try:
                # download paper with IDM
                if not os.path.exists(
                        this_paper_main_path) and main_link is not None:
                    downloader.download(
                        urls=main_link,
                        save_path=this_paper_main_path,
                        time_sleep_in_seconds=time_step_in_seconds
                    )
            except Exception as e:
                # error_flag = True
                print('Error: ' + title + ' - ' + str(e))
                error_log.append(
                    (title, main_link, 'main paper download error', str(e)))
            # download supp
            if is_download_supplement:
                # check whether the supp can be downloaded
                if not os.path.exists(
                        this_paper_supp_path) and supp_link is not None:
                    try:
                        downloader.download(
                            urls=supp_link,
                            save_path=this_paper_supp_path,
                            time_sleep_in_seconds=time_step_in_seconds
                        )
                    except Exception as e:
                        # error_flag = True
                        print('Error: ' + title + ' - ' + str(e))
                        error_log.append((title, supp_link,
                                          'supplement download error', str(e)))

    # write error log
    print('writing error log...')
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'download_err_log.txt')
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is not None:
                    f.write(e)
                else:
                    f.write('None')
                f.write('\n')
            f.write('\n')

    return True


if __name__ == '__main__':
    download_paper_given_volume(
        volume=150,
        save_dir=r'D:\The_KDD21_Workshop_on_Causal_Discovery',
        postfix=f'',
        is_download_supplement=False,
        time_step_in_seconds=5,
        downloader='IDM'
    )


================================================
FILE: lib/proxy.py
================================================
"""
proxy.py
20230228
"""
from selenium.webdriver.common.proxy import Proxy, ProxyType
import urllib


def get_proxy(ip_port: str):
    """
    setup proxy
    :param ip_port: str, proxy server ip address without protocol prefix,
        eg: "127.0.0.1:7890"
    :return: proxy (instance of selenium.webdriver.common.proxy.Proxy)
    Then the proxy could be to webdriver.Chrome:
        capabilities = webdriver.DesiredCapabilities.CHROME
        proxy.add_to_capabilities(capabilities)
        driver = webdriver.Chrome(
            service=Service(ChromeDriverManager().install()),
            desired_capabilities=capabilities)
    """
    proxy = Proxy()
    proxy.proxy_type = ProxyType.MANUAL
    proxy.http_proxy = ip_port
    proxy.ssl_proxy = ip_port
    return proxy


def set_proxy_4_urllib_request(ip_port: str):
    """
    setup proxy
    :param ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
    :return: proxies, dict with keys "http" and "https" or None.
    """
    if ip_port is None:
        proxies = None
    else:
        if not ip_port.startswith('http'):
            ip_port = 'http://' + ip_port
        proxies = {
            'http': ip_port,
            'https': ip_port
        }
        proxy_support = urllib.request.ProxyHandler(proxies)
        opener = urllib.request.build_opener(proxy_support)
        urllib.request.install_opener(opener)
    return proxies


def get_proxy_4_requests(ip_port: str):
    """
    setup proxy
    :param ip_port: str or None, proxy server ip address with or without
        protocol prefix, eg: "127.0.0.1:7890", "http://127.0.0.1:7890".
    :return: proxies, dict with keys "http" and "https" or None.
    """
    if ip_port is None:
        proxies = None
    else:
        if not ip_port.startswith('http'):
            ip_port = 'http://' + ip_port
        proxies = {
            'http': ip_port,
            'https': ip_port
        }
    return proxies


if __name__ == "__main__":
    # get my ip
    import json
    set_proxy_4_urllib_request('127.0.0.1:7897')
    url = "http://ip-api.com/json"  # ipv4
    response = urllib.request.urlopen(url)
    data = json.load(response)
    if data['status'] == 'success':
        ip = data['query']
        print(f'ip: {ip}')
        print(f'details: {data}')
    else:
        print(f'failed, try agin: {data}')


================================================
FILE: lib/springer.py
================================================
"""
springer.py
some function for springer
20201106
"""

import urllib
from bs4 import BeautifulSoup
from tqdm import tqdm
from slugify import slugify
from .my_request import urlopen_with_retry
import re


def get_paper_name_link_from_url(url):
    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    paper_dict = dict()
    content = urlopen_with_retry(url=url, headers=headers)
    soup = BeautifulSoup(content, 'html5lib')
    paper_list_bar = tqdm(
        soup.find('section', {'data-title': 'Table of contents'}).find(
            'div', {'class': 'c-book-section'}).find_all(
            ['li'], {'data-test': 'chapter'}))
    for paper in paper_list_bar:
        try:
            title = slugify(
                paper.find(['h3', 'h4'], {'class': 'app-card-open__heading'}).text)
            link = urllib.parse.urljoin(
                url, 
                paper.find(
                    ['h3', 'h4'], {'class': 'app-card-open__heading'}
                    ).a.get('href'))
            # 'https://link.springer.com/chapter/10.1007/978-3-642-33718-5_2' 
            # >>
            # 'https://link.springer.com/content/pdf/10.1007/978-3-642-33718-5_2.pdf'
            link = f'''{link.replace('/chapter/', '/content/pdf/')}.pdf'''
            paper_dict[title] = link
        except Exception as e:
            print(f'ERROR: {str(e)}')
    return paper_dict


if __name__ == '__main__':
    papers = get_paper_name_link_from_url('https://link.springer.com/book/10.1007%2F978-3-319-46448-0')

================================================
FILE: lib/supplement_porcess.py
================================================
"""
    supplement_process.py
"""
from PyPDF3 import PdfFileMerger
import zipfile
import os
import shutil
from tqdm import tqdm

def unzipfile(zip_file, save_path):
    """
    unzip zip file to save_path
    :param zipfile: str, zip file's full pathname.
    :param save_path: str, the path store unzipped files.
    :return: None
    """
    zip_ref = zipfile.ZipFile(zip_file, 'r')
    zip_ref.extractall(save_path)
    zip_ref.close()


def get_potential_supp_pdf(path):
    """
    get all the potential supplemental pdf file pathname
    :param path: str, the path of unzipped files
    :return: supp_pdf_list, List of str, pdf files' full pathnames
    """
    supp_pdf_list = [f for f in os.scandir(path) if f.name.endswith('.pdf')]
    if len(supp_pdf_list) == 0:
        supp_pdf_list = []
        for dir in os.scandir(path):
            if dir.is_dir() and not dir.name.startswith('__'):
                for pdf in os.scandir(dir.path):
                    if pdf.name.endswith('.pdf'):
                        supp_pdf_list.append(pdf.path)
    if len(supp_pdf_list) == 0:
        supp_pdf_list = []
        for dir in os.scandir(path):
            if dir.is_dir() and not dir.name.startswith('__'):
                for sub_dir in os.scandir(dir):
                    if sub_dir.is_dir() and not sub_dir.name.startswith('__'):
                        for pdf in os.scandir(sub_dir.path):
                            if pdf.name.endswith('.pdf'):
                                supp_pdf_list.append(pdf.path)
    return supp_pdf_list


def move_main_and_supplement_2_one_directory_with_group(main_path, supplement_path, supp_pdf_save_path):
    """
    unzip supplemental zip files to get the pdf files, copy and
        rename them into given path(supp_pdf_save_path/group_name)
    :param main_path: str, the main papers' path
    :param supplement_path: str, the supplemental material 's path
    :param supp_pdf_save_path: str, the supplemental pdf files' save path
    """
    if not os.path.exists(main_path):
        raise ValueError(f'''can not open '{main_path}' !''')
    if not os.path.exists(supplement_path):
        raise ValueError(f'''can not open '{supplement_path}' !''')
    error_log = []
    # make temp dir to unzip zip file
    temp_zip_dir = '.\\temp_zip'
    if not os.path.exists(temp_zip_dir):
        os.mkdir(temp_zip_dir)
    else:
        # remove all files
        for unzip_file in os.listdir(temp_zip_dir):
            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                os.remove(os.path.join(temp_zip_dir, unzip_file))
            if os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
            else:
                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
    for group in os.scandir(main_path):
        if group.is_dir():
            paper_bar = tqdm(os.scandir(group.path))
            for paper in paper_bar:
                if paper.is_file():
                    name, extension = os.path.splitext(paper.name)
                    if '.pdf' == extension:
                        paper_bar.set_description(f'''processing {name}''')
                        supp_pdf_path = None
                        # error_flag = False
                        if os.path.exists(os.path.join(supplement_path, group.name, f'{name}_supp.pdf')):
                            supp_pdf_path = os.path.join(supplement_path, group.name,  f'{name}_supp.pdf')
                            shutil.copyfile(
                                supp_pdf_path, os.path.join(supp_pdf_save_path, group.name,  f'{name}_supp.pdf'))
                        elif os.path.exists(os.path.join(supplement_path, group.name, f'{name}_supp.zip')):
                            try:
                                unzipfile(
                                    zip_file=os.path.join(supplement_path, group.name, f'{name}_supp.zip'),
                                    save_path=temp_zip_dir
                                )
                            except Exception as e:
                                print('Error: ' + name + ' - ' + str(e))
                                error_log.append((paper.path, supp_pdf_path, str(e)))
                            try:
                                # find if there is a pdf file (by listing all files in the dir)
                                supp_pdf_list = get_potential_supp_pdf(temp_zip_dir)
                                # rename the first pdf file
                                if len(supp_pdf_list) >= 1:
                                    # by default, we only deal with the first pdf
                                    supp_pdf_path = os.path.join(supp_pdf_save_path, group.name, name+'_supp.pdf')
                                    if not os.path.exists(supp_pdf_path):
                                        shutil.move(supp_pdf_list[0], supp_pdf_path)
                                    if len(supp_pdf_list) > 1:
                                        for i in range(1, len(supp_pdf_list)):
                                            supp_pdf_path = os.path.join(
                                                supp_pdf_save_path, group.name, name + f'_supp_{i}.pdf')
                                            if not os.path.exists(supp_pdf_path):
                                                shutil.move(supp_pdf_list[i], supp_pdf_path)
                                # empty the temp_folder (both the dirs and files)
                                for unzip_file in os.listdir(temp_zip_dir):
                                    if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                                        os.remove(os.path.join(temp_zip_dir, unzip_file))
                                    elif os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                                        shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
                                    else:
                                        print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
                            except Exception as e:
                                print('Error: ' + name + ' - ' + str(e))
                                error_log.append((paper.path, supp_pdf_path, str(e)))

    # 2. write error log
    print('write error log')
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'merge_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is None:
                    f.write('None')
                else:
                    f.write(e)
                f.write('\n')

            f.write('\n')


def move_main_and_supplement_2_one_directory(main_path, supplement_path, supp_pdf_save_path):
    """
    unzip supplemental zip files to get the pdf files, copy and
    rename them into given path(supp_pdf_save_path)
    :param main_path: str, the main papers' path
    :param supplement_path: str, the supplemental material's path
    :param supp_pdf_save_path: str, the supplemental pdf files' save path
    """
    if not os.path.exists(main_path):
        raise ValueError(f'''can not open '{main_path}' !''')
    if not os.path.exists(supplement_path):
        raise ValueError(f'''can not open '{supplement_path}' !''')
    os.makedirs(supp_pdf_save_path, exist_ok=True)
    error_log = []
    # make temp dir to unzip zip file
    temp_zip_dir = '..\\temp_zip'
    if not os.path.exists(temp_zip_dir):
        os.mkdir(temp_zip_dir)
    else:
        # remove all files
        for unzip_file in os.listdir(temp_zip_dir):
            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                os.remove(os.path.join(temp_zip_dir, unzip_file))
            if os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
            else:
                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))

    paper_bar = tqdm(os.scandir(main_path))
    for paper in paper_bar:
        if paper.is_file():
            name, extension = os.path.splitext(paper.name)
            if '.pdf' == extension:
                paper_bar.set_description(f'''processing {name}''')
                supp_pdf_path = None
                # error_flag = False
                if os.path.exists(os.path.join(supp_pdf_save_path, f'{name}_supp.pdf')):
                    continue
                elif os.path.exists(os.path.join(supplement_path, f'{name}_supp.pdf')):
                    supp_pdf_path = os.path.join(supplement_path, f'{name}_supp.pdf')
                    shutil.copyfile(supp_pdf_path, os.path.join(supp_pdf_save_path, f'{name}_supp.pdf'))
                elif os.path.exists(os.path.join(supplement_path, f'{name}_supp.zip')):
                    try:
                        unzipfile(
                            zip_file=os.path.join(supplement_path, f'{name}_supp.zip'),
                            save_path=temp_zip_dir)
                    except Exception as e:
                        print('Error: ' + name + ' - ' + str(e))
                        error_log.append((paper.path, supp_pdf_path, str(e)))
                    try:
                        # find if there is a pdf file (by listing all files in the dir)
                        supp_pdf_list = get_potential_supp_pdf(temp_zip_dir)

                        # rename the first pdf file
                        if len(supp_pdf_list) >= 1:
                            # by default, we only deal with the first pdf
                            supp_pdf_path = os.path.join(supp_pdf_save_path, name+'_supp.pdf')
                            if not os.path.exists(supp_pdf_path):
                                shutil.move(supp_pdf_list[0], supp_pdf_path)
                            if len(supp_pdf_list) > 1:
                                for i in range(1, len(supp_pdf_list)):
                                    supp_pdf_path = os.path.join(supp_pdf_save_path, name + f'_supp_{i}.pdf')
                                    if not os.path.exists(supp_pdf_path):
                                        shutil.move(supp_pdf_list[i], supp_pdf_path)
                        # empty the temp_folder (both the dirs and files)
                        for unzip_file in os.listdir(temp_zip_dir):
                            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                                os.remove(os.path.join(temp_zip_dir, unzip_file))
                            elif os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
                            else:
                                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
                    except Exception as e:
                        print('Error: ' + name + ' - ' + str(e))
                        error_log.append((paper.path, supp_pdf_path, str(e)))

    # 2. write error log
    print('write error log')
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'merge_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is None:
                    f.write('None')
                else:
                    f.write(e)
                f.write('\n')

            f.write('\n')


def merge_main_supplement(main_path, supplement_path, save_path, is_delete_ori_files=False):
    """
    merge the main paper and supplemental material into one single pdf file
    :param main_path: str, the main papers' path
    :param supplement_path: str, the supplemental material 's path
    :param save_path: str, merged pdf files's save path
    :param is_delete_ori_files: Bool, True for deleting the original main and supplemental material after merging
    """
    if not os.path.exists(main_path):
        raise ValueError(f'''can not open '{main_path}' !''')
    if not os.path.exists(supplement_path):
        raise ValueError(f'''can not open '{supplement_path}' !''')
    os.makedirs(save_path, exist_ok=True)
    error_log = []
    # make temp dir to unzip zip file
    temp_zip_dir = '.\\temp_zip'
    if not os.path.exists(temp_zip_dir):
        os.mkdir(temp_zip_dir)
    else:
        # remove all files
        for unzip_file in os.listdir(temp_zip_dir):
            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                os.remove(os.path.join(temp_zip_dir, unzip_file))
            if os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
            else:
                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
    paper_bar = tqdm(os.scandir(main_path))
    for paper in paper_bar:
        if paper.is_file():
            name, extension = os.path.splitext(paper.name)
            if '.pdf' == extension:
                paper_bar.set_description(f'''processing {name}''')
                if os.path.exists(os.path.join(save_path, paper.name)):
                    continue
                supp_pdf_path = None
                error_floa = False
                if os.path.exists(os.path.join(supplement_path, f'{name}_supp.pdf')):
                    supp_pdf_path = os.path.join(supplement_path, f'{name}_supp.pdf')
                elif os.path.exists(os.path.join(supplement_path, f'{name}_supp.zip')):
                    try:
                        unzipfile(
                            zip_file=os.path.join(supplement_path, f'{name}_supp.zip'),
                            save_path=temp_zip_dir
                        )
                    except Exception as e:
                        print('Error: ' + name + ' - ' + str(e))
                        error_log.append((paper.path, supp_pdf_path, str(e)))
                    try:
                        # find if there is a pdf file (by listing all files in the dir)
                        supp_pdf_list = get_potential_supp_pdf(temp_zip_dir)
                        # rename the first pdf file
                        if len(supp_pdf_list) >= 1:
                            # by default, we only deal with the first pdf
                            supp_pdf_path = os.path.join(supplement_path, name+'_supp.pdf')
                            if not os.path.exists(supp_pdf_path):
                                shutil.move(supp_pdf_list[0], supp_pdf_path)
                        # empty the temp_folder (both the dirs and files)
                        for unzip_file in os.listdir(temp_zip_dir):
                            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                                os.remove(os.path.join(temp_zip_dir, unzip_file))
                            elif os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
                            else:
                                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
                    except Exception as e:
                        error_floa = True
                        print('Error: ' + name + ' - ' + str(e))
                        error_log.append((paper.path, supp_pdf_path, str(e)))
                        # empty the temp_folder (both the dirs and files)
                        for unzip_file in os.listdir(temp_zip_dir):
                            if os.path.isfile(os.path.join(temp_zip_dir, unzip_file)):
                                os.remove(os.path.join(temp_zip_dir, unzip_file))
                            elif os.path.isdir(os.path.join(temp_zip_dir, unzip_file)):
                                shutil.rmtree(os.path.join(temp_zip_dir, unzip_file))
                            else:
                                print('Cannot Remove - ' + os.path.join(temp_zip_dir, unzip_file))
                        continue
                if supp_pdf_path is not None:
                    try:
                        merger = PdfFileMerger()
                        f_handle1 = open(paper.path, 'rb')
                        merger.append(f_handle1)
                        f_handle2 = open(supp_pdf_path, 'rb')
                        merger.append(f_handle2)
                        with open(os.path.join(save_path, paper.name), 'wb') as fout:
                            merger.write(fout)
                            print('\tmerged!')
                        f_handle1.close()
                        f_handle2.close()
                        merger.close()
                        if is_delete_ori_files:
                            os.remove(paper.path)
                            if os.path.exists(os.path.join(supplement_path, f'{name}_supp.zip')):
                                os.remove(os.path.join(supplement_path, f'{name}_supp.zip'))
                            if os.path.exists(os.path.join(supplement_path, f'{name}_supp.pdf')):
                                os.remove(os.path.join(supplement_path, f'{name}_supp.pdf'))
                    except Exception as e:
                        print('Error: ' + name + ' - ' + str(e))
                        error_log.append((paper.path, supp_pdf_path, str(e)))
                        if os.path.exists(os.path.join(save_path, paper.name)):
                            os.remove(os.path.join(save_path, paper.name))

                else:
                    if is_delete_ori_files:
                        shutil.move(paper.path, os.path.join(save_path, paper.name))
                    else:
                        shutil.copyfile(paper.path, os.path.join(save_path, paper.name))

    # 2. write error log
    print('write error log')
    project_root_folder = os.path.abspath(
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    log_file_pathname = os.path.join(
        project_root_folder, 'log', 'merge_err_log.txt'
    )
    with open(log_file_pathname, 'w') as f:
        for log in tqdm(error_log):
            for e in log:
                if e is None:
                    f.write('None')
                else:
                    f.write(e)
                f.write('\n')

            f.write('\n')


def rename_2_short_name(src_path, save_path, target_max_length=128,
                        extension='pdf'):
    """
    rename file to short filename while remain the conference postfix
    Args:
        src_path (str): path that contains files directly.
        save_path (str): path to save the renamed files.
        target_max_length (int): max filen name length after renaming. All the
            files whose name length is not less than this will be renamed, the
            others will stay unchanged and copy into the save path. Default:
            128.
        extension (str | None): only the files with this extension will be
            processed. None means all file will be processed. Default: 'pdf'.
    Returns:
        None
    """
    if not os.path.exists(src_path):
        raise ValueError(f'Path not found: {src_path}!')

    os.makedirs(save_path, exist_ok=True)

    for f in tqdm(os.scandir(src_path)):
        f_name = f.name

        # compare extension
        ext = os.path.splitext(f_name)[1]
        if extension is not None and ext[1:] != extension:
            continue
        # compare file name length
        l = len(f_name)
        if l < target_max_length:
            if not os.path.exists(os.path.join(save_path, f_name)):
                print(f'\ncopying {f_name}')
                shutil.copyfile(f.path, os.path.join(save_path, f_name))
        else:
            # rename
            try:
                [title, postfix] = f_name.split('_', 1)  # only split to 2 parts
                new_title = title[:target_max_length-len(postfix)-2]
                new_name = f'{new_title}_{postfix}'
                if not os.path.exists(os.path.join(save_path, new_name)):
                    print(f'\nrenaming {f_name} \n\t-> {new_name}')
                    shutil.copyfile(f.path, os.path.join(save_path, new_name))
            except ValueError:
                # ValueError: not enough values to unpack (expected 2, got 1)
                print(f'\nWARNING!!!:\n\tunable to parse postfix from {f.path}')
                print('\tSo, it will be just copy/rename to short name')
                new_title = f_name[:target_max_length - len(ext) - 1]
                new_name = f'{new_title}{ext}'
                if not os.path.exists(os.path.join(save_path, new_name)):
                    print(f'\nrenaming {f_name} \n\t-> {new_name}')
                    shutil.copyfile(f.path, os.path.join(save_path, new_name))


def rename_2_short_name_within_group(src_path, save_path, target_max_length=128,
                        extension='pdf'):
    """
    rename file to short filename while remain the conference postfix
    Args:
        src_path (str): path that contains files:
            src_path/group_name/files
        save_path (str): path to save the renamed files.
        target_max_length (int): max filen name length after renaming. All the
            files whose name length is not less than this will be renamed, the
            others will stay unchanged and copy into the save path. Default:
            128.
        extension (str | None): only the files with this extension will be
            processed. None means all file will be processed. Default: 'pdf'.
    Returns:
        None
    """
    if not os.path.exists(src_path):
        raise ValueError(f'Path not found: {src_path}!')

    os.makedirs(save_path, exist_ok=True)

    for d in tqdm(os.scandir(src_path)):
        if not d.is_dir():
            continue
        print(f'\nprocessing {d.name}')
        d_name = d.name
        d_name = d_name[:min(len(d_name), target_max_length-1)]
        rename_2_short_name(
            src_path=d.path,
            save_path=os.path.join(save_path, d_name),
            target_max_length=target_max_length,
            extension=extension
        )


================================================
FILE: lib/user_agents.py
================================================
"""
user_agents.py

user agents
20230702
"""


user_agents = [
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) '
    'Gecko/20071127 Firefox/2.0.0.11',

    'Opera/9.25 (Windows NT 5.1; U; en)',

    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; '
    '.NET CLR 1.1.4322; .NET CLR 2.0.50727)',

    'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) '
    'KHTML/3.5.5 (like Gecko) (Kubuntu)',

    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) '
    'Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',

    'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',

    "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 "
    "(KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 "
    "Chrome/16.0.912.77 Safari/535.7",

    "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) "
    "Gecko/20100101 Firefox/10.0 ",

    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
    'AppleWebKit/537.36 (KHTML, like Gecko) '
    'Chrome/105.0.0.0 Safari/537.36'

]

================================================
FILE: sharelinks.md
================================================
# SHARE LINKS
Aliyun share links

注：阿里云盘更新了协议，**一个分享链接最多只能分享不超过500个文件**，所以我进行了拆分，一个链接放499个文件，直至分享完。

## CVPR

### main conference

| year | index |                       share link                       | access code |   
|:----:|:-----:|:------------------------------------------------------:|:-----------:|
| 2023 |   1   |   [1-499](https://www.aliyundrive.com/s/SGMUABYNoRM)   |   `63un`    |  
| 2023 |   2   |  [500-998](https://www.aliyundrive.com/s/XeXJz53AVKn)  |   `7ws5`    |  
| 2023 |   3   | [999-1497](https://www.aliyundrive.com/s/9wjv8gaE95i)  |   `1er4`    |  
| 2023 |   4   | [1498-1996](https://www.aliyundrive.com/s/kqt4GNYmSYR) |   `lf58`    | 
| 2023 |   5   | [1997-2358](https://www.aliyundrive.com/s/GyyyD4XnqhZ) |   `f47s`    | 


### workshops

| year | index |                      share link                      | access code |   
|:----:|:-----:|:----------------------------------------------------:|:-----------:|
| 2023 |   1   |  [1-485](https://www.aliyundrive.com/s/gPtPRYcyttz)  |   `4n5t`    |  
| 2023 |   2   | [486-698](https://www.aliyundrive.com/s/x18A9AxPJGp) |   `x40h`    |