Repository: lmbringas/packtpub-downloader
Branch: master
Commit: 457395e3ae6d
Files: 9
Total size: 13.0 KB
Directory structure:
gitextract_28rn_m5h/
├── .gitignore
├── README.md
├── config.py
├── data.env-sample
├── docker-compose.yml
├── entrypoint.sh
├── main.py
├── requirements.txt
└── user.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
*.env
================================================
FILE: README.md
================================================
# PacktPub Downloader
Script to download all your PacktPub books inspired by https://github.com/ozzieperez/packtpub-library-downloader
Since PacktPub restructured their website [packtpub-library-downloader](https://github.com/ozzieperez/packtpub-library-downloader) became obsolete because the downloader used webscraping. So I figured out that now PacktPub uses a REST API. Then I found which endpoint to use for downloading books and made a simple script. Feel free to fork and PR to improve. Packtpub's API isn't documented :'(
## Usage:
pip install -r requirements.txt
python main.py -e <email> -p <password> [-d <directory> -b <book file types> -s -v -q]
##### Example: Download books in PDF format
python main.py -e hello@world.com -p p@ssw0rd -d ~/Desktop/packt -b pdf,epub,mobi,code
## Docker integration
You must put your data in the `.env` file.
```
mv data.env-sample data.env
```
and replace the sample data with your login credentials.
```
docker-compose up
```
After the execution, you can see the content in the `book` directory.
## Commandline Options
- *-e*, *--email* = Your login email
- *-p*, *--password* = Your login password
- *-d*, *--directory* = Directory to download into. Default is "media/" in the current directory
- *-b*, *--books* = Assets to download. Options are: *pdf,mobi,epub,code*
- *-s*, *--separate* = Create a separate directory for each book
- *-v*, *--verbose* = Show more detailed information
- *-q*, *--quiet* = Don't show information or progress bars
**Book File Types**
- *pdf*: PDF format
- *mobi*: MOBI format
- *epub*: EPUB format
- *code*: Accompanying source code, saved as .zip files
I'm working on Python 3.6.0
================================================
FILE: config.py
================================================
# -*- coding: utf-8 -*-
'''
This file contain all url endpoint
'''
# instead of variables should i change variables to a one big json of urls ?
# this is base url where i do the requests
BASE_URL = "https://services.packtpub.com/"
# URL to request jwt token, params by post are user and pass, return jwt token
AUTH_ENDPOINT = "auth-v1/users/tokens"
# URL to get all your books, two params that i change are offset and limit, method GET
PRODUCTS_ENDPOINT = "entitlements-v1/users/me/products?sort=createdAt:DESC&offset={offset}&limit={limit}"
# URL to get types , param is book id, method GET
URL_BOOK_TYPES_ENDPOINT = "products-v1/products/{book_id}/types"
# URL to get url file to download, params are book id and format of the file (can be pdf, epub, etc..), method GET
URL_BOOK_ENDPOINT = "products-v1/products/{book_id}/files/{format}"
================================================
FILE: data.env-sample
================================================
EMAIL=email@example.com
PASSWORD=example$password
================================================
FILE: docker-compose.yml
================================================
version: '3.3'
services:
packtpub-downloader:
image: python:3.6.0
container_name: "packtpub-downloader"
env_file:
- data.env
volumes:
- "./:/app"
command: "/bin/bash /app/entrypoint.sh"
================================================
FILE: entrypoint.sh
================================================
pip install -r /app/requirements.txt
python /app/main.py -e $EMAIL -p $PASSWORD -d /app/book -b pdf,mobi,epub,code
================================================
FILE: main.py
================================================
# -*- coding: utf-8 -*-
#!/usr/bin/python
from __future__ import print_function
import os
import sys
import glob
import math
import getopt
import requests
from tqdm import tqdm, trange
from config import BASE_URL, PRODUCTS_ENDPOINT, URL_BOOK_TYPES_ENDPOINT, URL_BOOK_ENDPOINT
from user import User
#TODO: I should do a function that his only purpose is to request and return data
def book_request(user, offset=0, limit=10, verbose=False):
data = []
url = BASE_URL + PRODUCTS_ENDPOINT.format(offset=offset, limit=limit)
if verbose:
print(url)
r = requests.get(url, headers=user.get_header())
data += r.json().get('data', [])
return url, r, data
def get_books(user, offset=0, limit=10, is_verbose=False, is_quiet=False):
'''
Request all your books, return json with info of all your books
Params
...
header : str
offset : int
limit : int
how many book wanna get by request
'''
# TODO: given x time jwt expired and should refresh the header, user.refresh_header()
url, r, data = book_request(user, offset, limit)
print(f'You have {str(r.json()["count"])} books')
print("Getting list of books...")
if not is_quiet:
pages_list = trange(r.json()['count'] // limit, unit='Pages')
else:
pages_list = range(r.json()['count'] // limit)
for i in pages_list:
offset += limit
data += book_request(user, offset, limit, is_verbose)[2]
return data
def get_url_book(user, book_id, format='pdf'):
'''
Return url of the book to download
'''
url = BASE_URL + URL_BOOK_ENDPOINT.format(book_id=book_id, format=format)
r = requests.get(url, headers=user.get_header())
if r.status_code == 200: # success
return r.json().get('data', '')
elif r.status_code == 401: # jwt expired
user.refresh_header() # refresh token
get_url_book(user, book_id, format) # call recursive
print('ERROR (please copy and paste in the issue)')
print(r.json())
print(r.status_code)
return ''
def get_book_file_types(user, book_id):
'''
Return a list with file types of a book
'''
url = BASE_URL + URL_BOOK_TYPES_ENDPOINT.format(book_id=book_id)
r = requests.get(url, headers=user.get_header())
if (r.status_code == 200): # success
return r.json()['data'][0].get('fileTypes', [])
elif (r.status_code == 401): # jwt expired
user.refresh_header() # refresh token
get_book_file_types(user, book_id, format) # call recursive
print('ERROR (please copy and paste in the issue)')
print(r.json())
print(r.status_code)
return []
# TODO: i'd like that this functions be async and download faster
def download_book(filename, url):
'''
Download your book
'''
print('Starting to download ' + filename)
with open(filename, 'wb') as f:
r = requests.get(url, stream=True)
total = r.headers.get('content-length')
if total is None:
f.write(response.content)
else:
total = int(total)
# TODO: read more about tqdm
for chunk in tqdm(r.iter_content(chunk_size=1024), total=math.ceil(total//1024), unit='KB', unit_scale=True):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush()
print('Finished ' + filename)
def make_zip(filename):
if filename[-4:] == 'code':
os.replace(filename, filename[:-4] + 'zip')
def move_current_files(root, book):
sub_dir = f'{root}/{book}'
does_dir_exist(sub_dir)
for f in glob.iglob(sub_dir + '.*'):
try:
os.rename(f, f'{sub_dir}/{book}' + f[f.index('.'):])
except OSError:
os.rename(f, f'{sub_dir}/{book}' + '_1' + f[f.index('.'):])
except ValueError as e:
print(e)
print('Skipping')
def does_dir_exist(directory):
if not os.path.exists(directory):
try:
os.makedirs(directory)
except Exception as e:
print(e)
sys.exit(2)
def main(argv):
# thanks to https://github.com/ozzieperez/packtpub-library-downloader/blob/master/downloader.py
email = None
password = None
root_directory = 'media'
book_file_types = ['pdf', 'mobi', 'epub', 'code']
separate = None
verbose = None
quiet = None
errorMessage = 'Usage: main.py -e <email> -p <password> [-d <directory> -b <book file types> -s -v -q]'
# get the command line arguments/options
try:
opts, args = getopt.getopt(
argv, 'e:p:d:b:svq', ['email=', 'pass=', 'directory=', 'books=', 'separate', 'verbose', 'quiet'])
except getopt.GetoptError:
print(errorMessage)
sys.exit(2)
# hold the values of the command line options
for opt, arg in opts:
if opt in ('-e', '--email'):
email = arg
elif opt in ('-p', '--pass'):
password = arg
elif opt in ('-d', '--directory'):
root_directory = os.path.expanduser(
arg) if '~' in arg else os.path.abspath(arg)
elif opt in ('-b', '--books'):
book_file_types = arg.split(',')
elif opt in ('-s', '--separate'):
separate = True
elif opt in ('-v', '--verbose'):
verbose = True
elif opt in ('-q', '--quiet'):
quiet = True
if verbose and quiet:
print("Verbose and quiet cannot be used together.")
sys.exit(2)
# do we have the minimum required info?
if not email or not password:
print(errorMessage)
sys.exit(2)
# check if not exists dir and create
does_dir_exist(root_directory)
# create user with his properly header
user = User(email, password)
# get all your books
books = get_books(user, is_verbose=verbose, is_quiet=quiet)
print('Downloading books...')
if not quiet:
books_iter = tqdm(books, unit='Book')
else:
books_iter = books
for book in books_iter:
# get the different file type of current book
file_types = get_book_file_types(user, book['productId'])
for file_type in file_types:
if file_type in book_file_types: # check if the file type entered is available by the current book
book_name = book['productName'].replace(' ', '_').replace('.', '_').replace(':', '_').replace('/','')
if separate:
filename = f'{root_directory}/{book_name}/{book_name}.{file_type}'
move_current_files(root_directory, book_name)
else:
filename = f'{root_directory}/{book_name}.{file_type}'
# get url of the book to download
url = get_url_book(user, book['productId'], file_type)
if not os.path.exists(filename) and not os.path.exists(filename.replace('.code', '.zip')):
download_book(filename, url)
make_zip(filename)
else:
if verbose:
tqdm.write(f'{filename} already exists, skipping.')
if __name__ == '__main__':
main(sys.argv[1:])
================================================
FILE: requirements.txt
================================================
aiofiles==0.4.0
aiohttp==3.5.4
async-timeout==3.0.1
attrs==18.2.0
certifi==2018.11.29
chardet==3.0.4
idna==2.8
idna-ssl==1.1.0
multidict==4.5.2
requests==2.21.0
tqdm==4.30.0
typing-extensions==3.7.2
urllib3==1.24.1
yarl==1.3.0
================================================
FILE: user.py
================================================
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys
import requests
from config import BASE_URL, AUTH_ENDPOINT
class User:
"""
User object that contain his header
"""
username = ""
password = ""
# need to fill Authoritazion with current token provide by api
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 " +
"(KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"Authorization":""
}
def __init__(self, username, password):
self.username = username
self.password = password
self.header["Authorization"] = self.get_token()
def get_token(self):
"""
Request auth endpoint and return user token
"""
url = BASE_URL+AUTH_ENDPOINT
# use json paramenter because for any reason they send user and pass in plain text :'(
r = requests.post(url, json={'username':self.username, 'password':self.password})
if r.status_code == 200:
print("You are in!")
return 'Bearer ' + r.json()['data']['access']
# except should happend when user and pass are incorrect
print("Error login, check user and password")
print("Error {}".format(e))
sys.exit(2)
def get_header(self):
return self.header
def refresh_header(self):
"""
Refresh jwt because it expired and returned
"""
self.header["Authorization"] = self.get_token()
return self.header
gitextract_28rn_m5h/ ├── .gitignore ├── README.md ├── config.py ├── data.env-sample ├── docker-compose.yml ├── entrypoint.sh ├── main.py ├── requirements.txt └── user.py
SYMBOL INDEX (14 symbols across 2 files)
FILE: main.py
function book_request (line 17) | def book_request(user, offset=0, limit=10, verbose=False):
function get_books (line 27) | def get_books(user, offset=0, limit=10, is_verbose=False, is_quiet=False):
function get_url_book (line 54) | def get_url_book(user, book_id, format='pdf'):
function get_book_file_types (line 75) | def get_book_file_types(user, book_id):
function download_book (line 97) | def download_book(filename, url):
function make_zip (line 118) | def make_zip(filename):
function move_current_files (line 123) | def move_current_files(root, book):
function does_dir_exist (line 136) | def does_dir_exist(directory):
function main (line 145) | def main(argv):
FILE: user.py
class User (line 8) | class User:
method __init__ (line 21) | def __init__(self, username, password):
method get_token (line 26) | def get_token(self):
method get_header (line 42) | def get_header(self):
method refresh_header (line 45) | def refresh_header(self):
Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (14K chars).
[
{
"path": ".gitignore",
"chars": 1210,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": "README.md",
"chars": 1688,
"preview": "# PacktPub Downloader\n\nScript to download all your PacktPub books inspired by https://github.com/ozzieperez/packtpub-lib"
},
{
"path": "config.py",
"chars": 854,
"preview": "# -*- coding: utf-8 -*-\n\n'''\n This file contain all url endpoint \n'''\n\n# instead of variables should i change variabl"
},
{
"path": "data.env-sample",
"chars": 49,
"preview": "EMAIL=email@example.com\nPASSWORD=example$password"
},
{
"path": "docker-compose.yml",
"chars": 221,
"preview": "version: '3.3'\n\nservices:\n packtpub-downloader:\n image: python:3.6.0\n container_name: \"packtpub-downloader\"\n e"
},
{
"path": "entrypoint.sh",
"chars": 114,
"preview": "pip install -r /app/requirements.txt\npython /app/main.py -e $EMAIL -p $PASSWORD -d /app/book -b pdf,mobi,epub,code"
},
{
"path": "main.py",
"chars": 7325,
"preview": "# -*- coding: utf-8 -*-\n#!/usr/bin/python\n\nfrom __future__ import print_function\nimport os\nimport sys\nimport glob\nimport"
},
{
"path": "requirements.txt",
"chars": 227,
"preview": "aiofiles==0.4.0\naiohttp==3.5.4\nasync-timeout==3.0.1\nattrs==18.2.0\ncertifi==2018.11.29\nchardet==3.0.4\nidna==2.8\nidna-ssl="
},
{
"path": "user.py",
"chars": 1573,
"preview": "# -*- coding: utf-8 -*-\n\nfrom __future__ import print_function\nimport sys\nimport requests\nfrom config import BASE_URL, A"
}
]
About this extraction
This page contains the full source code of the lmbringas/packtpub-downloader GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (13.0 KB), approximately 3.6k tokens, and a symbol index with 14 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.