Introduction
=============
This project was started to bring together useful Python code snippets that make
coding faster, easier, and more enjoyable. You can explore all the cheat sheets at
`Pysheeet `_. Contributions are always welcome—feel
free to fork the repo and submit a pull request to help it grow!
Plugin
======
**pysheeet** is available as a Claude Code plugin. Once installed, Claude
automatically uses the cheat sheets to answer Python questions — just ask
naturally and the skill triggers based on context.
Installation
------------
**As a Claude Code plugin (recommended):**
.. code-block:: bash
# Step 1: Add the marketplace
claude plugin marketplace add crazyguitar/pysheeet
# Step 2: Install the plugin
claude plugin install pysheeet@pysheeet
**Local testing (single session only):**
.. code-block:: bash
claude --plugin-dir /path/to/pysheeet
**Manual installation (requires cloning the repo):**
.. code-block:: bash
git clone https://github.com/crazyguitar/pysheeet.git
mkdir -p ~/.claude/skills
cp -r pysheeet/skills/py ~/.claude/skills/py
What's New In Python 3
======================
This part only provides a quick glance at some important features in Python 3.
If you're interested in all of the most important features, please read the
official document, `What’s New in Python `_.
- `New in Python3 `_
Cheat Sheet
===========
Core Python fundamentals including data types, functions, classes, and commonly
used patterns for everyday programming tasks.
- `From Scratch `_
- `Future `_
- `Typing `_
- `Class `_
- `Function `_
- `Unicode `_
- `List `_
- `Set `_
- `Dictionary `_
- `Heap `_
- `Generator `_
- `Regular expression `_
System
======
Date/time handling, file I/O, and operating system interfaces.
- `Datetime `_ - Timestamps, formatting, parsing, timezones, timedelta
- `Files and I/O `_ - Reading, writing, pathlib, shutil, tempfile
- `Operating System `_ - Processes, environment, system calls
Concurrency
===========
Threading, multiprocessing, and concurrent.futures for parallel execution.
Covers synchronization primitives, process pools, and bypassing the GIL.
- `Threading `_ - Threads, locks, semaphores, events, conditions
- `Multiprocessing `_ - Processes, pools, shared memory, IPC
- `concurrent.futures `_ - Executors, futures, callbacks
Asyncio
=======
Asynchronous programming with Python's ``asyncio`` module. Covers coroutines,
event loops, tasks, networking, and advanced patterns.
- `A Hitchhiker's Guide to Asynchronous Programming `_ - Design philosophy and evolution
- `Asyncio Basics `_ - Coroutines, tasks, gather, timeouts
- `Asyncio Networking `_ - TCP/UDP servers, HTTP, SSL/TLS
- `Asyncio Advanced `_ - Synchronization, queues, subprocesses
C/C++ Extensions
================
Native extensions for performance-critical code. Covers modern pybind11 (used by
PyTorch, TensorFlow), ctypes, cffi, Cython, and the traditional Python C API.
Also includes a guide for Python developers learning modern C++ syntax.
- `ctypes `_ - Load shared libraries without compilation
- `Python C API `_ - Traditional C extension reference
- `Modern C/C++ Extensions `_ - pybind11, Cython
- `Learn C++ from Python `_ - Modern C++ for Python developers
Security
========
Modern cryptographic practices and common security vulnerabilities. Covers
encryption, TLS/SSL, and why legacy patterns are dangerous.
- `Modern Cryptography `_ - AES-GCM, RSA-OAEP, Ed25519, Argon2
- `TLS/SSL and Certificates `_ - HTTPS servers, certificate generation
- `Common Vulnerabilities `_ - Padding oracle, injection, timing attacks
Network
=======
Low-level network programming with Python sockets. Covers TCP/UDP communication,
server implementations, asynchronous I/O, SSL/TLS encryption, and packet analysis.
- `Socket Basics `_
- `Socket Servers `_
- `Async Socket I/O `_
- `SSL/TLS Sockets `_
- `Packet Sniffing `_
- `SSH and Tunnels `_
Database
========
Database access with SQLAlchemy, Python's most popular ORM. Covers connection
management, raw SQL, object-relational mapping, and common query patterns.
- `SQLAlchemy Basics `_
- `SQLAlchemy ORM `_
- `SQLAlchemy Query Recipes `_
LLM
===
Large Language Models (LLM) training, inference, and optimization. Covers PyTorch
for model development, distributed training across GPUs, and vLLM/SGLang for
high-performance LLM inference and serving.
- `PyTorch `_ - Tensors, autograd, neural networks, training loops
- `Megatron `_ - NVIDIA Megatron training/fine-tuning framework with enroot/pyxis
- `LLM Serving `_ - vLLM and SGLang for production inference with TP/PP/DP/EP
- `LLM Benchmark `_ - Benchmark suite for measuring serving performance
HPC
===
High-Performance Computing tools for cluster management and job scheduling.
Covers Slurm workload manager for distributed computing and GPU clusters.
- `Slurm `_
Blog
====
Supplementary topics covering Python internals, debugging techniques, and
language features that don't fit elsewhere.
- `Is Disaggregated Prefill/Decode a Silver Bullet for LLM Serving? `_
- `Monitoring EFA with NCCL GIN and Nsys `_
- `GPU-Initiated Networking for NCCL on AWS `_
- `PEP 572 and the walrus operator `_
- `Python Interpreter in GNU Debugger `_
PDF Version
============
`pdf`_
.. _pdf: https://media.readthedocs.org/pdf/pysheeet/latest/pysheeet.pdf
How to run the server
=======================
.. code-block:: bash
$ virtualenv venv
$ . venv/bin/activate
$ pip install -r requirements.txt
$ make
$ python app.py
# URL: localhost:5000
================================================
FILE: app.py
================================================
# -*- coding: utf-8 -*-
"""This is a simple cheatsheet webapp."""
import os
from flask import Flask, abort, send_from_directory, render_template
from flask_sslify import SSLify
from flask_seasurf import SeaSurf
from flask_talisman import Talisman
from werkzeug.exceptions import NotFound
from werkzeug.utils import safe_join
DIR = os.path.dirname(os.path.realpath(__file__))
ROOT = os.path.join(DIR, "docs", "_build", "html")
def find_key(token):
"""Find the key from the environment variable."""
if token == os.environ.get("ACME_TOKEN"):
return os.environ.get("ACME_KEY")
for k, v in os.environ.items():
if v == token and k.startswith("ACME_TOKEN_"):
n = k.replace("ACME_TOKEN_", "")
return os.environ.get("ACME_KEY_{}".format(n))
csp = {
"default-src": "'none'",
"style-src": ["'self'", "'unsafe-inline'"],
"script-src": [
"'self'",
"*.cloudflare.com",
"*.cloudflareinsights.com",
"*.googletagmanager.com",
"*.google-analytics.com",
"*.carbonads.com",
"*.carbonads.net",
"cdn.carbonads.com",
"srv.carbonads.net",
"'unsafe-inline'",
"'unsafe-eval'",
],
"connect-src": [
"'self'",
"*.google-analytics.com",
"*.analytics.google.com",
"analytics.google.com",
"*.googletagmanager.com",
"*.carbonads.com",
"*.carbonads.net",
"*.doubleclick.net",
],
"font-src": "'self'",
"form-action": "'self'",
"base-uri": "'self'",
"img-src": "*",
"frame-src": ["ghbtns.com", "*.carbonads.com", "*.carbonads.net"],
"frame-ancestors": "'none'",
"object-src": "'none'",
}
feature_policy = {"geolocation": "'none'"}
app = Flask(__name__, template_folder=ROOT)
app.config["SECRET_KEY"] = os.urandom(16)
app.config["SESSION_COOKIE_NAME"] = "__Secure-session"
app.config["SESSION_COOKIE_SAMESITE"] = "Strict"
app.config["CSRF_COOKIE_NAME"] = "__Secure-csrf-token"
app.config["CSRF_COOKIE_HTTPONLY"] = True
app.config["CSRF_COOKIE_SECURE"] = True
csrf = SeaSurf(app)
talisman = Talisman(
app,
force_https=False,
content_security_policy=csp,
feature_policy=feature_policy,
)
if "DYNO" in os.environ:
sslify = SSLify(app, permanent=True, skips=[".well-known"])
@app.errorhandler(404)
def page_not_found(e):
"""Redirect to 404.html."""
return render_template("404.html"), 404
@app.route("/")
def static_proxy(path):
"""Find static files safely."""
try:
return send_from_directory(ROOT, path)
except NotFound:
# Handle file not found or directory errors
return render_template("404.html"), 404
@app.route("/")
def index_redirection():
"""Redirecting index file."""
return send_from_directory(ROOT, "index.html")
@csrf.exempt
@app.route("/.well-known/acme-challenge/")
def acme(token):
"""Find the acme-key from environment variable."""
key = find_key(token)
if key is None:
abort(404)
return key
if __name__ == "__main__":
# Only run the app in debug mode during development
app.run(debug=os.environ.get("FLASK_ENV") == "development")
================================================
FILE: app_test.py
================================================
"""Test app.py."""
import multiprocessing
import platform
import unittest
import requests
import os
from pathlib import Path
from werkzeug.exceptions import NotFound
from flask_testing import LiveServerTestCase
from app import acme, find_key, static_proxy, index_redirection, page_not_found
from app import ROOT
from app import app
if platform.system() == "Darwin":
multiprocessing.set_start_method("fork")
class PysheeetTest(LiveServerTestCase):
"""Test app."""
def create_app(self):
"""Create a app for test."""
# remove env ACME_TOKEN*
for k, v in os.environ.items():
if not k.startswith("ACME_TOKEN"):
continue
del os.environ[k]
self.token = "token"
self.key = "key"
os.environ["ACME_TOKEN"] = self.token
os.environ["ACME_KEY"] = self.key
os.environ["FLASK_ENV"] = "development"
os.environ["FLASK_DEBUG"] = "1"
app.config["TESTING"] = True
app.config["LIVESERVER_PORT"] = 0
return app
def check_security_headers(self, resp):
"""Check security headers."""
headers = resp.headers
self.assertTrue("Content-Security-Policy" in headers)
self.assertTrue("X-Content-Type-Options" in headers)
self.assertTrue("Content-Security-Policy" in headers)
self.assertTrue("Feature-Policy" in headers)
self.assertEqual(headers["Feature-Policy"], "geolocation 'none'")
self.assertEqual(headers["X-Frame-Options"], "SAMEORIGIN")
def check_csrf_cookies(self, resp):
"""Check cookies for csrf."""
cookies = resp.cookies
self.assertTrue(cookies.get("__Secure-session"))
self.assertTrue(cookies.get("__Secure-csrf-token"))
def test_index_redirection_req(self):
"""Test that send a request for the index page."""
url = self.get_server_url()
resp = requests.get(url)
self.check_security_headers(resp)
self.check_csrf_cookies(resp)
self.assertEqual(resp.status_code, 200)
def test_static_proxy_req(self):
"""Test that send a request for notes."""
url = self.get_server_url()
notes = Path(ROOT) / "notes"
for html in notes.rglob("*.html"):
page = html.relative_to(ROOT)
u = f"{url}/{page}"
resp = requests.get(u)
self.check_security_headers(resp)
self.check_csrf_cookies(resp)
self.assertEqual(resp.status_code, 200)
def test_acme_req(self):
"""Test that send a request for a acme key."""
url = self.get_server_url()
u = url + "/.well-known/acme-challenge/token"
resp = requests.get(u)
self.check_security_headers(resp)
self.assertEqual(resp.status_code, 200)
u = url + "/.well-known/acme-challenge/foo"
resp = requests.get(u)
self.check_security_headers(resp)
self.assertEqual(resp.status_code, 404)
def test_find_key(self):
"""Test that find a acme key from the environment."""
token = self.token
key = self.key
self.assertEqual(find_key(token), key)
del os.environ["ACME_TOKEN"]
del os.environ["ACME_KEY"]
os.environ["ACME_TOKEN_ENV"] = token
os.environ["ACME_KEY_ENV"] = key
self.assertEqual(find_key(token), key)
del os.environ["ACME_TOKEN_ENV"]
del os.environ["ACME_KEY_ENV"]
def test_acme(self):
"""Test that send a request for a acme key."""
token = self.token
key = self.key
self.assertEqual(acme(token), key)
token = token + "_env"
key = key + "_env"
os.environ["ACME_TOKEN_ENV"] = token
os.environ["ACME_KEY_ENV"] = key
self.assertEqual(find_key(token), key)
del os.environ["ACME_TOKEN_ENV"]
del os.environ["ACME_KEY_ENV"]
self.assertRaises(NotFound, acme, token)
def test_index_redirection(self):
"""Test index page redirection."""
resp = index_redirection()
self.assertEqual(resp.status_code, 200)
resp.close()
def test_static_proxy(self):
"""Test that request static pages."""
notes = Path(ROOT) / "notes"
for html in notes.rglob("*.html"):
u = html.relative_to(ROOT)
resp = static_proxy(u)
self.assertEqual(resp.status_code, 200)
resp.close()
u = "notes/../conf.py"
_, code = static_proxy(u)
self.assertEqual(code, 404)
def test_page_not_found(self):
"""Test page not found."""
html, status_code = page_not_found(None)
self.assertEqual(status_code, 404)
if __name__ == "__main__":
unittest.main()
================================================
FILE: docs/404.rst
================================================
:orphan:
404 Page Not Found
==================
What you were looking for is just not there.
`Click here to go back to homepage. >`_
================================================
FILE: docs/Makefile
================================================
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help
help:
@echo "Please use \`make ' where is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " applehelp to make an Apple Help Book"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
@echo " coverage to run coverage check of the documentation (if enabled)"
.PHONY: clean
clean:
rm -rf $(BUILDDIR)/*
.PHONY: html
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
.PHONY: dirhtml
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
.PHONY: singlehtml
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
.PHONY: pickle
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
.PHONY: json
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
.PHONY: htmlhelp
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
.PHONY: qthelp
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pysheeet.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pysheeet.qhc"
.PHONY: applehelp
applehelp:
$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
@echo
@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
@echo "N.B. You won't be able to view it unless you put it in" \
"~/Library/Documentation/Help or install it in your application" \
"bundle."
.PHONY: devhelp
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/pysheeet"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pysheeet"
@echo "# devhelp"
.PHONY: epub
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
.PHONY: latex
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
.PHONY: latexpdf
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: latexpdfja
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: text
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
.PHONY: man
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
.PHONY: texinfo
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
.PHONY: info
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
.PHONY: gettext
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
.PHONY: changes
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
.PHONY: linkcheck
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
.PHONY: doctest
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
.PHONY: coverage
coverage:
$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
@echo "Testing of coverage in the sources finished, look at the " \
"results in $(BUILDDIR)/coverage/python.txt."
.PHONY: xml
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
.PHONY: pseudoxml
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
================================================
FILE: docs/_extra/robots.txt
================================================
User-agent: *
Allow: /
Sitemap: https://www.pythonsheets.com/sitemap.xml
================================================
FILE: docs/_static/.gitignore
================================================
# Ignore everything in this directory
*
# Except this file
!.gitignore
!guido.png
!logo.svg
!style.css
!carbonad.css
!favicon.ico
================================================
FILE: docs/_static/carbonad.css
================================================
#carbonads {
display: block;
overflow: hidden;
padding: 1em;
padding-bottom: 0.3em;
line-height: 1.5;
margin-top: 10px;
margin-bottom: 10px;
}
#carbonads a {
text-decoration: none !important;
border-bottom: none;
}
#carbonads a:hover {
color: inherit;
}
#carbonads span {
display: block;
overflow: hidden;
}
.carbon-img {
display: block;
margin: 0 auto 8px;
}
.carbon-text {
display: block;
text-align: left;
margin-bottom: .1em;
color: #666;
}
.carbon-poweredby {
display: block;
text-align: left;
font-size: .9em;
color: #888 !important;
}
@media only screen and (min-width: 320px) and (max-width: 875px) {
#carbonads {
float: none;
max-width: 330px;
border: 0;
display: block;
overflow: hidden;
margin-top: 20px;
margin-bottom: 20px;
border-radius: 4px;
text-align: center;
box-shadow: 0 0 0 1px hsla(0, 0%, 0%, .1);
font-size: var(--font-size);
background-color: #eee;
line-height: 1.5;
}
#carbonads span {
position: relative;
}
#carbonads > span {
max-width: none;
}
.carbon-img {
float: left;
margin: 0;
}
.carbon-img img {
max-width: 130px !important;
}
.carbon-text {
float: left;
margin-bottom: 0;
padding: 8px 20px;
text-align: left;
color: #333 !important;
max-width: calc(100% - 130px - 3em);
}
.carbon-poweredby {
left: 130px;
bottom: 0;
display: block;
color: #555 !important;
text-align: right;
width: 100%;
}
}
================================================
FILE: docs/_static/style.css
================================================
nav#table-of-contents {
display: none;
}
div.highlight > pre {
font-size: 14px;
border-radius: 3px;
background: #f6f8fa !important;
border: 1px solid #000000 !important;
}
:root {
--cu-boulder-gold: #CFB87C;
}
.bd-container {
max-width: 99%;
}
.bd-container .bd-container__inner {
max-width: 99%;
}
.bd-main .bd-content .bd-article-container {
max-width: 100em;
}
.code-block-caption {
color: black;
}
.bd-sidebar-primary li.has-children>details>summary .toctree-toggle {
justify-content: left;
}
html[data-theme=light] {
--pst-font-size-base: none;
--pst-color-secondary: #176de8;
--pst-color-primary: #176de8;
}
.graph#doc-flowchart .node text {
font-weight: bold;
}
.bd-content .sd-tab-set .sd-tab-content {
padding: 1.5rem;
}
a {
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
button.theme-switch-button {
display: none !important;
}
blockquote {
background-color: transparent;
border: none;
}
================================================
FILE: docs/_templates/carbonad.html
================================================
================================================
FILE: docs/_templates/cheatsheets.html
================================================
This project tries to provide many snippets of Python code that make life easier.
================================================
FILE: docs/conf.py
================================================
# -*- coding: utf-8 -*-
#
# python-cheatsheet documentation build configuration file, created by
# sphinx-quickstart on Sun Feb 28 09:26:04 2016.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
from datetime import datetime
import os
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.viewcode',
'myst_parser',
'sphinx_copybutton',
'sphinx.ext.graphviz',
'sphinx_design',
'sphinx.ext.extlinks'
]
myst_enable_extensions = [
"colon_fence",
"attrs_inline",
"attrs_block",
"tasklist",
"substitution",
]
myst_enable_checkboxes = True
myst_heading_anchors = 6
copybutton_prompt_text = r'^\$ '
copybutton_prompt_is_regexp = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
year = datetime.now().year
project = u'pysheeet'
copyright = u'2016-{}, crazyguitar'.format(year)
author = u'crazyguitar'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'0.1.0'
# The full version, including alpha/beta/rc tags.
release = u'0.1.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = []
# The reST default role (used for this markup: `text`) to use for all
# documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
#keep_warnings = False
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'sphinx_book_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
"repository_url": "https://github.com/crazyguitar/pysheeet",
"use_repository_button": True,
}
# Custom sidebar templates
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# " v documentation".
html_title = "Python Cheat Sheet"
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
html_logo = "_static/logo.svg"
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
html_favicon = '_static/favicon.ico'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_css_files = ['style.css']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
html_extra_path = ['_extra']
html_context = {
"tracking_id": os.environ.get("TRACKING_ID"),
}
has_carbonad = os.environ.get("CARBONAD_SERVE") and os.environ.get("CARBONAD_PLACEMENT")
if has_carbonad:
html_context["carbonad_serve"] = os.environ.get("CARBONAD_SERVE")
html_context["carbonad_placement"] = os.environ.get("CARBONAD_PLACEMENT")
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Language to be used for generating the HTML full-text search index.
# Sphinx supports the following languages:
# 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
# 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr'
#html_search_language = 'en'
# A dictionary with options for the search language support, empty by default.
# Now only 'ja' uses this config value
#html_search_options = {'type': 'default'}
# The name of a javascript file (relative to the configuration directory) that
# implements a search results scorer. If empty, the default will be used.
#html_search_scorer = 'scorer.js'
# Output file base name for HTML help builder.
htmlhelp_basename = 'python-cheatsheetdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
# Latex figure (float) alignment
#'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'python-cheatsheet.tex', u'python-cheatsheet Documentation',
u'crazyguitar', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'python-cheatsheet', u'python-cheatsheet Documentation',
[author], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'python-cheatsheet', u'python-cheatsheet Documentation',
author, 'python-cheatsheet', 'One line description of project.',
'Miscellaneous'),
]
html_sidebars = {
"**": [
"navbar-logo.html",
"search-button-field.html",
"sbt-sidebar-nav.html",
"carbonad.html",
]
}
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False
def add_html_link(app, pagename, templatename, context, doctree):
"""Append html page."""
if pagename in ['404', 'search', 'genindex']:
return
app.sitemaps.append({
'pagename': pagename + ".html",
'priority': '1.0' if pagename == 'index' else '0.8',
'changefreq': 'weekly' if pagename == 'index' else 'monthly'
})
def create_sitemap(app, exception):
"""Generate a sitemap.xml"""
from xml.etree.ElementTree import ElementTree, Element, SubElement
from datetime import datetime
r = Element("urlset")
r.set("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
r.set("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance")
r.set("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9" +
" http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd")
for link_info in app.sitemaps:
url = SubElement(r, "url")
now = datetime.now()
SubElement(url, "loc").text = app.pysheeet + link_info['pagename']
SubElement(url, "lastmod").text = now.date().isoformat()
SubElement(url, "changefreq").text = link_info['changefreq']
SubElement(url, "priority").text = link_info['priority']
f = app.outdir + "/sitemap.xml"
t = ElementTree(r)
t.write(f, xml_declaration=True, encoding='utf-8', method="xml")
def setup(app):
"""Customize setup."""
site = os.environ.get("PYSHEEET")
if not site:
return
if site[-1] != '/':
site += '/'
# create a sitemap
app.pysheeet = site
app.sitemaps = []
app.connect('html-page-context', add_html_link)
app.connect('build-finished', create_sitemap)
================================================
FILE: docs/index.rst
================================================
.. python-cheatsheet documentation master file, created by
sphinx-quickstart on Sun Feb 28 09:26:04 2016.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. meta::
:description lang=en: Comprehensive Python cheat sheet with practical code snippets, examples, and tutorials for Python developers. Learn Python basics, advanced topics, databases, networking, and more.
:keywords: Python, Python Cheat Sheet, Python Tutorial, Python Examples, Python Code Snippets, Programming, Development, Python Reference, Python Guide
Python Cheat Sheet - Complete Guide with Code Examples
======================================================
Welcome to **pysheeet** - your ultimate Python cheat sheet! This comprehensive resource contains practical Python
code snippets, examples, and tutorials to make coding easier and more efficient for Python developers of all levels.
From basic Python syntax to advanced topics like databases, networking, and multitasking, this cheat sheet serves as your
complete Python reference guide. Ideal for beginners learning Python fundamentals and experienced developers seeking
quick code examples. Whether you're learning Python for web development, data science, automation, or general programming,
you'll find practical examples that save you time and improve your coding efficiency.
Contributions are always welcome—feel free to share ideas for new snippets, improvements, or clearer explanations!
If you'd like to contribute, `fork pysheeet on GitHub`_.
If there is any question or suggestion, please create an issue on `GitHub Issues`_.
.. _fork pysheeet on GitHub: https://github.com/crazyguitar/pysheeet
.. _GitHub Issues: https://github.com/crazyguitar/pysheeet/issues
Plugin
------
**pysheeet** is available as a `Claude Code `_ plugin. Once installed,
Claude automatically uses the cheat sheets to answer Python questions.
.. code-block:: bash
# Step 1: Add the marketplace
claude plugin marketplace add crazyguitar/pysheeet
# Step 2: Install the plugin
claude plugin install pysheeet@pysheeet
For local testing and manual installation, see the main `README `_.
What's New In Python 3
----------------------
The official document, `What's New In Python`_, displays all of the most
important changes. However, if you're too busy to read the whole changes,
this part provides a brief glance of new features in Python 3.
.. _What's New In Python: https://docs.python.org/3/whatsnew/index.html
.. toctree::
:maxdepth: 1
notes/python-new-py3
Python Cheat Sheet
------------------
This section focuses on commonly used Python code snippets. The cheat sheet
covers not only core Python features but also essential data structures,
algorithms, and frequently used modules to help programmers efficiently tackle
everyday tasks.
.. toctree::
:maxdepth: 1
notes/basic/index
notes/os/index
notes/concurrency/index
notes/asyncio/index
notes/network/index
notes/database/index
notes/security/index
notes/extension/index
notes/llm/index
notes/hpc/index
notes/appendix/index
================================================
FILE: docs/notes/appendix/disaggregated-prefill-decode.rst
================================================
.. meta::
:description lang=en: Evaluating disaggregated prefill/decode for LLM serving with vLLM, NIXL, and EFA on AWS
:keywords: LLM, vLLM, NIXL, disaggregated prefill decode, KV cache, EFA, inference serving
Is Disaggregated Prefill/Decode a Silver Bullet for LLM Serving?
================================================================
:Date: 2026-03-10
Abstract
--------
Disaggregated prefill/decode has gained traction as a promising architecture for
LLM serving, separating the compute-intensive prefill phase from the
memory-bound decode phase onto dedicated node groups. Proponents argue that this
separation enables independent scaling and eliminates interference between the
two phases. But is it truly a silver bullet? This article puts the claim to the
test by evaluating disaggregated prefill/decode using vLLM with NIXL over the
AWS Elastic Fabric Adapter (EFA) on a 4-node cluster. We compare data
parallelism and simple load-balanced routing as baselines against disaggregated
configurations. Our results show that while disaggregation dramatically reduces
inter-token latency (ITL), it comes at a significant cost to throughput and
time-to-first-token (TTFT), revealing that the architecture is far from a
universal solution.
Introduction
------------
In standard LLM serving, each node handles both prefill and decode for incoming
requests. The prefill phase is compute-bound and processes the entire input
prompt in parallel, while the decode phase is memory-bandwidth-bound and
generates tokens autoregressively. When both phases share the same GPU pool,
long prefill requests can block decode iterations, increasing inter-token
latency for concurrent requests.
Disaggregated prefill/decode addresses this interference by assigning prefill
and decode to separate node groups. After a prefill node completes prompt
processing, the KV cache is transferred to a decode node via a high-bandwidth
interconnect. NIXL [1]_ (NVIDIA Inference Xfer Library) provides the KV cache
transfer mechanism, and on AWS, this transfer occurs over EFA using the
``LIBFABRIC`` backend.
The appeal is intuitive: by isolating decode nodes from prefill interference,
token generation should proceed at a steady, low-latency pace. However, this
separation introduces new costs — KV cache transfer overhead, prefill node
saturation at long input lengths, and reduced effective cluster capacity for
each phase. The question is whether these trade-offs are worthwhile compared to
simpler alternatives like data parallelism or stateless load-balanced routing.
This experiment uses vLLM [2]_ with the
``NixlConnector`` to orchestrate disaggregated serving, and ``vllm-router`` [3]_ as
a reverse proxy to load-balance requests across node groups. The experiment
code is available under `src/nixl `_ in the companion repository.
Container Image
---------------
The experiment uses a custom Docker image that bundles all required components.
The ``Dockerfile`` builds on ``nvidia/cuda:12.8.1-devel-ubuntu24.04`` and
installs the following stack:
- **GDRCopy** v2.5.1 for GPU-direct memory registration
- **EFA installer** v1.47.0 for AWS Elastic Fabric Adapter support
- **UCX** v1.20.0 built with verbs, rdmacm, and EFA transport
- **NIXL** v0.10.1 with ``LIBFABRIC`` backend for KV cache transfer
- **nixlbench** for standalone NIXL bandwidth/latency microbenchmarks
- **PyTorch** 2.9.1, **flash-attn** 2.8.1, and **DeepGEMM** v2.1.1.post3
- **vLLM** 0.15.1 with ``NixlConnector`` support
- **vllm-router** for load-balancing across disaggregated node groups
The image is built and saved as a portable tarball via the ``Makefile``:
.. code-block:: bash
make docker && make save
This produces ``nixl-latest.tar.gz``, which is distributed to all Slurm nodes
at launch time via ``pigz`` decompression and ``docker load``.
Serving Script
--------------
The ``vllm.sbatch`` script orchestrates multi-node vLLM serving on Slurm. It
accepts two key flags that control the serving topology:
- ``--route R``: splits the allocated nodes into ``R`` identical groups, each
running an independent vLLM instance. A ``vllm-router`` process on the head
node round-robins requests across groups.
- ``--prefill P``: within each group, assigns ``P`` nodes as prefill-only
(``kv_producer``) and the remaining nodes as decode-only (``kv_consumer``).
KV cache transfer between prefill and decode nodes uses ``NixlConnector``
with the ``LIBFABRIC`` backend over EFA.
When ``--prefill 0`` (default), all nodes in a group run standard data-parallel
serving. The script computes ``DP = nodes_per_group * (8 / TP)`` and launches
vLLM with ``--data-parallel-size`` accordingly.
For disaggregated mode, each prefill and decode node runs as an independent
vLLM process with explicit KV transfer configuration:
.. code-block:: bash
# Prefill node
vllm serve ... \
--kv-transfer-config.kv_connector NixlConnector \
--kv-transfer-config.kv_role kv_producer \
--kv-transfer-config.kv_connector_extra_config.backends+ LIBFABRIC
# Decode node
vllm serve ... \
--kv-transfer-config.kv_connector NixlConnector \
--kv-transfer-config.kv_role kv_consumer \
--kv-transfer-config.kv_connector_extra_config.backends+ LIBFABRIC
The router uses ``round_robin`` policy for pure-DP groups and
``consistent_hash`` with ``--vllm-pd-disaggregation`` for PD groups, directing
initial requests to prefill endpoints and subsequent decode traffic to decode
endpoints:
.. code-block:: bash
# Router for pure-DP groups (round-robin across group endpoints)
vllm-router \
--policy round_robin \
--worker-urls http://:8000 http://:8001 \
--host 0.0.0.0 --port 8010
# Router for PD disaggregation (consistent hash with prefill/decode split)
vllm-router \
--policy consistent_hash \
--vllm-pd-disaggregation \
--prefill http://:8000 \
--decode http://:8001 --decode http://:8002 \
--host 0.0.0.0 --port 8010
Each container is launched with ``--privileged``, ``--net=host``, and explicit
``/dev/infiniband/uverbs*`` and ``/dev/gdrdrv`` device mounts to enable
GPU-direct RDMA over EFA.
Benchmark Script
----------------
The ``bench.sh`` script wraps ``vllm bench serve`` and handles Docker image
loading transparently. If the ``vllm`` CLI is not available on the host, the
script re-executes itself inside the container. It points the benchmark client
at the router endpoint (or the direct vLLM endpoint for single-group
configurations):
.. code-block:: bash
bash bench.sh -H -p -- \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--dataset-name random \
--random-input-len 512 --random-output-len 256 \
--num-prompts 1024
Experimental Setup
------------------
All experiments run on 4 nodes with 8 GPUs each (TP=8) using
DeepSeek-V2-Lite as the model. The benchmark uses random input/output data
with 1024 prompts via ``vllm bench serve``.
The configurations are:
- **Baseline (data parallelism)**: 4 nodes, TP=8, DP=4. All nodes serve both
prefill and decode. This is the standard data-parallel serving setup.
- **Route 2**: 2 groups of 2 nodes each, TP=8, DP=2 per group. A router
round-robins requests across groups. Each group independently handles both
prefill and decode.
- **Route 4**: 4 groups of 1 node each, TP=8, no data parallelism. A router
distributes requests across all 4 independent nodes.
- **PD 1P3D**: Disaggregated prefill/decode with 1 prefill node and 3 decode
nodes. KV cache is transferred from the prefill node to decode nodes via NIXL.
- **PD 2P2D**: Disaggregated prefill/decode with 2 prefill nodes and 2 decode
nodes.
.. code-block:: bash
# Exp 1: Baseline — 4 nodes, TP=8, pure DP
salloc -N 4 bash vllm.sbatch \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--gpu-memory-utilization 0.9
# Exp 2: 2 groups × 2 nodes, DP=2 per group, router round-robins
salloc -N 4 bash vllm.sbatch --route 2 \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--gpu-memory-utilization 0.9
# Exp 3: 4 groups × 1 node, no DP, router round-robins
salloc -N 4 bash vllm.sbatch --route 4 \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--gpu-memory-utilization 0.9
# Exp 4: 1 prefill + 3 decode
salloc -N 4 bash vllm.sbatch --prefill 1 \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--gpu-memory-utilization 0.9
# Exp 5: 2 prefill + 2 decode
salloc -N 4 bash vllm.sbatch --prefill 2 \
--model /fsx/models/deepseek-ai/DeepSeek-V2-Lite \
--gpu-memory-utilization 0.9
Results
-------
We evaluate each configuration along four metrics: output token throughput,
request throughput, time to first token (TTFT), and inter-token latency (ITL).
Each plot contains two panels — the left panel sweeps input length with a fixed
output length of 256 tokens (prefill-dominated regime), while the right panel
sweeps output length with a fixed input length of 512 tokens (decode-dominated
regime). This allows us to observe how each configuration behaves when the
workload shifts from prefill-heavy to decode-heavy.
Microbenchmark: KV Cache Transfer Bandwidth
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before examining end-to-end serving results, we use ``nixlbench`` to measure
the raw NIXL transfer bandwidth over EFA between two nodes. This establishes
an upper bound on KV cache transfer speed and helps contextualize the TTFT
overhead observed in disaggregated configurations.
The benchmark runs in Multi-GPU (MG) mode with all 8 GPUs per node performing
VRAM-to-VRAM transfers over the ``LIBFABRIC`` backend:
.. code-block:: bash
salloc -N 2 bash nixl.sbatch --backend LIBFABRIC \
--initiator_seg_type VRAM --target_seg_type VRAM \
--mode MG --num_initiator_dev 8 --num_target_dev 8
Block Size (B) Batch Size B/W (GB/Sec) Avg Lat. (us) P99 Tx (us)
---------------------------------------------------------------------------------
4096 1 0.670064 6.1 47.0
8192 1 1.315392 6.2 45.0
16384 1 2.511416 6.5 47.0
32768 1 4.820423 6.8 50.0
65536 1 8.733224 7.5 56.0
131072 1 12.341950 10.6 52.0
262144 1 23.272188 11.3 59.0
524288 1 43.365764 12.1 62.0
1048576 1 74.816773 14.0 77.0
2097152 1 121.086563 17.3 105.0
4194304 1 180.631395 23.2 146.0
8388608 1 239.037623 35.1 247.0
16777216 1 289.500030 58.0 432.0
33554432 1 327.436372 102.5 796.0
67108864 1 349.608429 192.0 1724.0
**Mapping to DeepSeek-V2-Lite KV cache transfer.** DeepSeek-V2-Lite uses
Multi-head Latent Attention (MLA), which compresses the KV cache into a latent
vector per token per layer. The per-token-per-layer KV cache size is
``(kv_lora_rank + qk_rope_head_dim) × dtype_size = (512 + 64) × 2 = 1,152 bytes``.
For 512 input tokens across 27 layers, the total KV cache is approximately **15.2 MB**.
With TP=8, each GPU transfers about **1.9 MB**, which falls in the ~121 GB/s
bandwidth range per the table above. Without tensor parallelism, the full 15.2 MB
transfer achieves approximately ~289 GB/s.
Output Token Throughput
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/nixl/throughput.png
:alt: Output token throughput comparison
The left panel varies input length with a fixed output length of 256 tokens
(prefill-dominated), while the right panel varies output length with a fixed
input length of 512 tokens (decode-dominated).
For prefill-dominated workloads, Route 4 achieves the highest throughput since
each node operates independently without the overhead of data parallelism
coordination. The disaggregated configurations (PD 1P3D and PD 2P2D) show
competitive throughput at shorter input lengths but degrade at longer inputs
where the prefill nodes become the bottleneck.
For decode-dominated workloads, Route 4 again leads, followed by PD 1P3D.
PD 2P2D shows the lowest throughput in this regime, as its two decode nodes
cannot match the decode capacity of other configurations.
Request Throughput
~~~~~~~~~~~~~~~~~~
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/nixl/req_throughput.png
:alt: Request throughput comparison
Request throughput follows a similar pattern. Route 4 consistently achieves the
highest request throughput across all configurations. The disaggregated PD 1P3D
configuration maintains reasonable request throughput for short inputs but drops
significantly at longer input lengths (4096 tokens), where the single prefill
node becomes saturated.
Time to First Token (TTFT)
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/nixl/ttft.png
:alt: TTFT comparison
TTFT is critical for user-perceived latency. The baseline DP and Route 2
configurations show moderate TTFT that scales with input length. Route 4
achieves the lowest TTFT across all input lengths due to the absence of
cross-node coordination.
The disaggregated configurations exhibit higher TTFT, particularly at longer
input lengths. PD 1P3D shows TTFT exceeding 37 seconds at 4096 input tokens,
as all prefill work funnels through a single node. PD 2P2D improves on this
but still lags behind the non-disaggregated configurations. The additional
latency from KV cache transfer over NIXL contributes to the elevated TTFT.
For decode-dominated workloads (right panel), the differences are smaller. At
short output lengths (256–512 tokens), PD 1P3D shows 1–2 seconds higher TTFT
than the baseline, as the KV cache transfer overhead is proportionally more
significant. At longer output lengths (1024+ tokens), the disaggregated
configurations converge with or improve upon the baseline, as the baseline
suffers from increased prefill/decode contention under heavier concurrent
decode load.
Inter-Token Latency (ITL)
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/nixl/itl.png
:alt: ITL comparison
ITL measures the latency between consecutive generated tokens during the decode
phase. This is where disaggregated serving shows its primary advantage.
In the prefill-dominated regime (left panel), PD 1P3D achieves the lowest ITL
across all input lengths, with mean ITL as low as 10 ms at 4096 input tokens.
By isolating decode nodes from prefill interference, the decode phase runs
uninterrupted. PD 2P2D also shows reduced ITL compared to the baseline, though
the benefit is less pronounced due to having fewer decode nodes. The baseline DP
and Route configurations show higher ITL, particularly at longer input lengths
where prefill and decode contend for the same GPU resources.
In the decode-dominated regime (right panel), Route 4 achieves the lowest ITL
(~25–29 ms) since each node serves independently without cross-node
coordination. Among the disaggregated configurations, PD 1P3D outperforms
PD 2P2D due to its greater decode capacity (3 decode nodes vs. 2), maintaining
ITL around 26–35 ms. PD 2P2D, with only 2 decode nodes, shows ITL comparable
to the baseline (~45–50 ms). As output length increases, ITL gradually rises
across all configurations, reflecting the growing decode load.
Discussion
----------
So, is disaggregated prefill/decode a silver bullet? The answer is clearly no —
at least not under the conditions tested here. All benchmarks use randomly
generated prompts, meaning every request produces a unique KV cache with zero
prefix cache hit rate. This represents a worst-case scenario for disaggregated
serving, where every prefill must be computed from scratch and the full KV cache
must be transferred over the network. In production workloads with shared system
prompts or repeated prefixes, prefix caching on prefill nodes could
substantially reduce redundant computation and transfer volume, potentially
shifting the balance in favor of disaggregation. Even so, the results reveal a
set of sharp trade-offs that make disaggregation a specialized tool rather than
a universal improvement:
- **ITL wins, but throughput depends on scaling**: Disaggregated configurations
deliver dramatically lower inter-token latency — PD 1P3D achieves as low as
10 ms ITL at long input lengths, up to 14× better than the baseline in
prefill-dominated regimes and 1.4–2.4× better in decode-dominated regimes.
The throughput and TTFT degradation observed here is partly an artifact of a
fixed 4-node cluster: dedicating nodes to one role starves the other. In
practice, prefill and decode pools can be scaled independently — adding more
prefill nodes to eliminate the prefill bottleneck, or more decode nodes to
increase token throughput. The challenge is finding the right ratio between
prefill and decode capacity for a given workload, as over-provisioning either
side increases cost without proportional benefit.
- **Prefill bottleneck is a hard constraint**: With a fixed cluster size,
dedicating nodes to prefill reduces decode capacity and vice versa. PD 1P3D
suffers severe prefill saturation at long input lengths (TTFT > 37s at 4096
tokens), while PD 2P2D has fewer decode nodes, limiting decode throughput.
Frameworks such as `NVIDIA Dynamo `_
aim to address this by dynamically scaling prefill and decode pools based on
real-time demand, though this adds operational complexity.
- **Simple routing beats disaggregation on throughput**: Route 4 (pure routing,
no DP, no disaggregation) consistently achieves the highest throughput across
all configurations by eliminating cross-node synchronization entirely. It also
achieves the lowest TTFT in prefill-dominated workloads, though PD 1P3D edges
it out on TTFT in decode-dominated regimes where the fixed 512-token input is
short enough to avoid prefill saturation. This is a surprisingly strong
baseline — for workloads where ITL is not the primary concern, stateless
load-balanced independent nodes outperform both data parallelism and
disaggregated configurations.
- **KV cache transfer is not free**: The NIXL transfer over EFA adds measurable
latency to TTFT in disaggregated configurations. This overhead is amortized
for longer decode sequences but is noticeable for short output lengths,
making disaggregation less attractive for short-response workloads.
In summary, disaggregated prefill/decode aims to optimize both TTFT and ITL by
isolating the two phases, but achieving these goals is not guaranteed. KV cache
transfer over the network introduces additional overhead that can negate the
TTFT benefit, particularly at long input lengths where the transfer volume is
large. While ITL improvements are consistently observed due to the elimination
of prefill interference on decode nodes, the overall serving performance depends
heavily on the prefill-to-decode ratio, workload characteristics, and network
bandwidth. Teams considering this architecture should carefully profile their
input/output length distributions, latency SLAs, and throughput requirements
before committing to the added complexity.
References
----------
.. [1] NVIDIA, "NIXL: NVIDIA Inference Xfer Library," GitHub, 2025.
https://github.com/ai-dynamo/nixl
.. [2] vLLM Project, "vLLM: Easy, fast, and cheap LLM serving," GitHub, 2024.
https://github.com/vllm-project/vllm
.. [3] vLLM Project, "vllm-router: Production-ready router for vLLM," GitHub, 2025.
https://github.com/vllm-project/vllm-router
================================================
FILE: docs/notes/appendix/index.rst
================================================
.. meta::
:description lang=en: Python appendix covering advanced topics including the walrus operator (PEP 572) and Python debugging with GDB
:keywords: Python, Python3, walrus operator, PEP 572, GDB, debugging, advanced Python
Blog
----
This section explores advanced programming topics to help users build a deeper
understanding of complex concepts and practical techniques. Programmers working
in other languages, such as C/C++, often use Python as a versatile debugging
tool. With debuggers like GDB, they may write Python scripts to parse memory
regions, improve output readability, or automate troubleshooting tasks.
More advanced topics and examples can be found in the following link.
.. toctree::
:maxdepth: 1
disaggregated-prefill-decode
megatron-efa-monitoring
nccl-gin
python-walrus
python-gdb
================================================
FILE: docs/notes/appendix/megatron-efa-monitoring.rst
================================================
.. meta::
:description lang=en: Monitoring EFA network performance with NCCL GIN and Nsys during distributed LLM training on AWS
:keywords: EFA, NCCL, GIN, Nsys, Megatron-LM, distributed training, network monitoring, AWS
Monitoring EFA with NCCL GIN and Nsys
======================================
:Date: 2026-02-28
Abstract
--------
Distributed training at scale requires deep visibility into network behavior to
identify bottlenecks and optimize communication patterns. When training large
language models with Megatron-LM on AWS infrastructure using the Elastic Fabric
Adapter (EFA), understanding network performance becomes critical for achieving
optimal throughput. This article demonstrates how to enable NCCL GPU-Initiated
Networking (GIN) in Megatron-LM using Megatron Bridge and leverage Nsys with
EFA metrics to monitor network behavior during distributed training workloads.
The techniques presented here are based on best practices from AWS re:Invent
2024 [1]_.
Introduction
------------
`Megatron-LM `_ is a widely adopted
framework for training large transformer models using model parallelism,
pipeline parallelism, and data parallelism. When deployed on AWS instances with
EFA, the network fabric provides high-bandwidth, low-latency communication
essential for scaling to hundreds or thousands of GPUs. However, achieving peak
performance requires careful tuning and monitoring of the communication layer.
NCCL GPU-Initiated Networking allows GPUs to initiate network operations
directly without CPU involvement, reducing latency and enabling kernel fusion.
Nsys (NVIDIA Nsight Systems) provides comprehensive profiling of GPU kernels,
CUDA API calls, and network operations. When combined with EFA metrics
collection (``--enable efa_metrics``), Nsys captures detailed network adapter
statistics including bandwidth utilization, packet counts, and error rates,
correlated with GPU execution timelines. This enables practitioners to diagnose
performance issues and validate that the network is operating at expected
capacity.
`Megatron Bridge `_
simplifies the configuration and deployment of Megatron-LM training jobs by
providing a high-level recipe-based interface. This eliminates the need to
manually construct complex command-line arguments and makes it easier to enable
advanced features like NCCL GIN and DeepEP for MoE models. Therefore, the
tutorial in this article will use Megatron Bridge.
Prerequisites
-------------
This guide assumes the following environment:
- AWS HyperPod or EC2 instances with EFA support (e.g., P5, P5e, P5en)
- NCCL >= v2.29.3-1 with Device API support
- aws-ofi-nccl plugin with GIN support
- Megatron-LM with Megatron-Bridge
We have demonstrated how to use vLLM with NCCL GIN and DeepEP in a previous
article. If you are interested in building NCCL and aws-ofi-nccl from source,
refer to the `NCCL GIN article
`_
in this repository.
Building the Megatron Container
--------------------------------
The Megatron training environment is packaged as a Docker container and
converted to an Enroot squash file for deployment on Slurm clusters. The
container includes NCCL with Device API support, aws-ofi-nccl with GIN support,
and Megatron-LM with Megatron Bridge.
To build the container and create the Enroot image:
.. code-block:: bash
cd src/megatron
make build
This will create a ``megatron-lm+latest.sqsh`` file that can be used with the
Slurm launcher scripts. For details on the container build process, refer to
the `Dockerfile
`_
and `enroot.sh
`_
scripts in the repository.
Enabling NCCL GIN in Megatron Bridge
-------------------------------------
Megatron Bridge recipes provide a declarative way to configure training jobs.
To enable NCCL GIN for MoE models using DeepEP, the following environment
variables are set automatically by the ``srun.sh`` launcher script:
.. code-block:: bash
export DEEP_EP_BACKEND=nccl
export NCCL_GIN_TYPE=2 # proxy-based GIN
export LD_LIBRARY_PATH=/opt/amazon/ofi-nccl/lib:$LD_LIBRARY_PATH
``NCCL_GIN_TYPE=2`` selects the proxy-based implementation, where a CPU thread
mediates GPU-initiated transfers. This mode is currently supported on EFA,
while GPU Direct Async Kernel-Initiated (DAKI) networking (``NCCL_GIN_TYPE=3``)
is not yet available on AWS at the time of writing (February 2026).
The ``srun.sh`` script also configures additional EFA-specific settings for
optimal performance:
.. code-block:: bash
export FI_PROVIDER=efa
export FI_EFA_USE_DEVICE_RDMA=1
export FI_EFA_FORK_SAFE=1
export NCCL_NET_PLUGIN=/opt/amazon/ofi-nccl/lib/libnccl-net-ofi.so
export NCCL_TUNER_PLUGIN=/opt/amazon/ofi-nccl/lib/libnccl-tuner-ofi.so
export NCCL_BUFFSIZE=8388608
export NCCL_P2P_NET_CHUNKSIZE=524288
Launching Megatron Training with DeepEP and NCCL GIN
-----------------------------------------------------
The following example demonstrates how to launch a DeepSeek-V2-Lite pretraining
job with DeepEP enabled for MoE token dispatching. The recipe configures the
model to use expert parallelism across 64 ranks with NCCL GIN for low-latency
all-to-all communication.
.. code-block:: bash
cd src/megatron
# Allocate 2 nodes on Slurm
salloc -N 2
# Launch DeepSeek-V2-Lite with DeepEP and NCCL GIN
./srun.sh recipes/deepseek_v2_lite_pretrain.py \
hf_path=/fsx/models/deepseek-ai/DeepSeek-V2-Lite \
moe_token_dispatcher_type=deepep \
model.tensor_model_parallel_size=1 \
model.expert_model_parallel_size=64 \
model.sequence_parallel=false
The ``moe_token_dispatcher_type=deepep`` argument enables DeepEP as the MoE
dispatcher backend. Under the hood, the recipe configures the following
settings:
.. code-block:: python
cfg.model.moe_token_dispatcher_type = "flex"
cfg.model.moe_flex_dispatcher_backend = "deepep"
cfg.model.moe_enable_deepep = True
cfg.model.moe_shared_expert_overlap = False
When the training job starts, verify that NCCL initializes with GIN enabled by
checking the logs for Device API initialization messages:
.. code-block:: text
[NCCL] Device API initialized
[NCCL] GIN proxy mode enabled (type=2)
[NCCL Backend] LOW LATENCY MODE: Rank 0 connecting to all ranks
[NCCL Backend] Initialized global rank 0/64
Monitoring EFA with Nsys and EFA Metrics
-----------------------------------------
Nsys (NVIDIA Nsight Systems) provides comprehensive profiling of GPU kernels,
CUDA API calls, and network operations. The ``--enable efa_metrics`` flag
instructs Nsys to collect EFA adapter statistics in real-time from the EFA
device counters (e.g., rdmap113s0, rdmap114s0) at 10Hz sampling rate, including:
- **TX/RX Bandwidth**: Transmit and receive throughput
- **TX/RX Packets**: Packet counts sent and received
- **Error Counters**: Link errors and dropped packets
Additionally, aws-ofi-nccl uses NVTX annotations to mark NCCL operations in the
timeline, allowing correlation between NCCL collective calls and EFA network
activity. These metrics are embedded in the Nsys timeline and correlated with
GPU kernel execution and NCCL operations, making it easy to identify
communication bottlenecks and validate network saturation.
To profile a Megatron training run with Nsys and capture EFA metrics:
.. code-block:: bash
cd src/megatron
salloc -N 8
./srun.sh --nsys recipes/deepseek_v2_lite_pretrain.py \
hf_path=/fsx/models/deepseek-ai/DeepSeek-V2-Lite \
moe_token_dispatcher_type=deepep \
model.tensor_model_parallel_size=1 \
model.expert_model_parallel_size=64 \
model.sequence_parallel=false \
profiling.use_nsys_profiler=true \
profiling.profile_step_start=10 \
profiling.profile_step_end=15 \
profiling.profile_ranks=[0]
The ``--nsys`` flag enables Nsys profiling with the following configuration:
.. code-block:: bash
nsys profile \
-t cuda,nvtx \
-s none \
--cpuctxsw=none \
--capture-range=cudaProfilerApi \
--capture-range-end=stop \
--enable efa_metrics \
-o nsys-megatron/profile--rank.nsys-rep \
--force-overwrite=true
The ``--enable efa_metrics`` flag is the key parameter that enables EFA adapter
monitoring. Nsys will automatically detect all EFA devices (typically
``rdmap182s0``, ``rdmap183s0``, etc.) and collect statistics at regular intervals
throughout the profiling session.
After profiling completes, the ``.nsys-rep`` files can be downloaded and opened
in Nsight Systems GUI for analysis. The EFA metrics appear as additional rows
in the timeline view, showing bandwidth and packet rate correlated with GPU
kernel execution and NCCL collective operations.
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/deepep-nsys.png
Profiling with Viztracer
-------------------------
For Python-level profiling of the training loop, Megatron Bridge supports
Viztracer, a low-overhead tracing tool that captures function calls and timing
information. This is useful for identifying CPU bottlenecks in data loading,
preprocessing, or scheduler logic that may indirectly impact network
performance.
.. code-block:: bash
salloc -N 2
./srun.sh recipes/deepseek_v2_lite_pretrain.py \
hf_path=/fsx/models/deepseek-ai/DeepSeek-V2-Lite \
train.train_iters=100 \
profiling.use_viztracer=true \
profiling.profile_step_start=10 \
profiling.profile_step_end=15 \
profiling.profile_ranks=[0]
The resulting ``.json`` trace files can be visualized in the Viztracer web UI
or Chrome's ``chrome://tracing`` interface. By enabling ``log_torch``, Viztracer
can capture additional PyTorch-level details such as NCCL stream and CUDA stream
operations, providing visibility into the execution flow of collective
communications and GPU kernels. However, to observe detailed EFA adapter
statistics (bandwidth, packet counts, error counters), Nsys with
``--enable efa_metrics`` remains the required tool.
Conclusion
----------
Nsys profiling with ``--enable efa_metrics`` now provides the capability to
monitor both EFA adapter behavior and NCCL operations simultaneously during
distributed training. This visibility is essential for diagnosing whether long
NCCL operation times are caused by actual EFA transmission delays or other
issues such as CPU bottlenecks, memory contention, or suboptimal NCCL
configuration. By examining the correlated timeline of GPU kernels, NCCL
collectives, and EFA bandwidth utilization, practitioners can pinpoint the root
cause of performance bottlenecks and validate that the network fabric is
operating at expected capacity.
In this article, we demonstrated this monitoring approach using Megatron-LM
with NCCL GIN and DeepEP as an example. The recipe-based approach of Megatron
Bridge simplifies the deployment of complex training configurations, making it
easier to adopt advanced features like DeepEP and NCCL GIN for large-scale MoE
model training while maintaining full observability into network performance.
For complete examples and scripts, refer to the `megatron directory
`_ in this
repository.
References
----------
.. [1] `AWS re:Invent 2024 - CMP335: Drilling down into performance for distributed training `_
================================================
FILE: docs/notes/appendix/nccl-gin.rst
================================================
.. meta::
:description lang=en: Enabling GPU-Initiated Networking for NCCL with DeepEP on AWS using EFA
:keywords: NCCL, GIN, GPU-Initiated Networking, DeepEP, EFA, AWS, MoE, HyperPod
GPU-Initiated Networking for NCCL on AWS
========================================
:Date: 2026-02-22
Abstract
--------
GPU-Initiated Networking (GIN) has attracted significant attention as a key
enabler for kernel fusion in large language model (LLM) training and inference.
Mixture-of-Experts (MoE) architectures, such as DeepSeek-V3 and Qwen3-30B,
require efficient token dispatching and combining across MoE layers.
Conventionally, inter-GPU communication is initiated by the CPU through
collective libraries such as NCCL or Gloo, necessitating explicit GPU
synchronization barriers and additional ``cudaLaunchKernel`` calls that
introduce non-trivial overhead. GPU-Initiated Networking eliminates this
CPU-mediated round-trip by allowing data exchange to occur directly within CUDA
kernels, thereby enabling kernel fusion and efficient CUDA Graph capture for
accelerating end-to-end LLM layer computation. This article demonstrates how to
enable NCCL GIN with DeepEP on AWS HyperPod Slurm using the AWS Elastic Fabric
Adapter (EFA).
Introduction
------------
Prior to 2026, adopting DeepEP as a Mixture-of-Experts dispatch and combine
backend on AWS presented a significant challenge. The DeepEP kernel was
originally built on top of InfiniBand with a customized NVSHMEM implementation,
a transport layer unavailable on AWS infrastructure. This incompatibility
effectively prevented users from leveraging DeepEP on instances equipped with
the Elastic Fabric Adapter (EFA). Recent collaborative efforts by NVIDIA and
Amazon Annapurna Labs have addressed this gap by introducing GPU-Initiated
Networking support in NCCL and the EFA provider, enabling DeepEP to operate
over EFA without relying on InfiniBand (see `DeepEP PR #521
`_ and `aws-ofi-nccl PR #1069
`_). The following experiment
builds upon these contributions to illustrate how to deploy DeepEP with NCCL
GIN on AWS using EFA.
Build DeepEP
------------
Before deploying DeepEP on AWS HyperPod Slurm, several components must be built
from source. First, NCCL >= v2.29.3-1 is required, as this is the minimum
version that exposes the Device API needed for GPU-Initiated Networking. The
build targets ``sm_90`` (NVIDIA H100) and ``sm_100`` (NVIDIA B200) compute
capabilities to ensure compatibility with current-generation GPU instances.
.. code-block:: bash
NCCL_VERSION=v2.29.3-1
git clone -b ${NCCL_VERSION} https://github.com/NVIDIA/nccl.git /opt/nccl \
&& cd /opt/nccl \
&& make -j $(nproc) src.build CUDA_HOME=/usr/local/cuda \
NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_100,code=sm_100"
Optionally, the NCCL Device API examples can be built to verify that
GPU-initiated communication functions correctly in the target environment. In
addition, the latest release of nccl-tests (v2.17.9) ships with a GIN-enabled
microbenchmark for the ``alltoall`` collective, which is useful for validating
inter-GPU bandwidth and latency before running full-scale MoE workloads (see
`nccl-tests alltoall.cu `_).
.. code-block:: bash
## Build NCCL Device API examples
cd /opt/nccl/examples/06_device_api \
&& make -j $(nproc) NCCL_HOME=/opt/nccl/build CUDA_HOME=/usr/local/cuda MPI=1 MPI_HOME=/opt/amazon/openmpi
NCCL_TESTS_VERSION=v2.17.9
git clone -b ${NCCL_TESTS_VERSION} https://github.com/NVIDIA/nccl-tests.git /opt/nccl-tests \
&& cd /opt/nccl-tests \
&& make -j $(nproc) \
MPI=1 \
MPI_HOME=/opt/amazon/openmpi/ \
CUDA_HOME=/usr/local/cuda \
NCCL_HOME=/opt/nccl/build \
NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_100,code=sm_100"
To test DeepEP on HyperPod Slurm, both DeepEP and aws-ofi-nccl must be pinned
to specific commits that include the NCCL GIN transport path. The DeepEP fork
by Aamir Shafi introduces an NCCL-based communication backend as an alternative
to the original NVSHMEM/InfiniBand path, while the aws-ofi-nccl plugin provides
the libfabric-to-NCCL translation layer required for EFA. Note that the NCCL
GIN implementation has since been merged into the aws-ofi-nccl main branch; the
commit hash is pinned here for reproducibility.
.. code-block:: bash
## Install DeepEP with NCCL GIN backend (PR #521)
unset NVSHMEM_DIR NVSHMEM_HOME \
&& export ENABLE_NCCL=1 \
&& export NCCL_DIR=/opt/nccl/build \
&& export LD_LIBRARY_PATH=/opt/nccl/build/lib:$LD_LIBRARY_PATH \
&& export LD_PRELOAD=/opt/nccl/build/lib/libnccl.so.2 \
&& git clone -b nccl https://github.com/aamirshafi/DeepEP.git /opt/DeepEP \
&& cd /opt/DeepEP \
&& git checkout 6d29f34 \
&& python3 setup.py build_ext --inplace \
&& pip install --break-system-packages --no-build-isolation .
AWS_OFI_NCCL_VERSION=5f4202f11db1585d878196db4430aeda0e834a0c
git clone https://github.com/aws/aws-ofi-nccl.git /tmp/aws-ofi-nccl \
&& cd /tmp/aws-ofi-nccl \
&& git checkout ${AWS_OFI_NCCL_VERSION} \
&& ./autogen.sh \
&& ./configure --prefix=/opt/amazon/ofi-nccl \
--with-libfabric=/opt/amazon/efa \
--with-cuda=/usr/local/cuda \
&& make -j$(nproc) \
&& make install \
&& rm -rf /tmp/aws-ofi-nccl
For a complete build with all necessary dependencies, refer to the
`Dockerfile `_
provided in this repository.
Test NCCL GIN
-------------
With the Docker image (or Enroot squash file) prepared in the previous section,
NCCL GIN functionality can be validated on a Slurm cluster. The following
examples demonstrate how to launch the NCCL Device API samples and nccl-tests
benchmarks. The corresponding Slurm wrapper scripts are available under the
`gin `_ directory
in this repository.
.. code-block:: bash
make docker && make save # build a docker image and import an Enroot squash file
# 01_allreduce_lsa (single node only)
salloc -N 1 ./run.enroot /opt/nccl/examples/06_device_api/01_allreduce_lsa/allreduce_lsa
# 01_allreduce_lsa (multi-node) — requires MNNVL (e.g. P6e-GB200), does NOT work over RDMA/EFA
salloc -N 2 ./run.enroot /opt/nccl/examples/06_device_api/01_allreduce_lsa/allreduce_lsa
# 02_alltoall_gin (multi-node)
salloc -N 2 ./run.enroot /opt/nccl/examples/06_device_api/02_alltoall_gin/alltoall_gin
# 03_alltoall_hybrid (multi-node)
salloc -N 2 ./run.enroot /opt/nccl/examples/06_device_api/03_alltoall_hybrid/alltoall_hybrid
The nccl-tests ``alltoall`` benchmark exposes two critical flags for selecting
the GIN transport mode and memory registration strategy:
The ``-D`` flag selects the device-side implementation for the ``alltoall``
collective:
.. code-block:: text
-D 0 — Host API (default)
-D 1 — NVL simple (LSA/NVLink only)
-D 2 — NVL optimized (LSA/NVLink only)
-D 3 — GIN only (network)
-D 4 — Hybrid (LSA intra-node + GIN inter-node)
The ``-R`` flag controls memory registration. Symmetric memory allocation
(``NCCL_MEM_SHARED``) is required for any device-side implementation
(``-D > 0``), as it maps GPU memory across all ranks to enable direct
remote read and write over the network:
.. code-block:: text
-R 0 — no registration (default)
-R 1 — register memory with ncclMemAlloc
-R 2 — register memory with symmetric memory allocation (NCCL_MEM_SHARED)
The following examples launch the nccl-tests ``alltoall_perf`` benchmark in
GIN-only mode (``-D 3``) and hybrid mode (``-D 4``), sweeping message sizes
from 32 MB to 2048 MB. The ``--blocking 0`` flag enables non-blocking
collectives, which is representative of how MoE layers overlap communication
with computation in production workloads:
.. code-block:: bash
# alltoall_perf with GIN (-D 3)
salloc -N 2 ./run.enroot /opt/nccl-tests/build/alltoall_perf \
-D 3 -R 2 -b 32M -e 2048M -f 2 -n 1000 -w 10 --blocking 0
# alltoall_perf with Hybrid LSA+GIN (-D 4)
salloc -N 2 ./run.enroot /opt/nccl-tests/build/alltoall_perf \
-D 4 -R 2 -b 32M -e 2048M -f 2 -n 1000 -w 10 --blocking 0
Serving MoE Models with vLLM and DeepEP over NCCL GIN
-----------------------------------------------------
With NCCL GIN and EFA validated on AWS HyperPod Slurm, this section
demonstrates an end-to-end inference deployment using vLLM with DeepEP as the
MoE all-to-all communication backend. DeepEP's low-latency dispatch and combine
kernels, now operating over NCCL GIN rather than NVSHMEM, enable efficient
expert-parallel inference for large MoE models such as DeepSeek-V3.
The Slurm launch script ``run.sbatch`` is the same one used to launch a vLLM
server in the `vllm example directory
`_. However,
to direct the DeepEP backend to use NCCL GIN, the following environment
variables must be set at launch time:
.. code-block:: bash
DEEP_EP_BACKEND=nccl
NCCL_GIN_TYPE=2 # proxy-based GIN
``NCCL_GIN_TYPE=2`` selects the proxy-based GIN path, in which a CPU-side proxy
thread mediates network transfers on behalf of the GPU. ``NCCL_GIN_TYPE=3``
would enable GPU Direct Async Kernel-Initiated (DAKI) networking, which
bypasses the proxy entirely; however, DAKI is not yet supported on AWS with EFA
at the time of writing.
For additional details on serving configurations and benchmarking, refer to
`llm-serving.rst
`_
or the `vLLM README
`_.
The following example launches a multi-node vLLM inference server for
DeepSeek-V3-0324 with expert parallelism enabled and the DeepEP low-latency
all-to-all backend:
.. code-block:: bash
IMAGE="${PWD}/src/gin/nccl+latest.tar.gz"
MODEL="/fsx/models/deepseek-ai/DeepSeek-V3-0324"
salloc -N 4 bash run.sbatch "${MODEL}" \
--image "${IMAGE}" \
--all2all-backend deepep_low_latency \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--gpu-memory-utilization 0.8 \
--enforce-eager
Upon successful launch, the vLLM server logs confirm that DeepEP is active as
the all-to-all backend and that NCCL GIN is being used for inter-GPU
communication. The key indicators are the ``DeepEPLLAll2AllManager`` manager
selection and the ``[NCCL Backend]`` initialization messages showing
communicator setup, symmetric memory allocation, and window registration across
all ranks:
.. code-block:: bash
...
INFO 02-22 19:06:49 [serve.py:100] Defaulting api_server_count to data_parallel_size (4).
INFO 02-22 19:06:49 [utils.py:325]
INFO 02-22 19:06:49 [utils.py:325] █ █ █▄ ▄█
INFO 02-22 19:06:49 [utils.py:325] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.15.1
INFO 02-22 19:06:49 [utils.py:325] █▄█▀ █ █ █ █ model /fsx/models/deepseek-ai/DeepSeek-V3-0324
INFO 02-22 19:06:49 [utils.py:325] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
INFO 02-22 19:06:49 [utils.py:325]
...
INFO 02-22 19:07:51 [cuda_communicator.py:124] Using DeepEPLLAll2AllManager all2all manager.
...
[NCCL Backend] LOW LATENCY MODE: Rank 0 connecting to all 32 ranks
[NCCL Backend] NCCL version: 2.29.3 (loaded library)
[NCCL Backend] Initializing 2 communicator(s) (qps_per_rank=8) for rank 0/32
[NCCL Backend] Rank 0 successfully initialized 2 communicator(s)
[NCCL Backend] Rank 0 created 2 device communication(s) with 32 barrier sessions each
[NCCL Backend] Initialized global rank 0/32 (comm rank 0/32)
[NCCL Backend - Memory Alloc] Rank 0: Allocated ptr=0xf882000000, size=3816818816
[NCCL Backend - Memory Register] Rank 0: Copying 2 NCCL windows to GPU
[NCCL Backend - Memory Register] Rank 0: Successfully copied windows to GPU
[NCCL Backend - Memory Register] Rank 0: Registered windows for ptr=0xf882000000, size=3816818816
Once the server is ready, inference requests can be issued via the
OpenAI-compatible completions API:
.. code-block:: bash
curl -sf -X POST http://:8000/v1/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "/fsx/models/deepseek-ai/DeepSeek-V3-0324",
"prompt": "Hello",
"max_tokens": 10
}'
# output
{"id":"cmpl-b6e9530a07561f11","object":"text_completion" ... }
Conclusion
----------
This article has demonstrated how to deploy vLLM with DeepEP and NCCL GIN on
AWS HyperPod Slurm using the Elastic Fabric Adapter. As this integration is
still under active development, certain limitations remain at the time of
writing. For instance, although DeepEP's low-latency mode supports CUDA Graph
capture, enabling it by removing ``--enforce-eager`` currently results in a
startup failure in vLLM. Additionally, performance over EFA may not yet match
that of InfiniBand-based deployments, as further optimizations are ongoing.
This article is intended as an early reference for evaluating DeepEP with NCCL
GIN on AWS. For production workloads, it is advisable to wait for official
stable releases from NVIDIA and Amazon Annapurna Labs.
================================================
FILE: docs/notes/appendix/python-gdb.rst
================================================
.. meta::
:description lang=en: Python interpreter in GNU Debugger (GDB)
:keywords: Python, Python3, GDB
==================================
Python Interpreter in GNU Debugger
==================================
:Date: 2025-08-30
Abstract
--------
The GNU Debugger (GDB) is the most powerful debugging tool for developers to
troubleshoot errors in their code. However, it is hard for beginners to learn,
and that is why many programmers prefer to insert ``print`` to examine runtime
status. Fortunately, `GDB Text User Interface (TUI)`_ provides a way for
developers to review their source code and debug simultaneously. More
excitingly, In GDB 7, **Python Interpreter** was built into GDB. This feature
offers more straightforward ways to customize GDB printers and commands through
the Python library. By discussing examples, this article tries to explore
advanced debugging techniques via Python to develop tool kits for GDB.
Introduction
------------
Troubleshooting software bugs is a big challenge for developers. While GDB
provides many “debug commands” to inspect programs’ runtime status, its
non-intuitive usages impede programmers to use it to solve problems. Indeed,
mastering GDB is a long-term process. However, a quick start is not complicated;
you must unlearn what you have learned like Yoda. To better understand how to
use Python in GDB, this article will focus on discussing Python interpreter in
GDB.
Define Commands
---------------
GDB supports customizing commands by using ``define``. It is useful to run a
batch of commands to troubleshoot at the same time. For example, a developer
can display the current frame information by defining a ``sf`` command.
.. code-block:: bash
# define in .gdbinit
define sf
where # find out where the program is
info args # show arguments
info locals # show local variables
end
However, writing a user-defined command may be inconvenient due to limited APIs.
Fortunately, by interacting with Python interpreter in GDB, developers can
utilize Python libraries to establish their debugging tool kits readily. The
following sections show how to use Python to simplify debugging processes.
Dump Memory
-----------
Inspecting a process’s memory information is an effective way to troubleshoot
memory issues. Developers can acquire memory contents by ``info proc mappings``
and ``dump memory``. To simplify these steps, defining a customized command is
useful. However, the implementation is not straightforward by using pure GDB
syntax. Even though GDB supports conditions, processing output is not intuitive.
To solve this problem, using Python API in GDB would be helpful because Python
contains many useful operations for handling strings.
.. code-block:: python
# mem.py
import gdb
import time
import re
class DumpMemory(gdb.Command):
"""Dump memory info into a file."""
def __init__(self):
super().__init__("dm", gdb.COMMAND_USER)
def get_addr(self, p, tty):
"""Get memory addresses."""
cmd = "info proc mappings"
out = gdb.execute(cmd, tty, True)
addrs = []
for l in out.split("\n"):
if re.match(f".*{p}*", l):
s, e, *_ = l.split()
addrs.append((s, e))
return addrs
def dump(self, addrs):
"""Dump memory result."""
if not addrs:
return
for s, e in addrs:
f = int(time.time() * 1000)
gdb.execute(f"dump memory {f}.bin {s} {e}")
def invoke(self, args, tty):
try:
# cat /proc/self/maps
addrs = self.get_addr(args, tty)
# dump memory
self.dump(addrs)
except Exception as e:
print("Usage: dm [pattern]")
DumpMemory()
Running the ``dm`` command will invoke ``DumpMemory.invoke``. By sourcing
or implementing Python scripts in *.gdbinit*, developers can utilize
user-defined commands to trace bugs when a program is running. For example, the
following steps show how to invoke ``DumpMemory`` in GDB.
.. code-block:: bash
(gdb) start
...
(gdb) source mem.py # source commands
(gdb) dm stack # dump stack to ${timestamp}.bin
(gdb) shell ls # ls current dir
1577283091687.bin a.cpp a.out mem.py
Dump JSON
---------
Parsing JSON is helpful when a developer is inspecting a JSON string in a
running program. GDB can parse a ``std::string`` via ``gdb.parse_and_eval``
and return it as a ``gdb.Value``. By processing ``gdb.Value``, developers can
pass a JSON string into Python ``json`` API and print it in a pretty format.
.. code-block:: python
# dj.py
import gdb
import re
import json
class DumpJson(gdb.Command):
"""Dump std::string as a styled JSON."""
def __init__(self):
super().__init__("dj", gdb.COMMAND_USER)
def get_json(self, args):
"""Parse std::string to JSON string."""
ret = gdb.parse_and_eval(args)
typ = str(ret.type)
if re.match("^std::.*::string", typ):
return json.loads(str(ret))
return None
def invoke(self, args, tty):
try:
# string to json string
s = self.get_json(args)
# json string to object
o = json.loads(s)
print(json.dumps(o, indent=2))
except Exception as e:
print(f"Parse json error! {args}")
DumpJson()
The command ``dj`` displays a more readable JSON format in GDB. This command
helps improve visual recognization when a JSON string large. Also, by using
this command, it can detect or monitor whether a ``std::string`` is JSON or
not.
.. code-block:: bash
(gdb) start
(gdb) list
1 #include
2
3 int main(int argc, char *argv[])
4 {
5 std::string json = R"({"foo": "FOO","bar": "BAR"})";
6 return 0;
7 }
...
(gdb) ptype json
type = std::string
(gdb) p json
$1 = "{\"foo\": \"FOO\",\"bar\": \"BAR\"}"
(gdb) source dj.py
(gdb) dj json
{
"foo": "FOO",
"bar": "BAR"
}
Highlight Syntax
----------------
Syntax highlighting is useful for developers to trace source code or to
troubleshoot issues. By using `Pygments`_, applying color to the source is easy
without defining ANSI escape code manually. The following example shows how to
apply color to the ``list`` command output.
.. code-block:: python
import gdb
from pygments import highlight
from pygments.lexers import CLexer
from pygments.formatters import TerminalFormatter
class PrettyList(gdb.Command):
"""Print source code with color."""
def __init__(self):
super().__init__("pl", gdb.COMMAND_USER)
self.lex = CLexer()
self.fmt = TerminalFormatter()
def invoke(self, args, tty):
try:
out = gdb.execute(f"l {args}", tty, True)
print(highlight(out, self.lex, self.fmt))
except Exception as e:
print(e)
PrettyList()
Tracepoints
-----------
Although a developer can insert ``printf``, ``std::cout``, or ``syslog`` to
inspect functions, printing messages is not an effective way to debug when a
project is enormous. Developers may waste their time in building source code
and may acquire little information. Even worse, the output may become too much
to detect problems. In fact, inspecting functions or variables do not require
to embed *print functions* in code. By writing a Python script with GDB API,
developers can customize watchpoints to trace issues dynamically at runtime.
For example, by implementing a ``gdb.Breakpoint`` and a ``gdb.Command``, it is
useful for developers to acquire essential information, such as parameters,
call stacks, or memory usage.
.. code-block:: python
# tp.py
import gdb
tp = {}
class Tracepoint(gdb.Breakpoint):
def __init__(self, *args):
super().__init__(*args)
self.silent = True
self.count = 0
def stop(self):
self.count += 1
frame = gdb.newest_frame()
block = frame.block()
sym_and_line = frame.find_sal()
framename = frame.name()
filename = sym_and_line.symtab.filename
line = sym_and_line.line
# show tracepoint info
print(f"{framename} @ {filename}:{line}")
# show args and vars
for s in block:
if not s.is_argument and not s.is_variable:
continue
typ = s.type
val = s.value(frame)
size = typ.sizeof
name = s.name
print(f"\t{name}({typ}: {val}) [{size}]")
# do not stop at tracepoint
return False
class SetTracepoint(gdb.Command):
def __init__(self):
super().__init__("tp", gdb.COMMAND_USER)
def invoke(self, args, tty):
try:
global tp
tp[args] = Tracepoint(args)
except Exception as e:
print(e)
def finish(event):
for t, p in tp.items():
c = p.count
print(f"Tracepoint '{t}' Count: {c}")
gdb.events.exited.connect(finish)
SetTracepoint()
Instead of inserting ``std::cout`` at the beginning of functions, using a
tracepoint at a function's entry point provides useful information to inspect
arguments, variables, and stacks. For instance, by setting a tracepoint at
``fib``, it is helpful to examine memory usage, stack, and the number of calls.
.. code-block:: cpp
int fib(int n)
{
if (n < 2) {
return 1;
}
return fib(n-1) + fib(n-2);
}
int main(int argc, char *argv[])
{
fib(3);
return 0;
}
The following output shows the result of an inspection of the function ``fib``.
In this case, tracepoints display all information a developer needs, including
arguments' value, recursive flow, and variables' size. By using tracepoints,
developers can acquire more useful information comparing with ``std::cout``.
.. code-block:: bash
(gdb) source tp.py
(gdb) tp main
Breakpoint 1 at 0x647: file a.cpp, line 12.
(gdb) tp fib
Breakpoint 2 at 0x606: file a.cpp, line 3.
(gdb) r
Starting program: /root/a.out
main @ a.cpp:12
argc(int: 1) [4]
argv(char **: 0x7fffffffe788) [8]
fib @ a.cpp:3
n(int: 3) [4]
fib @ a.cpp:3
n(int: 2) [4]
fib @ a.cpp:3
n(int: 1) [4]
fib @ a.cpp:3
n(int: 0) [4]
fib @ a.cpp:3
n(int: 1) [4]
[Inferior 1 (process 5409) exited normally]
Tracepoint 'main' Count: 1
Tracepoint 'fib' Count: 5
Profiling
---------
Without inserting timestamps, profiling is still feasible through tracepoints.
By using a ``gdb.FinishBreakpoint`` after a ``gdb.Breakpoint``, GDB sets a
temporary breakpoint at the return address of a frame for developers to get
the current timestamp and to calculate the time difference. Note that profiling
via GDB is not precise. Other tools, such as `Linux perf`_ or `Valgrind`_,
provide more useful and accurate information to trace performance issues.
.. code-block:: python
import gdb
import time
class EndPoint(gdb.FinishBreakpoint):
def __init__(self, breakpoint, *a, **kw):
super().__init__(*a, **kw)
self.silent = True
self.breakpoint = breakpoint
def stop(self):
# normal finish
end = time.time()
start, out = self.breakpoint.stack.pop()
diff = end - start
print(out.strip())
print(f"\tCost: {diff}")
return False
class StartPoint(gdb.Breakpoint):
def __init__(self, *a, **kw):
super().__init__(*a, **kw)
self.silent = True
self.stack = []
def stop(self):
start = time.time()
# start, end, diff
frame = gdb.newest_frame()
sym_and_line = frame.find_sal()
func = frame.function().name
filename = sym_and_line.symtab.filename
line = sym_and_line.line
block = frame.block()
args = []
for s in block:
if not s.is_argument:
continue
name = s.name
typ = s.type
val = s.value(frame)
args.append(f"{name}: {val} [{typ}]")
# format
out = ""
out += f"{func} @ {filename}:{line}\n"
for a in args:
out += f"\t{a}\n"
# append current status to a breakpoint stack
self.stack.append((start, out))
EndPoint(self, internal=True)
return False
class Profile(gdb.Command):
def __init__(self):
super().__init__("prof", gdb.COMMAND_USER)
def invoke(self, args, tty):
try:
StartPoint(args)
except Exception as e:
print(e)
Profile()
The following output shows the profiling result by setting a tracepoint at the
function ``fib``. It is convenient to inspect the function's performance and
stack at the same time.
.. code-block:: bash
(gdb) source prof.py
(gdb) prof fib
Breakpoint 1 at 0x606: file a.cpp, line 3.
(gdb) r
Starting program: /root/a.out
fib(int) @ a.cpp:3
n: 1 [int]
Cost: 0.0007786750793457031
fib(int) @ a.cpp:3
n: 0 [int]
Cost: 0.002572298049926758
fib(int) @ a.cpp:3
n: 2 [int]
Cost: 0.008517265319824219
fib(int) @ a.cpp:3
n: 1 [int]
Cost: 0.0014069080352783203
fib(int) @ a.cpp:3
n: 3 [int]
Cost: 0.01870584487915039
Pretty Print
------------
Although ``set print pretty on`` in GDB offers a better format to inspect
variables, developers may require to parse variables' value for readability.
Take the system call ``stat`` as an example. While it provides useful information
to examine file attributes, the output values, such as the permission, may not
be readable for debugging. By implementing a user-defined pretty print,
developers can parse ``struct stat`` and output information in a readable format.
.. code-block:: python
import gdb
import pwd
import grp
import stat
import time
from datetime import datetime
class StatPrint:
def __init__(self, val):
self.val = val
def get_filetype(self, st_mode):
if stat.S_ISDIR(st_mode):
return "directory"
if stat.S_ISCHR(st_mode):
return "character device"
if stat.S_ISBLK(st_mode):
return "block device"
if stat.S_ISREG:
return "regular file"
if stat.S_ISFIFO(st_mode):
return "FIFO"
if stat.S_ISLNK(st_mode):
return "symbolic link"
if stat.S_ISSOCK(st_mode):
return "socket"
return "unknown"
def get_access(self, st_mode):
out = "-"
info = ("r", "w", "x")
perm = [
(stat.S_IRUSR, stat.S_IWUSR, stat.S_IXUSR),
(stat.S_IRGRP, stat.S_IRWXG, stat.S_IXGRP),
(stat.S_IROTH, stat.S_IWOTH, stat.S_IXOTH),
]
for pm in perm:
for c, p in zip(pm, info):
out += p if st_mode & c else "-"
return out
def get_time(self, st_time):
tv_sec = int(st_time["tv_sec"])
return datetime.fromtimestamp(tv_sec).isoformat()
def to_string(self):
st = self.val
st_ino = int(st["st_ino"])
st_mode = int(st["st_mode"])
st_uid = int(st["st_uid"])
st_gid = int(st["st_gid"])
st_size = int(st["st_size"])
st_blksize = int(st["st_blksize"])
st_blocks = int(st["st_blocks"])
st_atim = st["st_atim"]
st_mtim = st["st_mtim"]
st_ctim = st["st_ctim"]
out = "{\n"
out += f"Size: {st_size}\n"
out += f"Blocks: {st_blocks}\n"
out += f"IO Block: {st_blksize}\n"
out += f"Inode: {st_ino}\n"
out += f"Access: {self.get_access(st_mode)}\n"
out += f"File Type: {self.get_filetype(st_mode)}\n"
out += f"Uid: ({st_uid}/{pwd.getpwuid(st_uid).pw_name})\n"
out += f"Gid: ({st_gid}/{grp.getgrgid(st_gid).gr_name})\n"
out += f"Access: {self.get_time(st_atim)}\n"
out += f"Modify: {self.get_time(st_mtim)}\n"
out += f"Change: {self.get_time(st_ctim)}\n"
out += "}"
return out
p = gdb.printing.RegexpCollectionPrettyPrinter("sp")
p.add_printer("stat", "^stat$", StatPrint)
o = gdb.current_objfile()
gdb.printing.register_pretty_printer(o, p)
By sourcing the previous Python script, the ``PrettyPrinter`` can recognize
``struct stat`` and output a readable format for developers to inspect file
attributes. Without inserting functions to parse and print ``struct stat``, it
is a more convenient way to acquire a better output from Python API.
.. code-block:: bash
(gdb) list 15
10 struct stat st;
11
12 if ((rc = stat("./a.cpp", &st)) < 0) {
13 perror("stat failed.");
14 goto end;
15 }
16
17 rc = 0;
18 end:
19 return rc;
(gdb) source st.py
(gdb) b 17
Breakpoint 1 at 0x762: file a.cpp, line 17.
(gdb) r
Starting program: /root/a.out
Breakpoint 1, main (argc=1, argv=0x7fffffffe788) at a.cpp:17
17 rc = 0;
(gdb) p st
$1 = {
Size: 298
Blocks: 8
IO Block: 4096
Inode: 1322071
Access: -rw-rw-r--
File Type: regular file
Uid: (0/root)
Gid: (0/root)
Access: 2019-12-28T15:53:17
Modify: 2019-12-28T15:53:01
Change: 2019-12-28T15:53:01
}
Note that developers can disable a user-defined pretty-print via the command
``disable``. For example, the previous Python script registers a pretty printer
under the global pretty-printers. By calling ``disable pretty-print``, the
printer ``sp`` will be disabled.
.. code-block:: bash
(gdb) disable pretty-print global sp
1 printer disabled
1 of 2 printers enabled
(gdb) i pretty-print
global pretty-printers:
builtin
mpx_bound128
sp [disabled]
stat
Additionally, developers can exclude a printer in the current GDB debugging
session if it is no longer required. The following snippet shows how to delete
the ``sp`` printer through ``gdb.pretty_printers.remove``.
.. code-block:: bash
(gdb) python
>import gdb
>for p in gdb.pretty_printers:
> if p.name == "sp":
> gdb.pretty_printers.remove(p)
>end
(gdb) i pretty-print
global pretty-printers:
builtin
mpx_bound128
Conclusion
----------
Integrating Python interpreter into GDB offers many flexible ways to
troubleshoot issues. While many integrated development environments (IDEs) may
embed GDB to debug visually, GDB allows developers to implement their commands
and parse variables’ output at runtime. By using debugging scripts, developers
can monitor and record necessary information without modifying their code.
Honestly, inserting or enabling debugging code blocks may change a program’s
behaviors, and developers should get rid of this bad habit. Also, when a problem
is reproduced, GDB can attach that process and examine its status without stopping
it. Obviously, debugging via GDB is inevitable if a challenging issue emerges.
Thanks to integrating Python into GDB, developing a script to troubleshoot becomes
more accessible that leads to developers establishing their debugging methods
diversely.
Reference
---------
1. `Extending GDB using Python`_
2. `gcc/gcc/gdbhooks.py`_
3. `gdbinit/Gdbinit`_
4. `cyrus-and/gdb-dashboard`_
5. `hugsy/gef`_
6. `sharkdp/stack-inspector`_
7. `gdb Debugging Full Example (Tutorial)`_
.. _Pygments: https://pygments.org/
.. _Extending GDB using Python: https://sourceware.org/gdb/onlinedocs/gdb/Python.html
.. _gcc/gcc/gdbhooks.py: https://github.com/gcc-mirror/gcc/blob/master/gcc/gdbhooks.py
.. _hugsy/gef: https://github.com/hugsy/gef
.. _cyrus-and/gdb-dashboard: https://github.com/cyrus-and/gdb-dashboard
.. _gdbinit/Gdbinit: https://github.com/gdbinit/Gdbinit
.. _sharkdp/stack-inspector: https://github.com/sharkdp/stack-inspector
.. _GDB Text User Interface (TUI): https://sourceware.org/gdb/onlinedocs/gdb/TUI.html
.. _Linux perf: https://github.com/torvalds/linux/tree/master/tools/perf
.. _Valgrind: https://valgrind.org/
.. _gdb Debugging Full Example (Tutorial): http://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html
================================================
FILE: docs/notes/appendix/python-walrus.rst
================================================
.. meta::
:description lang=en: Design philosophy of pep 572, the walrus operator
:keywords: Python3, PEP 572, walrus operator
PEP 572 and The Walrus Operator
===============================
:Date: 2025-08-30
Abstract
--------
`PEP 572`_ is one of the most contentious proposals in Python3 history because
assigning a value within an expression seems unnecessary. Also, it is ambiguous
for developers to distinguish the difference between **the walrus operator**
(``:=``) and the equal operator (``=``). Even though sophisticated developers
can use "``:=``" smoothly, they may concern the readability of their code. To
better understand the usage of "``:=``," this article discusses its design
philosophy and what kind of problems it tries to solve.
Introduction
------------
For C/C++ developer, assigning a function return to a variable is common due
to error code style handling. Managing function errors includes two steps;
one is to check the return value; another is to check ``errno``. For example,
.. code-block:: cpp
#include
#include
#include
#include
int main(int argc, char *argv[]) {
int rc = -1;
// assign access return to rc and check its value
if ((rc = access("hello_walrus", R_OK)) == -1) {
fprintf(stderr, "%s", strerror(errno));
goto end;
}
rc = 0;
end:
return rc;
}
In this case, ``access`` will assign its return value to the variable ``rc``
first. Then, the program will compare the ``rc`` value with ``-1`` to check
whether the execution of ``access`` is successful or not. However, Python did
not allow assigning values to variables within an expression before 3.8. To fix
this problem, therefore, PEP 572 introduced the walrus operator for developers.
The following Python snippet is equal to the previous C example.
.. code-block:: python
>>> import os
>>> from ctypes import *
>>> libc = CDLL("libc.dylib", use_errno=True)
>>> access = libc.access
>>> path = create_string_buffer(b"hello_walrus")
>>> if (rc := access(path, os.R_OK)) == -1:
... errno = get_errno()
... print(os.strerror(errno), file=sys.stderr)
...
No such file or directory
Why ``:=`` ?
------------
Developers may confuse the difference between "``:=``" and "``=``." In fact, they
serve the same purpose, assigning somethings to variables. Why Python introduced
"``:=``" instead of using "``=``"? What is the benefit of using "``:=``"? One
reason is to reinforce the visual recognition due to a common mistake made by
C/C++ developers. For instance,
.. code-block:: cpp
int rc = access("hello_walrus", R_OK);
// rc is unintentionally assigned to -1
if (rc = -1) {
fprintf(stderr, "%s", strerror(errno));
goto end;
}
Rather than comparison, the variable, ``rc``, is mistakenly assigned to -1. To
prevent this error, some people advocate using `Yoda conditions`_ within an
expression.
.. code-block:: cpp
int rc = access("hello_walrus", R_OK);
// -1 = rc will raise a compile error
if (-1 == rc) {
fprintf(stderr, "%s", strerror(errno));
goto end;
}
However, Yoda style is not readable enough like Yoda speaks non-standardized
English. Also, unlike C/C++ can detect assigning error during the compile-time
via compiler options (e.g., -Wparentheses), it is difficult for Python interpreter
to distinguish such mistakes throughout the runtime. Thus, the final result
of PEP 572 was to use a new syntax as a solution to implement *assignment
expressions*.
The walrus operator was not the first solution for PEP 572. The original proposal
used ``EXPR as NAME`` to assign values to variables. Unfortunately, there are
some rejected reasons in this solution and other solutions as well. After
intense debates, the final decision was ``:=``.
Scopes
------
Unlike other expressions, which a variable is bound to a scope, an assignment
expression belongs to the current scope. The purpose of this design is to
allow a compact way to write code.
.. code-block:: python3
>>> if not (env := os.environ.get("HOME")):
... raise KeyError("env HOME does not find!")
...
>>> print(env)
/root
In PEP 572, another benefit is to conveniently capture a "witness" for an
``any()`` or an ``all()`` expression. Although capturing function inputs can
assist an interactive debugger, the advantage is not so obvious, and examples
lack readability. Therefore, this benefit does not discuss here. Note that
other languages (e.g., C/C++ or Go) may bind an assignment to a scope. Take
Golang as an example.
.. code-block:: go
package main
import (
"fmt"
"os"
)
func main() {
if env := os.Getenv("HOME"); env == "" {
panic(fmt.Sprintf("Home does not find"))
}
fmt.Print(env) // <--- compile error: undefined: env
}
Pitfalls
--------
Although an assigning expression allows writing compact code, there are many
pitfalls when a developer uses it in a list comprehension. A common ``SyntaxError``
is to rebind iteration variables.
.. code-block:: python3
>>> [i := i+1 for i in range(5)] # invalid
However, updating an iteration variable will reduce readability and introduce
bugs. Even if Python 3.8 did not implement the walrus operator, a programmer
should avoid reusing iteration variables within a scope.
Another pitfall is Python prohibits using assignment expressions within a
comprehension under a class scope.
.. code-block:: python3
>>> class Example:
... [(j := i) for i in range(5)] # invalid
...
This limitation was from `bpo-3692`_. The interpreter's behavior is
unpredictable when a class declaration contains a list comprehension. To avoid
this corner case, assigning expression is invalid under a class.
.. code-block:: python3
>>> class Foo:
... a = [1, 2, 3]
... b = [4, 5, 6]
... c = [i for i in zip(a, b)] # b is defined
...
>>> class Bar:
... a = [1,2,3]
... b = [4,5,6]
... c = [x * y for x in a for y in b] # b is undefined
...
Traceback (most recent call last):
File "", line 1, in
File "", line 4, in Bar
File "", line 4, in
NameError: name 'b' is not defined
Conclusion
----------
The reason why the walrus operator (``:=``) is so controversial is that code
readability may decrease. In fact, in the discussion `mail thread `_,
the author of PEP 572, Christoph Groth, had considered using "``=``" to implement
inline assignment like C/C++. Without judging "``:=``" is ugly, many developers
argue that distinguishing the functionality between "``:=``" and "``=``" is
difficult because they serve the same purpose, but behaviors are not consistent.
Also, writing compact code is not persuasive enough because smaller is not
always better. However, in some cases, the walrus operator can enhance
readability (if you understand how to use ``:=``). For example,
.. code-block:: python3
buf = b""
while True:
data = read(1024)
if not data:
break
buf += data
By using ``:=``, the previous example can be simplified.
.. code-block:: python3
buf = b""
while (data := read(1024)):
buf += data
`Python document`_ and GitHub `issue-8122`_ provides many great examples about
improving code readability by "``:=``". However, using the walrus operator
should be careful. Some cases, such as ``foo(x := 3, cat='vector')``, may
introduce new bugs if developers are not aware of scopes. Although PEP 572
may be risky for developers to write buggy code, an in-depth understanding of
design philosophy and useful examples will help us use it to write readable
code at the right time.
References
----------
1. `PEP 572 - Assignment Expressions`_
2. `What’s New In Python 3.8`_
3. `PEP 572 and decision-making in Python`_
4. `The PEP 572 endgame`_
5. `Use assignment expression in stdlib (combined PR)`_
6. `Improper scope in list comprehension, when used in class declaration`_
.. _PEP 572: https://www.python.org/dev/peps/pep-0572/
.. _PEP 572 - Assignment Expressions: https://www.python.org/dev/peps/pep-0572/
.. _What’s New In Python 3.8: https://docs.python.org/3/whatsnew/3.8.html
.. _PEP 572 and decision-making in Python: https://lwn.net/Articles/757713/
.. _The PEP 572 endgame: https://lwn.net/Articles/759558/
.. _Use assignment expression in stdlib (combined PR): https://github.com/python/cpython/pull/8122/files
.. _improper scope in list comprehension, when used in class declaration: https://bugs.python.org/issue3692
.. _Yoda conditions: https://en.wikipedia.org/wiki/Yoda_conditions
.. _bpo-3692: https://bugs.python.org/issue3692
.. _Python document: https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions
.. _issue-8122: https://github.com/python/cpython/pull/8122/files
================================================
FILE: docs/notes/asyncio/index.rst
================================================
.. meta::
:description lang=en: Python asyncio tutorial covering coroutines, event loops, tasks, async/await syntax, networking, and asynchronous programming patterns
:keywords: Python, Python3, asyncio, async, await, coroutine, event loop, asynchronous, concurrent, networking, TCP, UDP
Asyncio
=======
Python's ``asyncio`` module provides infrastructure for writing single-threaded
concurrent code using coroutines, multiplexing I/O access over sockets and other
resources, running network clients and servers, and other related primitives.
Unlike threading, asyncio uses cooperative multitasking, where tasks voluntarily
yield control to allow other tasks to run. This makes it ideal for I/O-bound
applications like web servers, database clients, and network services where
waiting for external resources is the primary bottleneck.
This section covers asyncio from basic concepts to advanced patterns, including
the event loop, coroutines, tasks, synchronization primitives, and real-world
examples like TCP/UDP servers, HTTP clients, and connection pools.
.. toctree::
:maxdepth: 1
python-asyncio-guide
python-asyncio-basic
python-asyncio-server
python-asyncio-advanced
================================================
FILE: docs/notes/asyncio/python-asyncio-advanced.rst
================================================
.. meta::
:description lang=en: Python asyncio advanced - synchronization, queues, subprocesses, debugging, patterns
:keywords: Python, Python3, Asyncio, Synchronization, Queue, Semaphore, Lock, Subprocess, Debugging
=================
Asyncio Advanced
=================
:Source: `src/basic/asyncio_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
Beyond basic coroutines and networking, asyncio provides synchronization
primitives, queues, subprocess management, and debugging tools. This section
covers advanced patterns for building robust async applications, including
producer-consumer patterns, rate limiting, graceful shutdown, and integration
with synchronous code.
Locks
-----
``asyncio.Lock`` prevents multiple coroutines from accessing a shared resource
simultaneously. Unlike threading locks, async locks must be used with ``await``
and only work within the same event loop.
.. code-block:: python
import asyncio
class SharedCounter:
def __init__(self):
self.value = 0
self._lock = asyncio.Lock()
async def increment(self):
async with self._lock:
current = self.value
await asyncio.sleep(0.01) # Simulate work
self.value = current + 1
async def worker(counter, name, count):
for _ in range(count):
await counter.increment()
print(f"{name} done")
async def main():
counter = SharedCounter()
await asyncio.gather(
worker(counter, "A", 100),
worker(counter, "B", 100),
worker(counter, "C", 100),
)
print(f"Final value: {counter.value}") # Should be 300
asyncio.run(main())
Semaphores for Rate Limiting
----------------------------
``asyncio.Semaphore`` limits the number of concurrent operations. This is
essential for rate limiting API calls, limiting database connections, or
controlling resource usage.
.. code-block:: python
import asyncio
async def fetch(url, semaphore):
async with semaphore:
print(f"Fetching {url}")
await asyncio.sleep(1) # Simulate network request
return f"Response from {url}"
async def main():
# Limit to 3 concurrent requests
semaphore = asyncio.Semaphore(3)
urls = [f"https://api.example.com/{i}" for i in range(10)]
tasks = [fetch(url, semaphore) for url in urls]
results = await asyncio.gather(*tasks)
for r in results:
print(r)
asyncio.run(main())
Events for Signaling
--------------------
``asyncio.Event`` allows coroutines to wait for a signal from another coroutine.
This is useful for coordinating startup, shutdown, or state changes between
multiple tasks.
.. code-block:: python
import asyncio
async def waiter(event, name):
print(f"{name} waiting for event")
await event.wait()
print(f"{name} got the event!")
async def setter(event):
print("Setting event in 2 seconds...")
await asyncio.sleep(2)
event.set()
print("Event set!")
async def main():
event = asyncio.Event()
await asyncio.gather(
waiter(event, "Task 1"),
waiter(event, "Task 2"),
waiter(event, "Task 3"),
setter(event),
)
asyncio.run(main())
Conditions for Complex Synchronization
--------------------------------------
``asyncio.Condition`` combines a lock with the ability to wait for a condition.
This is useful for producer-consumer patterns where consumers need to wait
for specific conditions.
.. code-block:: python
import asyncio
class Buffer:
def __init__(self, size):
self.buffer = []
self.size = size
self.condition = asyncio.Condition()
async def put(self, item):
async with self.condition:
while len(self.buffer) >= self.size:
await self.condition.wait()
self.buffer.append(item)
self.condition.notify()
async def get(self):
async with self.condition:
while not self.buffer:
await self.condition.wait()
item = self.buffer.pop(0)
self.condition.notify()
return item
async def producer(buffer, name):
for i in range(5):
await buffer.put(f"{name}-{i}")
print(f"Produced: {name}-{i}")
await asyncio.sleep(0.1)
async def consumer(buffer, name):
for _ in range(5):
item = await buffer.get()
print(f"{name} consumed: {item}")
await asyncio.sleep(0.2)
async def main():
buffer = Buffer(size=2)
await asyncio.gather(
producer(buffer, "P1"),
consumer(buffer, "C1"),
consumer(buffer, "C2"),
)
asyncio.run(main())
Queues for Producer-Consumer
----------------------------
``asyncio.Queue`` is the preferred way to implement producer-consumer patterns.
It handles synchronization internally and provides blocking get/put operations
with optional timeouts.
.. code-block:: python
import asyncio
async def producer(queue, name):
for i in range(5):
item = f"{name}-item-{i}"
await queue.put(item)
print(f"Produced: {item}")
await asyncio.sleep(0.5)
async def consumer(queue, name):
while True:
try:
item = await asyncio.wait_for(queue.get(), timeout=2.0)
print(f"{name} consumed: {item}")
queue.task_done()
await asyncio.sleep(0.1)
except asyncio.TimeoutError:
print(f"{name} timed out, exiting")
break
async def main():
queue = asyncio.Queue(maxsize=3)
producers = [
asyncio.create_task(producer(queue, "P1")),
asyncio.create_task(producer(queue, "P2")),
]
consumers = [
asyncio.create_task(consumer(queue, "C1")),
asyncio.create_task(consumer(queue, "C2")),
]
await asyncio.gather(*producers)
await queue.join() # Wait for all items to be processed
for c in consumers:
c.cancel()
asyncio.run(main())
Priority Queue
--------------
``asyncio.PriorityQueue`` processes items by priority. Lower priority values
are processed first. Items must be comparable or wrapped in tuples with
priority as the first element.
.. code-block:: python
import asyncio
async def producer(queue):
items = [
(3, "low priority"),
(1, "high priority"),
(2, "medium priority"),
]
for priority, item in items:
await queue.put((priority, item))
print(f"Added: {item} (priority {priority})")
async def consumer(queue):
while not queue.empty():
priority, item = await queue.get()
print(f"Processing: {item} (priority {priority})")
await asyncio.sleep(0.5)
queue.task_done()
async def main():
queue = asyncio.PriorityQueue()
await producer(queue)
await consumer(queue)
asyncio.run(main())
Running Subprocesses
--------------------
Asyncio can run and communicate with subprocesses asynchronously. This is
useful for running shell commands, external tools, or parallel processes
without blocking the event loop.
.. code-block:: python
import asyncio
async def run_command(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
stdout, stderr = await proc.communicate()
return {
'cmd': cmd,
'returncode': proc.returncode,
'stdout': stdout.decode().strip(),
'stderr': stderr.decode().strip()
}
async def main():
commands = [
"echo 'Hello World'",
"python --version",
"date",
]
results = await asyncio.gather(*[run_command(c) for c in commands])
for r in results:
print(f"Command: {r['cmd']}")
print(f"Output: {r['stdout']}")
print()
asyncio.run(main())
Subprocess with Streaming Output
--------------------------------
For long-running processes, you can stream output line by line instead of
waiting for the process to complete. This is useful for monitoring logs or
progress.
.. code-block:: python
import asyncio
async def stream_subprocess(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
while True:
line = await proc.stdout.readline()
if not line:
break
print(f"[{cmd[:20]}] {line.decode().strip()}")
await proc.wait()
return proc.returncode
async def main():
# Run multiple commands and stream their output
await asyncio.gather(
stream_subprocess("for i in 1 2 3; do echo $i; sleep 1; done"),
stream_subprocess("for i in a b c; do echo $i; sleep 0.5; done"),
)
asyncio.run(main())
Graceful Shutdown
-----------------
Proper shutdown handling ensures all tasks complete cleanly and resources
are released. Use signal handlers to catch SIGINT/SIGTERM and cancel tasks
gracefully.
.. code-block:: python
import asyncio
import signal
async def worker(name):
try:
while True:
print(f"{name} working...")
await asyncio.sleep(1)
except asyncio.CancelledError:
print(f"{name} cancelled, cleaning up...")
await asyncio.sleep(0.5) # Cleanup time
print(f"{name} cleanup done")
raise
async def main():
loop = asyncio.get_event_loop()
tasks = [
asyncio.create_task(worker("Worker-1")),
asyncio.create_task(worker("Worker-2")),
]
def shutdown():
print("\nShutdown requested...")
for task in tasks:
task.cancel()
loop.add_signal_handler(signal.SIGINT, shutdown)
loop.add_signal_handler(signal.SIGTERM, shutdown)
try:
await asyncio.gather(*tasks)
except asyncio.CancelledError:
print("All tasks cancelled")
asyncio.run(main())
Running Async Code in Threads
-----------------------------
When you need to run async code from synchronous code (e.g., in a callback
or from another thread), use ``asyncio.run_coroutine_threadsafe()``.
.. code-block:: python
import asyncio
import threading
import time
async def async_task(value):
await asyncio.sleep(1)
return value * 2
def thread_function(loop):
# Run async code from a different thread
future = asyncio.run_coroutine_threadsafe(
async_task(21), loop
)
result = future.result(timeout=5)
print(f"Thread got result: {result}")
async def main():
loop = asyncio.get_event_loop()
# Start a thread that will call async code
thread = threading.Thread(target=thread_function, args=(loop,))
thread.start()
# Keep the event loop running
await asyncio.sleep(2)
thread.join()
asyncio.run(main())
Debugging Asyncio
-----------------
Enable debug mode to catch common mistakes like blocking calls, unawaited
coroutines, and slow callbacks. Debug mode adds overhead so use it only
during development.
.. code-block:: python
import asyncio
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
async def slow_callback():
import time
time.sleep(0.2) # This will trigger a warning in debug mode
async def main():
await slow_callback()
# Method 1: Environment variable
# PYTHONASYNCIODEBUG=1 python script.py
# Method 2: asyncio.run with debug=True
asyncio.run(main(), debug=True)
Custom Event Loop
-----------------
You can customize the event loop behavior by subclassing or patching. This
is useful for debugging, profiling, or adding custom functionality.
.. code-block:: python
import asyncio
class DebugEventLoop(asyncio.SelectorEventLoop):
def _run_once(self):
# Track number of scheduled callbacks
num_ready = len(self._ready)
num_scheduled = len(self._scheduled)
if num_ready or num_scheduled:
print(f"Ready: {num_ready}, Scheduled: {num_scheduled}")
super()._run_once()
async def task(n):
await asyncio.sleep(n)
print(f"Task {n} done")
# Use custom event loop
loop = DebugEventLoop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(asyncio.gather(
task(0.1),
task(0.2),
task(0.3),
))
finally:
loop.close()
Timeout Patterns
----------------
Different timeout patterns for various use cases: per-operation timeout,
overall timeout, and timeout with fallback.
.. code-block:: python
import asyncio
async def fetch(url, delay):
await asyncio.sleep(delay)
return f"Response from {url}"
async def fetch_with_timeout(url, delay, timeout):
"""Per-operation timeout."""
try:
return await asyncio.wait_for(fetch(url, delay), timeout)
except asyncio.TimeoutError:
return f"Timeout for {url}"
async def fetch_all_with_timeout(urls, timeout):
"""Overall timeout for all operations."""
async def fetch_all():
return await asyncio.gather(*[fetch(u, i) for i, u in enumerate(urls)])
try:
return await asyncio.wait_for(fetch_all(), timeout)
except asyncio.TimeoutError:
return ["Overall timeout"]
async def fetch_with_fallback(url, delay, timeout, fallback):
"""Timeout with fallback value."""
try:
return await asyncio.wait_for(fetch(url, delay), timeout)
except asyncio.TimeoutError:
return fallback
async def main():
# Per-operation timeout
result = await fetch_with_timeout("slow.com", 5, 1)
print(result)
# Timeout with fallback
result = await fetch_with_fallback("slow.com", 5, 1, "cached response")
print(result)
asyncio.run(main())
Retry Pattern
-------------
Implement retry logic for transient failures with exponential backoff.
This is essential for robust network clients.
.. code-block:: python
import asyncio
import random
class RetryError(Exception):
pass
async def unreliable_operation():
"""Simulates an operation that fails randomly."""
if random.random() < 0.7:
raise ConnectionError("Network error")
return "Success!"
async def retry(coro_func, max_retries=3, base_delay=1.0):
"""Retry with exponential backoff."""
last_exception = None
for attempt in range(max_retries):
try:
return await coro_func()
except Exception as e:
last_exception = e
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, 0.1 * delay)
print(f"Attempt {attempt + 1} failed, retrying in {delay:.2f}s")
await asyncio.sleep(delay + jitter)
raise RetryError(f"Failed after {max_retries} attempts") from last_exception
async def main():
try:
result = await retry(unreliable_operation, max_retries=5)
print(f"Result: {result}")
except RetryError as e:
print(f"All retries failed: {e}")
asyncio.run(main())
Async Context Variable
----------------------
Context variables (Python 3.7+) provide task-local storage, similar to
thread-local storage but for async tasks. Useful for request IDs, user
context, or database connections.
.. code-block:: python
import asyncio
import contextvars
# Create context variable
request_id = contextvars.ContextVar('request_id', default=None)
async def process_request(rid):
request_id.set(rid)
await step1()
await step2()
async def step1():
rid = request_id.get()
print(f"[{rid}] Step 1")
await asyncio.sleep(0.1)
async def step2():
rid = request_id.get()
print(f"[{rid}] Step 2")
await asyncio.sleep(0.1)
async def main():
await asyncio.gather(
process_request("req-001"),
process_request("req-002"),
process_request("req-003"),
)
asyncio.run(main())
TaskGroup (Python 3.11+)
------------------------
``TaskGroup`` provides structured concurrency, ensuring all tasks complete
or are cancelled together. Exceptions in any task cancel all other tasks
in the group.
.. code-block:: python
import asyncio
async def task(name, delay, should_fail=False):
await asyncio.sleep(delay)
if should_fail:
raise ValueError(f"{name} failed!")
return f"{name} done"
async def main():
try:
async with asyncio.TaskGroup() as tg:
tg.create_task(task("A", 1))
tg.create_task(task("B", 2))
tg.create_task(task("C", 0.5, should_fail=True))
except* ValueError as eg:
for exc in eg.exceptions:
print(f"Caught: {exc}")
# Python 3.11+
asyncio.run(main())
================================================
FILE: docs/notes/asyncio/python-asyncio-basic.rst
================================================
.. meta::
:description lang=en: Python asyncio basics - coroutines, tasks, event loop, async/await syntax
:keywords: Python, Python3, Asyncio, Coroutines, Event Loop, async await, Asynchronous Programming
================
Asyncio Basics
================
:Source: `src/basic/asyncio_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
The ``asyncio`` module, introduced in Python 3.4 and significantly improved in
Python 3.5+ with ``async/await`` syntax, provides a foundation for writing
asynchronous code. Unlike threads which use preemptive multitasking (the OS
decides when to switch), asyncio uses cooperative multitasking where coroutines
explicitly yield control using ``await``. This eliminates race conditions common
in threaded code and makes reasoning about program flow much easier.
Key concepts:
- **Coroutine**: A function defined with ``async def`` that can be paused and resumed
- **Event Loop**: The central scheduler that runs coroutines and handles I/O events
- **Task**: A wrapper around a coroutine that schedules it for execution
- **Future**: A placeholder for a result that will be available later
Running Coroutines with asyncio.run
-----------------------------------
The simplest way to run async code is ``asyncio.run()``, introduced in Python 3.7.
It creates an event loop, runs the coroutine until completion, and cleans up
automatically. This is the recommended entry point for asyncio programs.
.. code-block:: python
import asyncio
async def hello():
print("Hello")
await asyncio.sleep(1)
print("World")
# Python 3.7+
asyncio.run(hello())
For file I/O or other blocking operations, use ``run_in_executor`` to avoid
blocking the event loop:
.. code-block:: python
import asyncio
from concurrent.futures import ThreadPoolExecutor
async def read_file(path):
loop = asyncio.get_event_loop()
with ThreadPoolExecutor() as pool:
with open(path) as f:
return await loop.run_in_executor(pool, f.read)
content = asyncio.run(read_file('/etc/hosts'))
Creating and Managing Tasks
---------------------------
Tasks allow multiple coroutines to run concurrently. When you create a task,
it's scheduled to run on the event loop immediately. Use ``asyncio.create_task()``
(Python 3.7+) or ``loop.create_task()`` to create tasks.
.. code-block:: python
import asyncio
async def fetch(name, delay):
await asyncio.sleep(delay)
return f"{name} done"
async def main():
# Create tasks - they start running immediately
task1 = asyncio.create_task(fetch("A", 2))
task2 = asyncio.create_task(fetch("B", 1))
# Wait for both to complete
result1 = await task1
result2 = await task2
print(result1, result2)
asyncio.run(main())
Gathering Multiple Coroutines
-----------------------------
``asyncio.gather()`` runs multiple coroutines concurrently and collects their
results in order. This is the most common way to run multiple async operations
in parallel and wait for all of them to complete.
.. code-block:: python
import asyncio
async def fetch(url, delay):
await asyncio.sleep(delay)
return f"Response from {url}"
async def main():
urls = ["site1.com", "site2.com", "site3.com"]
coros = [fetch(url, i * 0.5) for i, url in enumerate(urls)]
# Run all concurrently, results in same order as input
results = await asyncio.gather(*coros)
for r in results:
print(r)
asyncio.run(main())
Waiting with Timeout
--------------------
Use ``asyncio.wait_for()`` to set a timeout on async operations. This is
essential for network operations where you don't want to wait indefinitely
for a response that may never come.
.. code-block:: python
import asyncio
async def slow_operation():
await asyncio.sleep(10)
return "done"
async def main():
try:
result = await asyncio.wait_for(slow_operation(), timeout=2.0)
except asyncio.TimeoutError:
print("Operation timed out!")
asyncio.run(main())
Waiting for First Completed
---------------------------
``asyncio.wait()`` provides more control than ``gather()``. You can wait for
the first task to complete, first exception, or all tasks. This is useful
when you want to process results as they become available.
.. code-block:: python
import asyncio
async def fetch(name, delay):
await asyncio.sleep(delay)
return f"{name}: {delay}s"
async def main():
tasks = [
asyncio.create_task(fetch("fast", 1)),
asyncio.create_task(fetch("slow", 3)),
]
# Wait for first to complete
done, pending = await asyncio.wait(
tasks, return_when=asyncio.FIRST_COMPLETED
)
for task in done:
print(f"Completed: {task.result()}")
print(f"Still pending: {len(pending)}")
# Cancel pending tasks
for task in pending:
task.cancel()
asyncio.run(main())
Asynchronous Iteration
----------------------
Async iterators allow you to iterate over data that arrives asynchronously,
such as streaming responses or database cursors. Implement ``__aiter__`` and
``__anext__`` methods to create custom async iterators.
.. code-block:: python
import asyncio
class AsyncRange:
"""Async iterator that yields numbers with delays."""
def __init__(self, start, stop):
self.current = start
self.stop = stop
def __aiter__(self):
return self
async def __anext__(self):
if self.current >= self.stop:
raise StopAsyncIteration
await asyncio.sleep(0.5)
value = self.current
self.current += 1
return value
async def main():
async for num in AsyncRange(0, 5):
print(num)
asyncio.run(main())
Asynchronous Context Managers
-----------------------------
Async context managers are essential for managing resources that require
async setup or cleanup, such as database connections, file handles, or
network sessions. Use ``async with`` to ensure proper resource management.
.. code-block:: python
import asyncio
class AsyncConnection:
"""Simulated async database connection."""
async def __aenter__(self):
print("Connecting...")
await asyncio.sleep(1)
print("Connected")
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
print("Disconnecting...")
await asyncio.sleep(0.5)
print("Disconnected")
async def query(self, sql):
await asyncio.sleep(0.1)
return f"Result of: {sql}"
async def main():
async with AsyncConnection() as conn:
result = await conn.query("SELECT * FROM users")
print(result)
asyncio.run(main())
Using @asynccontextmanager
--------------------------
The ``@asynccontextmanager`` decorator (Python 3.7+) provides a simpler way
to create async context managers using generator syntax, similar to the
synchronous ``@contextmanager`` decorator.
.. code-block:: python
import asyncio
from contextlib import asynccontextmanager
@asynccontextmanager
async def managed_resource(name):
print(f"Acquiring {name}")
await asyncio.sleep(0.5)
try:
yield name
finally:
print(f"Releasing {name}")
await asyncio.sleep(0.2)
async def main():
async with managed_resource("database") as resource:
print(f"Using {resource}")
asyncio.run(main())
Running Blocking Code in Executor
---------------------------------
When you need to call blocking code (file I/O, CPU-intensive operations,
or libraries without async support), use ``run_in_executor()`` to run it
in a thread pool without blocking the event loop.
.. code-block:: python
import asyncio
import time
from concurrent.futures import ThreadPoolExecutor
def blocking_io():
"""Simulates blocking I/O operation."""
time.sleep(2)
return "IO complete"
def cpu_bound():
"""Simulates CPU-intensive operation."""
return sum(i * i for i in range(10**6))
async def main():
loop = asyncio.get_event_loop()
# Run in default executor (ThreadPoolExecutor)
result1 = await loop.run_in_executor(None, blocking_io)
print(result1)
# Run in custom executor
with ThreadPoolExecutor(max_workers=4) as pool:
result2 = await loop.run_in_executor(pool, cpu_bound)
print(result2)
asyncio.run(main())
Async Generators
----------------
Async generators (Python 3.6+) combine generators with async/await, allowing
you to yield values asynchronously. They're useful for streaming data or
implementing async iterators more concisely.
.. code-block:: python
import asyncio
async def async_range(start, stop):
"""Async generator that yields numbers with delays."""
for i in range(start, stop):
await asyncio.sleep(0.5)
yield i
async def main():
async for num in async_range(0, 5):
print(num)
# Async comprehension
results = [x async for x in async_range(0, 3)]
print(results)
asyncio.run(main())
Exception Handling in Tasks
---------------------------
Exceptions in tasks are stored and re-raised when you await the task or
call ``result()``. Unhandled exceptions in tasks that are never awaited
will be logged but may be silently ignored, so always await your tasks.
.. code-block:: python
import asyncio
async def failing_task():
await asyncio.sleep(1)
raise ValueError("Something went wrong")
async def main():
task = asyncio.create_task(failing_task())
try:
await task
except ValueError as e:
print(f"Caught exception: {e}")
# Using gather with return_exceptions
tasks = [
asyncio.create_task(asyncio.sleep(1)),
asyncio.create_task(failing_task()),
]
results = await asyncio.gather(*tasks, return_exceptions=True)
for r in results:
if isinstance(r, Exception):
print(f"Task failed: {r}")
else:
print(f"Task succeeded: {r}")
asyncio.run(main())
Cancelling Tasks
----------------
Tasks can be cancelled using ``task.cancel()``. The cancelled task will
raise ``asyncio.CancelledError`` at the next await point. Handle this
exception to perform cleanup when a task is cancelled.
.. code-block:: python
import asyncio
async def long_running():
try:
while True:
print("Working...")
await asyncio.sleep(1)
except asyncio.CancelledError:
print("Task was cancelled, cleaning up...")
raise # Re-raise to mark task as cancelled
async def main():
task = asyncio.create_task(long_running())
await asyncio.sleep(3)
task.cancel()
try:
await task
except asyncio.CancelledError:
print("Task cancellation confirmed")
asyncio.run(main())
================================================
FILE: docs/notes/asyncio/python-asyncio-guide.rst
================================================
.. meta::
:description lang=en: A comprehensive guide to understanding asynchronous programming in Python, from blocking I/O to event loops, callbacks, generators, and async/await syntax
:keywords: Python, Python3, asyncio, coroutine, event loop, async await, asynchronous programming, C10k problem, non-blocking I/O, selectors, generators, callback
================================================
A Hitchhiker's Guide to Asynchronous Programming
================================================
.. contents:: Table of Contents
:backlinks: none
Abstract
--------
The `C10k problem`_ remains a fundamental challenge for programmers seeking to
handle massive concurrent connections efficiently. Traditionally, developers
address extensive I/O operations using **threads**, **epoll**, or **kqueue** to
prevent software from blocking on expensive operations. However, developing
readable and bug-free concurrent code is challenging due to complexities around
data sharing and task dependencies. Even powerful tools like `Valgrind`_ that
help detect deadlocks and race conditions cannot eliminate the time-consuming
debugging process as software scales.
To address these challenges, many programming languages—including Python,
JavaScript, and C++—have developed better libraries, frameworks, and syntaxes
to help programmers manage concurrent tasks properly. Rather than focusing on
how to use modern parallel APIs, this article concentrates on the **design
philosophy** behind asynchronous programming patterns, tracing the evolution
from blocking I/O to the elegant ``async/await`` syntax.
Using threads is the most natural approach for dispatching tasks without
blocking the main thread. However, threads introduce performance overhead from
context switching and require careful locking of critical sections for atomic
operations. While event loops can enhance performance in I/O-bound scenarios,
writing readable event-driven code is challenging due to callback complexity
(commonly known as "callback hell"). Fortunately, Python introduced the
``async/await`` syntax to help developers write understandable code with high
performance. The following figure illustrates how ``async/await`` enables
handling socket connections with the simplicity of threads but the efficiency
of event loops.
.. image:: https://raw.githubusercontent.com/crazyguitar/pysheeet/master/docs/_static/appendix/event-loop-vs-thread.png
Introduction
------------
Handling I/O operations such as network connections is among the most expensive
tasks in any program. Consider a simple TCP blocking echo server (shown below).
If a client connects without sending any data, it blocks all other connections.
Even when clients send data promptly, the server cannot handle concurrent
requests because it wastes significant time waiting for I/O responses from
hardware like network interfaces. Thus, socket programming with concurrency
becomes essential for managing high request volumes.
.. code-block:: python
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 5566))
s.listen(10)
while True:
conn, addr = s.accept()
msg = conn.recv(1024)
conn.send(msg)
One solution to prevent blocking is dispatching tasks to separate threads. The
following example demonstrates handling connections simultaneously using threads.
However, creating numerous threads consumes computing resources without
proportional throughput gains. Worse, applications may waste time waiting for
locks when processing tasks in critical sections. While threads solve blocking
issues, factors like CPU utilization and memory overhead remain critical for
solving the C10k problem. Without creating unlimited threads, the **event loop**
provides an alternative solution for managing connections efficiently.
.. code-block:: python
import threading
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 5566))
s.listen(10240)
def handler(conn):
while True:
msg = conn.recv(65535)
conn.send(msg)
while True:
conn, addr = s.accept()
t = threading.Thread(target=handler, args=(conn,))
t.start()
A simple event-driven socket server comprises three main components: an **I/O
multiplexing module** (e.g., `select`_), a **scheduler** (the loop), and
**callback functions** (event handlers). The following server uses Python's
high-level I/O multiplexing module, `selectors`_, within a loop to check
whether I/O operations are ready. When data becomes available for reading or
writing, the loop retrieves I/O events and executes the appropriate callback
functions—``accept``, ``read``, or ``write``—to complete tasks.
.. code-block:: python
import socket
from selectors import DefaultSelector, EVENT_READ, EVENT_WRITE
from functools import partial
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 5566))
s.listen(10240)
s.setblocking(False)
sel = DefaultSelector()
def accept(s, mask):
conn, addr = s.accept()
conn.setblocking(False)
sel.register(conn, EVENT_READ, read)
def read(conn, mask):
msg = conn.recv(65535)
if not msg:
sel.unregister(conn)
return conn.close()
sel.modify(conn, EVENT_WRITE, partial(write, msg=msg))
def write(conn, mask, msg=None):
if msg:
conn.send(msg)
sel.modify(conn, EVENT_READ, read)
sel.register(s, EVENT_READ, accept)
while True:
events = sel.select()
for e, m in events:
cb = e.data
cb(e.fileobj, m)
Although managing connections via threads may be inefficient, event-loop-based
programs are harder to read and maintain. To enhance code readability, many
programming languages—including Python—introduce abstract concepts such as
**coroutines**, **futures**, and **async/await** to handle I/O multiplexing
elegantly. The following sections explore these concepts and the problems they
solve.
Callback Functions
------------------
Callback functions control data flow at runtime when events occur. However,
preserving state across callbacks is challenging. For example, implementing a
handshake protocol over TCP requires storing previous state somewhere accessible
to subsequent callbacks.
.. code-block:: python
import socket
from selectors import DefaultSelector, EVENT_READ, EVENT_WRITE
from functools import partial
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 5566))
s.listen(10240)
s.setblocking(False)
sel = DefaultSelector()
is_hello = {}
def accept(s, mask):
conn, addr = s.accept()
conn.setblocking(False)
is_hello[conn] = False
sel.register(conn, EVENT_READ, read)
def read(conn, mask):
msg = conn.recv(65535)
if not msg:
sel.unregister(conn)
return conn.close()
# Check whether handshake is successful
if is_hello[conn]:
sel.modify(conn, EVENT_WRITE, partial(write, msg=msg))
return
# Perform handshake
if msg.decode("utf-8").strip() != "hello":
sel.unregister(conn)
return conn.close()
is_hello[conn] = True
def write(conn, mask, msg=None):
if msg:
conn.send(msg)
sel.modify(conn, EVENT_READ, read)
sel.register(s, EVENT_READ, accept)
while True:
events = sel.select()
for e, m in events:
cb = e.data
cb(e.fileobj, m)
Although the ``is_hello`` dictionary stores state to track handshake status,
the code becomes difficult to understand. The underlying logic is actually
simple—equivalent to this blocking version:
.. code-block:: python
def accept(s):
conn, addr = s.accept()
success = handshake(conn)
if not success:
conn.close()
def handshake(conn):
data = conn.recv(65535)
if not data:
return False
if data.decode('utf-8').strip() != "hello":
return False
conn.send(b"hello")
return True
To achieve similar structure in non-blocking code, a function (or task) must
snapshot its current state—including arguments, local variables, and execution
position—when waiting for I/O operations. The scheduler must then be able to
**re-enter** the function and execute remaining code after I/O completes.
Unlike languages like C++, Python achieves this naturally because **generators**
preserve all state and can be re-entered by calling ``next()``. By utilizing
generators, handling I/O operations in a non-blocking manner with readable,
linear code—called *inline callbacks*—becomes possible within an event loop.
Event Loop
----------
An event loop is a user-space scheduler that manages tasks within a program
instead of relying on operating system thread scheduling. The following snippet
demonstrates a simple event loop handling socket connections asynchronously.
The implementation appends tasks to a FIFO job queue and registers with a
*selector* when I/O operations are not ready. A *generator* preserves task
state, allowing execution to resume without callback functions when I/O results
become available. Understanding how this event loop works reveals that a Python
generator is indeed a form of **coroutine**.
.. code-block:: python
# loop.py
from selectors import DefaultSelector, EVENT_READ, EVENT_WRITE
class Loop:
def __init__(self):
self.sel = DefaultSelector()
self.queue = []
def create_task(self, task):
self.queue.append(task)
def polling(self):
for e, m in self.sel.select(0):
self.queue.append((e.data, None))
self.sel.unregister(e.fileobj)
def is_registered(self, fileobj):
try:
self.sel.get_key(fileobj)
except KeyError:
return False
return True
def register(self, t, data):
if not data:
return False
event_type, fileobj = data
if event_type in (EVENT_READ, EVENT_WRITE):
if self.is_registered(fileobj):
self.sel.modify(fileobj, event_type, t)
else:
self.sel.register(fileobj, event_type, t)
return True
return False
def accept(self, s):
while True:
try:
conn, addr = s.accept()
except BlockingIOError:
yield (EVENT_READ, s)
else:
break
return conn, addr
def recv(self, conn, size):
while True:
try:
msg = conn.recv(size)
except BlockingIOError:
yield (EVENT_READ, conn)
else:
break
return msg
def send(self, conn, msg):
while True:
try:
size = conn.send(msg)
except BlockingIOError:
yield (EVENT_WRITE, conn)
else:
break
return size
def once(self):
self.polling()
unfinished = []
for t, data in self.queue:
try:
data = t.send(data)
except StopIteration:
continue
if self.register(t, data):
unfinished.append((t, None))
self.queue = unfinished
def run(self):
while self.queue or self.sel.get_map():
self.once()
By assigning jobs to an event loop, the programming pattern resembles using
threads but with a user-level scheduler. `PEP 380`_ introduced generator
delegation via ``yield from``, allowing a generator to wait for other generators
to complete. The following snippet is far more intuitive and readable than
callback-based I/O handling:
.. code-block:: python
# server.py
# $ python3 server.py &
# $ nc localhost 5566
import socket
from loop import Loop
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 5566))
s.listen(10240)
s.setblocking(False)
loop = Loop()
def handler(conn):
while True:
msg = yield from loop.recv(conn, 1024)
if not msg:
conn.close()
break
yield from loop.send(conn, msg)
def main():
while True:
conn, addr = yield from loop.accept(s)
conn.setblocking(False)
loop.create_task((handler(conn), None))
loop.create_task((main(), None))
loop.run()
Using an event loop with ``yield from`` manages connections without blocking
the main thread—this was how ``asyncio`` worked before Python 3.5. However,
``yield from`` is ambiguous: why does adding ``@asyncio.coroutine`` transform
a generator into a coroutine? Instead of overloading generator syntax for
asynchronous operations, `PEP 492`_ proposed that coroutines should become a
**standalone concept** in Python. This led to the introduction of ``async/await``
syntax, dramatically improving readability for asynchronous programming.
What is a Coroutine?
--------------------
Python documentation defines coroutines as "a generalized form of subroutines."
This definition, while technically accurate, can be confusing. Based on our
discussion, an event loop schedules generators to perform specific tasks—similar
to how an OS dispatches jobs to threads. In this context, generators serve as
"routine workers." A **coroutine** is simply a task scheduled by an event loop
within a program, rather than by the operating system.
The following snippet illustrates what ``@coroutine`` does. This decorator
transforms a function into a generator function and wraps it with
``types.coroutine`` for backward compatibility:
.. code-block:: python
import asyncio
import inspect
import types
from functools import wraps
from asyncio.futures import Future
def coroutine(func):
"""Simple prototype of coroutine decorator"""
if inspect.isgeneratorfunction(func):
return types.coroutine(func)
@wraps(func)
def coro(*a, **k):
res = func(*a, **k)
if isinstance(res, Future) or inspect.isgenerator(res):
res = yield from res
return res
return types.coroutine(coro)
@coroutine
def foo():
yield from asyncio.sleep(1)
print("Hello Foo")
loop = asyncio.get_event_loop()
loop.run_until_complete(loop.create_task(foo()))
loop.close()
With Python 3.5+, the ``async def`` syntax creates native coroutines directly,
and ``await`` replaces ``yield from`` for suspending execution. This makes the
intent explicit: ``async def`` declares a coroutine, and ``await`` marks
suspension points where the event loop can switch to other tasks.
Conclusion
----------
Asynchronous programming via event loops has become more straightforward and
readable thanks to modern syntax and library support. Most programming
languages, including Python, implement libraries that manage task scheduling
through integration with new syntaxes. While ``async/await`` may seem enigmatic
initially, it provides a way for programmers to develop logical, linear code
structure—similar to using threads—while gaining the performance benefits of
event-driven I/O.
Without callback functions passing state between handlers, programmers no
longer need to worry about preserving local variables and arguments across
asynchronous boundaries. This allows developers to focus on application logic
rather than spending time troubleshooting concurrency issues. The evolution
from callbacks to generators to ``async/await`` represents a significant
advancement in making concurrent programming accessible and maintainable.
References
----------
1. `asyncio — Asynchronous I/O`_
2. `PEP 342 - Coroutines via Enhanced Generators`_
3. `PEP 380 - Syntax for Delegating to a Subgenerator`_
4. `PEP 492 - Coroutines with async and await syntax`_
.. _C10k problem: https://en.wikipedia.org/wiki/C10k_problem
.. _Valgrind: https://valgrind.org/
.. _select: https://docs.python.org/3/library/select.html
.. _selectors: https://docs.python.org/3/library/selectors.html
.. _asyncio — Asynchronous I/O: https://docs.python.org/3/library/asyncio.html
.. _PEP 492: https://www.python.org/dev/peps/pep-0492/
.. _PEP 380: https://www.python.org/dev/peps/pep-0380/
.. _PEP 342 - Coroutines via Enhanced Generators: https://www.python.org/dev/peps/pep-0342/
.. _PEP 492 - Coroutines with async and await syntax: https://www.python.org/dev/peps/pep-0492/
.. _PEP 380 - Syntax for Delegating to a Subgenerator: https://www.python.org/dev/peps/pep-0380/
================================================
FILE: docs/notes/asyncio/python-asyncio-server.rst
================================================
.. meta::
:description lang=en: Python asyncio networking - TCP/UDP servers, HTTP clients, SSL/TLS, protocols
:keywords: Python, Python3, Asyncio, TCP Server, UDP Server, HTTP Client, SSL TLS, Network Programming
===================
Asyncio Networking
===================
:Source: `src/basic/asyncio_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
Asyncio excels at network programming because network I/O is inherently
asynchronous - you send a request and wait for a response. Instead of blocking
a thread while waiting, asyncio allows other tasks to run. This section covers
building TCP/UDP servers and clients, HTTP requests, SSL/TLS encryption, and
the Transport/Protocol API for low-level control.
TCP Echo Server with Streams
----------------------------
The streams API (``asyncio.start_server``, ``open_connection``) provides a
high-level interface for TCP networking. It handles buffering, encoding, and
connection management automatically, making it the recommended approach for
most applications.
.. code-block:: python
import asyncio
async def handle_client(reader, writer):
addr = writer.get_extra_info('peername')
print(f"Connected: {addr}")
while True:
data = await reader.read(1024)
if not data:
break
message = data.decode()
print(f"Received: {message!r} from {addr}")
writer.write(data)
await writer.drain()
print(f"Disconnected: {addr}")
writer.close()
await writer.wait_closed()
async def main():
server = await asyncio.start_server(
handle_client, 'localhost', 8888
)
addr = server.sockets[0].getsockname()
print(f"Serving on {addr}")
async with server:
await server.serve_forever()
asyncio.run(main())
TCP Client with Streams
-----------------------
The client side uses ``asyncio.open_connection()`` to establish a connection.
The returned reader and writer objects provide async methods for sending and
receiving data.
.. code-block:: python
import asyncio
async def tcp_client(message):
reader, writer = await asyncio.open_connection(
'localhost', 8888
)
print(f"Sending: {message!r}")
writer.write(message.encode())
await writer.drain()
data = await reader.read(1024)
print(f"Received: {data.decode()!r}")
writer.close()
await writer.wait_closed()
asyncio.run(tcp_client("Hello, Server!"))
Low-Level TCP with Sockets
--------------------------
For more control, you can use raw sockets with the event loop's socket methods.
This approach is useful when you need fine-grained control over socket options
or when integrating with existing socket-based code.
.. code-block:: python
import asyncio
import socket
async def handle_client(loop, conn):
while True:
data = await loop.sock_recv(conn, 1024)
if not data:
break
await loop.sock_sendall(conn, data)
conn.close()
async def server():
loop = asyncio.get_event_loop()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.setblocking(False)
sock.bind(('localhost', 8888))
sock.listen(100)
print("Server listening on localhost:8888")
while True:
conn, addr = await loop.sock_accept(sock)
print(f"Connected: {addr}")
asyncio.create_task(handle_client(loop, conn))
asyncio.run(server())
UDP Echo Server
---------------
UDP is connectionless, so the API is different from TCP. Use
``create_datagram_endpoint()`` with a protocol class to handle UDP packets.
Each packet is independent and may arrive out of order or not at all.
.. code-block:: python
import asyncio
class EchoUDPProtocol(asyncio.DatagramProtocol):
def connection_made(self, transport):
self.transport = transport
def datagram_received(self, data, addr):
message = data.decode()
print(f"Received {message!r} from {addr}")
self.transport.sendto(data, addr)
async def main():
loop = asyncio.get_event_loop()
transport, protocol = await loop.create_datagram_endpoint(
EchoUDPProtocol,
local_addr=('localhost', 9999)
)
print("UDP server listening on localhost:9999")
try:
await asyncio.sleep(3600) # Run for 1 hour
finally:
transport.close()
asyncio.run(main())
HTTP Client with SSL
--------------------
Making HTTPS requests requires SSL context configuration. This example shows
how to fetch web pages using low-level streams with proper SSL verification.
.. code-block:: python
import asyncio
import ssl
async def fetch_https(host, path="/"):
# Create SSL context with certificate verification
ctx = ssl.create_default_context()
reader, writer = await asyncio.open_connection(
host, 443, ssl=ctx
)
# Send HTTP request
request = f"GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n"
writer.write(request.encode())
await writer.drain()
# Read response
response = await reader.read()
writer.close()
await writer.wait_closed()
return response.decode()
async def main():
urls = [
("www.python.org", "/"),
("github.com", "/"),
]
tasks = [fetch_https(host, path) for host, path in urls]
responses = await asyncio.gather(*tasks)
for (host, _), resp in zip(urls, responses):
status = resp.split('\r\n')[0]
print(f"{host}: {status}")
asyncio.run(main())
HTTPS Server with SSL
---------------------
Creating an HTTPS server requires SSL certificates. This example shows a
simple HTTPS server that serves static content with TLS encryption.
.. code-block:: python
import asyncio
import ssl
async def handle_request(reader, writer):
request = await reader.read(1024)
response = b"HTTP/1.1 200 OK\r\n"
response += b"Content-Type: text/html\r\n\r\n"
response += b"
Hello HTTPS!
"
writer.write(response)
await writer.drain()
writer.close()
await writer.wait_closed()
async def main():
# Create SSL context
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain('cert.pem', 'key.pem')
server = await asyncio.start_server(
handle_request, 'localhost', 8443, ssl=ctx
)
print("HTTPS server on https://localhost:8443")
async with server:
await server.serve_forever()
# Generate self-signed cert:
# openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
asyncio.run(main())
Transport and Protocol API
--------------------------
The Transport/Protocol API provides low-level control over network connections.
Transports handle the actual I/O while Protocols handle the data processing.
This separation allows for flexible and reusable network code.
.. code-block:: python
import asyncio
class EchoProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
peername = transport.get_extra_info('peername')
print(f"Connection from {peername}")
def data_received(self, data):
print(f"Received: {data.decode()!r}")
self.transport.write(data)
def connection_lost(self, exc):
print("Connection closed")
async def main():
loop = asyncio.get_event_loop()
server = await loop.create_server(
EchoProtocol, 'localhost', 8888
)
async with server:
await server.serve_forever()
asyncio.run(main())
DNS Resolution
--------------
Asyncio provides async DNS resolution through ``getaddrinfo()``. This is
useful when you need to resolve hostnames without blocking the event loop.
.. code-block:: python
import asyncio
import socket
async def resolve_host(host, port=80):
loop = asyncio.get_event_loop()
infos = await loop.getaddrinfo(
host, port,
family=socket.AF_UNSPEC,
type=socket.SOCK_STREAM
)
for family, type_, proto, canonname, sockaddr in infos:
ip, port = sockaddr[:2]
family_name = "IPv4" if family == socket.AF_INET else "IPv6"
print(f"{host} -> {ip} ({family_name})")
async def main():
hosts = ["python.org", "github.com", "google.com"]
await asyncio.gather(*[resolve_host(h) for h in hosts])
asyncio.run(main())
Simple HTTP Server
------------------
A minimal HTTP server implementation showing how to parse requests and
send responses. For production use, consider frameworks like aiohttp or
FastAPI.
.. code-block:: python
import asyncio
async def handle_http(reader, writer):
request = await reader.read(1024)
request_line = request.decode().split('\r\n')[0]
method, path, _ = request_line.split(' ')
print(f"{method} {path}")
# Simple routing
if path == '/':
body = b"
Home
"
status = "200 OK"
elif path == '/about':
body = b"
About
"
status = "200 OK"
else:
body = b"
404 Not Found
"
status = "404 Not Found"
response = f"HTTP/1.1 {status}\r\n"
response += f"Content-Length: {len(body)}\r\n"
response += "Content-Type: text/html\r\n\r\n"
writer.write(response.encode() + body)
await writer.drain()
writer.close()
await writer.wait_closed()
async def main():
server = await asyncio.start_server(
handle_http, 'localhost', 8080
)
print("HTTP server on http://localhost:8080")
async with server:
await server.serve_forever()
asyncio.run(main())
Using sendfile for Efficient File Transfer
------------------------------------------
The ``sendfile()`` method (Python 3.7+) efficiently transfers file contents
to a transport using the OS's sendfile syscall, avoiding copying data through
Python.
.. code-block:: python
import asyncio
async def handle_request(reader, writer):
await reader.read(1024) # Read request
with open('index.html', 'rb') as f:
# Get file size
f.seek(0, 2)
size = f.tell()
f.seek(0)
# Send headers
headers = f"HTTP/1.1 200 OK\r\n"
headers += f"Content-Length: {size}\r\n"
headers += "Content-Type: text/html\r\n\r\n"
writer.write(headers.encode())
# Send file efficiently
loop = asyncio.get_event_loop()
await loop.sendfile(writer.transport, f)
writer.close()
await writer.wait_closed()
async def main():
server = await asyncio.start_server(
handle_request, 'localhost', 8080
)
async with server:
await server.serve_forever()
asyncio.run(main())
Connection Pool
---------------
Connection pools reuse connections to avoid the overhead of establishing
new connections for each request. This is essential for high-performance
clients that make many requests to the same server.
.. code-block:: python
import asyncio
from collections import deque
class ConnectionPool:
def __init__(self, host, port, size=5):
self.host = host
self.port = port
self.size = size
self._pool = deque()
self._lock = asyncio.Lock()
async def get(self):
async with self._lock:
if self._pool:
return self._pool.popleft()
# Create new connection
reader, writer = await asyncio.open_connection(
self.host, self.port
)
return reader, writer
async def put(self, reader, writer):
async with self._lock:
if len(self._pool) < self.size:
self._pool.append((reader, writer))
else:
writer.close()
await writer.wait_closed()
async def close(self):
async with self._lock:
while self._pool:
reader, writer = self._pool.popleft()
writer.close()
await writer.wait_closed()
async def fetch(pool, message):
reader, writer = await pool.get()
try:
writer.write(message.encode())
await writer.drain()
data = await reader.read(1024)
return data.decode()
finally:
await pool.put(reader, writer)
async def main():
pool = ConnectionPool('localhost', 8888, size=3)
try:
tasks = [fetch(pool, f"msg{i}") for i in range(10)]
results = await asyncio.gather(*tasks)
for r in results:
print(r)
finally:
await pool.close()
asyncio.run(main())
================================================
FILE: docs/notes/basic/index.rst
================================================
.. meta::
:description lang=en: Python basics cheat sheet covering syntax, data types, functions, classes, generators, typing, and essential Python programming concepts
:keywords: Python, Python3, basics, syntax, data types, functions, classes, generators, typing, list, dict, set, comprehension
Quick Start
===========
This cheat sheet is designed to help developers learn Python syntax from the
ground up. It covers the fundamentals while also introducing common patterns
and idioms that experienced Python developers use, which may feel unfamiliar to
beginners. For instance, constructs like ``for ... else ...`` are rarely seen in
other programming languages. Additionally, we’ll explore interesting topics such
as ``__future__``, typing, and Unicode—concepts you may have heard of but never
fully understood. By working through this cheat sheet, you’ll gain a solid
foundation in Python and learn to write code that feels truly Pythonic.
.. toctree::
:maxdepth: 1
python-basic
python-future
python-func
python-object
python-typing
python-list
python-set
python-dict
python-heap
python-generator
python-unicode
python-rexp
================================================
FILE: docs/notes/basic/python-basic.rst
================================================
.. meta::
:description lang=en: Python basics tutorial covering fundamental syntax, data types, control flow, string formatting, and essential Python programming concepts
:keywords: Python, Python3, basics, syntax, data types, control flow, string formatting, variables, operators, conditionals
============
From Scratch
============
.. contents:: Table of Contents
:backlinks: none
The main goal of this cheat sheet is to collect some common and basic semantics
or snippets. The cheat sheet includes some syntax, which we have already known
but still ambiguous in our mind, or some snippets, which we google them again
and again. In addition, because **the end Of life date for Python 2** is coming.
Most of the snippets are mainly based on **Python 3**'s syntax.
Hello world!
------------
When we start to learn a new language, we usually learn from printing
**Hello world!**. In Python, we can use another way to print the message by
importing ``__hello__`` module. The source code can be found on
`frozen.c `_.
.. code-block:: python
>>> print("Hello world!")
Hello world!
>>> import __hello__
Hello world!
>>> import __phello__
Hello world!
>>> import __phello__.spam
Hello world!
Python Version
--------------
It is important for a programmer to know current Python version because
not every syntax will work in the current version. In this case, we can get the
Python version by ``python -V`` or using the module, ``sys``.
.. code-block:: python
>>> import sys
>>> print(sys.version)
3.7.1 (default, Nov 6 2018, 18:46:03)
[Clang 10.0.0 (clang-1000.11.45.5)]
We can also use ``platform.python_version`` to get Python version.
.. code-block:: python
>>> import platform
>>> platform.python_version()
'3.7.1'
Sometimes, checking the current Python version is important because we may want
to enable some features in some specific versions. ``sys.version_info`` provides more
detail information about the interpreter. We can use it to compare with the
version we want.
.. code-block:: python
>>> import sys
>>> sys.version_info >= (3, 6)
True
>>> sys.version_info >= (3, 7)
False
Ellipsis
--------
`Ellipsis `_ is a
built-in constant. After Python 3.0, we case use ``...`` as ``Ellipsis``. It
may be the most enigmatic constant in Python. Based on the official document,
we can use it to extend slicing syntax. Nevertheless, there are some other
conventions in type hinting, stub files, or function expressions.
.. code-block:: python
>>> ...
Ellipsis
>>> ... == Ellipsis
True
>>> type(...)
The following snippet shows that we can use the ellipsis to represent a function
or a class which has not implemented yet.
.. code-block:: python
>>> class Foo: ...
...
>>> def foo(): ...
...
if ... elif ... else
--------------------
The **if statements** are used to control the code flow. Instead of using
``switch`` or ``case`` statements control the logic of the code, Python uses
``if ... elif ... else`` sequence. Although someone proposes we can use
``dict`` to achieve ``switch`` statements, this solution may introduce
unnecessary overhead such as creating disposable dictionaries and undermine
a readable code. Thus, the solution is not recommended.
.. code-block:: python
>>> import random
>>> num = random.randint(0, 10)
>>> if num < 3:
... print("less than 3")
... elif num < 5:
... print("less than 5")
... else:
... print(num)
...
less than 3
for Loop
--------
In Python, we can access iterable object's items directly through the
**for statement**. If we need to get indexes and items of an iterable object
such as list or tuple at the same time, using ``enumerate`` is better than
``range(len(iterable))``. Further information can be found on
`Looping Techniques `_.
.. code-block:: python
>>> for val in ["foo", "bar"]:
... print(val)
...
foo
bar
>>> for idx, val in enumerate(["foo", "bar", "baz"]):
... print(idx, val)
...
(0, 'foo')
(1, 'bar')
(2, 'baz')
for ... else ...
----------------
It may be a little weird when we see the ``else`` belongs to a ``for`` loop at
the first time. The ``else`` clause can assist us to avoid using flag
variables in loops. A loop’s ``else`` clause runs when no break occurs.
.. code-block:: python
>>> for _ in range(5):
... pass
... else:
... print("no break")
...
no break
The following snippet shows the difference between using a flag variable and
the ``else`` clause to control the loop. We can see that the ``else`` does not
run when the ``break`` occurs in the loop.
.. code-block:: python
>>> is_break = False
>>> for x in range(5):
... if x % 2 == 0:
... is_break = True
... break
...
>>> if is_break:
... print("break")
...
break
>>> for x in range(5):
... if x % 2 == 0:
... print("break")
... break
... else:
... print("no break")
...
break
Using ``range``
---------------
The problem of ``range`` in Python 2 is that ``range`` may take up a lot of
memory if we need to iterate a loop many times. Consequently, using ``xrange``
is recommended in Python 2.
.. code-block:: python
>>> import platform
>>> import sys
>>> platform.python_version()
'2.7.15'
>>> sys.getsizeof(range(100000000))
800000072
>>> sys.getsizeof(xrange(100000000))
40
In Python 3, the built-in function ``range`` returns an iterable **range object**
instead of a list. The behavior of ``range`` is the same as the ``xrange`` in
Python 2. Therefore, using ``range`` do not take up huge memory anymore if we
want to run a code block many times within a loop. Further information can be
found on PEP `3100 `_.
.. code-block:: python
>>> import platform
>>> import sys
>>> platform.python_version()
'3.7.1'
>>> sys.getsizeof(range(100000000))
48
while ... else ...
------------------
The ``else`` clause belongs to a while loop serves the same purpose as the
``else`` clause in a for loop. We can observe that the ``else`` does not run
when the ``break`` occurs in the while loop.
.. code-block:: python
>>> n = 0
>>> while n < 5:
... if n == 3:
... break
... n += 1
... else:
... print("no break")
...
The ``do while`` Statement
--------------------------
There are many programming languages such as C/C++, Ruby, or Javascript,
provide the ``do while`` statement. In Python, there is no ``do while``
statement. However, we can place the condition and the ``break`` at the end of
a ``while`` loop to achieve the same thing.
.. code-block:: python
>>> n = 0
>>> while True:
... n += 1
... if n == 5:
... break
...
>>> n
5
try ... except ... else ...
---------------------------
Most of the time, we handle errors in ``except`` clause and clean up resources
in ``finally`` clause. Interestingly, the ``try`` statement also provides an
``else`` clause for us to avoid catching an exception which was raised by the
code that should not be protected by ``try ... except``. The ``else`` clause
runs when no exception occurs between ``try`` and ``except``.
.. code-block:: python
>>> try:
... print("No exception")
... except:
... pass
... else:
... print("Success")
...
No exception
Success
String
------
Unlike other programming languages, Python does not support string’s item
assignment directly. Therefore, if it is necessary to manipulate string’s
items, e.g., swap items, we have to convert a string to a list and do a join
operation after a series item assignments finish.
.. code-block:: python
>>> a = "Hello Python"
>>> l = list(a)
>>> l[0], l[6] = 'h', 'p'
>>> ''.join(l)
'hello python'
List
----
Lists are versatile containers. Python provides a lot of ways such as
**negative index**, **slicing statement**, or **list comprehension** to
manipulate lists. The following snippet shows some common operations of lists.
.. code-block:: python
>>> a = [1, 2, 3, 4, 5]
>>> a[-1] # negative index
5
>>> a[1:] # slicing
[2, 3, 4, 5]
>>> a[1:-1]
[2, 3, 4]
>>> a[1:-1:2]
[2, 4]
>>> a[::-1] # reverse
[5, 4, 3, 2, 1]
>>> a[0] = 0 # set an item
>>> a
[0, 2, 3, 4, 5]
>>> a.append(6) # append an item
>>> a
[0, 2, 3, 4, 5, 6]
>>> del a[-1] # del an item
>>> a
[0, 2, 3, 4, 5]
>>> b = [x for x in range(3)] # list comprehension
>>> b
[0, 1, 2]
>>> a + b # add two lists
[0, 2, 3, 4, 5, 0, 1, 2]
Dict
----
Dictionaries are key-value pairs containers. Like lists, Python supports many
ways such as **dict comprehensions** to manipulate dictionaries. After
Python 3.6, dictionaries preserve the insertion order of keys. The Following
snippet shows some common operations of dictionaries.
.. code-block:: python
>>> d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
>>> d
{'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
>>> d['timmy'] = "yellow" # set data
>>> d
{'timmy': 'yellow', 'barry': 'green', 'guido': 'blue'}
>>> del d['guido'] # del data
>>> d
>>> 'guido' in d # contain data
False
{'timmy': 'yellow', 'barry': 'green'}
>>> {k: v for k ,v in d.items()} # dict comprehension
{'timmy': 'yellow', 'barry': 'green'}
>>> d.keys() # list all keys
dict_keys(['timmy', 'barry'])
>>> d.values() # list all values
dict_values(['yellow', 'green'])
Function
--------
Defining a function in Python is flexible. We can define a function with
**function documents**, **default values**, **arbitrary arguments**,
**keyword arguments**, **keyword-only arguments**, and so on. The Following
snippet shows some common expressions to define functions.
.. code-block:: python
def foo_with_doc():
"""Documentation String."""
def foo_with_arg(arg): ...
def foo_with_args(*arg): ...
def foo_with_kwarg(a, b="foo"): ...
def foo_with_args_kwargs(*args, **kwargs): ...
def foo_with_kwonly(a, b, *, k): ... # python3
def foo_with_annotations(a: int) -> int: ... # python3
Function Annotations
--------------------
Instead of writing string documents in functions to hint the type of parameters
and return values, we can denote types by **function annotations**. Function annotations
which the details can be found on PEP `3017 `_
and PEP `484 `_ were introduced in
Python 3.0. They are an **optional** feature in **Python 3**. Using function
annotations will lose compatibility in **Python 2**. We can solve this issue
by stub files. In addition, we can do static type checking through
`mypy `_.
.. code-block:: python
>>> def fib(n: int) -> int:
... a, b = 0, 1
... for _ in range(n):
... b, a = a + b, b
... return a
...
>>> fib(10)
55
Generators
----------
Python uses the ``yield`` statement to define a **generator function**. In
other words, when we call a generator function, the generator function will
return a **generator** instead of return values for creating an **iterator**.
.. code-block:: python
>>> def fib(n):
... a, b = 0, 1
... for _ in range(n):
... yield a
... b, a = a + b, b
...
>>> g = fib(10)
>>> g
>>> for f in fib(5):
... print(f)
...
0
1
1
2
3
Generator Delegation
--------------------
Python 3.3 introduced ``yield from`` expression. It allows a generator to
delegate parts of operations to another generator. In other words, we can
**yield** a sequence **from** other **generators** in the current **generator function**.
Further information can be found on PEP `380 `_.
.. code-block:: python
>>> def fib(n):
... a, b = 0, 1
... for _ in range(n):
... yield a
... b, a = a + b, b
...
>>> def fibonacci(n):
... yield from fib(n)
...
>>> [f for f in fibonacci(5)]
[0, 1, 1, 2, 3]
Class
-----
Python supports many common features such as **class documents**, **multiple inheritance**,
**class variables**, **instance variables**, **static method**, **class method**, and so on.
Furthermore, Python provides some special methods for programmers to implement
**iterators**, **context manager**, etc. The following snippet displays common definition
of a class.
.. code-block:: python
class A: ...
class B: ...
class Foo(A, B):
"""A class document."""
foo = "class variable"
def __init__(self, v):
self.attr = v
self.__private = "private var"
@staticmethod
def bar_static_method(): ...
@classmethod
def bar_class_method(cls): ...
def bar(self):
"""A method document."""
def bar_with_arg(self, arg): ...
def bar_with_args(self, *args): ...
def bar_with_kwarg(self, kwarg="bar"): ...
def bar_with_args_kwargs(self, *args, **kwargs): ...
def bar_with_kwonly(self, *, k): ...
def bar_with_annotations(self, a: int): ...
``async`` / ``await``
---------------------
``async`` and ``await`` syntax was introduced from Python 3.5. They were
designed to be used with an event loop. Some other features such as the
**asynchronous generator** were implemented in later versions.
A **coroutine function**
(``async def``) are used to create a **coroutine** for an event loop. Python
provides a built-in module, **asyncio**, to write a concurrent code through
``async``/``await`` syntax. The following snippet shows a simple example of
using **asyncio**. The code must be run on Python 3.7 or above.
.. code-block:: python
import asyncio
async def http_ok(r, w):
head = b"HTTP/1.1 200 OK\r\n"
head += b"Content-Type: text/html\r\n"
head += b"\r\n"
body = b""
body += b"
Hello world!
"
body += b""
_ = await r.read(1024)
w.write(head + body)
await w.drain()
w.close()
async def main():
server = await asyncio.start_server(
http_ok, "127.0.0.1", 8888
)
async with server:
await server.serve_forever()
asyncio.run(main())
Avoid ``exec`` and ``eval``
---------------------------
The following snippet shows how to use the built-in function ``exec``. Yet,
using ``exec`` and ``eval`` are not recommended because of some security issues
and unreadable code for a human. Further reading can be found on
`Be careful with exec and eval in Python `_
and `Eval really is dangerous `_
.. code-block:: python
>>> py = '''
... def fib(n):
... a, b = 0, 1
... for _ in range(n):
... b, a = b + a, b
... return a
... print(fib(10))
... '''
>>> exec(py, globals(), locals())
55
================================================
FILE: docs/notes/basic/python-dict.rst
================================================
.. meta::
:description lang=en: Python dictionary cheat sheet covering creation, manipulation, merging, comprehensions, defaultdict, OrderedDict, and LRU cache with code examples
:keywords: Python, Python3, Python dictionary, Python dict cheat sheet, dict, hashmap, key-value pairs, defaultdict, OrderedDict, dictionary comprehension, LRU cache, dict methods
==========
Dictionary
==========
.. contents:: Table of Contents
:backlinks: none
Dictionaries are one of Python's most powerful and frequently used data structures.
They store key-value pairs and provide O(1) average time complexity for lookups,
insertions, and deletions. Since Python 3.7, dictionaries maintain insertion order
as a language feature. This cheat sheet covers essential dictionary operations,
from basic manipulation to advanced patterns like emulating dictionary behavior
with special methods and implementing an LRU (Least Recently Used) cache.
The source code is available on `GitHub `_.
References
----------
- `Mapping Types — dict `_
- `collections — Container datatypes `_
- `PEP 584 -- Add Union Operators To dict `_
Get All Keys with ``dict.keys()``
---------------------------------
The ``keys()`` method returns a view object containing all dictionary keys.
In Python 3, this is a dynamic view that reflects changes to the dictionary.
.. code-block:: python
>>> a = {"1":1, "2":2, "3":3}
>>> b = {"2":2, "3":3, "4":4}
>>> a.keys()
['1', '3', '2']
Get Key-Value Pairs with ``dict.items()``
-----------------------------------------
The ``items()`` method returns key-value pairs as tuples, which is useful for
iterating over both keys and values simultaneously.
.. code-block:: python
>>> a = {"1":1, "2":2, "3":3}
>>> a.items()
Find Common Keys Between Dictionaries
-------------------------------------
Finding keys that exist in multiple dictionaries is a common operation. Using
set intersection is the most efficient approach.
.. code-block:: python
>>> a = {"1":1, "2":2, "3":3}
>>> b = {"2":2, "3":3, "4":4}
>>> [_ for _ in a.keys() if _ in b.keys()]
['3', '2']
>>> # better way
>>> c = set(a).intersection(set(b))
>>> list(c)
['3', '2']
>>> # or
>>> [_ for _ in a if _ in b]
['3', '2']
[('1', 1), ('3', 3), ('2', 2)]
Set Default Values with ``setdefault()`` and ``defaultdict``
------------------------------------------------------------
When working with dictionaries, you often need to set default values for missing
keys. Python provides ``setdefault()`` and ``collections.defaultdict`` for this.
.. code-block:: python
>>> # intuitive but not recommend
>>> d = {}
>>> key = "foo"
>>> if key not in d:
... d[key] = []
...
# using d.setdefault(key[, default])
>>> d = {}
>>> key = "foo"
>>> d.setdefault(key, [])
[]
>>> d[key] = 'bar'
>>> d
{'foo': 'bar'}
# using collections.defaultdict
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d["key"]
[]
>>> d["foo"]
[]
>>> d["foo"].append("bar")
>>> d
defaultdict(, {'key': [], 'foo': ['bar']})
``dict.setdefault(key[, default])`` returns its default value if *key* is not in
the dictionary. However, if the key exists in the dictionary, the function will
return its value.
.. code-block:: python
>>> d = {}
>>> d.setdefault("key", [])
[]
>>> d["key"] = "bar"
>>> d.setdefault("key", [])
'bar'
Update Dictionary with ``dict.update()``
----------------------------------------
The ``update()`` method merges another dictionary into the current one. Keys from
the second dictionary overwrite existing keys in the first.
.. code-block:: python
>>> a = {"1":1, "2":2, "3":3}
>>> b = {"2":2, "3":3, "4":4}
>>> a.update(b)
>>> a
{'1': 1, '3': 3, '2': 2, '4': 4}
Merge Two Dictionaries in Python
--------------------------------
There are several ways to merge dictionaries depending on your Python version.
Python 3.9+ also supports the ``|`` operator for dictionary merging.
Python 3.4 or lower
.. code-block:: python
>>> a = {"x": 55, "y": 66}
>>> b = {"a": "foo", "b": "bar"}
>>> c = a.copy()
>>> c.update(b)
>>> c
{'y': 66, 'x': 55, 'b': 'bar', 'a': 'foo'}
Python 3.5 or above
.. code-block:: python
>>> a = {"x": 55, "y": 66}
>>> b = {"a": "foo", "b": "bar"}
>>> c = {**a, **b}
>>> c
{'x': 55, 'y': 66, 'a': 'foo', 'b': 'bar'}
Emulate a Dictionary with Special Methods
-----------------------------------------
You can create dictionary-like objects by implementing special methods:
``__getitem__``, ``__setitem__``, ``__delitem__``, ``__contains__``, and ``__iter__``.
.. code-block:: python
>>> class EmuDict(object):
... def __init__(self, dict_):
... self._dict = dict_
... def __repr__(self):
... return "EmuDict: " + repr(self._dict)
... def __getitem__(self, key):
... return self._dict[key]
... def __setitem__(self, key, val):
... self._dict[key] = val
... def __delitem__(self, key):
... del self._dict[key]
... def __contains__(self, key):
... return key in self._dict
... def __iter__(self):
... return iter(self._dict.keys())
...
>>> _ = {"1":1, "2":2, "3":3}
>>> emud = EmuDict(_)
>>> emud # __repr__
EmuDict: {'1': 1, '2': 2, '3': 3}
>>> emud['1'] # __getitem__
1
>>> emud['5'] = 5 # __setitem__
>>> emud
EmuDict: {'1': 1, '2': 2, '3': 3, '5': 5}
>>> del emud['2'] # __delitem__
>>> emud
EmuDict: {'1': 1, '3': 3, '5': 5}
>>> for _ in emud:
... print(emud[_], end=' ') # __iter__
... else:
... print()
...
1 3 5
>>> '1' in emud # __contains__
True
Implement LRU Cache with OrderedDict
------------------------------------
An LRU (Least Recently Used) cache evicts the least recently accessed items when
full. ``OrderedDict.move_to_end()`` makes implementation straightforward.
.. code-block:: python
from collections import OrderedDict
class LRU(object):
def __init__(self, maxsize=128):
self._maxsize = maxsize
self._cache = OrderedDict()
def get(self, k):
if k not in self._cache:
return None
self._cache.move_to_end(k)
return self._cache[k]
def put(self, k, v):
if k in self._cache:
self._cache.move_to_end(k)
self._cache[k] = v
if len(self._cache) > self._maxsize:
self._cache.popitem(last=False)
def __str__(self):
return str(self._cache)
def __repr__(self):
return self.__str__()
Note that dictionaries preserve insertion order from Python 3.7. Moreover,
updating a key does not affect the order. Therefore, a dictionary can also
simulate an LRU cache, which is similar to using an OrderedDict.
.. code-block:: python
class LRU(object):
def __init__(self, maxsize=128):
self._maxsize = maxsize
self._cache = {}
def get(self, k):
if k not in self._cache:
return None
self.move_to_end(k)
return self._cache[k]
def put(self, k, v):
if k in self._cache:
self.move_to_end(k)
self._cache[k] = v
if len(self._cache) > self._maxsize:
self.pop()
def pop(self):
it = iter(self._cache.keys())
del self._cache[next(it)]
def move_to_end(self, k):
if k not in self._cache:
return
v = self._cache[k]
del self._cache[k]
self._cache[k] = v
def __str__(self):
return str(self._cache)
def __repr__(self):
return self.__str__()
================================================
FILE: docs/notes/basic/python-func.rst
================================================
.. meta::
:description lang=en: Python function cheat sheet covering function definitions, arguments, decorators, lambda, closures, and functools with code examples
:keywords: Python, Python3, Python function, Python function cheat sheet, decorator, lambda, closure, *args, **kwargs, functools, lru_cache, partial
========
Function
========
.. contents:: Table of Contents
:backlinks: none
A function can help programmers to wrap their logic into a task for avoiding
duplicate code. In Python, the definition of a function is so versatile that
we can use many features such as decorator, annotation, docstrings, default
arguments and so on to define a function. In this cheat sheet, it collects
many ways to define a function and demystifies some enigmatic syntax in functions.
Document Functions
------------------
Documentation provides programmers hints about how a function is supposed to
be used. A docstring gives an expedient way to write a readable document of
functions. The docstring should be placed as the first statement in the function
body, enclosed in triple quotes. It can be accessed via the ``__doc__`` attribute
or the built-in ``help()`` function. PEP `257 `_
defines conventions for docstrings, and tools like ``pydocstyle`` can help
enforce these conventions in your codebase.
.. code-block:: python
>>> def example():
... """This is an example function."""
... print("Example function")
...
>>> example.__doc__
'This is an example function.'
>>> help(example)
Default Arguments
-----------------
Defining a function where the arguments are optional and have a default value
is quite simple in Python. We can just assign values in the definition and make
sure the default arguments appear in the end. When calling the function, you can
omit arguments that have defaults, pass them positionally, or use keyword syntax
to specify them explicitly. This flexibility makes functions more versatile and
easier to use in different contexts.
.. code-block:: python
>>> def add(a, b=0):
... return a + b
...
>>> add(1)
1
>>> add(1, 2)
3
>>> add(1, b=2)
3
.. warning::
Avoid using mutable objects (like lists or dictionaries) as default arguments.
Default argument values are evaluated only once when the function is defined,
not each time the function is called. This means mutable defaults are shared
across all calls, which can lead to unexpected behavior where modifications
persist between function calls.
.. code-block:: python
>>> def bad(items=[]): # DON'T do this
... items.append(1)
... return items
...
>>> bad()
[1]
>>> bad() # unexpected!
[1, 1]
>>> def good(items=None): # DO this instead
... if items is None:
... items = []
... items.append(1)
... return items
Variable Arguments ``*args`` and ``**kwargs``
---------------------------------------------
Python provides a flexible way to handle functions that need to accept a variable
number of arguments. Use ``*args`` to collect any number of positional arguments
into a tuple, and ``**kwargs`` to collect any number of keyword arguments into a
dictionary. These are commonly used when writing wrapper functions, decorators,
or functions that need to pass arguments through to other functions. The names
``args`` and ``kwargs`` are conventions; you can use any valid identifier after
the ``*`` or ``**``.
.. code-block:: python
>>> def example(a, b=None, *args, **kwargs):
... print(a, b)
... print(args)
... print(kwargs)
...
>>> example(1, "var", 2, 3, word="hello")
1 var
(2, 3)
{'word': 'hello'}
Unpack Arguments
----------------
When calling a function, you can use ``*`` to unpack a sequence (like a list or
tuple) into separate positional arguments, and ``**`` to unpack a dictionary into
keyword arguments. This is the inverse of ``*args`` and ``**kwargs`` in function
definitions. Unpacking is particularly useful when you have data in a collection
that you want to pass to a function that expects separate arguments.
.. code-block:: python
>>> def foo(a, b, c='BAZ'):
... print(a, b, c)
...
>>> foo(*("FOO", "BAR"), **{"c": "baz"})
FOO BAR baz
>>> args = [1, 2, 3]
>>> print(*args)
1 2 3
Keyword-Only Arguments
----------------------
Arguments that appear after ``*`` or ``*args`` in a function definition are
keyword-only, meaning they must be passed by name and cannot be passed positionally.
This feature, introduced in Python 3.0, helps prevent errors when functions have
many parameters, as it forces callers to be explicit about which argument they're
providing. Keyword-only arguments can have default values, making them optional.
**New in Python 3.0**
.. code-block:: python
>>> def f(a, b, *, kw):
... print(a, b, kw)
...
>>> f(1, 2, kw=3)
1 2 3
>>> f(1, 2, 3)
Traceback (most recent call last):
TypeError: f() takes 2 positional arguments but 3 were given
>>> # keyword-only with default
>>> def g(a, *, kw=10):
... return a + kw
...
>>> g(5)
15
Positional-Only Arguments
-------------------------
Arguments that appear before ``/`` in a function definition are positional-only,
meaning they cannot be passed by keyword name. This feature, introduced in Python
3.8, is useful when parameter names are not meaningful to callers or when you want
to reserve the flexibility to change parameter names without breaking existing code.
Many built-in functions like ``len()`` and ``pow()`` use positional-only parameters.
You can combine positional-only (``/``) and keyword-only (``*``) in the same function.
**New in Python 3.8**
.. code-block:: python
>>> def f(a, b, /, c):
... print(a, b, c)
...
>>> f(1, 2, 3)
1 2 3
>>> f(1, 2, c=3)
1 2 3
>>> f(a=1, b=2, c=3)
Traceback (most recent call last):
TypeError: f() got some positional-only arguments passed as keyword arguments
>>> # combining positional-only and keyword-only
>>> def g(a, /, b, *, c):
... return a + b + c
...
>>> g(1, 2, c=3)
6
Annotations
-----------
Function annotations provide a way to attach metadata to function parameters and
return values. While Python doesn't enforce these annotations at runtime, they
serve as documentation and are used by static type checkers like ``mypy`` to catch
type errors before code runs. Annotations are stored in the function's ``__annotations__``
attribute as a dictionary. The ``typing`` module (Python 3.5+) provides additional
types like ``List``, ``Dict``, ``Optional``, and ``Union`` for more expressive type hints.
**New in Python 3.0**
.. code-block:: python
>>> def fib(n: int) -> int:
... a, b = 0, 1
... for _ in range(n):
... b, a = a + b, b
... return a
...
>>> fib(10)
55
>>> fib.__annotations__
{'n': , 'return': }
Lambda
------
Lambda expressions create small anonymous functions inline. They are syntactically
restricted to a single expression, which is implicitly returned. Lambdas are useful
for short, throwaway functions, especially as arguments to higher-order functions
like ``sorted()``, ``map()``, ``filter()``, and ``reduce()``. While lambdas can make
code more concise, complex logic should be written as regular named functions for
better readability and debugging.
.. code-block:: python
>>> square = lambda x: x ** 2
>>> square(5)
25
>>> # lambda with multiple arguments
>>> add = lambda a, b: a + b
>>> add(2, 3)
5
>>> # lambda with conditional
>>> max_val = lambda a, b: a if a > b else b
>>> max_val(3, 5)
5
>>> # common use: sorting key
>>> pairs = [(1, 'b'), (2, 'a'), (3, 'c')]
>>> sorted(pairs, key=lambda x: x[1])
[(2, 'a'), (1, 'b'), (3, 'c')]
Callable
--------
In Python, any object that implements the ``__call__`` method is callable, meaning
it can be invoked like a function using parentheses. This includes functions, methods,
lambdas, classes (calling a class creates an instance), and instances of classes that
define ``__call__``. The built-in ``callable()`` function returns ``True`` if an object
appears callable, which is useful for checking before attempting to call an object
to avoid ``TypeError`` exceptions.
.. code-block:: python
>>> callable(print)
True
>>> callable(42)
False
>>> class Adder:
... def __init__(self, n):
... self.n = n
... def __call__(self, x):
... return self.n + x
...
>>> add_five = Adder(5)
>>> callable(add_five)
True
>>> add_five(10)
15
Get Function Name
-----------------
Functions in Python are first-class objects with various attributes that provide
metadata about them. The ``__name__`` attribute contains the function's name as
defined, ``__doc__`` contains the docstring, ``__module__`` indicates which module
the function was defined in, and ``__annotations__`` holds type hints. These
attributes are useful for debugging, logging, and introspection.
.. code-block:: python
>>> def example_function():
... """Example docstring."""
... pass
...
>>> example_function.__name__
'example_function'
>>> example_function.__doc__
'Example docstring.'
>>> example_function.__module__
'__main__'
Closure
-------
A closure is a function that captures and remembers values from its enclosing
lexical scope even after that scope has finished executing. This happens when
a nested function references variables from its outer function. Closures are
powerful for creating function factories (functions that return customized
functions), implementing decorators, and maintaining state without using global
variables or classes. Use the ``nonlocal`` keyword to modify captured variables
from the enclosing scope.
.. code-block:: python
>>> def make_multiplier(n):
... def multiplier(x):
... return x * n
... return multiplier
...
>>> double = make_multiplier(2)
>>> triple = make_multiplier(3)
>>> double(5)
10
>>> triple(5)
15
>>> # closure with mutable state
>>> def make_counter():
... count = 0
... def counter():
... nonlocal count
... count += 1
... return count
... return counter
...
>>> counter = make_counter()
>>> counter()
1
>>> counter()
2
Generator
---------
Generator functions use the ``yield`` statement to produce a sequence of values
lazily, one at a time, instead of computing all values upfront and storing them
in memory. When called, a generator function returns a generator iterator that
can be iterated over with ``for`` loops or ``next()``. Generators are memory-efficient
for large sequences and can represent infinite sequences. Generator expressions
provide a concise syntax similar to list comprehensions but with lazy evaluation.
.. code-block:: python
>>> def fib(n):
... a, b = 0, 1
... for _ in range(n):
... yield a
... b, a = a + b, b
...
>>> list(fib(10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
>>> # generator expression
>>> squares = (x**2 for x in range(5))
>>> list(squares)
[0, 1, 4, 9, 16]
Decorator
---------
Decorators are a powerful pattern for modifying or extending the behavior of
functions without changing their source code. A decorator is a function that
takes a function as input and returns a new function (usually a wrapper) that
adds some functionality before or after calling the original. The ``@decorator``
syntax is syntactic sugar for ``func = decorator(func)``. Always use ``@wraps``
from ``functools`` in your wrapper function to preserve the original function's
metadata like ``__name__``, ``__doc__``, and ``__annotations__``.
**New in Python 2.4** - PEP `318 `_
.. code-block:: python
>>> from functools import wraps
>>> def log_calls(func):
... @wraps(func)
... def wrapper(*args, **kwargs):
... print(f"Calling {func.__name__}")
... return func(*args, **kwargs)
... return wrapper
...
>>> @log_calls
... def greet(name):
... return f"Hello, {name}!"
...
>>> greet("Alice")
Calling greet
'Hello, Alice!'
>>> # equivalent to:
>>> # greet = log_calls(greet)
.. note::
Always use ``@wraps(func)`` in decorators to preserve the original function's
``__name__``, ``__doc__``, and other attributes. Without it, the decorated
function will have the wrapper's attributes, which makes debugging harder.
Decorator with Arguments
------------------------
To create a decorator that accepts arguments, you need an extra layer of nesting.
The outermost function takes the decorator's arguments and returns the actual
decorator. The middle function takes the function being decorated and returns
the wrapper. The innermost function is the wrapper that executes when the decorated
function is called. This pattern is commonly used for decorators like ``@repeat(3)``
or ``@route('/path')``.
.. code-block:: python
>>> from functools import wraps
>>> def repeat(times):
... def decorator(func):
... @wraps(func)
... def wrapper(*args, **kwargs):
... for _ in range(times):
... result = func(*args, **kwargs)
... return result
... return wrapper
... return decorator
...
>>> @repeat(3)
... def say_hello():
... print("Hello!")
...
>>> say_hello()
Hello!
Hello!
Hello!
>>> # equivalent to:
>>> # say_hello = repeat(3)(say_hello)
Class Decorator
---------------
Decorators can also be implemented as classes instead of functions. A class-based
decorator implements ``__init__`` to receive the decorated function and ``__call__``
to act as the wrapper. This approach is useful when the decorator needs to maintain
state across multiple calls to the decorated function, such as counting calls,
caching results, or tracking timing information.
.. code-block:: python
>>> class CountCalls:
... def __init__(self, func):
... self.func = func
... self.count = 0
... def __call__(self, *args, **kwargs):
... self.count += 1
... return self.func(*args, **kwargs)
...
>>> @CountCalls
... def example():
... return "result"
...
>>> example()
'result'
>>> example()
'result'
>>> example.count
2
Cache with ``lru_cache``
------------------------
The ``lru_cache`` decorator from ``functools`` automatically caches function results
based on the arguments passed. When the function is called with the same arguments
again, the cached result is returned instead of recomputing it. This is especially
useful for expensive computations or recursive functions like Fibonacci. The ``maxsize``
parameter limits cache size (use ``None`` for unlimited). Use ``cache_info()`` to
see hit/miss statistics and ``cache_clear()`` to reset the cache.
**New in Python 3.2**
.. code-block:: python
>>> from functools import lru_cache
>>> @lru_cache(maxsize=None)
... def fib(n):
... if n < 2:
... return n
... return fib(n - 1) + fib(n - 2)
...
>>> fib(100)
354224848179261915075
>>> fib.cache_info()
CacheInfo(hits=98, misses=101, maxsize=None, currsize=101)
>>> fib.cache_clear() # clear the cache
**New in Python 3.9** - ``@cache`` is a simpler alias for ``@lru_cache(maxsize=None)``
.. code-block:: python
>>> from functools import cache
>>> @cache
... def factorial(n):
... return n * factorial(n-1) if n else 1
Partial Functions
-----------------
The ``functools.partial`` function creates a new callable with some arguments of
the original function pre-filled. This is useful for adapting functions to interfaces
that expect fewer arguments, creating specialized versions of general functions,
or preparing callback functions. The resulting partial object can be called with
the remaining arguments. You can pre-fill both positional and keyword arguments.
.. code-block:: python
>>> from functools import partial
>>> def power(base, exponent):
... return base ** exponent
...
>>> square = partial(power, exponent=2)
>>> cube = partial(power, exponent=3)
>>> square(5)
25
>>> cube(5)
125
>>> # useful for callbacks
>>> from functools import partial
>>> def greet(greeting, name):
... return f"{greeting}, {name}!"
...
>>> say_hello = partial(greet, "Hello")
>>> say_hello("Alice")
'Hello, Alice!'
``singledispatch`` - Function Overloading
-----------------------------------------
The ``singledispatch`` decorator from ``functools`` enables function overloading
based on the type of the first argument. You define a base function and then
register specialized implementations for different types using the ``@func.register``
decorator. When the function is called, Python automatically dispatches to the
appropriate implementation based on the argument's type. This is useful for writing
generic functions that behave differently for different types.
**New in Python 3.4**
.. code-block:: python
>>> from functools import singledispatch
>>> @singledispatch
... def process(arg):
... return f"Default: {arg}"
...
>>> @process.register(int)
... def _(arg):
... return f"Integer: {arg * 2}"
...
>>> @process.register(list)
... def _(arg):
... return f"List with {len(arg)} items"
...
>>> process("hello")
'Default: hello'
>>> process(5)
'Integer: 10'
>>> process([1, 2, 3])
'List with 3 items'
``reduce`` - Cumulative Operations
----------------------------------
The ``reduce`` function from ``functools`` applies a two-argument function
cumulatively to the items of a sequence, from left to right, reducing the sequence
to a single value. For example, ``reduce(f, [a, b, c, d])`` computes ``f(f(f(a, b), c), d)``.
An optional third argument provides an initial value. While ``reduce`` can be powerful,
list comprehensions or explicit loops are often more readable for simple cases.
.. code-block:: python
>>> from functools import reduce
>>> # sum of list
>>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
15
>>> # product of list
>>> reduce(lambda x, y: x * y, [1, 2, 3, 4, 5])
120
>>> # with initial value
>>> reduce(lambda x, y: x + y, [1, 2, 3], 10)
16
Higher-Order Functions
----------------------
Higher-order functions are functions that take other functions as arguments or
return functions as results. Python provides several built-in higher-order functions
that are commonly used for functional programming patterns. ``map()`` applies a
function to every item in an iterable, ``filter()`` keeps items where the function
returns ``True``, and ``sorted()``/``min()``/``max()`` accept a ``key`` function
to customize comparison. These functions return iterators (except ``sorted``),
so wrap them in ``list()`` if you need a list.
.. code-block:: python
>>> # map - apply function to each item
>>> list(map(lambda x: x**2, [1, 2, 3, 4]))
[1, 4, 9, 16]
>>> # filter - keep items where function returns True
>>> list(filter(lambda x: x > 2, [1, 2, 3, 4]))
[3, 4]
>>> # sorted with key function
>>> sorted(['banana', 'apple', 'cherry'], key=len)
['apple', 'banana', 'cherry']
>>> # min/max with key function
>>> max(['apple', 'banana', 'cherry'], key=len)
'banana'
================================================
FILE: docs/notes/basic/python-future.rst
================================================
.. meta::
:description lang=en: Python __future__ module guide covering future statements, backward compatibility, and feature backporting from newer Python versions
:keywords: Python, __future__, future statements, backward compatibility, print_function, annotations, division
======
Future
======
.. contents:: Table of Contents
:backlinks: none
`Future statements `_
tell the interpreter to compile some semantics as the semantics which will be
available in the future Python version. In other words, Python uses ``from __future__ import feature``
to backport features from other higher Python versions to the current interpreter.
In Python 3, many features such as ``print_function`` are already enabled, but
we still leave these future statements for backward compatibility.
Future statements are **NOT** import statements. Future statements change how
Python interprets the code. They **MUST** be at the top of the file. Otherwise,
Python interpreter will raise ``SyntaxError``.
If you're interested in future statements and want to acquire more explanation,
further information can be found on `PEP 236 - Back to the __future__ `_
List All New Features
---------------------
`__future__ `_ is a Python
module. We can use it to check what kind of future features can import to
current Python interpreter. The fun is ``import __future__`` is **NOT** a future
statement, it is a import statement.
.. code-block:: python
>>> from pprint import pprint
>>> import __future__
>>> pprint(__future__.all_feature_names)
['nested_scopes',
'generators',
'division',
'absolute_import',
'with_statement',
'print_function',
'unicode_literals',
'barry_as_FLUFL',
'generator_stop',
'annotations']
Future statements not only change the behavior of the Python interpreter but
also import ``__future__._Feature`` into the current program.
.. code-block:: python
>>> from __future__ import print_function
>>> print_function
_Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)
Print Function
--------------
Replacing **print statement** to **print function** is one of the most
notorious decision in Python history. However, this change brings some
flexibilities to extend the ability of ``print``. Further information can
be found on PEP `3105 `_.
.. code-block:: python
>>> print "Hello World" # print is a statement
Hello World
>>> from __future__ import print_function
>>> print "Hello World"
File "", line 1
print "Hello World"
^
SyntaxError: invalid syntax
>>> print("Hello World") # print become a function
Hello World
Unicode
-------
As **print function**, making text become Unicode is another infamous decision.
Nevertheless, many modern programming languages’ text is Unicode. This change
compels us to decode texts early in order to prevent runtime error after we
run programs for a while. Further information can be found on PEP
`3112 `_.
.. code-block:: python
>>> type("Guido") # string type is str in python2
>>> from __future__ import unicode_literals
>>> type("Guido") # string type become unicode
Division
--------
Sometimes, it is counterintuitive when the division result is int or long.
In this case, Python 3 enables the **true division** by default. However, in
Python 2, we have to backport ``division`` to the current interpreter. Further
information can be found on PEP `238 `_.
.. code-block:: python
>>> 1 / 2
0
>>> from __future__ import division
>>> 1 / 2 # return a float (classic division)
0.5
>>> 1 // 2 # return a int (floor division)
0
Annotations
-----------
Before Python 3.7, we cannot assign annotations in a class or a function if
it is not available in the current scope. A common situation is the definition
of a container class.
.. code-block:: python
class Tree(object):
def insert(self, tree: Tree): ...
Example
.. code-block:: bash
$ python3 foo.py
Traceback (most recent call last):
File "foo.py", line 1, in
class Tree(object):
File "foo.py", line 3, in Tree
def insert(self, tree: Tree): ...
NameError: name 'Tree' is not defined
In this case, the definition of the class is not available yet. Python interpreter
cannot parse the annotation during their definition time. To solve this issue,
Python uses string literals to replace the class.
.. code-block:: python
class Tree(object):
def insert(self, tree: 'Tree'): ...
After version 3.7, Python introduces the future statement, ``annotations``, to
perform postponed evaluation. It will become the default feature in Python 4.
For further information please refer to PEP `563 `_.
.. code-block:: python
from __future__ import annotations
class Tree(object):
def insert(self, tree: Tree): ...
BDFL Retirement
---------------
**New in Python 3.1**
PEP `401 `_ is just an Easter egg.
This feature brings the current interpreter back to the past. It enables the
diamond operator ``<>`` in Python 3.
.. code-block:: python
>>> 1 != 2
True
>>> from __future__ import barry_as_FLUFL
>>> 1 != 2
File "", line 1
1 != 2
^
SyntaxError: with Barry as BDFL, use '<>' instead of '!='
>>> 1 <> 2
True
Braces
------
``braces`` is an Easter egg. The source code can be found on
`future.c `_.
.. code-block:: python
>>> from __future__ import braces
File "", line 1
SyntaxError: not a chance
================================================
FILE: docs/notes/basic/python-generator.rst
================================================
.. meta::
:description lang=en: Python generator cheat sheet covering generator functions, generator expressions, yield, yield from, send, async generators, and coroutines with code examples
:keywords: Python, Python3, Python generator, Python generator cheat sheet, yield, yield from, generator expression, async generator, iterator, coroutine, contextmanager
=========
Generator
=========
.. contents:: Table of Contents
:backlinks: none
Generators are a powerful feature in Python for creating iterators. They allow
you to iterate over data without storing the entire sequence in memory, making
them ideal for processing large datasets or infinite sequences. This cheat sheet
covers generator functions, generator expressions, ``yield``, ``yield from``,
sending values to generators, and async generators.
Generator Function vs Generator Expression
------------------------------------------
A generator function is defined like a normal function but uses ``yield`` to
produce a sequence of values. When called, it returns a generator object that
can be iterated over. A generator expression is a compact syntax similar to
list comprehensions but produces values lazily on demand.
.. code-block:: python
# generator function
>>> def gen_func():
... yield 5566
...
>>> g = gen_func()
>>> g
>>> next(g)
5566
# generator expression
>>> g = (x for x in range(3))
>>> next(g)
0
>>> next(g)
1
Yield Values from Generator
---------------------------
The ``yield`` statement produces a value and suspends the generator's execution.
When ``next()`` is called again, execution resumes from where it left off. This
example generates prime numbers by checking divisibility for each candidate.
.. code-block:: python
>>> def prime(n):
... p = 2
... while n > 0:
... for x in range(2, p):
... if p % x == 0:
... break
... else:
... yield p
... n -= 1
... p += 1
...
>>> list(prime(5))
[2, 3, 5, 7, 11]
Unpack Generators
-----------------
Python 3.5+ (PEP 448) allows unpacking generators directly into lists, sets,
function arguments, and variables using the ``*`` operator. This provides a
convenient way to consume generator values without explicit iteration.
.. code-block:: python
# PEP 448 - unpacking inside a list
>>> g1 = (x for x in range(3))
>>> g2 = (x**2 for x in range(2))
>>> [1, *g1, 2, *g2]
[1, 0, 1, 2, 2, 0, 1]
# unpacking inside a set
>>> g = (x for x in [5, 5, 6, 6])
>>> {*g}
{5, 6}
# unpacking to variables
>>> g = (x for x in range(3))
>>> a, b, c = g
>>> a, b, c
(0, 1, 2)
# extended unpacking
>>> g = (x for x in range(6))
>>> a, b, *c, d = g
>>> a, b, d
(0, 1, 5)
>>> c
[2, 3, 4]
# unpacking inside a function
>>> print(*(x for x in range(3)))
0 1 2
Iterable Class via Generator
----------------------------
You can make a class iterable by implementing ``__iter__`` as a generator method.
This approach is cleaner than implementing a separate iterator class. The
``__reversed__`` method can also be implemented as a generator to support the
built-in ``reversed()`` function.
.. code-block:: python
>>> class Count:
... def __init__(self, n):
... self._n = n
... def __iter__(self):
... n = self._n
... while n > 0:
... yield n
... n -= 1
... def __reversed__(self):
... n = 1
... while n <= self._n:
... yield n
... n += 1
...
>>> list(Count(5))
[5, 4, 3, 2, 1]
>>> list(reversed(Count(5)))
[1, 2, 3, 4, 5]
Send Values to Generator
------------------------
Generators can receive values through the ``send()`` method. The sent value
becomes the result of the ``yield`` expression inside the generator. Before
sending values, you must start the generator by calling ``next()`` or
``send(None)`` to advance it to the first ``yield``.
.. code-block:: python
>>> def spam():
... msg = yield
... print("Message:", msg)
...
>>> g = spam()
>>> next(g) # start generator
>>> try:
... g.send("Hello World!")
... except StopIteration:
... pass
Message: Hello World!
yield from Expression
---------------------
The ``yield from`` expression delegates iteration to another generator or
iterable. It automatically handles forwarding ``send()``, ``throw()``, and
``close()`` calls to the subgenerator, making it ideal for creating generator
pipelines and recursive generators.
.. code-block:: python
>>> def subgen():
... try:
... yield 9527
... except ValueError:
... print("got ValueError")
...
>>> def delegating_gen():
... yield from subgen()
...
>>> g = delegating_gen()
>>> next(g)
9527
>>> try:
... g.throw(ValueError)
... except StopIteration:
... pass
got ValueError
You can chain multiple ``yield from`` expressions together. The
``inspect.getgeneratorstate()`` function helps track the generator's lifecycle
through its states: GEN_CREATED, GEN_RUNNING, GEN_SUSPENDED, and GEN_CLOSED.
.. code-block:: python
# yield from + yield from
>>> import inspect
>>> def subgen():
... yield from range(3)
...
>>> def delegating_gen():
... yield from subgen()
...
>>> g = delegating_gen()
>>> inspect.getgeneratorstate(g)
'GEN_CREATED'
>>> next(g)
0
>>> inspect.getgeneratorstate(g)
'GEN_SUSPENDED'
>>> g.close()
>>> inspect.getgeneratorstate(g)
'GEN_CLOSED'
yield from with Return
----------------------
Generators can return a value using the ``return`` statement. The returned value
is accessible through the ``value`` attribute of the ``StopIteration`` exception.
When using ``yield from``, the return value of the subgenerator becomes the value
of the ``yield from`` expression.
.. code-block:: python
>>> def average():
... total = .0
... count = 0
... while True:
... val = yield
... if not val:
... break
... total += val
... count += 1
... return total / count
...
>>> g = average()
>>> next(g)
>>> g.send(3)
>>> g.send(5)
>>> try:
... g.send(None)
... except StopIteration as e:
... print(e.value)
4.0
.. code-block:: python
>>> def subgen():
... yield 9527
...
>>> def delegating_gen():
... yield from subgen()
... return 5566
...
>>> g = delegating_gen()
>>> next(g)
9527
>>> try:
... next(g)
... except StopIteration as e:
... print(e.value)
5566
Generate Sequences
------------------
The ``yield from`` expression provides a concise way to yield all values from
an iterable. This is particularly useful for chaining multiple sequences together
or flattening nested structures.
.. code-block:: python
>>> def chain():
... yield from 'ab'
... yield from range(3)
...
>>> list(chain())
['a', 'b', 0, 1, 2]
What ``RES = yield from EXP`` Does
----------------------------------
This snippet shows the simplified equivalent of what ``yield from`` does
internally, as described in PEP 380. It handles iteration, value passing via
``send()``, and captures the return value from the subgenerator.
.. code-block:: python
# Simplified version (ref: PEP 380)
>>> def subgen():
... for x in range(3):
... yield x
...
>>> def delegating_gen():
... _i = iter(subgen())
... try:
... _y = next(_i)
... except StopIteration as _e:
... RES = _e.value
... else:
... while True:
... _s = yield _y
... try:
... _y = _i.send(_s)
... except StopIteration as _e:
... RES = _e.value
... break
...
>>> list(delegating_gen())
[0, 1, 2]
Check Generator Type
--------------------
Use ``types.GeneratorType`` to check if an object is a generator. This is useful
for writing functions that need to handle generators differently from other
iterables.
.. code-block:: python
>>> from types import GeneratorType
>>> def gen_func():
... yield 5566
...
>>> isinstance(gen_func(), GeneratorType)
True
Check Generator State
---------------------
The ``inspect.getgeneratorstate()`` function returns the current state of a
generator. This is helpful for debugging and understanding the generator lifecycle.
The four possible states are: GEN_CREATED (not started), GEN_RUNNING (currently
executing), GEN_SUSPENDED (paused at yield), and GEN_CLOSED (completed or closed).
.. code-block:: python
>>> import inspect
>>> def gen_func():
... yield 9527
...
>>> g = gen_func()
>>> inspect.getgeneratorstate(g)
'GEN_CREATED'
>>> next(g)
9527
>>> inspect.getgeneratorstate(g)
'GEN_SUSPENDED'
>>> g.close()
>>> inspect.getgeneratorstate(g)
'GEN_CLOSED'
Context Manager via Generator
-----------------------------
The ``@contextlib.contextmanager`` decorator transforms a generator function into
a context manager. Code before ``yield`` runs on entering the ``with`` block,
and code after ``yield`` (typically in ``finally``) runs on exit. The yielded
value is bound to the variable after ``as``.
.. code-block:: python
>>> import contextlib
>>> @contextlib.contextmanager
... def mylist():
... try:
... l = [1, 2, 3, 4, 5]
... yield l
... finally:
... print("exit scope")
...
>>> with mylist() as l:
... print(l)
[1, 2, 3, 4, 5]
exit scope
What ``@contextmanager`` Does
-----------------------------
This snippet shows a simplified implementation of how ``@contextmanager`` works
internally. It wraps a generator in a class that implements the context manager
protocol (``__enter__`` and ``__exit__``), handling both normal exit and
exception propagation.
.. code-block:: python
class GeneratorCM:
def __init__(self, gen):
self._gen = gen
def __enter__(self):
return next(self._gen)
def __exit__(self, *exc_info):
try:
if exc_info[0] is None:
next(self._gen)
else:
self._gen.throw(*exc_info)
except StopIteration:
return True
raise
def contextmanager(func):
def run(*a, **k):
return GeneratorCM(func(*a, **k))
return run
Profile Code Block
------------------
A practical example of using generator-based context managers to measure
execution time of code blocks. The ``yield`` statement marks the boundary
between setup (recording start time) and teardown (calculating elapsed time).
.. code-block:: python
>>> import time
>>> from contextlib import contextmanager
>>> @contextmanager
... def profile(msg):
... try:
... s = time.time()
... yield
... finally:
... print(f'{msg} cost: {time.time() - s:.2f}s')
...
>>> with profile('block'):
... time.sleep(0.1)
block cost: 0.10s
``yield from`` and ``__iter__``
-------------------------------
When using ``yield from`` with a class instance, Python calls the object's
``__iter__`` method to get an iterator. This allows custom classes to work
seamlessly with ``yield from`` delegation, enabling elegant composition of
iterables.
.. code-block:: python
>>> class FakeGen:
... def __iter__(self):
... n = 0
... while n < 3:
... yield n
... n += 1
... def __reversed__(self):
... n = 2
... while n >= 0:
... yield n
... n -= 1
...
>>> def spam():
... yield from FakeGen()
...
>>> list(spam())
[0, 1, 2]
>>> list(reversed(FakeGen()))
[2, 1, 0]
Closure Using Generator
-----------------------
Generators provide an elegant way to implement closures that maintain state
between calls. Each call to ``next()`` resumes execution and can access and
modify the enclosed variables. This is often cleaner than using ``nonlocal``
or class-based approaches.
.. code-block:: python
# generator version
>>> def closure_gen():
... x = 5566
... while True:
... x += 1
... yield x
...
>>> g = closure_gen()
>>> next(g)
5567
>>> next(g)
5568
Simple Scheduler
----------------
This example demonstrates how generators can be used to implement cooperative
multitasking. Each generator represents a task that yields control back to the
scheduler. The scheduler uses a deque to round-robin between tasks, advancing
each one step at a time.
.. code-block:: python
>>> from collections import deque
>>> def fib(n):
... if n <= 2: return 1
... return fib(n-1) + fib(n-2)
...
>>> def g_fib(n):
... for x in range(1, n + 1):
... yield fib(x)
...
>>> q = deque([g_fib(3), g_fib(5)])
>>> def run():
... while q:
... try:
... t = q.popleft()
... print(next(t))
... q.append(t)
... except StopIteration:
... print("Task done")
...
>>> run()
1
1
1
1
2
2
Task done
3
5
Task done
Simple Round-Robin with Blocking
--------------------------------
A more advanced scheduler that handles I/O blocking using ``select()``. Tasks
yield tuples indicating what operation they're waiting for ('recv' or 'send')
and which socket. The scheduler moves blocked tasks to wait queues and only
runs them when their I/O is ready. This is the foundation of async I/O frameworks.
.. code-block:: python
from collections import deque
from select import select
import socket
tasks = deque()
w_read = {}
w_send = {}
def run():
while any([tasks, w_read, w_send]):
while not tasks:
can_r, can_s, _ = select(w_read, w_send, [])
for _r in can_r:
tasks.append(w_read.pop(_r))
for _w in can_s:
tasks.append(w_send.pop(_w))
try:
task = tasks.popleft()
why, what = next(task)
if why == 'recv':
w_read[what] = task
elif why == 'send':
w_send[what] = task
except StopIteration:
pass
def server():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('localhost', 5566))
sock.listen(5)
while True:
yield 'recv', sock
conn, addr = sock.accept()
tasks.append(client_handler(conn))
def client_handler(conn):
while True:
yield 'recv', conn
msg = conn.recv(1024)
if not msg: break
yield 'send', conn
conn.send(msg)
conn.close()
tasks.append(server())
run()
Async Generator (Python 3.6+)
-----------------------------
Async generators combine ``async def`` with ``yield`` to create asynchronous
iterators. They can use ``await`` to pause for async operations between yields.
Use ``async for`` to iterate over async generators. This is essential for
streaming data from async sources like network connections or databases.
.. code-block:: python
>>> import asyncio
>>> async def slow_gen(n, t):
... for x in range(n):
... await asyncio.sleep(t)
... yield x
...
>>> async def task(n):
... async for x in slow_gen(n, 0.1):
... print(x)
...
>>> asyncio.run(task(3))
0
1
2
Async Generator with try..finally
---------------------------------
Async generators support ``try..finally`` blocks for cleanup, just like regular
generators. The ``finally`` block executes when the generator is closed or
garbage collected, ensuring resources are properly released even if an exception
occurs during iteration.
.. code-block:: python
>>> import asyncio
>>> async def agen(t):
... try:
... await asyncio.sleep(t)
... yield 1 / 0
... finally:
... print("finally")
...
>>> async def main():
... try:
... g = agen(0.1)
... await g.__anext__()
... except Exception as e:
... print(repr(e))
...
>>> asyncio.run(main())
finally
ZeroDivisionError('division by zero')
Send and Throw to Async Generator
---------------------------------
Async generators support ``asend()`` to send values and ``athrow()`` to throw
exceptions, similar to regular generators. These methods are coroutines that
must be awaited. This enables two-way communication with async generators for
building complex async data pipelines.
.. code-block:: python
>>> import asyncio
>>> async def agen(n):
... try:
... for x in range(n):
... await asyncio.sleep(0.1)
... val = yield x
... print(f'got: {val}')
... except RuntimeError as e:
... yield repr(e)
...
>>> async def main():
... g = agen(5)
... ret = await g.asend(None) + await g.asend('foo')
... print(ret)
... ret = await g.athrow(RuntimeError('error'))
... print(ret)
...
>>> asyncio.run(main())
got: foo
1
RuntimeError('error')
Async Comprehension (Python 3.6+)
---------------------------------
PEP 530 introduced async comprehensions, allowing ``async for`` in list, set,
and dict comprehensions. This provides a concise way to collect values from
async generators. You can also use ``if`` clauses to filter values and
conditional expressions for transformations.
.. code-block:: python
>>> import asyncio
>>> async def agen(n):
... for x in range(n):
... await asyncio.sleep(0.01)
... yield x
...
>>> async def main():
... ret = [x async for x in agen(5)]
... print(ret)
... ret = [x async for x in agen(5) if x < 3]
... print(ret)
... ret = {f'{x}': x async for x in agen(3)}
... print(ret)
...
>>> asyncio.run(main())
[0, 1, 2, 3, 4]
[0, 1, 2]
{'0': 0, '1': 1, '2': 2}
Simple Async Round-Robin
------------------------
This example shows cooperative multitasking with async generators. Multiple
async generators are scheduled in a deque, and the scheduler awaits each one
in turn using ``__anext__()``. This pattern is useful for interleaving multiple
async data streams fairly.
.. code-block:: python
>>> import asyncio
>>> from collections import deque
>>> async def agen(n):
... for x in range(n):
... await asyncio.sleep(0.1)
... yield x
...
>>> async def main():
... q = deque([agen(3), agen(5)])
... while q:
... try:
... g = q.popleft()
... print(await g.__anext__())
... q.append(g)
... except StopAsyncIteration:
... pass
...
>>> asyncio.run(main())
0
0
1
1
2
2
3
4
Async Generator vs Async Iterator Performance
----------------------------------------------
Async generators have better performance than manually implemented async iterators
because they are optimized at the C level in CPython. This benchmark shows that
async generators can be significantly faster for iteration-heavy workloads.
.. code-block:: python
>>> import time
>>> import asyncio
>>> class AsyncIter:
... def __init__(self, n):
... self._n = n
... def __aiter__(self):
... return self
... async def __anext__(self):
... ret = self._n
... if self._n == 0:
... raise StopAsyncIteration
... self._n -= 1
... return ret
...
>>> async def agen(n):
... for i in range(n):
... yield i
...
>>> async def task_agen(n):
... s = time.time()
... async for _ in agen(n): pass
... cost = time.time() - s
... print(f"agen cost time: {cost}")
...
>>> async def task_aiter(n):
... s = time.time()
... async for _ in AsyncIter(n): pass
... cost = time.time() - s
... print(f"aiter cost time: {cost}")
...
>>> n = 10 ** 7
>>> asyncio.run(task_agen(n))
agen cost time: 1.2698817253112793
>>> asyncio.run(task_aiter(n))
aiter cost time: 4.168368101119995
``yield from == await`` Expression
----------------------------------
Before Python 3.5 introduced ``async``/``await`` syntax, coroutines were
implemented using generators with ``@asyncio.coroutine`` decorator and
``yield from``. The ``await`` keyword is essentially equivalent to ``yield from``
for coroutines. This example shows both the old and new syntax for an echo server.
.. code-block:: python
import asyncio
import socket
loop = asyncio.get_event_loop()
host = 'localhost'
port = 5566
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.setblocking(False)
sock.bind((host, port))
sock.listen(10)
# old syntax (Python 3.4)
@asyncio.coroutine
def echo_server():
while True:
conn, addr = yield from loop.sock_accept(sock)
loop.create_task(handler(conn))
@asyncio.coroutine
def handler(conn):
while True:
msg = yield from loop.sock_recv(conn, 1024)
if not msg:
break
yield from loop.sock_sendall(conn, msg)
conn.close()
# new syntax (Python 3.5+)
async def echo_server():
while True:
conn, addr = await loop.sock_accept(sock)
loop.create_task(handler(conn))
async def handler(conn):
while True:
msg = await loop.sock_recv(conn, 1024)
if not msg:
break
await loop.sock_sendall(conn, msg)
conn.close()
loop.create_task(echo_server())
loop.run_forever()
Simple Compiler Using Generators
--------------------------------
This advanced example from David Beazley demonstrates using generators to
implement a simple expression compiler. It includes a tokenizer, parser, and
evaluator using the visitor pattern with generators for stack-based evaluation.
.. code-block:: python
import re
import types
from collections import namedtuple
tokens = [
r'(?P\d+)',
r'(?P\+)',
r'(?P-)',
r'(?P\*)',
r'(?P/)',
r'(?P\s+)']
Token = namedtuple('Token', ['type', 'value'])
lex = re.compile('|'.join(tokens))
def tokenize(text):
scan = lex.scanner(text)
gen = (Token(m.lastgroup, m.group())
for m in iter(scan.match, None) if m.lastgroup != 'WS')
return gen
class Node:
_fields = []
def __init__(self, *args):
for attr, value in zip(self._fields, args):
setattr(self, attr, value)
class Number(Node):
_fields = ['value']
class BinOp(Node):
_fields = ['op', 'left', 'right']
def parse(toks):
lookahead, current = next(toks, None), None
def accept(*toktypes):
nonlocal lookahead, current
if lookahead and lookahead.type in toktypes:
current, lookahead = lookahead, next(toks, None)
return True
def expr():
left = term()
while accept('PLUS', 'MINUS'):
left = BinOp(current.value, left)
left.right = term()
return left
def term():
left = factor()
while accept('TIMES', 'DIVIDE'):
left = BinOp(current.value, left)
left.right = factor()
return left
def factor():
if accept('NUMBER'):
return Number(int(current.value))
else:
raise SyntaxError()
return expr()
class NodeVisitor:
def visit(self, node):
stack = [self.genvisit(node)]
ret = None
while stack:
try:
node = stack[-1].send(ret)
stack.append(self.genvisit(node))
ret = None
except StopIteration as e:
stack.pop()
ret = e.value
return ret
def genvisit(self, node):
ret = getattr(self, 'visit_' + type(node).__name__)(node)
if isinstance(ret, types.GeneratorType):
ret = yield from ret
return ret
class Evaluator(NodeVisitor):
def visit_Number(self, node):
return node.value
def visit_BinOp(self, node):
leftval = yield node.left
rightval = yield node.right
if node.op == '+':
return leftval + rightval
elif node.op == '-':
return leftval - rightval
elif node.op == '*':
return leftval * rightval
elif node.op == '/':
return leftval / rightval
def evaluate(exp):
toks = tokenize(exp)
tree = parse(toks)
return Evaluator().visit(tree)
print(evaluate('2 * 3 + 5 / 2')) # 8.5
print(evaluate('+'.join([str(x) for x in range(10000)]))) # 49995000
================================================
FILE: docs/notes/basic/python-heap.rst
================================================
.. meta::
:description lang=en: Python heap and priority queue cheat sheet covering heapq module operations, heap sort algorithm, priority queue implementation with custom comparators, and practical examples
:keywords: Python, Python Cheat Sheet, heap, heapq, priority queue, heap sort, min heap, max heap, Python heapq, nlargest, nsmallest
====
Heap
====
.. contents:: Table of Contents
:backlinks: none
The heapq module provides an implementation of the heap queue algorithm, also
known as the priority queue algorithm. Heaps are binary trees where every parent
node has a value less than or equal to any of its children (min-heap). This
cheat sheet covers heap operations including heap sort, priority queues, merging
sorted iterables, and finding the n largest or smallest elements efficiently.
The source code is available on `GitHub `_.
References
----------
- `heapq — Heap queue algorithm `_
- `queue.PriorityQueue `_
Basic Heap Operations
---------------------
The ``heapq`` module provides functions to create and manipulate heaps. Use
``heapify`` to convert a list into a heap in-place in O(n) time. Use ``heappush``
and ``heappop`` to add and remove elements while maintaining the heap property.
.. code-block:: python
>>> import heapq
>>> # Convert list to heap in-place
>>> h = [5, 1, 3, 2, 6]
>>> heapq.heapify(h)
>>> h[0] # smallest element at root
1
>>> # Push and pop
>>> heapq.heappush(h, 0)
>>> heapq.heappop(h)
0
>>> # Push and pop in one operation
>>> heapq.heappushpop(h, 4) # push 4, then pop smallest
1
>>> # Pop and push in one operation
>>> heapq.heapreplace(h, 0) # pop smallest, then push 0
2
Implement Heap Sort with ``heapq``
----------------------------------
Heap sort works by pushing all elements onto a heap and then popping them off
one by one. Since the heap maintains the min-heap property, elements come out
in sorted order. The time complexity is O(n log n).
.. code-block:: python
>>> import heapq
>>> a = [5, 1, 3, 2, 6]
>>> h = []
>>> for x in a:
... heapq.heappush(h, x)
...
>>> x = [heapq.heappop(h) for _ in range(len(a))]
>>> x
[1, 2, 3, 5, 6]
A more efficient approach uses ``heapify`` to convert the list in-place:
.. code-block:: python
>>> import heapq
>>> def heap_sort(items):
... h = items.copy()
... heapq.heapify(h)
... return [heapq.heappop(h) for _ in range(len(h))]
...
>>> heap_sort([5, 1, 3, 2, 6])
[1, 2, 3, 5, 6]
Implement Max Heap
------------------
Python's ``heapq`` only provides a min-heap. To implement a max-heap, negate
the values when pushing and negate again when popping.
.. code-block:: python
>>> import heapq
>>> # Max heap using negation
>>> h = []
>>> for x in [5, 1, 3, 2, 6]:
... heapq.heappush(h, -x)
...
>>> [-heapq.heappop(h) for _ in range(len(h))]
[6, 5, 3, 2, 1]
For custom objects, implement ``__lt__`` with reversed comparison:
.. code-block:: python
import heapq
class MaxHeapItem:
def __init__(self, val):
self.val = val
def __lt__(self, other):
return self.val > other.val # reversed for max heap
h = []
for x in [5, 1, 3]:
heapq.heappush(h, MaxHeapItem(x))
print(heapq.heappop(h).val) # 5 (largest)
Implement Priority Queue with ``heapq``
---------------------------------------
A priority queue processes elements based on their priority rather than insertion
order. Use tuples ``(priority, value)`` where lower numbers indicate higher priority.
.. code-block:: python
>>> import heapq
>>> pq = []
>>> heapq.heappush(pq, (2, "medium"))
>>> heapq.heappush(pq, (1, "high"))
>>> heapq.heappush(pq, (3, "low"))
>>> [heapq.heappop(pq) for _ in range(len(pq))]
[(1, 'high'), (2, 'medium'), (3, 'low')]
For custom objects, implement the ``__lt__`` method to define comparison behavior:
.. code-block:: python
import heapq
class Task:
def __init__(self, priority, name):
self.priority = priority
self.name = name
def __lt__(self, other):
return self.priority < other.priority
def __repr__(self):
return f"Task({self.priority}, {self.name!r})"
h = []
heapq.heappush(h, Task(3, "low"))
heapq.heappush(h, Task(1, "high"))
heapq.heappush(h, Task(2, "medium"))
while h:
print(heapq.heappop(h))
# Task(1, 'high')
# Task(2, 'medium')
# Task(3, 'low')
Find K Largest or Smallest Elements
-----------------------------------
The ``nlargest`` and ``nsmallest`` functions efficiently find the k largest or
smallest elements. They are more efficient than sorting when k is small relative
to the list size.
.. code-block:: python
>>> import heapq
>>> nums = [5, 1, 8, 3, 9, 2, 7]
>>> heapq.nsmallest(3, nums)
[1, 2, 3]
>>> heapq.nlargest(3, nums)
[9, 8, 7]
Use the ``key`` parameter to extract comparison keys from complex objects:
.. code-block:: python
>>> import heapq
>>> data = [
... {'name': 'Alice', 'score': 85},
... {'name': 'Bob', 'score': 92},
... {'name': 'Charlie', 'score': 78},
... ]
>>> heapq.nlargest(2, data, key=lambda x: x['score'])
[{'name': 'Bob', 'score': 92}, {'name': 'Alice', 'score': 85}]
Merge Sorted Iterables
----------------------
The ``merge`` function merges multiple sorted inputs into a single sorted output.
It returns an iterator, making it memory-efficient for large datasets.
.. code-block:: python
>>> import heapq
>>> a = [1, 3, 5, 7]
>>> b = [2, 4, 6, 8]
>>> c = [0, 9, 10]
>>> list(heapq.merge(a, b, c))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Use ``key`` and ``reverse`` parameters for custom merging:
.. code-block:: python
>>> import heapq
>>> # Merge in descending order
>>> a = [5, 3, 1]
>>> b = [6, 4, 2]
>>> list(heapq.merge(a, b, reverse=True))
[6, 5, 4, 3, 2, 1]
Maintain a Fixed-Size Heap
--------------------------
To maintain a heap of fixed size k (e.g., tracking top k elements), use
``heappushpop`` or check the size after each push.
.. code-block:: python
>>> import heapq
>>> def top_k(items, k):
... """Keep track of k largest elements using min-heap."""
... h = []
... for x in items:
... if len(h) < k:
... heapq.heappush(h, x)
... elif x > h[0]:
... heapq.heapreplace(h, x)
... return sorted(h, reverse=True)
...
>>> top_k([5, 1, 8, 3, 9, 2, 7, 4, 6], 3)
[9, 8, 7]
Heap with Index Tracking
------------------------
When you need to update priorities in a heap, use a dictionary to track element
positions or mark entries as invalid.
.. code-block:: python
import heapq
class IndexedHeap:
def __init__(self):
self.heap = []
self.entry_finder = {}
self.REMOVED = ''
def push(self, item, priority):
if item in self.entry_finder:
self.remove(item)
entry = [priority, item]
self.entry_finder[item] = entry
heapq.heappush(self.heap, entry)
def remove(self, item):
entry = self.entry_finder.pop(item)
entry[-1] = self.REMOVED
def pop(self):
while self.heap:
priority, item = heapq.heappop(self.heap)
if item is not self.REMOVED:
del self.entry_finder[item]
return item
raise KeyError('pop from empty heap')
# Usage
h = IndexedHeap()
h.push('task1', 3)
h.push('task2', 1)
h.push('task1', 0) # update priority
print(h.pop()) # task1 (now has priority 0)
================================================
FILE: docs/notes/basic/python-list.rst
================================================
.. meta::
:description lang=en: Python list cheat sheet covering list operations, comprehensions, slicing, sorting, filtering, and common list manipulation patterns with code examples
:keywords: Python, Python3, Python list, Python list cheat sheet, list comprehension, slicing, sorting, filtering, append, extend, iteration
====
List
====
.. contents:: Table of Contents
:backlinks: none
The list is a common data structure which we use to store objects. Most of the
time, programmers concern about getting, setting, searching, filtering, and
sorting. Furthermore, sometimes, we waltz ourself into common pitfalls of
the memory management. Thus, the main goal of this cheat sheet is to collect
some common operations and pitfalls.
Python List Basics and Common Operations
----------------------------------------
There are so many ways that we can manipulate lists in Python. Before we start
to learn those versatile manipulations, the following snippet shows the most
common operations of lists.
.. code-block:: python
>>> a = [1, 2, 3, 4, 5]
>>> # contains
>>> 2 in a
True
>>> # positive index
>>> a[0]
1
>>> # negative index
>>> a[-1]
5
>>> # slicing list[start:end:step]
>>> a[1:]
[2, 3, 4, 5]
>>> a[1:-1]
[2, 3, 4]
>>> a[1:-1:2]
[2, 4]
>>> # reverse
>>> a[::-1]
[5, 4, 3, 2, 1]
>>> a[:0:-1]
[5, 4, 3, 2]
>>> # set an item
>>> a[0] = 0
>>> a
[0, 2, 3, 4, 5]
>>> # append items to list
>>> a.append(6)
>>> a
[0, 2, 3, 4, 5, 6]
>>> a.extend([7, 8, 9])
>>> a
[0, 2, 3, 4, 5, 6, 7, 8, 9]
>>> # delete an item
>>> del a[-1]
>>> a
[0, 2, 3, 4, 5, 6, 7, 8]
>>> # list comprehension
>>> b = [x for x in range(3)]
>>> b
[0, 1, 2]
>>> # add two lists
>>> a + b
[0, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2]
Initialize Lists with Multiplication Operator
---------------------------------------------
Generally speaking, we can create a list through ``*`` operator if the item in
the list expression is an immutable object.
.. code-block:: python
>>> a = [None] * 3
>>> a
[None, None, None]
>>> a[0] = "foo"
>>> a
['foo', None, None]
However, if the item in the list expression is a mutable object, the ``*``
operator will copy the reference of the item N times. In order to avoid this
pitfall, we should use a list comprehension to initialize a list.
.. code-block:: python
>>> a = [[]] * 3
>>> b = [[] for _ in range(3)]
>>> a[0].append("Hello")
>>> a
[['Hello'], ['Hello'], ['Hello']]
>>> b[0].append("Python")
>>> b
[['Python'], [], []]
Copy Lists: Shallow vs Deep Copy
--------------------------------
Assigning a list to a variable is a common pitfall. This assignment does not
copy the list to the variable. The variable only refers to the list and increase
the reference count of the list.
.. code-block:: python
import sys
>>> a = [1, 2, 3]
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> b[2] = 123456 # a[2] = 123456
>>> b
[1, 2, 123456]
>>> a
[1, 2, 123456]
There are two types of copy. The first one is called *shallow copy* (non-recursive copy)
and the second one is called *deep copy* (recursive copy). Most of the time, it
is sufficient for us to copy a list by shallow copy. However, if a list is nested,
we have to use a deep copy.
.. code-block:: python
>>> # shallow copy
>>> a = [1, 2]
>>> b = list(a)
>>> b[0] = 123
>>> a
[1, 2]
>>> b
[123, 2]
>>> a = [[1], [2]]
>>> b = list(a)
>>> b[0][0] = 123
>>> a
[[123], [2]]
>>> b
[[123], [2]]
>>> # deep copy
>>> import copy
>>> a = [[1], [2]]
>>> b = copy.deepcopy(a)
>>> b[0][0] = 123
>>> a
[[1], [2]]
>>> b
[[123], [2]]
Slice Lists with slice Objects
------------------------------
Sometimes, our data may concatenate as a large segment such as packets. In
this case, we will represent the range of data by using ``slice`` objects
as explaining variables instead of using *slicing expressions*.
.. code-block:: python
>>> icmp = (
... b"080062988e2100005bff49c20005767c"
... b"08090a0b0c0d0e0f1011121314151617"
... b"18191a1b1c1d1e1f2021222324252627"
... b"28292a2b2c2d2e2f3031323334353637"
... )
>>> head = slice(0, 32)
>>> data = slice(32, len(icmp))
>>> icmp[head]
b'080062988e2100005bff49c20005767c'
Create Lists with List Comprehensions
-------------------------------------
`List comprehensions `_
which was proposed in PEP `202 `_
provides a graceful way to create a new list based on another list, sequence,
or some object which is iterable. In addition, we can use this expression to
substitute ``map`` and ``filter`` sometimes.
.. code-block:: python
>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [(lambda x: x**2)(i) for i in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x for x in range(10) if x > 5]
[6, 7, 8, 9]
>>> [x if x > 5 else 0 for x in range(10)]
[0, 0, 0, 0, 0, 0, 6, 7, 8, 9]
>>> [x + 1 if x < 5 else x + 2 if x > 5 else x + 5 for x in range(10)]
[1, 2, 3, 4, 5, 10, 8, 9, 10, 11]
>>> [(x, y) for x in range(3) for y in range(2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
Unpack Lists into Variables
---------------------------
Sometimes, we want to unpack our list to variables in order to make our code
become more readable. In this case, we assign N elements to N variables as
following example.
.. code-block:: python
>>> arr = [1, 2, 3]
>>> a, b, c = arr
>>> a, b, c
(1, 2, 3)
Based on PEP `3132 `_, we can use a
single asterisk to unpack N elements to the number of variables which is less
than N in Python 3.
.. code-block:: python
>>> arr = [1, 2, 3, 4, 5]
>>> a, b, *c, d = arr
>>> a, b, d
(1, 2, 5)
>>> c
[3, 4]
Iterate with Index Using enumerate()
------------------------------------
``enumerate`` is a built-in function. It helps us to acquire indexes
(or a count) and elements at the same time without using ``range(len(list))``.
Further information can be found on
`Looping Techniques `_.
.. code-block:: python
>>> for i, v in enumerate(range(3)):
... print(i, v)
...
0 0
1 1
2 2
>>> for i, v in enumerate(range(3), 1): # start = 1
... print(i, v)
...
1 0
2 1
3 2
Combine Lists with zip()
------------------------
`zip `_ enables us to
iterate over items contained in multiple lists at a time. Iteration stops
whenever one of the lists is exhausted. As a result, the length of the
iteration is the same as the shortest list. If this behavior is not desired,
we can use ``itertools.zip_longest`` in **Python 3** or ``itertools.izip_longest``
in **Python 2**.
.. code-block:: python
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> list(zip(a, b))
[(1, 4), (2, 5), (3, 6)]
>>> c = [1]
>>> list(zip(a, b, c))
[(1, 4, 1)]
>>> from itertools import zip_longest
>>> list(zip_longest(a, b, c))
[(1, 4, 1), (2, 5, None), (3, 6, None)]
Filter List Items
-----------------
`filter `_ is a
built-in function to assist us to remove unnecessary items. In **Python 2**,
``filter`` returns a list. However, in **Python 3**, ``filter`` returns an
*iterable object*. Note that *list comprehension* or *generator
expression* provides a more concise way to remove items.
.. code-block:: python
>>> [x for x in range(5) if x > 1]
[2, 3, 4]
>>> l = ['1', '2', 3, 'Hello', 4]
>>> f = lambda x: isinstance(x, int)
>>> filter(f, l)
>>> list(filter(f, l))
[3, 4]
>>> list((i for i in l if f(i)))
[3, 4]
Implement Stack with List
-------------------------
There is no need for an additional data structure, stack, in Python because the
``list`` provides ``append`` and ``pop`` methods which enable us use a list as
a stack.
.. code-block:: python
>>> stack = []
>>> stack.append(1)
>>> stack.append(2)
>>> stack.append(3)
>>> stack
[1, 2, 3]
>>> stack.pop()
3
>>> stack.pop()
2
>>> stack
[1]
Check Membership with in Operator
---------------------------------
We can implement the ``__contains__`` method to make a class do ``in``
operations. It is a common way for a programmer to emulate
a membership test operations for custom classes.
.. code-block:: python
class Stack:
def __init__(self):
self.__list = []
def push(self, val):
self.__list.append(val)
def pop(self):
return self.__list.pop()
def __contains__(self, item):
return True if item in self.__list else False
stack = Stack()
stack.push(1)
print(1 in stack)
print(0 in stack)
Example
.. code-block:: bash
python stack.py
True
False
Access Items with __getitem__ and __setitem__
---------------------------------------------
Making custom classes perform get and set operations like lists is simple. We
can implement a ``__getitem__`` method and a ``__setitem__`` method to enable
a class to retrieve and overwrite data by index. In addition, if we want to use
the function, ``len``, to calculate the number of elements, we can implement a
``__len__`` method.
.. code-block:: python
class Stack:
def __init__(self):
self.__list = []
def push(self, val):
self.__list.append(val)
def pop(self):
return self.__list.pop()
def __repr__(self):
return "{}".format(self.__list)
def __len__(self):
return len(self.__list)
def __getitem__(self, idx):
return self.__list[idx]
def __setitem__(self, idx, val):
self.__list[idx] = val
stack = Stack()
stack.push(1)
stack.push(2)
print("stack:", stack)
stack[0] = 3
print("stack:", stack)
print("num items:", len(stack))
Example
.. code-block:: bash
$ python stack.py
stack: [1, 2]
stack: [3, 2]
num items: 2
Delegate Iteration with __iter__
--------------------------------
If a custom container class holds a list and we want iterations to work on the
container, we can implement a ``__iter__`` method to delegate iterations to
the list. Note that the method, ``__iter__``, should return an *iterator object*,
so we cannot return the list directly; otherwise, Python raises a ``TypeError``.
.. code-block:: python
class Stack:
def __init__(self):
self.__list = []
def push(self, val):
self.__list.append(val)
def pop(self):
return self.__list.pop()
def __iter__(self):
return iter(self.__list)
stack = Stack()
stack.push(1)
stack.push(2)
for s in stack:
print(s)
Example
.. code-block:: bash
$ python stack.py
1
2
Sort Lists with sort() and sorted()
-----------------------------------
Python list provides a built-in ``list.sort`` method which sorts a list
`in-place `_ without using
extra memory. Moreover, the return value of ``list.sort`` is ``None`` in
order to avoid confusion with ``sorted`` and the function can only be used for
``list``.
.. code-block:: python
>>> l = [5, 4, 3, 2, 1]
>>> l.sort()
>>> l
[1, 2, 3, 4, 5]
>>> l.sort(reverse=True)
>>> l
[5, 4, 3, 2, 1]
The ``sorted`` function does not modify any iterable object in-place. Instead,
it returns a new sorted list. Using ``sorted`` is safer than ``list.sort`` if
some list's elements are read-only or immutable. Besides, another difference
between ``list.sort`` and ``sorted`` is that ``sorted`` accepts any **iterable
object**.
.. code-block:: python
>>> l = [5, 4, 3, 2, 1]
>>> new = sorted(l)
>>> new
[1, 2, 3, 4, 5]
>>> l
[5, 4, 3, 2, 1]
>>> d = {3: 'andy', 2: 'david', 1: 'amy'}
>>> sorted(d) # sort iterable
[1, 2, 3]
To sort a list with its elements are tuples, using ``operator.itemgetter`` is
helpful because it assigns a key function to the ``sorted`` key parameter. Note
that the key should be comparable; otherwise, it will raise a ``TypeError``.
.. code-block:: python
>>> from operator import itemgetter
>>> l = [('andy', 10), ('david', 8), ('amy', 3)]
>>> l.sort(key=itemgetter(1))
>>> l
[('amy', 3), ('david', 8), ('andy', 10)]
``operator.itemgetter`` is useful because the function returns a getter
method which can be applied to other objects with a method ``__getitem__``. For
example, sorting a list with its elements are dictionary can be achieved by
using ``operator.itemgetter`` due to all elements have ``__getitem__``.
.. code-block:: python
>>> from pprint import pprint
>>> from operator import itemgetter
>>> l = [
... {'name': 'andy', 'age': 10},
... {'name': 'david', 'age': 8},
... {'name': 'amy', 'age': 3},
... ]
>>> l.sort(key=itemgetter('age'))
>>> pprint(l)
[{'age': 3, 'name': 'amy'},
{'age': 8, 'name': 'david'},
{'age': 10, 'name': 'andy'}]
If it is necessary to sort a list with its elements are neither comparable nor
having ``__getitem__`` method, assigning a customized key function is feasible.
.. code-block:: python
>>> class Node(object):
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=lambda x: x.val)
>>> nodes
[Node(1), Node(2), Node(3)]
>>> nodes.sort(key=lambda x: x.val, reverse=True)
>>> nodes
[Node(3), Node(2), Node(1)]
The above snippet can be simplified by using ``operator.attrgetter``. The
function returns an attribute getter based on the attribute's name. Note that
the attribute should be comparable; otherwise, ``sorted`` or ``list.sort`` will
raise ``TypeError``.
.. code-block:: python
>>> from operator import attrgetter
>>> class Node(object):
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=attrgetter('val'))
>>> nodes
[Node(1), Node(2), Node(3)]
If an object has ``__lt__`` method, it means that the object is comparable and
``sorted`` or ``list.sort`` is not necessary to input a key function to its key
parameter. A list or an iterable sequence can be sorted directly.
.. code-block:: python
>>> class Node(object):
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return f"Node({self.val})"
... def __lt__(self, other):
... return self.val - other.val < 0
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]
If an object does not have ``__lt__`` method, it is likely to patch the method
after a declaration of the object's class. In other words, after the patching,
the object becomes comparable.
.. code-block:: python
>>> class Node(object):
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return f"Node({self.val})"
...
>>> Node.__lt__ = lambda s, o: s.val < o.val
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]
Note that ``sorted`` or ``list.sort`` in Python3 does not support ``cmp``
parameter which is an **ONLY** valid argument in Python2. If it is necessary to
use an old comparison function, e.g., some legacy code, ``functools.cmp_to_key``
is useful since it converts a comparison function to a key function.
.. code-block:: python
>>> from functools import cmp_to_key
>>> class Node(object):
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=cmp_to_key(lambda x,y: x.val - y.val))
>>> nodes
[Node(1), Node(2), Node(3)]
Maintain Sorted List with bisect
--------------------------------
The `bisect `_ module provides
functions to maintain a list in sorted order without having to sort the list
after each insertion. It uses a binary search algorithm, making insertions
efficient for large lists.
.. code-block:: python
import bisect
class Foo(object):
def __init__(self, k):
self.k = k
def __eq__(self, rhs):
return self.k == rhs.k
def __ne__(self, rhs):
return self.k != rhs.k
def __lt__(self, rhs):
return self.k < rhs.k
def __gt__(self, rhs):
return self.k > rhs.k
def __le__(self, rhs):
return self.k <= rhs.k
def __ge__(self, rhs):
return self.k >= rhs.k
def __repr__(self):
return f"Foo({self.k})"
def __str__(self):
return self.__repr__()
foo = [Foo(1), Foo(3), Foo(2), Foo(0)]
bar = []
for x in foo:
bisect.insort(bar, x)
print(bar) # [Foo(0), Foo(1), Foo(2), Foo(3)]
Create Nested Lists Correctly
-----------------------------
When creating nested lists (2D lists or matrices), we should use list
comprehension to ensure each inner list is a separate object. The following
snippet shows the correct way to create a 2D list.
.. code-block:: python
# new a list with size = 3
>>> [0] * 3
[0, 0, 0]
# new a 2d list with size 3x3
>>> [[0] * 3 for _ in range(3)]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
Note that we should avoid creating a multi-dimension list via the following
snippet because all objects in the list point to the same address.
.. code-block:: python
>>> a = [[0] * 3] * 3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[1][1] = 2
>>> a
[[0, 2, 0], [0, 2, 0], [0, 2, 0]]
Implement Circular Buffer with deque
------------------------------------
`collections.deque `_
is a double-ended queue that supports adding and removing elements from both ends
efficiently. By setting ``maxlen``, we can create a circular buffer that automatically
discards old elements when new ones are added.
.. code-block:: python
>>> from collections import deque
>>> d = deque(maxlen=8)
>>> for x in range(9):
... d.append(x)
...
>>> d
deque([1, 2, 3, 4, 5, 6, 7, 8], maxlen=8)
The following example shows how to implement a ``tail`` function similar to
the Unix command using ``deque``.
.. code-block:: python
>>> from collections import deque
>>> def tail(path, n=10):
... with open(path) as f:
... return deque(f, n)
...
>>> tail("/etc/hosts")
Split List into Chunks
----------------------
Sometimes, we need to split a list into smaller chunks of a specific size.
The following generator function yields successive chunks from the list.
.. code-block:: python
>>> def chunk(lst, n):
... for i in range(0, len(lst), n):
... yield lst[i:i+n]
...
>>> a = [1, 2, 3, 4, 5, 6, 7, 8]
>>> list(chunk(a, 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]
Group Consecutive Elements with itertools.groupby
-------------------------------------------------
`itertools.groupby `_
groups consecutive elements in an iterable that have the same key. It is useful
for run-length encoding or grouping sorted data.
.. code-block:: python
>>> import itertools
>>> s = "AAABBCCCCC"
>>> for k, v in itertools.groupby(s):
... print(k, list(v))
...
A ['A', 'A', 'A']
B ['B', 'B']
C ['C', 'C', 'C', 'C', 'C']
# group by key
>>> x = [('gp1', 'a'), ('gp2', 'b'), ('gp2', 'c')]
>>> for k, v in itertools.groupby(x, lambda x: x[0]):
... print(k, list(v))
...
gp1 [('gp1', 'a')]
gp2 [('gp2', 'b'), ('gp2', 'c')]
Binary Search in Sorted List
----------------------------
Binary search is an efficient algorithm for finding an item in a sorted list.
The following snippet shows how to implement binary search using ``bisect_left``.
.. code-block:: python
>>> def binary_search(arr, x, lo=0, hi=None):
... if not hi: hi = len(arr)
... pos = bisect_left(arr, x, lo, hi)
... return pos if pos != hi and arr[pos] == x else -1
...
>>> a = [1, 1, 1, 2, 3]
>>> binary_search(a, 1)
0
>>> binary_search(a, 2)
3
Find Lower Bound with bisect_left
---------------------------------
``bisect_left`` returns the leftmost position where an element can be inserted
to keep the list sorted. This is equivalent to finding the lower bound.
.. code-block:: python
>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_left(a, 3)
2
>>> bisect.bisect_left(a, 3.5)
4
Find Upper Bound with bisect_right
----------------------------------
``bisect_right`` (or ``bisect``) returns the rightmost position where an element
can be inserted to keep the list sorted. This is equivalent to finding the upper bound.
.. code-block:: python
>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_right(a, 3)
4
>>> bisect.bisect_right(a, 3.5)
4
Sort Tuples Lexicographically
-----------------------------
Python compares tuples and lists lexicographically by default. This means it
compares the first elements, and if they are equal, it compares the second
elements, and so on.
.. code-block:: python
# python compare lists lexicographically
>>> a = [(1,2), (1,1), (1,0), (2,1)]
>>> a.sort()
>>> a
[(1, 0), (1, 1), (1, 2), (2, 1)]
Implement Trie (Prefix Tree)
----------------------------
A `Trie `_ (prefix tree) is a tree data
structure used for efficient retrieval of keys in a dataset of strings. The
following snippet shows a compact implementation using ``defaultdict``.
.. code-block:: python
>>> from functools import reduce
>>> from collections import defaultdict
>>> Trie = lambda: defaultdict(Trie)
>>> prefixes = ['abc', 'de', 'g']
>>> trie = Trie()
>>> end = True
>>> for p in prefixes:
... reduce(dict.__getitem__, p, trie)[end] = p
...
# search prefix
>>> def find(trie, word):
... curr = trie
... for c in word:
... if c not in curr:
... return False
... curr = curr[c]
... return True
...
>>> find(trie, "abcdef")
False
>>> find(trie, "abc")
True
>>> find(trie, "ab")
True
# search word
>>> def find(trie, p):
... curr = trie
... for c in p:
... if c not in curr or True in curr:
... break
... curr = curr[c]
... return True if True in curr else False
...
>>> find(trie, "abcdef")
True
>>> find(trie, "abc")
True
>>> find(trie, "ab")
False
================================================
FILE: docs/notes/basic/python-object.rst
================================================
.. meta::
:description lang=en: Python class cheat sheet covering magic methods, property decorators, inheritance, context managers, and OOP design patterns with code examples
:keywords: Python, Python3, Python class, Python OOP cheat sheet, magic methods, property decorator, context manager, singleton, abstract class, descriptor, inheritance
=====
Class
=====
.. contents:: Table of Contents
:backlinks: none
Python is an object-oriented programming language. This cheat sheet covers
class definitions, inheritance, magic methods, property decorators, context
managers, and common design patterns. Understanding these concepts is essential
for writing clean, maintainable Python code.
List Attributes with dir()
--------------------------
The ``dir()`` function returns a list of all attributes and methods of an object.
This is useful for introspection and discovering what operations are available.
.. code-block:: python
>>> dir(list) # check all attr of list
['__add__', '__class__', ...]
Check Type with isinstance()
----------------------------
Use ``isinstance()`` to check if an object is an instance of a class or its
subclasses. This is preferred over ``type()`` comparison because it supports
inheritance.
.. code-block:: python
>>> ex = 10
>>> isinstance(ex, int)
True
>>> isinstance(ex, (int, float)) # check multiple types
True
Check Inheritance with issubclass()
-----------------------------------
Use ``issubclass()`` to check if a class is a subclass of another class.
.. code-block:: python
>>> class Animal: pass
>>> class Dog(Animal): pass
>>> issubclass(Dog, Animal)
True
>>> issubclass(Dog, object)
True
Get Class Name
--------------
Access the class name through the ``__class__.__name__`` attribute.
.. code-block:: python
>>> class ExampleClass:
... pass
...
>>> ex = ExampleClass()
>>> ex.__class__.__name__
'ExampleClass'
Has / Get / Set Attributes
--------------------------
Python provides built-in functions to dynamically access and modify object
attributes at runtime.
.. code-block:: python
>>> class Example:
... def __init__(self):
... self.name = "ex"
...
>>> ex = Example()
>>> hasattr(ex, "name")
True
>>> getattr(ex, 'name')
'ex'
>>> setattr(ex, 'name', 'example')
>>> ex.name
'example'
>>> getattr(ex, 'missing', 'default') # with default
'default'
Declare Class with type()
-------------------------
Classes can be created dynamically using ``type()``. This is useful for
metaprogramming and creating classes at runtime.
.. code-block:: python
>>> def greet(self):
... return f"Hello, I'm {self.name}"
...
>>> Person = type('Person', (object,), {
... 'name': 'Anonymous',
... 'greet': greet
... })
>>> p = Person()
>>> p.greet()
"Hello, I'm Anonymous"
This is equivalent to:
.. code-block:: python
>>> class Person:
... name = 'Anonymous'
... def greet(self):
... return f"Hello, I'm {self.name}"
__new__ vs __init__
-------------------
``__new__`` creates the instance, ``__init__`` initializes it. ``__init__`` is
only called if ``__new__`` returns an instance of the class.
.. code-block:: python
>>> class Example:
... def __new__(cls, arg):
... print(f'__new__ {arg}')
... return super().__new__(cls)
... def __init__(self, arg):
... print(f'__init__ {arg}')
...
>>> o = Example("Hello")
__new__ Hello
__init__ Hello
__str__ and __repr__
--------------------
``__str__`` returns a human-readable string, ``__repr__`` returns an unambiguous
representation for debugging. When ``__str__`` is not defined, ``__repr__`` is used.
.. code-block:: python
>>> class Vector:
... def __init__(self, x, y):
... self.x, self.y = x, y
... def __repr__(self):
... return f"Vector({self.x}, {self.y})"
... def __str__(self):
... return f"({self.x}, {self.y})"
...
>>> v = Vector(1, 2)
>>> repr(v)
'Vector(1, 2)'
>>> str(v)
'(1, 2)'
>>> print(v)
(1, 2)
Comparison Magic Methods
------------------------
Implement comparison operators by defining magic methods. Use
``functools.total_ordering`` to generate all comparisons from ``__eq__`` and one other.
.. code-block:: python
>>> from functools import total_ordering
>>> @total_ordering
... class Number:
... def __init__(self, val):
... self.val = val
... def __eq__(self, other):
... return self.val == other.val
... def __lt__(self, other):
... return self.val < other.val
...
>>> Number(1) < Number(2)
True
>>> Number(2) >= Number(1)
True
Arithmetic Magic Methods
------------------------
Implement arithmetic operators to make objects work with ``+``, ``-``, ``*``, etc.
.. code-block:: python
>>> class Vector:
... def __init__(self, x, y):
... self.x, self.y = x, y
... def __add__(self, other):
... return Vector(self.x + other.x, self.y + other.y)
... def __mul__(self, scalar):
... return Vector(self.x * scalar, self.y * scalar)
... def __repr__(self):
... return f"Vector({self.x}, {self.y})"
...
>>> Vector(1, 2) + Vector(3, 4)
Vector(4, 6)
>>> Vector(1, 2) * 3
Vector(3, 6)
Callable with __call__
----------------------
Implement ``__call__`` to make instances callable like functions. This is useful
for creating function-like objects that maintain state.
.. code-block:: python
>>> class Multiplier:
... def __init__(self, factor):
... self.factor = factor
... def __call__(self, x):
... return x * self.factor
...
>>> double = Multiplier(2)
>>> double(5)
10
>>> callable(double)
True
@property Decorator
-------------------
Use ``@property`` to define getters, setters, and deleters for managed attributes.
This allows attribute access syntax while running custom code.
.. code-block:: python
>>> class Circle:
... def __init__(self, radius):
... self._radius = radius
... @property
... def radius(self):
... return self._radius
... @radius.setter
... def radius(self, value):
... if value < 0:
... raise ValueError("Radius must be positive")
... self._radius = value
... @property
... def area(self):
... return 3.14159 * self._radius ** 2
...
>>> c = Circle(5)
>>> c.area
78.53975
>>> c.radius = 10
>>> c.radius
10
Descriptor Protocol
-------------------
Descriptors control attribute access at the class level. They implement
``__get__``, ``__set__``, and/or ``__delete__`` methods.
.. code-block:: python
>>> class Positive:
... def __init__(self, name):
... self.name = name
... def __get__(self, obj, objtype=None):
... return obj.__dict__[self.name]
... def __set__(self, obj, value):
... if value < 0:
... raise ValueError("Must be positive")
... obj.__dict__[self.name] = value
...
>>> class Example:
... x = Positive('x')
... def __init__(self, x):
... self.x = x
...
>>> ex = Example(10)
>>> ex.x
10
Context Manager Protocol
------------------------
Context managers implement ``__enter__`` and ``__exit__`` to manage resources
with the ``with`` statement. This ensures proper cleanup even if exceptions occur.
.. code-block:: python
class ManagedFile:
def __init__(self, filename):
self.filename = filename
def __enter__(self):
self.file = open(self.filename, 'r')
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
self.file.close()
return False # don't suppress exceptions
with ManagedFile('example.txt') as f:
content = f.read()
Using contextlib
----------------
The ``contextlib`` module provides utilities for creating context managers
without writing a full class.
.. code-block:: python
from contextlib import contextmanager
@contextmanager
def managed_file(filename):
f = open(filename, 'r')
try:
yield f
finally:
f.close()
with managed_file('example.txt') as f:
content = f.read()
@staticmethod and @classmethod
------------------------------
``@staticmethod`` defines a method that doesn't access instance or class.
``@classmethod`` receives the class as the first argument, useful for
alternative constructors.
.. code-block:: python
>>> class Date:
... def __init__(self, year, month, day):
... self.year, self.month, self.day = year, month, day
... @classmethod
... def from_string(cls, date_string):
... year, month, day = map(int, date_string.split('-'))
... return cls(year, month, day)
... @staticmethod
... def is_valid(date_string):
... try:
... y, m, d = map(int, date_string.split('-'))
... return 1 <= m <= 12 and 1 <= d <= 31
... except:
... return False
...
>>> d = Date.from_string('2024-01-15')
>>> d.year
2024
>>> Date.is_valid('2024-13-01')
False
Abstract Base Classes with abc
------------------------------
Use ``abc`` module to define abstract base classes that cannot be instantiated
and require subclasses to implement certain methods.
.. code-block:: python
>>> from abc import ABC, abstractmethod
>>> class Shape(ABC):
... @abstractmethod
... def area(self):
... pass
...
>>> class Rectangle(Shape):
... def __init__(self, width, height):
... self.width, self.height = width, height
... def area(self):
... return self.width * self.height
...
>>> r = Rectangle(3, 4)
>>> r.area()
12
>>> Shape() # raises TypeError
The Diamond Problem (MRO)
-------------------------
Python uses Method Resolution Order (MRO) to resolve the diamond problem in
multiple inheritance. Use ``ClassName.mro()`` to see the resolution order.
.. code-block:: python
>>> class A:
... def method(self):
... return "A"
...
>>> class B(A):
... def method(self):
... return "B"
...
>>> class C(A):
... def method(self):
... return "C"
...
>>> class D(B, C):
... pass
...
>>> D().method()
'B'
>>> D.mro()
[, , , , ]
Singleton Pattern
-----------------
Singleton ensures only one instance of a class exists. Implement using
``__new__`` or a decorator.
.. code-block:: python
class Singleton:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
a = Singleton()
b = Singleton()
print(a is b) # True
Using __slots__
---------------
``__slots__`` restricts instance attributes and reduces memory usage by avoiding
``__dict__`` per instance.
.. code-block:: python
>>> class Point:
... __slots__ = ['x', 'y']
... def __init__(self, x, y):
... self.x, self.y = x, y
...
>>> p = Point(1, 2)
>>> p.x
1
>>> p.z = 3 # raises AttributeError
Common Magic Methods Reference
------------------------------
.. code-block:: python
# Object Creation and Representation
__new__(cls, ...) # create instance
__init__(self, ...) # initialize instance
__del__(self) # destructor
__repr__(self) # repr(obj)
__str__(self) # str(obj)
# Comparison
__eq__(self, other) # ==
__ne__(self, other) # !=
__lt__(self, other) # <
__le__(self, other) # <=
__gt__(self, other) # >
__ge__(self, other) # >=
# Arithmetic
__add__(self, other) # +
__sub__(self, other) # -
__mul__(self, other) # *
__truediv__(self, other) # /
__floordiv__(self, other)# //
__mod__(self, other) # %
__pow__(self, other) # **
# Container
__len__(self) # len(obj)
__getitem__(self, key) # obj[key]
__setitem__(self, k, v) # obj[key] = value
__delitem__(self, key) # del obj[key]
__contains__(self, item) # item in obj
__iter__(self) # iter(obj)
# Attribute Access
__getattr__(self, name) # obj.name (when not found)
__setattr__(self, n, v) # obj.name = value
__delattr__(self, name) # del obj.name
# Callable
__call__(self, ...) # obj()
# Context Manager
__enter__(self) # with obj
__exit__(self, ...) # exit with block
# Descriptor
__get__(self, obj, type) # descriptor access
__set__(self, obj, val) # descriptor assignment
__delete__(self, obj) # descriptor deletion
================================================
FILE: docs/notes/basic/python-rexp.rst
================================================
.. meta::
:description lang=en: Python regex cheat sheet covering re module, pattern matching, groups, lookahead, lookbehind, substitution, and common regex patterns with code examples
:keywords: Python, Python3, Python regex, Python regex cheat sheet, regular expression, re module, pattern matching, findall, search, match, sub, lookahead, lookbehind, named groups
==================
Regular Expression
==================
.. contents:: Table of Contents
:backlinks: none
Regular expressions (regex) are powerful tools for pattern matching and text
manipulation. Python's ``re`` module provides comprehensive support for regex
operations. This cheat sheet covers basic matching, groups, lookaround assertions,
substitution, and common patterns for validating emails, URLs, IP addresses, etc.
Basic Operations
----------------
The ``re`` module provides several functions for pattern matching. Use ``search()``
to find the first match anywhere in the string, ``match()`` to match at the
beginning, and ``fullmatch()`` to match the entire string.
.. code-block:: python
>>> import re
>>> # search - find anywhere in string
>>> re.search(r'\d+', 'abc123def')
>>> # match - match at beginning only
>>> re.match(r'\d+', '123abc')
>>> re.match(r'\d+', 'abc123') is None
True
>>> # fullmatch - match entire string
>>> re.fullmatch(r'\d+', '123')
>>> re.fullmatch(r'\d+', '123abc') is None
True
``re.findall()`` - Find All Matches
-----------------------------------
The ``findall()`` function returns all non-overlapping matches as a list of
strings. If the pattern has groups, it returns a list of tuples.
.. code-block:: python
>>> # find all words
>>> source = "Hello World Ker HAHA"
>>> re.findall(r'[\w]+', source)
['Hello', 'World', 'Ker', 'HAHA']
>>> # find all digits
>>> re.findall(r'\d+', 'a1b22c333')
['1', '22', '333']
>>> # with groups - returns tuples
>>> re.findall(r'(\w+)=(\d+)', 'a=1 b=2 c=3')
[('a', '1'), ('b', '2'), ('c', '3')]
``re.split()`` - Split by Pattern
---------------------------------
The ``split()`` function splits a string by pattern occurrences. Use ``maxsplit``
to limit the number of splits.
.. code-block:: python
>>> re.split(r'\s+', 'a b c')
['a', 'b', 'c']
>>> re.split(r'[,;]', 'a,b;c,d')
['a', 'b', 'c', 'd']
>>> re.split(r'(\s+)', 'a b c') # keep delimiters
['a', ' ', 'b', ' ', 'c']
>>> re.split(r'\s+', 'a b c d', maxsplit=2)
['a', 'b', 'c d']
Group Matching
--------------
Parentheses ``(...)`` create capturing groups. Use ``group()`` to access matched
groups. Group 0 is the entire match, group 1 is the first parenthesized group, etc.
.. code-block:: python
>>> m = re.search(r'(\d{4})-(\d{2})-(\d{2})', '2016-01-01')
>>> m.groups()
('2016', '01', '01')
>>> m.group() # entire match
'2016-01-01'
>>> m.group(1) # first group
'2016'
>>> m.group(2, 3) # multiple groups
('01', '01')
# Nested groups - numbered left to right by opening parenthesis
>>> m = re.search(r'(((\d{4})-\d{2})-\d{2})', '2016-01-01')
>>> m.groups()
('2016-01-01', '2016-01', '2016')
Non-Capturing Group ``(?:...)``
-------------------------------
Use ``(?:...)`` when you need grouping for alternation or quantifiers but don't
need to capture the match. This improves performance and keeps group numbering clean.
.. code-block:: python
>>> url = 'http://stackoverflow.com/'
>>> # non-capturing group for protocol
>>> m = re.search(r'(?:http|ftp)://([^/\r\n]+)(/[^\r\n]*)?', url)
>>> m.groups()
('stackoverflow.com', '/')
>>> # capturing group - protocol is captured
>>> m = re.search(r'(http|ftp)://([^/\r\n]+)(/[^\r\n]*)?', url)
>>> m.groups()
('http', 'stackoverflow.com', '/')
Named Groups ``(?P...)``
------------------------------
Named groups make patterns more readable and allow access by name instead of
number. Use ``(?P...)`` to define and ``(?P=name)`` for back reference.
.. code-block:: python
>>> pattern = r'(?P\d{4})-(?P\d{2})-(?P\d{2})'
>>> m = re.search(pattern, '2016-01-01')
>>> m.group('year')
'2016'
>>> m.group('month')
'01'
>>> m.groupdict()
{'year': '2016', 'month': '01', 'day': '01'}
# named back reference
>>> re.search(r'^(?P[a-z])(?P=char)', 'aa')
>>> re.search(r'^(?P[a-z])(?P=char)', 'ab') is None
True
Back Reference ``\1``, ``\2``
-----------------------------
Back references match the same text as a previous capturing group. Use ``\1``
for the first group, ``\2`` for the second, etc.
.. code-block:: python
>>> # match repeated characters
>>> re.search(r'([a-z])\1', 'aa') is not None
True
>>> re.search(r'([a-z])\1', 'ab') is not None
False
>>> # match HTML tags with matching close tag
>>> pattern = r'<([^>]+)>[\s\S]*?\1>'
>>> re.search(pattern, 'test') is not None
True
>>> re.search(pattern, 'test') is not None
False
Substitute with ``re.sub()``
----------------------------
The ``sub()`` function replaces pattern matches with a replacement string.
Use ``\1``, ``\2`` in the replacement to reference captured groups.
.. code-block:: python
>>> # basic substitution
>>> re.sub(r'[a-z]', ' ', '1a2b3c')
'1 2 3 '
>>> # substitute with group reference
>>> re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', '2016-01-01')
'01/01/2016'
>>> # using function as replacement
>>> re.sub(r'\d+', lambda m: str(int(m.group()) * 2), 'a1b2c3')
'a2b4c6'
>>> # camelCase to snake_case
>>> def to_snake(s):
... s = re.sub(r'(.)([A-Z][a-z]+)', r'\1_\2', s)
... return re.sub(r'([a-z])([A-Z])', r'\1_\2', s).lower()
...
>>> to_snake('CamelCase')
'camel_case'
>>> to_snake('SimpleHTTPServer')
'simple_http_server'
Lookahead and Lookbehind
------------------------
Lookaround assertions match a position without consuming characters. They are
useful for matching patterns based on context.
+---------------+---------------------+---------------------------+
| Notation | Name | Description |
+===============+=====================+===========================+
| ``(?=...)`` | Positive lookahead | Followed by ... |
+---------------+---------------------+---------------------------+
| ``(?!...)`` | Negative lookahead | Not followed by ... |
+---------------+---------------------+---------------------------+
| ``(?<=...)`` | Positive lookbehind | Preceded by ... |
+---------------+---------------------+---------------------------+
| ``(?>> # positive lookahead - find word before @
>>> re.findall(r'\w+(?=@)', 'user@example.com')
['user']
>>> # negative lookahead - find digits not followed by px
>>> re.findall(r'\d+(?!px)', '12px 34em 56')
['1', '34', '56']
>>> # positive lookbehind - find digits after $
>>> re.findall(r'(?<=\$)\d+', '$100 $200')
['100', '200']
>>> # negative lookbehind - find digits not after $
>>> re.findall(r'(?>> # insert space before groups of 3 digits from right
>>> re.sub(r'(?=(\d{3})+$)', ' ', '12345678')
' 12 345 678'
Compile Pattern for Reuse
-------------------------
Use ``re.compile()`` to create a reusable pattern object. This improves
performance when the same pattern is used multiple times.
.. code-block:: python
>>> pattern = re.compile(r'\d{4}-\d{2}-\d{2}')
>>> pattern.search('Date: 2024-01-15')
>>> pattern.findall('2024-01-15 and 2024-02-20')
['2024-01-15', '2024-02-20']
Regex Flags
-----------
Flags modify pattern behavior. Common flags include ``re.IGNORECASE`` (``re.I``),
``re.MULTILINE`` (``re.M``), ``re.DOTALL`` (``re.S``), and ``re.VERBOSE`` (``re.X``).
.. code-block:: python
>>> # case insensitive
>>> re.findall(r'[a-z]+', 'Hello World', re.I)
['Hello', 'World']
>>> # multiline - ^ and $ match line boundaries
>>> re.findall(r'^\w+', 'line1\nline2', re.M)
['line1', 'line2']
>>> # dotall - . matches newline
>>> re.search(r'a.b', 'a\nb', re.S)
>>> # verbose - allow comments and whitespace
>>> pattern = re.compile(r'''
... \d{4} # year
... -
... \d{2} # month
... -
... \d{2} # day
... ''', re.X)
>>> pattern.match('2024-01-15')
Compare HTML Tags
-----------------
Common patterns for matching different types of HTML tags.
+------------+--------------+--------------+
| Tag Type | Pattern | Example |
+============+==============+==============+
| All tags | <[^>]+> | , |
+------------+--------------+--------------+
| Open tag | <[^/>][^>]*> | ,
|
+------------+--------------+--------------+
| Close tag | [^>]+> | , |
+------------+--------------+--------------+
| Self-close | <[^/>]+/> | |
+------------+--------------+--------------+
.. code-block:: python
>>> # open tag
>>> re.search(r'<[^/>][^>]*>', '
') is not None
True
>>> re.search(r'<[^/>][^>]*>', '
') is not None
False
>>> # close tag
>>> re.search(r'[^>]+>', '
') is not None
True
>>> # self-closing tag
>>> re.search(r'<[^/>]+/>', ' ') is not None
True
Match Email Address
-------------------
A pattern for validating email addresses. Note that fully RFC-compliant email
validation is extremely complex; this covers common cases.
.. code-block:: python
>>> pattern = re.compile(r'^[\w.+-]+@[\w-]+\.[\w.-]+$')
>>> pattern.match('hello.world@example.com') is not None
True
>>> pattern.match('user+tag@sub.domain.org') is not None
True
>>> pattern.match('invalid@') is not None
False
Match URL
---------
A pattern for matching URLs with optional protocol, domain, and path.
.. code-block:: python
>>> pattern = re.compile(r'''
... ^(https?://)? # optional protocol
... ([\da-z.-]+) # domain
... \.([a-z.]{2,6}) # TLD
... ([/\w.-]*)*/?$ # path
... ''', re.X | re.I)
>>> pattern.match('https://www.example.com/path') is not None
True
>>> pattern.match('example.com') is not None
True
Match IP Address
----------------
A pattern for validating IPv4 addresses (0.0.0.0 to 255.255.255.255).
.. code-block:: python
>>> pattern = re.compile(r'''
... ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
... (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
... ''', re.X)
>>> pattern.match('192.168.1.1') is not None
True
>>> pattern.match('255.255.255.0') is not None
True
>>> pattern.match('256.0.0.0') is not None
False
Match MAC Address
-----------------
A pattern for validating MAC addresses in colon-separated format.
.. code-block:: python
>>> pattern = re.compile(r'^([0-9a-f]{2}:){5}[0-9a-f]{2}$', re.I)
>>> pattern.match('3c:38:51:05:03:1e') is not None
True
>>> pattern.match('AA:BB:CC:DD:EE:FF') is not None
True
Match Phone Number
------------------
Patterns for common phone number formats.
.. code-block:: python
>>> # US phone number
>>> pattern = re.compile(r'^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$')
>>> pattern.match('123-456-7890') is not None
True
>>> pattern.match('(123) 456-7890') is not None
True
>>> pattern.match('+1 123 456 7890') is not None
True
Match Password Strength
-----------------------
Pattern to validate password with minimum requirements: at least 8 characters,
one uppercase, one lowercase, one digit, and one special character.
.. code-block:: python
>>> pattern = re.compile(r'''
... ^(?=.*[a-z]) # at least one lowercase
... (?=.*[A-Z]) # at least one uppercase
... (?=.*\d) # at least one digit
... (?=.*[@$!%*?&]) # at least one special char
... [A-Za-z\d@$!%*?&]{8,}$ # at least 8 chars
... ''', re.X)
>>> pattern.match('Passw0rd!') is not None
True
>>> pattern.match('weakpass') is not None
False
Simple Lexer
------------
Using regex to build a simple tokenizer for arithmetic expressions. This
demonstrates using named groups and ``scanner()`` for lexical analysis.
.. code-block:: python
>>> from collections import namedtuple
>>> tokens = [
... r'(?P\d+)',
... r'(?P\+)',
... r'(?P-)',
... r'(?P\*)',
... r'(?P/)',
... r'(?P\s+)'
... ]
>>> lex = re.compile('|'.join(tokens))
>>> Token = namedtuple('Token', ['type', 'value'])
>>> def tokenize(text):
... scan = lex.scanner(text)
... return (Token(m.lastgroup, m.group())
... for m in iter(scan.match, None) if m.lastgroup != 'WS')
...
>>> list(tokenize('9 + 5 * 2'))
[Token(type='NUMBER', value='9'), Token(type='PLUS', value='+'), Token(type='NUMBER', value='5'), Token(type='TIMES', value='*'), Token(type='NUMBER', value='2')]
Common Patterns Reference
-------------------------
.. code-block:: python
# Digits only
r'^\d+$'
# Alphanumeric
r'^[a-zA-Z0-9]+$'
# Username (3-16 chars, alphanumeric, underscore, hyphen)
r'^[a-zA-Z0-9_-]{3,16}$'
# Hex color
r'^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$'
# Date (YYYY-MM-DD)
r'^\d{4}-\d{2}-\d{2}$'
# Time (HH:MM:SS)
r'^\d{2}:\d{2}:\d{2}$'
# Slug (URL-friendly string)
r'^[a-z0-9]+(?:-[a-z0-9]+)*$'
# Remove HTML tags
re.sub(r'<[^>]+>', '', html)
# Extract domain from URL
re.search(r'https?://([^/]+)', url).group(1)
# Find all hashtags
re.findall(r'#\w+', text)
# Find all @mentions
re.findall(r'@\w+', text)
================================================
FILE: docs/notes/basic/python-set.rst
================================================
.. meta::
:description lang=en: Python set cheat sheet covering set comprehensions, set operations (union, intersection, difference), removing duplicates, subsets, supersets, and frozenset with code examples
:keywords: Python, Python3, Python set, Python set cheat sheet, set comprehension, set operations, union, intersection, difference, symmetric difference, frozenset, subset, superset
===
Set
===
.. contents:: Table of Contents
:backlinks: none
Sets are unordered collections of unique elements in Python. They provide O(1)
average time complexity for membership testing and support mathematical set
operations like union, intersection, and difference. This cheat sheet covers
set comprehensions, common set operations, uniquifying lists, and the immutable
frozenset type.
The source code is available on `GitHub `_.
References
----------
- `Set Types — set, frozenset `_
- `Sets `_
Create a Set
------------
Create sets using curly braces ``{}`` or the ``set()`` constructor. Note that
empty curly braces ``{}`` create a dict, not a set.
.. code-block:: python
>>> s = {1, 2, 3}
>>> s
{1, 2, 3}
>>> s = set([1, 2, 2, 3])
>>> s
{1, 2, 3}
>>> empty = set() # not {}
>>> type(empty)
Create Sets with Set Comprehension
----------------------------------
Like list comprehensions, set comprehensions provide a concise way to create sets.
The syntax uses curly braces ``{}`` instead of square brackets.
.. code-block:: python
>>> a = [1, 2, 5, 6, 6, 6, 7]
>>> s = {x for x in a}
>>> s
{1, 2, 5, 6, 7}
>>> s = {x for x in a if x > 3}
>>> s
{5, 6, 7}
>>> s = {x ** 2 for x in range(5)}
>>> s
{0, 1, 4, 9, 16}
Remove Duplicates from a List
-----------------------------
Converting a list to a set automatically removes duplicate elements. This is one
of the most common use cases for sets.
.. code-block:: python
>>> a = [1, 2, 2, 2, 3, 4, 5, 5]
>>> list(set(a))
[1, 2, 3, 4, 5]
To preserve the original order, use ``dict.fromkeys()`` (Python 3.7+):
.. code-block:: python
>>> a = [3, 1, 2, 1, 3, 2]
>>> list(dict.fromkeys(a))
[3, 1, 2]
Add Items to a Set
------------------
Use ``add()`` to add a single element, or ``update()`` to add multiple elements.
.. code-block:: python
>>> s = {1, 2, 3}
>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.update([5, 6, 7])
>>> s
{1, 2, 3, 4, 5, 6, 7}
>>> s |= {8, 9} # same as update
>>> s
{1, 2, 3, 4, 5, 6, 7, 8, 9}
Remove Items from a Set
-----------------------
Use ``remove()`` to remove an element (raises KeyError if not found), or
``discard()`` to remove without error. Use ``pop()`` to remove an arbitrary element.
.. code-block:: python
>>> s = {1, 2, 3, 4, 5}
>>> s.remove(3)
>>> s
{1, 2, 4, 5}
>>> s.discard(10) # no error if not found
>>> s.pop() # remove arbitrary element
1
>>> s.clear() # remove all
>>> s
set()
Union with ``|`` Operator
-------------------------
The union of two sets contains all elements from both sets. Use the ``|`` operator
or the ``union()`` method.
.. code-block:: python
>>> a = {1, 2, 3}
>>> b = {3, 4, 5}
>>> a | b
{1, 2, 3, 4, 5}
>>> a.union(b)
{1, 2, 3, 4, 5}
>>> a | b | {6, 7} # multiple sets
{1, 2, 3, 4, 5, 6, 7}
Intersection with ``&`` Operator
--------------------------------
The intersection of two sets contains only elements that exist in both sets.
Use the ``&`` operator or the ``intersection()`` method.
.. code-block:: python
>>> a = {1, 2, 3, 4}
>>> b = {3, 4, 5, 6}
>>> a & b
{3, 4}
>>> a.intersection(b)
{3, 4}
Find Common Elements Between Lists
----------------------------------
Finding common items between two lists is a practical application of set
intersection.
.. code-block:: python
>>> a = [1, 1, 2, 3]
>>> b = [3, 5, 5, 6]
>>> list(set(a) & set(b))
[3]
Difference with ``-`` Operator
------------------------------
The difference of two sets contains elements that are in the first set but not
in the second. Use the ``-`` operator or the ``difference()`` method.
.. code-block:: python
>>> a = {1, 2, 3, 4}
>>> b = {3, 4, 5, 6}
>>> a - b
{1, 2}
>>> b - a
{5, 6}
Symmetric Difference with ``^`` Operator
----------------------------------------
The symmetric difference contains elements that are in either set, but not in
both. Use the ``^`` operator or the ``symmetric_difference()`` method.
.. code-block:: python
>>> a = {1, 2, 3}
>>> b = {3, 4, 5}
>>> a ^ b
{1, 2, 4, 5}
Check Subset with ``<=`` Operator
---------------------------------
Use ``<=`` or ``issubset()`` to check if all elements of one set are in another.
Use ``<`` for proper subset (subset but not equal).
.. code-block:: python
>>> a = {1, 2}
>>> b = {1, 2, 3, 4}
>>> a <= b # a is subset of b
True
>>> a < b # a is proper subset
True
>>> a <= a # equal sets
True
>>> a < a # not proper subset
False
Check Superset with ``>=`` Operator
-----------------------------------
Use ``>=`` or ``issuperset()`` to check if a set contains all elements of another.
.. code-block:: python
>>> a = {1, 2, 3, 4}
>>> b = {1, 2}
>>> a >= b # a is superset of b
True
>>> a > b # a is proper superset
True
Check Disjoint Sets
-------------------
Two sets are disjoint if they have no elements in common. Use ``isdisjoint()``
to check.
.. code-block:: python
>>> a = {1, 2, 3}
>>> b = {4, 5, 6}
>>> a.isdisjoint(b)
True
>>> c = {3, 4, 5}
>>> a.isdisjoint(c)
False
Membership Testing
------------------
Sets provide O(1) average time complexity for membership testing, making them
much faster than lists for this operation.
.. code-block:: python
>>> s = {1, 2, 3, 4, 5}
>>> 3 in s
True
>>> 10 in s
False
>>> 10 not in s
True
Frozenset - Immutable Set
-------------------------
``frozenset`` is an immutable version of set. It can be used as a dictionary key
or as an element of another set.
.. code-block:: python
>>> fs = frozenset([1, 2, 3])
>>> fs
frozenset({1, 2, 3})
>>> fs.add(4) # raises AttributeError
AttributeError: 'frozenset' object has no attribute 'add'
Use frozenset as dictionary key:
.. code-block:: python
>>> d = {frozenset([1, 2]): "a", frozenset([3, 4]): "b"}
>>> d[frozenset([1, 2])]
'a'
Use frozenset in a set:
.. code-block:: python
>>> s = {frozenset([1, 2]), frozenset([3, 4])}
>>> frozenset([1, 2]) in s
True
Set Operations Summary
----------------------
.. code-block:: python
# Creation
s = {1, 2, 3} # literal
s = set([1, 2, 3]) # from iterable
s = {x for x in range(5)} # comprehension
# Add/Remove
s.add(x) # add single element
s.update([x, y]) # add multiple elements
s.remove(x) # remove (KeyError if missing)
s.discard(x) # remove (no error if missing)
s.pop() # remove arbitrary element
s.clear() # remove all
# Set Operations
a | b # union
a & b # intersection
a - b # difference
a ^ b # symmetric difference
# Comparisons
a <= b # subset
a < b # proper subset
a >= b # superset
a > b # proper superset
a.isdisjoint(b) # no common elements
# Membership
x in s # O(1) lookup
x not in s
================================================
FILE: docs/notes/basic/python-typing.rst
================================================
.. meta::
:description lang=en: Python typing cheat sheet covering type hints, annotations, generics, protocols, TypeVar, and mypy type checking with code examples
:keywords: Python, Python3, Python typing, Python type hints cheat sheet, type annotations, generics, Protocol, TypeVar, mypy, static typing
======
Typing
======
.. contents:: Table of Contents
:backlinks: none
PEP `484 `_, which provides a
specification about what a type system should look like in Python3, introduced
the concept of type hints. Moreover, to better understand the type hints design
philosophy, it is crucial to read PEP `483 `_
that would be helpful to aid a pythoneer to understand reasons why Python
introduce a type system. The main goal of this cheat sheet is to show some
common usage about type hints in Python3.
Without type check
-------------------
.. code-block:: python
def fib(n):
a, b = 0, 1
for _ in range(n):
yield a
b, a = a + b, b
print([n for n in fib(3.6)])
output:
.. code-block:: bash
# errors will not be detected until runtime
$ python fib.py
Traceback (most recent call last):
File "fib.py", line 8, in
print([n for n in fib(3.5)])
File "fib.py", line 8, in
print([n for n in fib(3.5)])
File "fib.py", line 3, in fib
for _ in range(n):
TypeError: 'float' object cannot be interpreted as an integer
With type check
----------------
.. code-block:: python
# give a type hint
from typing import Generator
def fib(n: int) -> Generator:
a: int = 0
b: int = 1
for _ in range(n):
yield a
b, a = a + b, b
print([n for n in fib(3.6)])
output:
.. code-block:: bash
# errors will be detected before running
$ mypy --strict fib.py
fib.py:12: error: Argument 1 to "fib" has incompatible type "float"; expected "int"
Basic types
-----------
.. code-block:: python
import io
import re
from collections import deque, namedtuple
from typing import (
Dict,
List,
Tuple,
Set,
Deque,
NamedTuple,
IO,
Pattern,
Match,
Text,
Optional,
Sequence,
Iterable,
Mapping,
MutableMapping,
Any,
)
# without initializing
x: int
# any type
y: Any
y = 1
y = "1"
# built-in
var_int: int = 1
var_str: str = "Hello Typing"
var_byte: bytes = b"Hello Typing"
var_bool: bool = True
var_float: float = 1.
var_unicode: Text = u'\u2713'
# could be none
var_could_be_none: Optional[int] = None
var_could_be_none = 1
# collections
var_set: Set[int] = {i for i in range(3)}
var_dict: Dict[str, str] = {"foo": "Foo"}
var_list: List[int] = [i for i in range(3)]
var_static_length_Tuple: Tuple[int, int, int] = (1, 2, 3)
var_dynamic_length_Tuple: Tuple[int, ...] = (i for i in range(10, 3))
var_deque: Deque = deque([1, 2, 3])
var_nametuple: NamedTuple = namedtuple('P', ['x', 'y'])
# io
var_io_str: IO[str] = io.StringIO("Hello String")
var_io_byte: IO[bytes] = io.BytesIO(b"Hello Bytes")
var_io_file_str: IO[str] = open(__file__)
var_io_file_byte: IO[bytes] = open(__file__, 'rb')
# re
p: Pattern = re.compile("(https?)://([^/\r\n]+)(/[^\r\n]*)?")
m: Optional[Match] = p.match("https://www.python.org/")
# duck types: list-like
var_seq_list: Sequence[int] = [1, 2, 3]
var_seq_tuple: Sequence[int] = (1, 2, 3)
var_iter_list: Iterable[int] = [1, 2, 3]
var_iter_tuple: Iterable[int] = (1, 2, 3)
# duck types: dict-like
var_map_dict: Mapping[str, str] = {"foo": "Foo"}
var_mutable_dict: MutableMapping[str, str] = {"bar": "Bar"}
Functions
----------
.. code-block:: python
from typing import Generator, Callable
# function
def gcd(a: int, b: int) -> int:
while b:
a, b = b, a % b
return a
# callback
def fun(cb: Callable[[int, int], int]) -> int:
return cb(55, 66)
# lambda
f: Callable[[int], int] = lambda x: x * 2
Classes
--------
.. code-block:: python
from typing import ClassVar, Dict, List
class Foo:
x: int = 1 # instance variable. default = 1
y: ClassVar[str] = "class var" # class variable
def __init__(self) -> None:
self.i: List[int] = [0]
def foo(self, a: int, b: str) -> Dict[int, str]:
return {a: b}
foo = Foo()
foo.x = 123
print(foo.x)
print(foo.i)
print(Foo.y)
print(foo.foo(1, "abc"))
Generator
----------
.. code-block:: python
from typing import Generator
# Generator[YieldType, SendType, ReturnType]
def fib(n: int) -> Generator[int, None, None]:
a: int = 0
b: int = 1
while n > 0:
yield a
b, a = a + b, b
n -= 1
g: Generator = fib(10)
i: Iterator[int] = (x for x in range(3))
Asynchronous Generator
-----------------------
.. code-block:: python
import asyncio
from typing import AsyncGenerator, AsyncIterator
async def fib(n: int) -> AsyncGenerator:
a: int = 0
b: int = 1
while n > 0:
await asyncio.sleep(0.1)
yield a
b, a = a + b, b
n -= 1
async def main() -> None:
async for f in fib(10):
print(f)
ag: AsyncIterator = (f async for f in fib(10))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Context Manager
---------------
.. code-block:: python
from typing import ContextManager, Generator, IO
from contextlib import contextmanager
@contextmanager
def open_file(name: str) -> Generator:
f = open(name)
yield f
f.close()
cm: ContextManager[IO] = open_file(__file__)
with cm as f:
print(f.read())
Asynchronous Context Manager
-----------------------------
.. code-block:: python
import asyncio
from typing import AsyncContextManager, AsyncGenerator, IO
from contextlib import asynccontextmanager
# need python 3.7 or above
@asynccontextmanager
async def open_file(name: str) -> AsyncGenerator:
await asyncio.sleep(0.1)
f = open(name)
yield f
await asyncio.sleep(0.1)
f.close()
async def main() -> None:
acm: AsyncContextManager[IO] = open_file(__file__)
async with acm as f:
print(f.read())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Avoid ``None`` access
----------------------
.. code-block:: python
import re
from typing import Pattern, Dict, Optional
# like c++
# std::regex url("(https?)://([^/\r\n]+)(/[^\r\n]*)?");
# std::regex color("^#?([a-f0-9]{6}|[a-f0-9]{3})$");
url: Pattern = re.compile("(https?)://([^/\r\n]+)(/[^\r\n]*)?")
color: Pattern = re.compile("^#?([a-f0-9]{6}|[a-f0-9]{3})$")
x: Dict[str, Pattern] = {"url": url, "color": color}
y: Optional[Pattern] = x.get("baz", None)
print(y.match("https://www.python.org/"))
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:15: error: Item "None" of "Optional[Pattern[Any]]" has no attribute "match"
Positional-only arguments
--------------------------
.. code-block:: python
# define arguments with names beginning with __
def fib(__n: int) -> int: # positional only arg
a, b = 0, 1
for _ in range(__n):
b, a = a + b, b
return a
def gcd(*, a: int, b: int) -> int: # keyword only arg
while b:
a, b = b, a % b
return a
print(fib(__n=10)) # error
print(gcd(10, 5)) # error
output:
.. code-block:: bash
mypy --strict foo.py
foo.py:1: note: "fib" defined here
foo.py:14: error: Unexpected keyword argument "__n" for "fib"
foo.py:15: error: Too many positional arguments for "gcd"
Multiple return values
-----------------------
.. code-block:: python
from typing import Tuple, Iterable, Union
def foo(x: int, y: int) -> Tuple[int, int]:
return x, y
# or
def bar(x: int, y: str) -> Iterable[Union[int, str]]:
# XXX: not recommend declaring in this way
return x, y
a: int
b: int
a, b = foo(1, 2) # ok
c, d = bar(3, "bar") # ok
Optional Type
----------------------------------
.. code-block:: python
from typing import List, Union
def first(l: List[Union[int, None]]) -> Union[int, None]:
return None if len(l) == 0 else l[0]
first([None])
# equal to
from typing import List, Optional
def first(l: List[Optional[int]]) -> Optional[int]:
return None if len(l) == 0 else l[0]
first([None])
Be careful of ``Optional``
---------------------------
.. code-block:: python
from typing import cast, Optional
def fib(n):
a, b = 0, 1
for _ in range(n):
b, a = a + b, b
return a
def cal(n: Optional[int]) -> None:
print(fib(n))
cal(None)
output:
.. code-block:: bash
# mypy will not detect errors
$ mypy foo.py
Explicitly declare
.. code-block:: python
from typing import Optional
def fib(n: int) -> int: # declare n to be int
a, b = 0, 1
for _ in range(n):
b, a = a + b, b
return a
def cal(n: Optional[int]) -> None:
print(fib(n))
output:
.. code-block:: bash
# mypy can detect errors even we do not check None
$ mypy --strict foo.py
foo.py:11: error: Argument 1 to "fib" has incompatible type "Optional[int]"; expected "int"
Be careful of casting
----------------------
.. code-block:: python
from typing import cast, Optional
def gcd(a: int, b: int) -> int:
while b:
a, b = b, a % b
return a
def cal(a: Optional[int], b: Optional[int]) -> None:
# XXX: Avoid casting
ca, cb = cast(int, a), cast(int, b)
print(gcd(ca, cb))
cal(None, None)
output:
.. code-block:: bash
# mypy will not detect type errors
$ mypy --strict foo.py
Forward references
-------------------
Based on PEP 484, if we want to reference a type before it has been declared, we
have to use **string literal** to imply that there is a type of that name later on
in the file.
.. code-block:: python
from typing import Optional
class Tree:
def __init__(
self, data: int,
left: Optional["Tree"], # Forward references.
right: Optional["Tree"]
) -> None:
self.data = data
self.left = left
self.right = right
.. note::
There are some issues that mypy does not complain about Forward References.
Get further information from `Issue#948`_.
.. _Issue\#948: https://github.com/python/mypy/issues/948
.. code-block:: python
class A:
def __init__(self, a: A) -> None: # should fail
self.a = a
output:
.. code-block:: bash
$ mypy --strict type.py
$ echo $?
0
$ python type.py # get runtime fail
Traceback (most recent call last):
File "type.py", line 1, in
class A:
File "type.py", line 2, in A
def __init__(self, a: A) -> None: # should fail
NameError: name 'A' is not defined
Postponed Evaluation of Annotations
-----------------------------------
**New in Python 3.7**
- PEP 563_ - Postponed Evaluation of Annotations
.. _563: https://www.python.org/dev/peps/pep-0563/
Before Python 3.7
.. code-block:: python
>>> class A:
... def __init__(self, a: A) -> None:
... self._a = a
...
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in A
NameError: name 'A' is not defined
After Python 3.7 (include 3.7)
.. code-block:: python
>>> from __future__ import annotations
>>> class A:
... def __init__(self, a: A) -> None:
... self._a = a
...
.. note::
Annotation can only be used within the scope which names have already
existed. Therefore, **forward reference** does not support the case which
names are not available in the current scope. **Postponed evaluation
of annotations** will become the default behavior in Python 4.0.
Type Alias
----------
Like ``typedef`` or ``using`` in c/c++
.. code-block:: cpp
#include
#include
#include
#include
typedef std::string Url;
template using Vector = std::vector;
int main(int argc, char *argv[])
{
Url url = "https://python.org";
std::regex p("(https?)://([^/\r\n]+)(/[^\r\n]*)?");
bool m = std::regex_match(url, p);
Vector v = {1, 2};
std::cout << m << std::endl;
for (auto it : v) std::cout << it << std::endl;
return 0;
}
Type aliases are defined by simple variable assignments
.. code-block:: python
import re
from typing import Pattern, List
# Like typedef, using in c/c++
# PEP 484 recommend capitalizing alias names
Url = str
url: Url = "https://www.python.org/"
p: Pattern = re.compile("(https?)://([^/\r\n]+)(/[^\r\n]*)?")
m = p.match(url)
Vector = List[int]
v: Vector = [1., 2.]
Using NewType
---------------------
Unlike alias, ``NewType`` returns a separate type but is identical to the original type at runtime.
.. code-block:: python
from sqlalchemy import Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from typing import NewType, Any
# check mypy #2477
Base: Any = declarative_base()
# create a new type
Id = NewType('Id', int) # not equal alias, it's a 'new type'
class User(Base):
__tablename__ = 'User'
id = Column(Integer, primary_key=True)
age = Column(Integer, nullable=False)
name = Column(String, nullable=False)
def __init__(self, id: Id, age: int, name: str) -> None:
self.id = id
self.age = age
self.name = name
# create users
user1 = User(Id(1), 62, "Guido van Rossum") # ok
user2 = User(2, 48, "David M. Beazley") # error
output:
.. code-block:: bash
$ python foo.py
$ mypy --ignore-missing-imports foo.py
foo.py:24: error: Argument 1 to "User" has incompatible type "int"; expected "Id"
Further reading:
- `Issue\#1284`_
.. _`Issue\#1284`: https://github.com/python/mypy/issues/1284
Using ``TypeVar`` as template
------------------------------
Like c++ ``template ``
.. code-block:: cpp
#include
template
T add(T x, T y) {
return x + y;
}
int main(int argc, char *argv[])
{
std::cout << add(1, 2) << std::endl;
std::cout << add(1., 2.) << std::endl;
return 0;
}
Python using ``TypeVar``
.. code-block:: python
from typing import TypeVar
T = TypeVar("T")
def add(x: T, y: T) -> T:
return x + y
add(1, 2)
add(1., 2.)
Using ``TypeVar`` and ``Generic`` as class template
----------------------------------------------------
Like c++ ``template class``
.. code-block:: cpp
#include
template
class Foo {
public:
Foo(T foo) {
foo_ = foo;
}
T Get() {
return foo_;
}
private:
T foo_;
};
int main(int argc, char *argv[])
{
Foo f(123);
std::cout << f.Get() << std::endl;
return 0;
}
Define a generic class in Python
.. code-block:: python
from typing import Generic, TypeVar
T = TypeVar("T")
class Foo(Generic[T]):
def __init__(self, foo: T) -> None:
self.foo = foo
def get(self) -> T:
return self.foo
f: Foo[str] = Foo("Foo")
v: int = f.get()
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:13: error: Incompatible types in assignment (expression has type "str", variable has type "int")
Scoping rules for ``TypeVar``
------------------------------
- ``TypeVar`` used in different generic function will be inferred to be different types.
.. code-block:: python
from typing import TypeVar
T = TypeVar("T")
def foo(x: T) -> T:
return x
def bar(y: T) -> T:
return y
a: int = foo(1) # ok: T is inferred to be int
b: int = bar("2") # error: T is inferred to be str
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:12: error: Incompatible types in assignment (expression has type "str", variable has type "int")
- ``TypeVar`` used in a generic class will be inferred to be same types.
.. code-block:: python
from typing import TypeVar, Generic
T = TypeVar("T")
class Foo(Generic[T]):
def foo(self, x: T) -> T:
return x
def bar(self, y: T) -> T:
return y
f: Foo[int] = Foo()
a: int = f.foo(1) # ok: T is inferred to be int
b: str = f.bar("2") # error: T is expected to be int
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:15: error: Incompatible types in assignment (expression has type "int", variable has type "str")
foo.py:15: error: Argument 1 to "bar" of "Foo" has incompatible type "str"; expected "int"
- ``TypeVar`` used in a method but did not match any parameters which declare in ``Generic`` can be inferred to be different types.
.. code-block:: python
from typing import TypeVar, Generic
T = TypeVar("T")
S = TypeVar("S")
class Foo(Generic[T]): # S does not match params
def foo(self, x: T, y: S) -> S:
return y
def bar(self, z: S) -> S:
return z
f: Foo[int] = Foo()
a: str = f.foo(1, "foo") # S is inferred to be str
b: int = f.bar(12345678) # S is inferred to be int
output:
.. code-block:: bash
$ mypy --strict foo.py
- ``TypeVar`` should not appear in body of method/function if it is unbound type.
.. code-block:: python
from typing import TypeVar, Generic
T = TypeVar("T")
S = TypeVar("S")
def foo(x: T) -> None:
a: T = x # ok
b: S = 123 # error: invalid type
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:8: error: Invalid type "foo.S"
Restricting to a fixed set of possible types
----------------------------------------------
``T = TypeVar('T', ClassA, ...)`` means we create a **type variable with a value restriction**.
.. code-block:: python
from typing import TypeVar
# restrict T = int or T = float
T = TypeVar("T", int, float)
def add(x: T, y: T) -> T:
return x + y
add(1, 2)
add(1., 2.)
add("1", 2)
add("hello", "world")
output:
.. code-block:: bash
# mypy can detect wrong type
$ mypy --strict foo.py
foo.py:10: error: Value of type variable "T" of "add" cannot be "object"
foo.py:11: error: Value of type variable "T" of "add" cannot be "str"
``TypeVar`` with an upper bound
--------------------------------
``T = TypeVar('T', bound=BaseClass)`` means we create a **type variable with an upper bound**.
The concept is similar to **polymorphism** in c++.
.. code-block:: cpp
#include
class Shape {
public:
Shape(double width, double height) {
width_ = width;
height_ = height;
};
virtual double Area() = 0;
protected:
double width_;
double height_;
};
class Rectangle: public Shape {
public:
Rectangle(double width, double height)
:Shape(width, height)
{};
double Area() {
return width_ * height_;
};
};
class Triangle: public Shape {
public:
Triangle(double width, double height)
:Shape(width, height)
{};
double Area() {
return width_ * height_ / 2;
};
};
double Area(Shape &s) {
return s.Area();
}
int main(int argc, char *argv[])
{
Rectangle r(1., 2.);
Triangle t(3., 4.);
std::cout << Area(r) << std::endl;
std::cout << Area(t) << std::endl;
return 0;
}
Like c++, create a base class and ``TypeVar`` which bounds to the base class.
Then, static type checker will take every subclass as type of base class.
.. code-block:: python
from typing import TypeVar
class Shape:
def __init__(self, width: float, height: float) -> None:
self.width = width
self.height = height
def area(self) -> float:
return 0
class Rectangle(Shape):
def area(self) -> float:
width: float = self.width
height: float = self.height
return width * height
class Triangle(Shape):
def area(self) -> float:
width: float = self.width
height: float = self.height
return width * height / 2
S = TypeVar("S", bound=Shape)
def area(s: S) -> float:
return s.area()
r: Rectangle = Rectangle(1, 2)
t: Triangle = Triangle(3, 4)
i: int = 5566
print(area(r))
print(area(t))
print(area(i))
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:40: error: Value of type variable "S" of "area" cannot be "int"
@overload
----------
Sometimes, we use ``Union`` to infer that the return of a function has multiple
different types. However, type checker cannot distinguish which type do we want.
Therefore, following snippet shows that type checker cannot determine which type
is correct.
.. code-block:: python
from typing import List, Union
class Array(object):
def __init__(self, arr: List[int]) -> None:
self.arr = arr
def __getitem__(self, i: Union[int, str]) -> Union[int, str]:
if isinstance(i, int):
return self.arr[i]
if isinstance(i, str):
return str(self.arr[int(i)])
arr = Array([1, 2, 3, 4, 5])
x:int = arr[1]
y:str = arr["2"]
output:
.. code-block:: bash
$ mypy --strict foo.py
foo.py:16: error: Incompatible types in assignment (expression has type "Union[int, str]", variable has type "int")
foo.py:17: error: Incompatible types in assignment (expression has type "Union[int, str]", variable has type "str")
Although we can use ``cast`` to solve the problem, it cannot avoid typo and ``cast`` is not safe.
.. code-block:: python
from typing import List, Union, cast
class Array(object):
def __init__(self, arr: List[int]) -> None:
self.arr = arr
def __getitem__(self, i: Union[int, str]) -> Union[int, str]:
if isinstance(i, int):
return self.arr[i]
if isinstance(i, str):
return str(self.arr[int(i)])
arr = Array([1, 2, 3, 4, 5])
x: int = cast(int, arr[1])
y: str = cast(str, arr[2]) # typo. we want to assign arr["2"]
output:
.. code-block:: bash
$ mypy --strict foo.py
$ echo $?
0
Using ``@overload`` can solve the problem. We can declare the return type explicitly.
.. code-block:: python
from typing import Generic, List, Union, overload
class Array(object):
def __init__(self, arr: List[int]) -> None:
self.arr = arr
@overload
def __getitem__(self, i: str) -> str:
...
@overload
def __getitem__(self, i: int) -> int:
...
def __getitem__(self, i: Union[int, str]) -> Union[int, str]:
if isinstance(i, int):
return self.arr[i]
if isinstance(i, str):
return str(self.arr[int(i)])
arr = Array([1, 2, 3, 4, 5])
x: int = arr[1]
y: str = arr["2"]
output:
.. code-block:: bash
$ mypy --strict foo.py
$ echo $?
0
.. warning::
Based on PEP 484, the ``@overload`` decorator just **for type checker only**, it does not implement
the real overloading like c++/java. Thus, we have to implement one exactly non-``@overload``
function. At the runtime, calling the ``@overload`` function will raise ``NotImplementedError``.
.. code-block:: python
from typing import List, Union, overload
class Array(object):
def __init__(self, arr: List[int]) -> None:
self.arr = arr
@overload
def __getitem__(self, i: Union[int, str]) -> Union[int, str]:
if isinstance(i, int):
return self.arr[i]
if isinstance(i, str):
return str(self.arr[int(i)])
arr = Array([1, 2, 3, 4, 5])
try:
x: int = arr[1]
except NotImplementedError as e:
print("NotImplementedError")
output:
.. code-block:: bash
$ python foo.py
NotImplementedError
Stub Files
----------
Stub files just like header files which we usually use to define our interfaces in c/c++.
In python, we can define our interfaces in the same module directory or ``export MYPYPATH=${stubs}``
First, we need to create a stub file (interface file) for module.
.. code-block:: bash
$ mkdir fib
$ touch fib/__init__.py fib/__init__.pyi
Then, define the interface of the function in ``__init__.pyi`` and implement the module.
.. code-block:: python
# fib/__init__.pyi
def fib(n: int) -> int: ...
# fib/__init__.py
def fib(n):
a, b = 0, 1
for _ in range(n):
b, a = a + b, b
return a
Then, write a test.py for testing ``fib`` module.
.. code-block:: python
# touch test.py
import sys
from pathlib import Path
p = Path(__file__).parent / "fib"
sys.path.append(str(p))
from fib import fib
print(fib(10.0))
output:
.. code-block:: bash
$ mypy --strict test.py
test.py:10: error: Argument 1 to "fib" has incompatible type "float"; expected "int"
================================================
FILE: docs/notes/basic/python-unicode.rst
================================================
.. meta::
:description lang=en: Python Unicode tutorial covering string encoding, decoding, UTF-8, ASCII, bytes conversion, and character handling in Python 3
:keywords: Python, Python3, Unicode, UTF-8, encoding, decoding, bytes, string, ASCII, character, codec
=======
Unicode
=======
.. contents:: Table of Contents
:backlinks: none
The main goal of this cheat sheet is to collect some common snippets which are
related to Unicode. In Python 3, strings are represented by Unicode instead of
bytes. Further information can be found on PEP `3100 `_
**ASCII** code is the most well-known standard which defines numeric codes
for characters. The numeric values only define 128 characters originally,
so ASCII only contains control codes, digits, lowercase letters, uppercase
letters, etc. However, it is not enough for us to represent characters such as
accented characters, Chinese characters, or emoji existed around the world.
Therefore, **Unicode** was developed to solve this issue. It defines the
*code point* to represent various characters like ASCII but the number of
characters is up to 1,111,998.
String
------
In Python 2, strings are represented in *bytes*, not *Unicode*. Python provides
different types of string such as Unicode string, raw string, and so on.
In this case, if we want to declare a Unicode string, we add ``u`` prefix for
string literals.
.. code-block:: python
>>> s = 'Café' # byte string
>>> s
'Caf\xc3\xa9'
>>> type(s)
>>> u = u'Café' # unicode string
>>> u
u'Caf\xe9'
>>> type(u)
In Python 3, strings are represented in *Unicode*. If we want to represent a
byte string, we add the ``b`` prefix for string literals. Note that the early
Python versions (3.0-3.2) do not support the ``u`` prefix. In order to ease
the pain to migrate Unicode aware applications from Python 2, Python 3.3 once
again supports the ``u`` prefix for string literals. Further information can
be found on PEP `414 `_
.. code-block:: python
>>> s = 'Café'
>>> type(s)
>>> s
'Café'
>>> s.encode('utf-8')
b'Caf\xc3\xa9'
>>> s.encode('utf-8').decode('utf-8')
'Café'
Characters
----------
Python 2 takes all string characters as bytes. In this case, the length of
strings may be not equivalent to the number of characters. For example,
the length of ``Café`` is 5, not 4 because ``é`` is encoded as a 2 bytes
character.
.. code-block:: python
>>> s= 'Café'
>>> print([_c for _c in s])
['C', 'a', 'f', '\xc3', '\xa9']
>>> len(s)
5
>>> s = u'Café'
>>> print([_c for _c in s])
[u'C', u'a', u'f', u'\xe9']
>>> len(s)
4
Python 3 takes all string characters as Unicode code point. The lenght of
a string is always equivalent to the number of characters.
.. code-block:: python
>>> s = 'Café'
>>> print([_c for _c in s])
['C', 'a', 'f', 'é']
>>> len(s)
4
>>> bs = bytes(s, encoding='utf-8')
>>> print(bs)
b'Caf\xc3\xa9'
>>> len(bs)
5
Porting unicode(s, 'utf-8')
---------------------------
The `unicode() `_
built-in function was removed in Python 3 so what is the best way to convert
the expression ``unicode(s, 'utf-8')`` so it works in both Python 2 and 3?
In Python 2:
.. code-block:: python
>>> s = 'Café'
>>> unicode(s, 'utf-8')
u'Caf\xe9'
>>> s.decode('utf-8')
u'Caf\xe9'
>>> unicode(s, 'utf-8') == s.decode('utf-8')
True
In Python 3:
.. code-block:: python
>>> s = 'Café'
>>> s.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'
So, the real answer is...
Unicode Code Point
------------------
`ord `_ is a powerful
built-in function to get a Unicode code point from a given character.
Consequently, If we want to check a Unicode code point of a character, we can
use ``ord``.
.. code-block:: python
>>> s = u'Café'
>>> for _c in s: print('U+%04x' % ord(_c))
...
U+0043
U+0061
U+0066
U+00e9
>>> u = '中文'
>>> for _c in u: print('U+%04x' % ord(_c))
...
U+4e2d
U+6587
Encoding
--------
A *Unicode code point* transfers to a *byte string* is called encoding.
.. code-block:: python
>>> s = u'Café'
>>> type(s.encode('utf-8'))
Decoding
---------
A *byte string* transfers to a *Unicode code point* is called decoding.
.. code-block:: python
>>> s = bytes('Café', encoding='utf-8')
>>> s.decode('utf-8')
'Café'
Unicode Normalization
---------------------
Some characters can be represented in two similar form. For example, the
character, ``é`` can be written as ``e ́`` (Canonical Decomposition) or ``é``
(Canonical Composition). In this case, we may acquire unexpected results when we
are comparing two strings even though they look alike. Therefore, we can
normalize a Unicode form to solve the issue.
.. code-block:: python
# python 3
>>> u1 = 'Café' # unicode string
>>> u2 = 'Cafe\u0301'
>>> u1, u2
('Café', 'Café')
>>> len(u1), len(u2)
(4, 5)
>>> u1 == u2
False
>>> u1.encode('utf-8') # get u1 byte string
b'Caf\xc3\xa9'
>>> u2.encode('utf-8') # get u2 byte string
b'Cafe\xcc\x81'
>>> from unicodedata import normalize
>>> s1 = normalize('NFC', u1) # get u1 NFC format
>>> s2 = normalize('NFC', u2) # get u2 NFC format
>>> s1 == s2
True
>>> s1.encode('utf-8'), s2.encode('utf-8')
(b'Caf\xc3\xa9', b'Caf\xc3\xa9')
>>> s1 = normalize('NFD', u1) # get u1 NFD format
>>> s2 = normalize('NFD', u2) # get u2 NFD format
>>> s1, s2
('Café', 'Café')
>>> s1 == s2
True
>>> s1.encode('utf-8'), s2.encode('utf-8')
(b'Cafe\xcc\x81', b'Cafe\xcc\x81')
Avoid ``UnicodeDecodeError``
----------------------------
Python raises `UnicodeDecodeError` when byte strings cannot decode to Unicode
code points. If we want to avoid this exception, we can pass *replace*,
*backslashreplace*, or *ignore* to errors argument in `decode `_.
.. code-block:: python
>>> u = b"\xff"
>>> u.decode('utf-8', 'strict')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
>>> # use U+FFFD, REPLACEMENT CHARACTER
>>> u.decode('utf-8', "replace")
'\ufffd'
>>> # inserts a \xNN escape sequence
>>> u.decode('utf-8', "backslashreplace")
'\\xff'
>>> # leave the character out of the Unicode result
>>> u.decode('utf-8', "ignore")
''
Long String
-----------
The following snippet shows common ways to declare a multi-line string in
Python.
.. code-block:: python
# original long string
s = 'This is a very very very long python string'
# Single quote with an escaping backslash
s = "This is a very very very " \
"long python string"
# Using brackets
s = (
"This is a very very very "
"long python string"
)
# Using ``+``
s = (
"This is a very very very " +
"long python string"
)
# Using triple-quote with an escaping backslash
s = '''This is a very very very \
long python string'''
================================================
FILE: docs/notes/concurrency/index.rst
================================================
.. meta::
:description lang=en: Python concurrency tutorial covering threading, multiprocessing, locks, semaphores, queues, process pools, and concurrent.futures
:keywords: Python, Python3, threading, multiprocessing, concurrency, parallel, lock, semaphore, queue, ThreadPoolExecutor, ProcessPoolExecutor, GIL
Concurrency
===========
Python provides multiple approaches for concurrent execution to handle CPU-bound
and I/O-bound tasks efficiently. The ``threading`` module enables lightweight
concurrent execution within a single process, while ``multiprocessing`` bypasses
the Global Interpreter Lock (GIL) by using separate processes for true parallelism.
The ``concurrent.futures`` module offers a high-level interface that abstracts
the differences between threads and processes behind a unified API.
Understanding when to use each approach is crucial: threads excel at I/O-bound
tasks (network requests, file operations) where the GIL is released during
waiting, while processes are better for CPU-bound tasks (computation, data
processing) where true parallel execution is needed.
.. toctree::
:maxdepth: 1
python-threading
python-multiprocessing
python-futures
================================================
FILE: docs/notes/concurrency/python-futures.rst
================================================
.. meta::
:description lang=en: Python concurrent.futures tutorial covering ThreadPoolExecutor, ProcessPoolExecutor, Future objects, callbacks, and high-level parallel execution patterns
:keywords: Python, Python3, concurrent.futures, ThreadPoolExecutor, ProcessPoolExecutor, Future, executor, submit, map, as_completed, parallel
==================
concurrent.futures
==================
:Source: `src/basic/concurrency_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
The ``concurrent.futures`` module provides a high-level interface for
asynchronously executing callables using threads or processes. It abstracts
the differences between threading and multiprocessing behind a unified API,
making it easy to switch between them. The module introduces two key concepts:
**Executors** that manage pools of workers, and **Futures** that represent
the eventual result of an asynchronous operation.
ThreadPoolExecutor Basics
-------------------------
``ThreadPoolExecutor`` manages a pool of threads that execute tasks concurrently.
Use it for I/O-bound tasks like network requests, file operations, or database
queries where threads spend time waiting for external resources.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor
import time
def fetch_url(url):
"""Simulate fetching a URL."""
time.sleep(1) # Simulate network delay
return f"Content from {url}"
urls = ["http://site1.com", "http://site2.com", "http://site3.com"]
# Sequential - takes ~3 seconds
start = time.time()
results = [fetch_url(url) for url in urls]
print(f"Sequential: {time.time() - start:.2f}s")
# Concurrent - takes ~1 second
start = time.time()
with ThreadPoolExecutor(max_workers=3) as executor:
results = list(executor.map(fetch_url, urls))
print(f"Concurrent: {time.time() - start:.2f}s")
ProcessPoolExecutor Basics
--------------------------
``ProcessPoolExecutor`` manages a pool of processes for true parallel execution.
Use it for CPU-bound tasks like data processing, calculations, or image
manipulation where you need to utilize multiple CPU cores.
.. code-block:: python
from concurrent.futures import ProcessPoolExecutor
import time
def cpu_intensive(n):
"""CPU-bound computation."""
return sum(i * i for i in range(n))
if __name__ == "__main__":
numbers = [10**7] * 4
# Sequential
start = time.time()
results = [cpu_intensive(n) for n in numbers]
print(f"Sequential: {time.time() - start:.2f}s")
# Parallel with processes
start = time.time()
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(cpu_intensive, numbers))
print(f"Parallel: {time.time() - start:.2f}s")
Using submit() and Future Objects
---------------------------------
The ``submit()`` method schedules a callable and returns a ``Future`` object
immediately. The Future represents the pending result and provides methods to
check status, get the result, or cancel the task. This gives more control than
``map()`` for handling individual tasks.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor
import time
def task(name, duration):
time.sleep(duration)
return f"{name} completed in {duration}s"
with ThreadPoolExecutor(max_workers=3) as executor:
# Submit tasks - returns Future immediately
future1 = executor.submit(task, "Task A", 2)
future2 = executor.submit(task, "Task B", 1)
future3 = executor.submit(task, "Task C", 3)
# Check if done (non-blocking)
print(f"Task A done: {future1.done()}")
# Get result (blocking)
print(future2.result()) # Waits for completion
print(future1.result())
print(future3.result())
Processing Results as They Complete
-----------------------------------
``as_completed()`` yields futures as they complete, regardless of submission
order. This is useful when you want to process results as soon as they're
available rather than waiting for all tasks to finish.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
import random
def fetch_data(source_id):
delay = random.uniform(0.5, 2.0)
time.sleep(delay)
return f"Data from source {source_id} (took {delay:.2f}s)"
sources = range(5)
with ThreadPoolExecutor(max_workers=5) as executor:
# Submit all tasks
future_to_source = {
executor.submit(fetch_data, src): src
for src in sources
}
# Process results as they complete
for future in as_completed(future_to_source):
source = future_to_source[future]
try:
result = future.result()
print(f"Source {source}: {result}")
except Exception as e:
print(f"Source {source} failed: {e}")
Using wait() for Completion Control
-----------------------------------
``wait()`` blocks until specified futures complete. You can wait for all tasks,
the first task, or the first exception. This provides fine-grained control
over when to proceed.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED, ALL_COMPLETED
import time
def task(task_id, duration):
time.sleep(duration)
return f"Task {task_id} done"
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [
executor.submit(task, 1, 3),
executor.submit(task, 2, 1),
executor.submit(task, 3, 2),
]
# Wait for first to complete
done, not_done = wait(futures, return_when=FIRST_COMPLETED)
print(f"First completed: {done.pop().result()}")
print(f"Still running: {len(not_done)}")
# Wait for all remaining
done, not_done = wait(not_done, return_when=ALL_COMPLETED)
for f in done:
print(f"Completed: {f.result()}")
Adding Callbacks to Futures
---------------------------
Callbacks are functions that execute automatically when a future completes.
They're useful for processing results without blocking the main thread or
for chaining operations. The callback receives the future as its argument.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor
import time
def compute(n):
time.sleep(1)
return n * n
def on_complete(future):
"""Callback executed when future completes."""
try:
result = future.result()
print(f"Callback: result is {result}")
except Exception as e:
print(f"Callback: task failed with {e}")
with ThreadPoolExecutor(max_workers=3) as executor:
for i in range(5):
future = executor.submit(compute, i)
future.add_done_callback(on_complete)
# Main thread continues while callbacks fire
print("Main thread: tasks submitted")
time.sleep(2)
print("Main thread: done waiting")
Exception Handling
------------------
Exceptions raised in tasks are captured and re-raised when you call
``result()``. You can also check for exceptions using ``exception()``.
Always handle exceptions to prevent silent failures.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor, as_completed
def risky_task(n):
if n == 3:
raise ValueError(f"Bad value: {n}")
return n * 2
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {executor.submit(risky_task, i): i for i in range(5)}
for future in as_completed(futures):
n = futures[future]
try:
result = future.result()
print(f"Task {n}: {result}")
except ValueError as e:
print(f"Task {n} failed: {e}")
# Alternative: check exception without raising
future = executor.submit(risky_task, 3)
future.result() # Wait for completion
if future.exception() is not None:
print(f"Exception occurred: {future.exception()}")
Timeout Handling
----------------
Both ``result()`` and ``as_completed()`` accept timeout parameters. If a task
doesn't complete within the timeout, a ``TimeoutError`` is raised. This
prevents indefinite blocking on slow or stuck tasks.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor, TimeoutError, as_completed
import time
def slow_task(duration):
time.sleep(duration)
return f"Completed after {duration}s"
with ThreadPoolExecutor(max_workers=2) as executor:
future = executor.submit(slow_task, 5)
try:
# Wait max 2 seconds for result
result = future.result(timeout=2)
print(result)
except TimeoutError:
print("Task timed out!")
# Note: task continues running in background
# Timeout with as_completed
futures = [executor.submit(slow_task, i) for i in [1, 3, 5]]
try:
for future in as_completed(futures, timeout=2):
print(future.result())
except TimeoutError:
print("Some tasks didn't complete in time")
Cancelling Tasks
----------------
Tasks can be cancelled before they start executing using ``cancel()``. Once
a task has started, it cannot be cancelled. Check ``cancelled()`` to see if
cancellation succeeded.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor
import time
def long_task(n):
time.sleep(2)
return n
with ThreadPoolExecutor(max_workers=1) as executor:
# Submit multiple tasks to single worker
future1 = executor.submit(long_task, 1)
future2 = executor.submit(long_task, 2) # Queued, not started
future3 = executor.submit(long_task, 3) # Queued, not started
time.sleep(0.1) # Let first task start
# Try to cancel queued tasks
cancelled2 = future2.cancel()
cancelled3 = future3.cancel()
print(f"Future 2 cancelled: {cancelled2}") # True
print(f"Future 3 cancelled: {cancelled3}") # True
print(f"Future 1 cancelled: {future1.cancel()}") # False (already running)
Executor Context Manager
------------------------
Using executors as context managers (``with`` statement) ensures proper cleanup.
When exiting the context, ``shutdown(wait=True)`` is called automatically,
which waits for all pending tasks to complete before returning.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor
import time
def task(n):
time.sleep(1)
return n * 2
# Context manager - automatic cleanup
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(task, i) for i in range(5)]
# Executor waits for all tasks when exiting 'with' block
print("All tasks completed")
# Manual management (not recommended)
executor = ThreadPoolExecutor(max_workers=3)
try:
futures = [executor.submit(task, i) for i in range(5)]
finally:
executor.shutdown(wait=True) # Must call explicitly
Map with Chunking
-----------------
For large iterables, ``map()`` can be more efficient with chunking. The
``chunksize`` parameter groups items together, reducing overhead from
inter-process communication when using ``ProcessPoolExecutor``.
.. code-block:: python
from concurrent.futures import ProcessPoolExecutor
import time
def process_item(x):
return x * x
if __name__ == "__main__":
items = range(100000)
# Without chunking - more IPC overhead
start = time.time()
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(process_item, items))
print(f"No chunking: {time.time() - start:.2f}s")
# With chunking - less IPC overhead
start = time.time()
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(process_item, items, chunksize=1000))
print(f"With chunking: {time.time() - start:.2f}s")
Real-World Example: Parallel Downloads
--------------------------------------
This example demonstrates a practical use case: downloading multiple files
concurrently with progress tracking, error handling, and timeout management.
.. code-block:: python
from concurrent.futures import ThreadPoolExecutor, as_completed
import urllib.request
import time
def download(url, timeout=10):
"""Download URL content with timeout."""
try:
with urllib.request.urlopen(url, timeout=timeout) as response:
content = response.read()
return url, len(content), None
except Exception as e:
return url, 0, str(e)
urls = [
"https://www.python.org",
"https://www.github.com",
"https://www.google.com",
"https://httpbin.org/delay/5", # Slow endpoint
]
print("Starting downloads...")
start = time.time()
with ThreadPoolExecutor(max_workers=4) as executor:
future_to_url = {executor.submit(download, url): url for url in urls}
for future in as_completed(future_to_url, timeout=15):
url, size, error = future.result()
if error:
print(f"FAILED: {url} - {error}")
else:
print(f"OK: {url} - {size} bytes")
print(f"Total time: {time.time() - start:.2f}s")
================================================
FILE: docs/notes/concurrency/python-multiprocessing.rst
================================================
.. meta::
:description lang=en: Python multiprocessing tutorial covering process creation, pools, shared memory, inter-process communication, and parallel CPU-bound task execution
:keywords: Python, Python3, multiprocessing, Process, Pool, Queue, Pipe, shared memory, parallel, CPU-bound, GIL bypass
===============
Multiprocessing
===============
:Source: `src/basic/concurrency_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
The ``multiprocessing`` module enables true parallel execution by spawning
separate Python processes, each with its own Python interpreter and memory
space. Unlike threads, processes bypass the Global Interpreter Lock (GIL),
making multiprocessing ideal for CPU-bound tasks that need to utilize multiple
CPU cores. The trade-off is higher overhead for process creation and
inter-process communication compared to threads.
Creating Processes
------------------
Creating processes is similar to creating threads. Each process runs in its
own memory space, so changes to variables in one process don't affect others.
Use ``start()`` to begin execution and ``join()`` to wait for completion.
.. code-block:: python
from multiprocessing import Process
import os
def worker(name):
print(f"Worker {name}, PID: {os.getpid()}")
if __name__ == "__main__":
processes = []
for i in range(4):
p = Process(target=worker, args=(i,))
processes.append(p)
p.start()
for p in processes:
p.join()
print(f"Main process PID: {os.getpid()}")
Process Pool
------------
A ``Pool`` manages a collection of worker processes and distributes tasks among
them. This is more efficient than creating a new process for each task, as
processes are reused. The pool provides methods like ``map()``, ``apply()``,
and their async variants for different use cases.
.. code-block:: python
from multiprocessing import Pool
import time
def cpu_intensive(n):
"""Simulate CPU-bound work."""
total = 0
for i in range(n):
total += i * i
return total
if __name__ == "__main__":
numbers = [10**6, 10**6, 10**6, 10**6]
# Sequential execution
start = time.time()
results = [cpu_intensive(n) for n in numbers]
print(f"Sequential: {time.time() - start:.2f}s")
# Parallel execution with Pool
start = time.time()
with Pool(4) as pool:
results = pool.map(cpu_intensive, numbers)
print(f"Parallel: {time.time() - start:.2f}s")
Pool Methods
------------
The Pool class provides several methods for distributing work. ``map()`` applies
a function to each item in an iterable and returns results in order. ``apply()``
calls a function with arguments and blocks until complete. The ``_async``
variants return immediately with an ``AsyncResult`` object.
.. code-block:: python
from multiprocessing import Pool
def square(x):
return x * x
def add(a, b):
return a + b
if __name__ == "__main__":
with Pool(4) as pool:
# map - apply function to iterable
results = pool.map(square, range(10))
print(f"map: {results}")
# starmap - unpack arguments from iterable
pairs = [(1, 2), (3, 4), (5, 6)]
results = pool.starmap(add, pairs)
print(f"starmap: {results}")
# apply_async - non-blocking single call
result = pool.apply_async(square, (10,))
print(f"apply_async: {result.get()}")
# map_async - non-blocking map
result = pool.map_async(square, range(5))
print(f"map_async: {result.get()}")
Sharing Data with Queue
-----------------------
Processes don't share memory by default. ``multiprocessing.Queue`` provides a
thread and process-safe way to exchange data between processes. It's the
recommended approach for most inter-process communication scenarios.
.. code-block:: python
from multiprocessing import Process, Queue
def producer(q, items):
for item in items:
q.put(item)
print(f"Produced: {item}")
q.put(None) # Sentinel
def consumer(q):
while True:
item = q.get()
if item is None:
break
print(f"Consumed: {item}")
if __name__ == "__main__":
q = Queue()
items = list(range(5))
p1 = Process(target=producer, args=(q, items))
p2 = Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
Sharing Data with Pipe
----------------------
A ``Pipe`` creates a two-way communication channel between two processes. It's
simpler and faster than Queue for point-to-point communication but only
supports two endpoints. Each end can send and receive data.
.. code-block:: python
from multiprocessing import Process, Pipe
def sender(conn):
conn.send("Hello from sender")
conn.send([1, 2, 3])
response = conn.recv()
print(f"Sender received: {response}")
conn.close()
def receiver(conn):
msg = conn.recv()
print(f"Receiver got: {msg}")
data = conn.recv()
print(f"Receiver got: {data}")
conn.send("Thanks!")
conn.close()
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
p1 = Process(target=sender, args=(parent_conn,))
p2 = Process(target=receiver, args=(child_conn,))
p1.start()
p2.start()
p1.join()
p2.join()
Shared Memory with Value and Array
----------------------------------
For simple shared state, ``Value`` and ``Array`` provide shared memory that
multiple processes can access. These are faster than Queue/Pipe for frequently
accessed data but require careful synchronization to avoid race conditions.
.. code-block:: python
from multiprocessing import Process, Value, Array
def increment(counter, lock_needed=True):
for _ in range(10000):
with counter.get_lock():
counter.value += 1
def modify_array(arr):
for i in range(len(arr)):
arr[i] = arr[i] * 2
if __name__ == "__main__":
# Shared integer
counter = Value('i', 0) # 'i' = signed int
processes = [Process(target=increment, args=(counter,)) for _ in range(4)]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Counter: {counter.value}") # 40000
# Shared array
arr = Array('d', [1.0, 2.0, 3.0, 4.0]) # 'd' = double
p = Process(target=modify_array, args=(arr,))
p.start()
p.join()
print(f"Array: {list(arr)}") # [2.0, 4.0, 6.0, 8.0]
Manager for Complex Shared Objects
----------------------------------
A ``Manager`` provides a way to share more complex Python objects (lists, dicts)
between processes. The manager runs a server process that holds the actual
objects, and other processes access them through proxies. This is slower than
Value/Array but supports arbitrary Python objects.
.. code-block:: python
from multiprocessing import Process, Manager
def worker(shared_dict, shared_list, worker_id):
shared_dict[worker_id] = worker_id * 10
shared_list.append(worker_id)
if __name__ == "__main__":
with Manager() as manager:
shared_dict = manager.dict()
shared_list = manager.list()
processes = []
for i in range(4):
p = Process(target=worker, args=(shared_dict, shared_list, i))
processes.append(p)
p.start()
for p in processes:
p.join()
print(f"Dict: {dict(shared_dict)}")
print(f"List: {list(shared_list)}")
Process Synchronization
-----------------------
Multiprocessing provides the same synchronization primitives as threading:
``Lock``, ``RLock``, ``Semaphore``, ``Event``, ``Condition``, and ``Barrier``.
These work across processes instead of threads.
.. code-block:: python
from multiprocessing import Process, Lock, Value
def safe_increment(counter, lock):
for _ in range(10000):
with lock:
counter.value += 1
if __name__ == "__main__":
lock = Lock()
counter = Value('i', 0)
processes = [
Process(target=safe_increment, args=(counter, lock))
for _ in range(4)
]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Counter: {counter.value}") # 40000
Daemon Processes
----------------
Like daemon threads, daemon processes are terminated when the main process
exits. They're useful for background tasks that shouldn't prevent program
termination. Set ``daemon=True`` before calling ``start()``.
.. code-block:: python
from multiprocessing import Process
import time
def background_task():
while True:
print("Background process running...")
time.sleep(1)
if __name__ == "__main__":
p = Process(target=background_task, daemon=True)
p.start()
time.sleep(3)
print("Main process exiting, daemon will be terminated")
Handling Process Termination
----------------------------
Processes can be terminated gracefully using ``terminate()`` or forcefully
using ``kill()``. Always clean up resources properly and consider using
signals for graceful shutdown in production code.
.. code-block:: python
from multiprocessing import Process
import time
import signal
def long_running_task():
try:
while True:
print("Working...")
time.sleep(1)
except KeyboardInterrupt:
print("Graceful shutdown")
if __name__ == "__main__":
p = Process(target=long_running_task)
p.start()
time.sleep(3)
# Graceful termination (SIGTERM)
p.terminate()
p.join(timeout=2)
# Force kill if still alive
if p.is_alive():
p.kill()
p.join()
print(f"Exit code: {p.exitcode}")
ProcessPoolExecutor
-------------------
``concurrent.futures.ProcessPoolExecutor`` provides a higher-level interface
for process pools that's consistent with ``ThreadPoolExecutor``. It's often
easier to use than ``multiprocessing.Pool`` and integrates well with the
futures pattern.
.. code-block:: python
from concurrent.futures import ProcessPoolExecutor, as_completed
def compute(n):
return sum(i * i for i in range(n))
if __name__ == "__main__":
numbers = [10**6, 10**6, 10**6, 10**6]
with ProcessPoolExecutor(max_workers=4) as executor:
# Submit individual tasks
futures = [executor.submit(compute, n) for n in numbers]
# Process results as they complete
for future in as_completed(futures):
print(f"Result: {future.result()}")
# Or use map for ordered results
results = list(executor.map(compute, numbers))
print(f"All results: {results}")
Comparing Threads vs Processes
------------------------------
Choose threads for I/O-bound tasks (network, file I/O) where the GIL is
released during waiting. Choose processes for CPU-bound tasks that need true
parallelism. This example demonstrates the performance difference.
.. code-block:: python
from threading import Thread
from multiprocessing import Process, Pool
import time
def cpu_bound(n):
"""CPU-intensive task."""
return sum(i * i for i in range(n))
if __name__ == "__main__":
n = 10**7
count = 4
# Sequential
start = time.time()
for _ in range(count):
cpu_bound(n)
print(f"Sequential: {time.time() - start:.2f}s")
# Threads (limited by GIL)
start = time.time()
threads = [Thread(target=cpu_bound, args=(n,)) for _ in range(count)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Threads: {time.time() - start:.2f}s")
# Processes (true parallelism)
start = time.time()
with Pool(count) as pool:
pool.map(cpu_bound, [n] * count)
print(f"Processes: {time.time() - start:.2f}s")
================================================
FILE: docs/notes/concurrency/python-threading.rst
================================================
.. meta::
:description lang=en: Python threading tutorial covering thread creation, synchronization primitives, locks, semaphores, events, conditions, and thread-safe data structures
:keywords: Python, Python3, threading, Thread, Lock, RLock, Semaphore, Event, Condition, synchronization, GIL, concurrent, parallel
=========
Threading
=========
:Source: `src/basic/concurrency_.py `_
.. contents:: Table of Contents
:backlinks: none
Introduction
------------
The ``threading`` module provides a high-level interface for creating and
managing threads in Python. Threads are lightweight units of execution that
share the same memory space within a process, making them efficient for I/O-bound
tasks where the program spends time waiting for external resources. However,
due to Python's Global Interpreter Lock (GIL), threads cannot achieve true
parallelism for CPU-bound tasks—only one thread can execute Python bytecode
at a time. For CPU-intensive work, consider using ``multiprocessing`` instead.
Creating Threads
----------------
There are two primary ways to create threads: subclassing ``Thread`` or passing
a target function. The function-based approach is more flexible and commonly
used, while subclassing is useful when you need to encapsulate thread state
and behavior in a class.
.. code-block:: python
from threading import Thread
# Method 1: Subclass Thread
class Worker(Thread):
def __init__(self, worker_id):
super().__init__()
self.worker_id = worker_id
def run(self):
print(f"Worker {self.worker_id} running")
# Method 2: Pass target function (preferred)
def task(worker_id):
print(f"Task {worker_id} running")
# Using subclass
t1 = Worker(1)
t1.start()
t1.join()
# Using target function
t2 = Thread(target=task, args=(2,))
t2.start()
t2.join()
Thread with Return Value
------------------------
Threads don't directly return values from their target functions. To get results
back, you can use shared mutable objects, queues, or store results as instance
attributes when subclassing Thread.
.. code-block:: python
from threading import Thread
from queue import Queue
def compute(n, results):
"""Store result in shared dict."""
results[n] = n * n
# Using shared dictionary
results = {}
threads = []
for i in range(5):
t = Thread(target=compute, args=(i, results))
threads.append(t)
t.start()
for t in threads:
t.join()
print(results) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
# Using Queue (thread-safe)
def compute_queue(n, q):
q.put((n, n * n))
q = Queue()
threads = []
for i in range(5):
t = Thread(target=compute_queue, args=(i, q))
threads.append(t)
t.start()
for t in threads:
t.join()
while not q.empty():
n, result = q.get()
print(f"{n}: {result}")
Daemon Threads
--------------
Daemon threads run in the background and are automatically terminated when all
non-daemon threads have finished. They're useful for background tasks that
shouldn't prevent the program from exiting, such as monitoring or cleanup tasks.
.. code-block:: python
from threading import Thread
import time
def background_task():
while True:
print("Background task running...")
time.sleep(1)
# Daemon thread - won't prevent program exit
t = Thread(target=background_task, daemon=True)
t.start()
# Main thread work
time.sleep(3)
print("Main thread done, daemon will be killed")
Lock - Mutual Exclusion
-----------------------
A ``Lock`` is the simplest synchronization primitive that prevents multiple
threads from accessing a shared resource simultaneously. Always use locks when
modifying shared state to prevent race conditions. The context manager syntax
(``with lock:``) is preferred as it guarantees the lock is released even if
an exception occurs.
.. code-block:: python
from threading import Thread, Lock
counter = 0
lock = Lock()
def increment(n):
global counter
for _ in range(n):
with lock: # Acquire and release automatically
counter += 1
threads = [Thread(target=increment, args=(100000,)) for _ in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Counter: {counter}") # Always 500000 with lock
RLock - Reentrant Lock
----------------------
An ``RLock`` (reentrant lock) can be acquired multiple times by the same thread
without causing a deadlock. This is essential when a thread needs to call
methods that also acquire the same lock, such as in recursive functions or
when methods call other methods on the same object.
.. code-block:: python
from threading import Thread, RLock
class Counter:
def __init__(self):
self.value = 0
self.lock = RLock()
def increment(self):
with self.lock:
self.value += 1
def increment_twice(self):
with self.lock: # First acquisition
self.increment() # Second acquisition - OK with RLock
self.increment()
counter = Counter()
threads = [Thread(target=counter.increment_twice) for _ in range(100)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Value: {counter.value}") # 200
Semaphore - Resource Limiting
-----------------------------
A ``Semaphore`` limits the number of threads that can access a resource
concurrently. Unlike a lock which allows only one thread, a semaphore with
count N allows up to N threads to proceed. This is useful for connection pools,
rate limiting, or controlling access to limited resources.
.. code-block:: python
from threading import Thread, Semaphore
import time
# Allow max 3 concurrent connections
connection_pool = Semaphore(3)
def access_database(thread_id):
print(f"Thread {thread_id} waiting for connection...")
with connection_pool:
print(f"Thread {thread_id} connected")
time.sleep(1) # Simulate database work
print(f"Thread {thread_id} disconnected")
threads = [Thread(target=access_database, args=(i,)) for i in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
Event - Thread Signaling
------------------------
An ``Event`` is a simple signaling mechanism that allows one thread to signal
other threads that something has happened. Threads can wait for the event to
be set, and one thread can set or clear the event. This is useful for
coordinating startup, shutdown, or state changes between threads.
.. code-block:: python
from threading import Thread, Event
import time
ready = Event()
def worker(worker_id):
print(f"Worker {worker_id} waiting for signal...")
ready.wait() # Block until event is set
print(f"Worker {worker_id} starting work")
def coordinator():
print("Coordinator preparing...")
time.sleep(2)
print("Coordinator: All systems go!")
ready.set() # Signal all waiting threads
threads = [Thread(target=worker, args=(i,)) for i in range(3)]
threads.append(Thread(target=coordinator))
for t in threads:
t.start()
for t in threads:
t.join()
Condition - Complex Synchronization
-----------------------------------
A ``Condition`` combines a lock with the ability to wait for and notify about
state changes. It's essential for producer-consumer patterns where threads
need to wait for specific conditions (like "buffer not empty" or "buffer not
full") before proceeding.
.. code-block:: python
from threading import Thread, Condition
import time
items = []
condition = Condition()
def producer():
for i in range(5):
time.sleep(0.5)
with condition:
items.append(i)
print(f"Produced: {i}")
condition.notify() # Wake up one waiting consumer
def consumer():
while True:
with condition:
while not items: # Wait until items available
condition.wait()
item = items.pop(0)
print(f"Consumed: {item}")
if item == 4:
break
t1 = Thread(target=producer)
t2 = Thread(target=consumer)
t1.start()
t2.start()
t1.join()
t2.join()
Barrier - Synchronization Point
-------------------------------
A ``Barrier`` blocks a specified number of threads until all of them have
reached the barrier point, then releases them all simultaneously. This is
useful when you need multiple threads to complete a phase before any can
proceed to the next phase.
.. code-block:: python
from threading import Thread, Barrier
import time
import random
barrier = Barrier(3)
def worker(worker_id):
# Phase 1: Initialization
print(f"Worker {worker_id} initializing...")
time.sleep(random.uniform(0.5, 2))
print(f"Worker {worker_id} waiting at barrier")
barrier.wait() # Wait for all threads
# Phase 2: All threads proceed together
print(f"Worker {worker_id} proceeding")
threads = [Thread(target=worker, args=(i,)) for i in range(3)]
for t in threads:
t.start()
for t in threads:
t.join()
Timer - Delayed Execution
-------------------------
A ``Timer`` is a thread that executes a function after a specified delay.
It can be cancelled before it fires. This is useful for timeouts, delayed
cleanup, or scheduling one-time tasks.
.. code-block:: python
from threading import Timer
def delayed_task():
print("Task executed after delay")
# Execute after 2 seconds
timer = Timer(2.0, delayed_task)
timer.start()
# Can be cancelled before it fires
# timer.cancel()
Thread-Local Data
-----------------
``threading.local()`` provides thread-local storage where each thread has its
own independent copy of the data. This is useful for storing per-thread state
without passing it through function arguments, such as database connections
or request context in web applications.
.. code-block:: python
from threading import Thread, local
# Each thread gets its own 'data' attribute
thread_data = local()
def worker(worker_id):
thread_data.value = worker_id
process()
def process():
# Access thread-local data without passing as argument
print(f"Processing with value: {thread_data.value}")
threads = [Thread(target=worker, args=(i,)) for i in range(3)]
for t in threads:
t.start()
for t in threads:
t.join()
Producer-Consumer with Queue
----------------------------
The ``queue.Queue`` class provides a thread-safe FIFO queue that handles all
locking internally. This is the recommended way to communicate between threads
in a producer-consumer pattern, as it eliminates the need for manual
synchronization.
.. code-block:: python
from threading import Thread
from queue import Queue
import time
def producer(q, items):
for item in items:
time.sleep(0.1)
q.put(item)
print(f"Produced: {item}")
q.put(None) # Sentinel to signal completion
def consumer(q):
while True:
item = q.get()
if item is None:
break
print(f"Consumed: {item}")
q.task_done()
q = Queue(maxsize=5) # Bounded queue
items = list(range(10))
t1 = Thread(target=producer, args=(q, items))
t2 = Thread(target=consumer, args=(q,))
t1.start()
t2.start()
t1.join()
t2.join()
Deadlock Example and Prevention
-------------------------------
Deadlock occurs when two or more threads are waiting for each other to release
locks, creating a circular dependency. The classic example is when thread A
holds lock 1 and waits for lock 2, while thread B holds lock 2 and waits for
lock 1. Prevent deadlocks by always acquiring locks in a consistent order.
.. code-block:: python
from threading import Thread, Lock
import time
lock1 = Lock()
lock2 = Lock()
# DEADLOCK EXAMPLE - DON'T DO THIS
def task_a_bad():
with lock1:
print("Task A acquired lock1")
time.sleep(0.1)
with lock2: # Waits for lock2
print("Task A acquired lock2")
def task_b_bad():
with lock2:
print("Task B acquired lock2")
time.sleep(0.1)
with lock1: # Waits for lock1 - DEADLOCK!
print("Task B acquired lock1")
# CORRECT - Always acquire locks in same order
def task_a_good():
with lock1:
with lock2:
print("Task A acquired both locks")
def task_b_good():
with lock1: # Same order as task_a
with lock2:
print("Task B acquired both locks")
Understanding the GIL
---------------------
The Global Interpreter Lock (GIL) is a mutex that protects access to Python
objects, preventing multiple threads from executing Python bytecode
simultaneously. This means threads don't provide speedup for CPU-bound tasks.
However, the GIL is released during I/O operations, making threads effective
for I/O-bound tasks.
.. code-block:: python
from threading import Thread
import time
def cpu_bound(n):
"""CPU-bound task - GIL limits parallelism."""
count = 0
for i in range(n):
count += i
return count
def io_bound(seconds):
"""I/O-bound task - GIL released during sleep."""
time.sleep(seconds)
# CPU-bound: threads won't help (may be slower due to GIL contention)
start = time.time()
threads = [Thread(target=cpu_bound, args=(10**7,)) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"CPU-bound threaded: {time.time() - start:.2f}s")
# I/O-bound: threads help significantly
start = time.time()
threads = [Thread(target=io_bound, args=(1,)) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"I/O-bound threaded: {time.time() - start:.2f}s") # ~1s, not 4s
================================================
FILE: docs/notes/database/index.rst
================================================
.. meta::
:description lang=en: Python SQLAlchemy tutorial covering database connections, ORM models, relationships, queries, joins, and advanced query patterns
:keywords: Python, Python3, SQLAlchemy, database, SQL, ORM, query, join, relationship, session, model, PostgreSQL, MySQL, SQLite
========
Database
========
Working with databases is a core skill for most Python applications, from web
development to data analysis. SQLAlchemy is Python's most popular database
toolkit, providing both a low-level SQL expression language (Core) and a
high-level Object-Relational Mapper (ORM). This section covers SQLAlchemy from
basic connections and queries to advanced patterns like relationships, eager
loading, and complex joins. Whether you're building a simple script or a
large-scale application, these examples will help you interact with databases
efficiently and safely.
.. toctree::
:maxdepth: 1
python-sqlalchemy
python-sqlalchemy-orm
python-sqlalchemy-query
================================================
FILE: docs/notes/database/python-sqlalchemy-orm.rst
================================================
.. meta::
:description lang=en: SQLAlchemy ORM tutorial covering declarative models, sessions, relationships, and object-relational mapping patterns
:keywords: Python, SQLAlchemy, ORM, database, model, session, relationship, declarative, object-relational mapping
==============
SQLAlchemy ORM
==============
.. contents:: Table of Contents
:backlinks: none
SQLAlchemy's Object-Relational Mapper (ORM) provides a high-level abstraction that
allows you to work with database tables as Python classes and rows as objects. The
ORM builds on top of SQLAlchemy Core and adds features like identity mapping, unit
of work pattern, and relationship management. This approach lets you write database
code in a more Pythonic way, focusing on objects and their relationships rather than
SQL statements. The ORM is ideal for applications with complex domain models where
you want to leverage object-oriented programming patterns.
Define Models with Declarative Base
-----------------------------------
The declarative system is the most common way to define ORM models in SQLAlchemy.
You create a base class using ``declarative_base()`` and then define your models
as subclasses. Each model class represents a database table, with class attributes
defining columns. The ``__tablename__`` attribute specifies the table name. This
approach keeps your model definitions clean and readable while providing full
access to SQLAlchemy's features.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String
>>> from sqlalchemy.orm import declarative_base
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... email = Column(String(100))
... def __repr__(self):
... return f"User(id={self.id}, name='{self.name}')"
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
Session Basics
--------------
The ``Session`` is the primary interface for persistence operations in the ORM.
It manages a "holding zone" for objects you've loaded or associated with it, and
handles the communication with the database. Sessions track changes to objects
and synchronize them with the database when you call ``commit()``. The recommended
pattern is to use ``sessionmaker`` to create a session factory, then create sessions
as needed. Always close sessions when done to release database connections.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> try:
... user = User(name="Alice")
... session.add(user)
... session.commit()
... print(f"Created user with id: {user.id}")
... finally:
... session.close()
Created user with id: 1
Add and Commit Objects
----------------------
To persist new objects to the database, add them to the session with ``add()`` or
``add_all()`` for multiple objects. Objects remain in a "pending" state until you
call ``commit()``, which flushes all pending changes to the database in a transaction.
If an error occurs, call ``rollback()`` to undo all changes since the last commit.
After commit, auto-generated values like primary keys are available on the objects.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> # Add single object
>>> user1 = User(name="Alice")
>>> session.add(user1)
>>> # Add multiple objects
>>> users = [User(name="Bob"), User(name="Carol")]
>>> session.add_all(users)
>>> session.commit()
>>> print([u.id for u in [user1] + users])
[1, 2, 3]
>>> session.close()
Query Objects
-------------
SQLAlchemy 2.0 uses ``select()`` with ``session.execute()`` for queries, replacing
the legacy ``session.query()`` API. The ``select()`` construct accepts model classes
or specific columns. Use ``scalars()`` to get model instances directly, or ``execute()``
for row tuples. The result supports iteration, ``all()`` for a list, ``first()`` for
the first result, and ``one()`` when exactly one result is expected.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... age = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... User(name="Alice", age=30),
... User(name="Bob", age=25),
... User(name="Carol", age=35)])
>>> session.commit()
>>> # Get all users
>>> users = session.execute(select(User)).scalars().all()
>>> print([u.name for u in users])
['Alice', 'Bob', 'Carol']
>>> # Filter with where()
>>> user = session.execute(select(User).where(User.age > 28)).scalars().first()
>>> print(user.name)
Alice
>>> session.close()
Filter Queries
--------------
The ``where()`` method accepts filter conditions using column comparisons. SQLAlchemy
overloads Python operators to generate SQL: ``==`` becomes ``=``, ``!=`` becomes ``<>``,
and so on. For complex conditions, use ``and_()``, ``or_()``, and ``not_()`` from
SQLAlchemy. Columns also provide methods like ``in_()``, ``like()``, ``between()``,
``is_()``, and ``isnot()`` for SQL-specific operations.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, and_, or_
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... age = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... User(name="Alice", age=30),
... User(name="Bob", age=25),
... User(name="Carol", age=35),
... User(name="Fred", age=30)])
>>> session.commit()
>>> # AND condition
>>> stmt = select(User).where(and_(User.age >= 30, User.name.like("A%")))
>>> print([u.name for u in session.execute(stmt).scalars()])
['Alice']
>>> # OR condition
>>> stmt = select(User).where(or_(User.name == "Alice", User.name == "Bob"))
>>> print([u.name for u in session.execute(stmt).scalars()])
['Alice', 'Bob']
>>> # IN clause
>>> stmt = select(User).where(User.age.in_([25, 35]))
>>> print([u.name for u in session.execute(stmt).scalars()])
['Bob', 'Carol']
>>> session.close()
Update Objects
--------------
To update objects, simply modify their attributes and call ``commit()``. The session
tracks changes to loaded objects automatically through a mechanism called "dirty
tracking". When you commit, SQLAlchemy generates UPDATE statements only for changed
attributes. You can also use bulk updates with ``update()`` for efficiency when
modifying many rows without loading them into memory.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, update
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add(User(name="Alice"))
>>> session.commit()
>>> # Update via object modification
>>> user = session.execute(select(User).where(User.name == "Alice")).scalars().first()
>>> user.name = "Alicia"
>>> session.commit()
>>> # Verify update
>>> user = session.execute(select(User)).scalars().first()
>>> print(user.name)
Alicia
>>> session.close()
Delete Objects
--------------
To delete objects, use ``session.delete()`` followed by ``commit()``. The session
will generate a DELETE statement for the object. For bulk deletes without loading
objects, use the ``delete()`` construct with ``session.execute()``. Be careful with
cascading deletes when objects have relationships - SQLAlchemy can automatically
delete related objects based on your cascade configuration.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, delete
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([User(name="Alice"), User(name="Bob"), User(name="Carol")])
>>> session.commit()
>>> # Delete via object
>>> user = session.execute(select(User).where(User.name == "Bob")).scalars().first()
>>> session.delete(user)
>>> session.commit()
>>> # Verify deletion
>>> users = session.execute(select(User)).scalars().all()
>>> print([u.name for u in users])
['Alice', 'Carol']
>>> session.close()
One-to-Many Relationship
------------------------
Relationships define how tables are connected. A one-to-many relationship means one
record in the parent table can have multiple related records in the child table.
Use ``relationship()`` on the parent side and ``ForeignKey`` on the child side.
The ``back_populates`` parameter creates a bidirectional relationship, allowing
navigation from both sides. SQLAlchemy handles the foreign key management automatically.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... posts = relationship("Post", back_populates="author")
>>> class Post(Base):
... __tablename__ = "posts"
... id = Column(Integer, primary_key=True)
... title = Column(String(100))
... user_id = Column(Integer, ForeignKey("users.id"))
... author = relationship("User", back_populates="posts")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> user = User(name="Alice")
>>> user.posts.append(Post(title="First Post"))
>>> user.posts.append(Post(title="Second Post"))
>>> session.add(user)
>>> session.commit()
>>> # Access relationship
>>> user = session.execute(select(User)).scalars().first()
>>> print([p.title for p in user.posts])
['First Post', 'Second Post']
>>> session.close()
Many-to-Many Relationship
-------------------------
Many-to-many relationships require an association table that contains foreign keys
to both related tables. Define the association table using ``Table``, then use
``relationship()`` with the ``secondary`` parameter pointing to it. Both sides can
have a relationship, and SQLAlchemy manages the association table entries automatically
when you add or remove items from the relationship collections.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, Table, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> # Association table
>>> student_course = Table(
... "student_course", Base.metadata,
... Column("student_id", Integer, ForeignKey("students.id"), primary_key=True),
... Column("course_id", Integer, ForeignKey("courses.id"), primary_key=True))
>>> class Student(Base):
... __tablename__ = "students"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... courses = relationship("Course", secondary=student_course, back_populates="students")
>>> class Course(Base):
... __tablename__ = "courses"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... students = relationship("Student", secondary=student_course, back_populates="courses")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> math = Course(name="Math")
>>> physics = Course(name="Physics")
>>> alice = Student(name="Alice", courses=[math, physics])
>>> bob = Student(name="Bob", courses=[math])
>>> session.add_all([alice, bob])
>>> session.commit()
>>> # Query relationships
>>> math = session.execute(select(Course).where(Course.name == "Math")).scalars().first()
>>> print([s.name for s in math.students])
['Alice', 'Bob']
>>> session.close()
Self-Referential Relationship
-----------------------------
Self-referential relationships connect a table to itself, useful for hierarchical
data like organizational charts, categories, or threaded comments. Use ``ForeignKey``
pointing to the same table and ``relationship()`` with ``remote_side`` to indicate
which side is the "parent". This pattern allows you to model tree structures where
each node can have a parent and multiple children.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> class Employee(Base):
... __tablename__ = "employees"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... manager_id = Column(Integer, ForeignKey("employees.id"))
... manager = relationship("Employee", remote_side=[id], backref="subordinates")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> ceo = Employee(name="CEO")
>>> session.add(ceo)
>>> session.flush()
>>> manager = Employee(name="Manager", manager_id=ceo.id)
>>> session.add(manager)
>>> session.flush()
>>> worker1 = Employee(name="Worker1", manager_id=manager.id)
>>> worker2 = Employee(name="Worker2", manager_id=manager.id)
>>> session.add_all([worker1, worker2])
>>> session.commit()
>>> # Navigate hierarchy
>>> mgr = session.execute(select(Employee).where(Employee.name == "Manager")).scalars().first()
>>> print(f"Manager: {mgr.name}, Boss: {mgr.manager.name}")
Manager: Manager, Boss: CEO
>>> print(f"Subordinates: {[e.name for e in mgr.subordinates]}")
Subordinates: ['Worker1', 'Worker2']
>>> session.close()
Cascade Deletes
---------------
Cascade options control what happens to related objects when a parent is deleted
or modified. The ``cascade`` parameter on ``relationship()`` accepts a comma-separated
string of cascade rules. Common options include ``"all, delete-orphan"`` which deletes
children when the parent is deleted and when children are removed from the collection.
This ensures referential integrity and prevents orphaned records.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> class Parent(Base):
... __tablename__ = "parents"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... children = relationship("Child", back_populates="parent",
... cascade="all, delete-orphan")
>>> class Child(Base):
... __tablename__ = "children"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... parent_id = Column(Integer, ForeignKey("parents.id"))
... parent = relationship("Parent", back_populates="children")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> parent = Parent(name="Parent1")
>>> parent.children = [Child(name="Child1"), Child(name="Child2")]
>>> session.add(parent)
>>> session.commit()
>>> # Delete parent - children are also deleted
>>> session.delete(parent)
>>> session.commit()
>>> children = session.execute(select(Child)).scalars().all()
>>> print(len(children))
0
>>> session.close()
Eager Loading
-------------
By default, SQLAlchemy uses lazy loading for relationships, executing a new query
when you access related objects. This can cause the "N+1 query problem" when iterating
over many objects. Eager loading fetches related objects in the same query using
JOIN or subqueries. Use ``joinedload()`` for single objects or small collections,
and ``selectinload()`` for larger collections to avoid cartesian products.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship, joinedload
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... posts = relationship("Post", back_populates="author")
>>> class Post(Base):
... __tablename__ = "posts"
... id = Column(Integer, primary_key=True)
... title = Column(String(100))
... user_id = Column(Integer, ForeignKey("users.id"))
... author = relationship("User", back_populates="posts")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> user = User(name="Alice")
>>> user.posts = [Post(title="Post1"), Post(title="Post2")]
>>> session.add(user)
>>> session.commit()
>>> # Eager load posts with user in single query
>>> stmt = select(User).options(joinedload(User.posts))
>>> user = session.execute(stmt).scalars().unique().first()
>>> print([p.title for p in user.posts]) # No additional query
['Post1', 'Post2']
>>> session.close()
Hybrid Properties
-----------------
Hybrid properties allow you to define Python properties that work both at the instance
level (in Python) and at the class level (in SQL queries). This is useful for computed
attributes that you want to filter or sort by in database queries. Use the
``@hybrid_property`` decorator and optionally ``@property.expression`` to customize
the SQL expression.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> from sqlalchemy.ext.hybrid import hybrid_property
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... first_name = Column(String(50))
... last_name = Column(String(50))
...
... @hybrid_property
... def full_name(self):
... return f"{self.first_name} {self.last_name}"
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add(User(first_name="Alice", last_name="Smith"))
>>> session.commit()
>>> user = session.execute(select(User)).scalars().first()
>>> print(user.full_name)
Alice Smith
>>> session.close()
Event Hooks
-----------
SQLAlchemy provides an event system that lets you hook into various ORM operations
like before/after insert, update, or delete. Use ``@event.listens_for()`` decorator
to register event handlers. Events are useful for auditing, validation, automatic
timestamps, or triggering side effects. Common events include ``before_insert``,
``after_insert``, ``before_update``, ``after_update``, ``before_delete``, and
``after_delete``.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, DateTime, select, event
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> from datetime import datetime
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... created_at = Column(DateTime)
... updated_at = Column(DateTime)
>>> @event.listens_for(User, "before_insert")
... def set_created_at(mapper, connection, target):
... target.created_at = datetime.now()
... target.updated_at = datetime.now()
>>> @event.listens_for(User, "before_update")
... def set_updated_at(mapper, connection, target):
... target.updated_at = datetime.now()
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> user = User(name="Alice")
>>> session.add(user)
>>> session.commit()
>>> print(user.created_at is not None)
True
>>> session.close()
================================================
FILE: docs/notes/database/python-sqlalchemy-query.rst
================================================
.. meta::
:description lang=en: SQLAlchemy advanced query patterns including joins, subqueries, aggregations, window functions, and performance optimization
:keywords: Python, SQLAlchemy, query, join, subquery, aggregate, window function, CTE, performance, optimization
========================
SQLAlchemy Query Recipes
========================
.. contents:: Table of Contents
:backlinks: none
This section covers advanced query patterns and recipes for SQLAlchemy. While the
basics of querying are covered in the ORM section, real-world applications often
require more sophisticated queries involving joins across multiple tables, subqueries,
aggregations, and performance optimizations. These patterns help you write efficient
database queries while maintaining readable Python code. Understanding these techniques
is essential for building scalable applications that interact with relational databases.
Order By
--------
The ``order_by()`` method sorts query results by one or more columns. Pass column
objects directly, or use ``desc()`` for descending order. You can chain multiple
columns for secondary sorting. SQLAlchemy also supports ``nullsfirst()`` and
``nullslast()`` to control how NULL values are sorted, which is particularly useful
when dealing with optional fields.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, desc
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... age = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... User(name="Alice", age=30),
... User(name="Bob", age=25),
... User(name="Carol", age=30)])
>>> session.commit()
>>> # Ascending order
>>> stmt = select(User).order_by(User.age)
>>> print([u.name for u in session.execute(stmt).scalars()])
['Bob', 'Alice', 'Carol']
>>> # Descending order
>>> stmt = select(User).order_by(desc(User.age), User.name)
>>> print([u.name for u in session.execute(stmt).scalars()])
['Alice', 'Carol', 'Bob']
>>> session.close()
Limit and Offset
----------------
Use ``limit()`` to restrict the number of results and ``offset()`` to skip rows,
enabling pagination. These methods translate directly to SQL LIMIT and OFFSET clauses.
For large datasets, consider using keyset pagination (filtering by the last seen ID)
instead of offset-based pagination, as OFFSET can become slow with large offsets
since the database must scan and discard rows.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([User(name=f"User{i}") for i in range(10)])
>>> session.commit()
>>> # First page (3 items)
>>> stmt = select(User).order_by(User.id).limit(3)
>>> print([u.name for u in session.execute(stmt).scalars()])
['User0', 'User1', 'User2']
>>> # Second page
>>> stmt = select(User).order_by(User.id).limit(3).offset(3)
>>> print([u.name for u in session.execute(stmt).scalars()])
['User3', 'User4', 'User5']
>>> session.close()
Group By and Aggregates
-----------------------
Use ``group_by()`` with aggregate functions like ``func.count()``, ``func.sum()``,
``func.avg()``, ``func.min()``, and ``func.max()`` for grouped calculations. The
``having()`` method filters groups after aggregation, similar to SQL's HAVING clause.
When selecting both regular columns and aggregates, all non-aggregate columns must
be included in the GROUP BY clause.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, func
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class Sale(Base):
... __tablename__ = "sales"
... id = Column(Integer, primary_key=True)
... product = Column(String(50))
... amount = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... Sale(product="A", amount=100),
... Sale(product="A", amount=150),
... Sale(product="B", amount=200),
... Sale(product="B", amount=50)])
>>> session.commit()
>>> # Group by with sum
>>> stmt = select(Sale.product, func.sum(Sale.amount).label("total"))\
... .group_by(Sale.product)
>>> for row in session.execute(stmt):
... print(f"{row.product}: {row.total}")
A: 250
B: 250
>>> # Having clause
>>> stmt = select(Sale.product, func.count().label("count"))\
... .group_by(Sale.product).having(func.count() > 1)
>>> print(session.execute(stmt).fetchall())
[('A', 2), ('B', 2)]
>>> session.close()
Join Queries
------------
SQLAlchemy provides several ways to join tables. For ORM models with relationships,
use ``join()`` which automatically uses the foreign key. For explicit join conditions,
pass the condition as the second argument. Use ``outerjoin()`` for LEFT OUTER JOIN.
The ``select_from()`` method specifies the FROM clause when needed for complex joins.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... orders = relationship("Order", back_populates="user")
>>> class Order(Base):
... __tablename__ = "orders"
... id = Column(Integer, primary_key=True)
... product = Column(String(50))
... user_id = Column(Integer, ForeignKey("users.id"))
... user = relationship("User", back_populates="orders")
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> alice = User(name="Alice")
>>> alice.orders = [Order(product="Book"), Order(product="Pen")]
>>> bob = User(name="Bob")
>>> session.add_all([alice, bob])
>>> session.commit()
>>> # Inner join
>>> stmt = select(User.name, Order.product).join(Order)
>>> print(session.execute(stmt).fetchall())
[('Alice', 'Book'), ('Alice', 'Pen')]
>>> # Left outer join (includes users without orders)
>>> stmt = select(User.name, Order.product).outerjoin(Order)
>>> print(session.execute(stmt).fetchall())
[('Alice', 'Book'), ('Alice', 'Pen'), ('Bob', None)]
>>> session.close()
Subqueries
----------
Subqueries are queries nested inside other queries. Use ``subquery()`` to create
a subquery that can be used in the FROM clause, or ``scalar_subquery()`` for single-value
subqueries in SELECT or WHERE clauses. Subqueries are useful for complex filtering,
computing derived values, or when you need to reference aggregated data in the main query.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, func
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... score = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... User(name="Alice", score=85),
... User(name="Bob", score=90),
... User(name="Carol", score=75)])
>>> session.commit()
>>> # Scalar subquery: users with above-average score
>>> avg_score = select(func.avg(User.score)).scalar_subquery()
>>> stmt = select(User).where(User.score > avg_score)
>>> print([u.name for u in session.execute(stmt).scalars()])
['Alice', 'Bob']
>>> session.close()
Common Table Expressions (CTE)
------------------------------
CTEs (WITH clauses) improve query readability by naming subqueries. They're especially
useful for recursive queries and when the same subquery is referenced multiple times.
Use ``cte()`` to create a CTE from a select statement. CTEs can reference themselves
for recursive queries, which is powerful for hierarchical data like organizational
charts or category trees.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, func
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class Sale(Base):
... __tablename__ = "sales"
... id = Column(Integer, primary_key=True)
... region = Column(String(50))
... amount = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... Sale(region="East", amount=100),
... Sale(region="East", amount=200),
... Sale(region="West", amount=150)])
>>> session.commit()
>>> # CTE for regional totals
>>> regional_totals = select(
... Sale.region,
... func.sum(Sale.amount).label("total")
... ).group_by(Sale.region).cte("regional_totals")
>>> # Use CTE in main query
>>> stmt = select(regional_totals).where(regional_totals.c.total > 200)
>>> print(session.execute(stmt).fetchall())
[('East', 300)]
>>> session.close()
Exists and Correlated Subqueries
--------------------------------
The ``exists()`` construct creates an EXISTS subquery, which returns true if the
subquery returns any rows. This is efficient for checking the existence of related
records without loading them. Correlated subqueries reference columns from the outer
query, allowing row-by-row comparisons. Use ``correlate()`` to explicitly specify
which tables the subquery should correlate with.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select, exists
>>> from sqlalchemy.orm import declarative_base, sessionmaker, relationship
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> class Order(Base):
... __tablename__ = "orders"
... id = Column(Integer, primary_key=True)
... user_id = Column(Integer, ForeignKey("users.id"))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([User(name="Alice"), User(name="Bob")])
>>> session.commit()
>>> alice = session.execute(select(User).where(User.name == "Alice")).scalars().first()
>>> session.add(Order(user_id=alice.id))
>>> session.commit()
>>> # Users with orders
>>> has_orders = exists().where(Order.user_id == User.id)
>>> stmt = select(User).where(has_orders)
>>> print([u.name for u in session.execute(stmt).scalars()])
['Alice']
>>> # Users without orders
>>> stmt = select(User).where(~has_orders)
>>> print([u.name for u in session.execute(stmt).scalars()])
['Bob']
>>> session.close()
Union Queries
-------------
Use ``union()`` to combine results from multiple SELECT statements, removing duplicates.
Use ``union_all()`` to keep all rows including duplicates, which is faster when you
know there are no duplicates or don't care about them. The queries must have the same
number of columns with compatible types. Unions are useful for combining data from
different tables or different filtered views of the same table.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, union_all
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class Customer(Base):
... __tablename__ = "customers"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> class Supplier(Base):
... __tablename__ = "suppliers"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([Customer(name="Alice"), Customer(name="Bob")])
>>> session.add_all([Supplier(name="Acme"), Supplier(name="Bob")])
>>> session.commit()
>>> # Union all names
>>> stmt = union_all(
... select(Customer.name),
... select(Supplier.name))
>>> print(sorted([row[0] for row in session.execute(stmt)]))
['Acme', 'Alice', 'Bob', 'Bob']
>>> session.close()
Case Expressions
----------------
The ``case()`` construct creates SQL CASE expressions for conditional logic in queries.
It's useful for computed columns, conditional aggregations, and data transformations.
Pass a list of (condition, result) tuples, with an optional ``else_`` for the default
value. Case expressions can be used in SELECT, WHERE, ORDER BY, and other clauses.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, case
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... score = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... User(name="Alice", score=95),
... User(name="Bob", score=75),
... User(name="Carol", score=55)])
>>> session.commit()
>>> # Grade based on score
>>> grade = case(
... (User.score >= 90, "A"),
... (User.score >= 70, "B"),
... else_="C")
>>> stmt = select(User.name, grade.label("grade"))
>>> for row in session.execute(stmt):
... print(f"{row.name}: {row.grade}")
Alice: A
Bob: B
Carol: C
>>> session.close()
Distinct and Count Distinct
---------------------------
Use ``distinct()`` to remove duplicate rows from results. For counting unique values,
combine ``func.count()`` with ``distinct()``. The ``distinct()`` method can be applied
to the entire query or to specific columns. This is essential for accurate counting
when dealing with joined tables that may produce duplicate rows.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, func, distinct
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class Order(Base):
... __tablename__ = "orders"
... id = Column(Integer, primary_key=True)
... customer = Column(String(50))
... product = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... Order(customer="Alice", product="Book"),
... Order(customer="Alice", product="Pen"),
... Order(customer="Bob", product="Book")])
>>> session.commit()
>>> # Distinct customers
>>> stmt = select(Order.customer).distinct()
>>> print(session.execute(stmt).fetchall())
[('Alice',), ('Bob',)]
>>> # Count distinct products
>>> stmt = select(func.count(distinct(Order.product)))
>>> print(session.execute(stmt).scalar())
2
>>> session.close()
Aliased Tables
--------------
Use ``aliased()`` to create aliases for tables, allowing you to reference the same
table multiple times in a query with different names. This is essential for self-joins
and queries that need to compare rows within the same table. Aliases create independent
references that can have different join conditions and filters.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker, aliased
>>> Base = declarative_base()
>>> class Employee(Base):
... __tablename__ = "employees"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
... salary = Column(Integer)
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([
... Employee(name="Alice", salary=50000),
... Employee(name="Bob", salary=60000),
... Employee(name="Carol", salary=55000)])
>>> session.commit()
>>> # Find employees earning more than Alice
>>> alice_alias = aliased(Employee, name="alice")
>>> stmt = select(Employee.name).select_from(Employee).join(
... alice_alias, alice_alias.name == "Alice"
... ).where(Employee.salary > alice_alias.salary)
>>> print([row[0] for row in session.execute(stmt)])
['Bob', 'Carol']
>>> session.close()
Bulk Operations
---------------
For inserting or updating many rows, bulk operations are much faster than adding
objects one by one. Use ``session.bulk_insert_mappings()`` for inserts and
``session.bulk_update_mappings()`` for updates. These methods bypass the ORM's
unit of work pattern for better performance. For even faster inserts, use Core's
``insert()`` with ``execute()`` passing a list of dictionaries.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, select, insert
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> # Bulk insert with Core (fastest)
>>> data = [{"name": f"User{i}"} for i in range(100)]
>>> session.execute(insert(User), data)
>>> session.commit()
>>> # Verify
>>> count = session.execute(select(func.count()).select_from(User)).scalar()
>>> print(count)
100
>>> session.close()
Raw SQL with Text
-----------------
When you need to execute raw SQL that's difficult to express with SQLAlchemy's
constructs, use ``text()`` to wrap SQL strings. Always use bound parameters (`:name`
syntax) instead of string formatting to prevent SQL injection. The ``text()`` construct
can be used with both Core and ORM queries, and results can be mapped to ORM objects
using ``from_statement()``.
.. code-block:: python
>>> from sqlalchemy import create_engine, Column, Integer, String, text, select
>>> from sqlalchemy.orm import declarative_base, sessionmaker
>>> Base = declarative_base()
>>> class User(Base):
... __tablename__ = "users"
... id = Column(Integer, primary_key=True)
... name = Column(String(50))
>>> engine = create_engine("sqlite:///:memory:")
>>> Base.metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add_all([User(name="Alice"), User(name="Bob")])
>>> session.commit()
>>> # Raw SQL with parameters
>>> result = session.execute(
... text("SELECT * FROM users WHERE name = :name"),
... {"name": "Alice"})
>>> print(result.fetchall())
[(1, 'Alice')]
>>> # Map raw SQL to ORM objects
>>> stmt = select(User).from_statement(text("SELECT * FROM users ORDER BY name"))
>>> users = session.execute(stmt).scalars().all()
>>> print([u.name for u in users])
['Alice', 'Bob']
>>> session.close()
================================================
FILE: docs/notes/database/python-sqlalchemy.rst
================================================
.. meta::
:description lang=en: SQLAlchemy Core tutorial covering database connections, engine creation, metadata, table definitions, and SQL expression language
:keywords: Python, SQLAlchemy, database, SQL, engine, metadata, table, connection, transaction, Core API
=================
SQLAlchemy Basics
=================
.. contents:: Table of Contents
:backlinks: none
SQLAlchemy is the most popular database toolkit and Object-Relational Mapping (ORM)
library for Python. It provides a full suite of well-known enterprise-level persistence
patterns, designed for efficient and high-performing database access. SQLAlchemy is
divided into two main components: the Core (low-level SQL abstraction) and the ORM
(high-level object mapping). This cheat sheet covers the Core API, which provides a
SQL Expression Language that allows you to construct SQL statements in Python code
while remaining database-agnostic. The Core is ideal when you need fine-grained control
over SQL queries or when working with existing database schemas.
Create an Engine
----------------
The ``Engine`` is the starting point for any SQLAlchemy application. It represents
the connection pool and dialect for a particular database, managing connectivity
and translating Python code into database-specific SQL. The ``create_engine()``
function takes a database URL that specifies the database type, credentials, host,
and database name. SQLAlchemy supports many databases including SQLite, PostgreSQL,
MySQL, Oracle, and Microsoft SQL Server through different dialects.
.. code-block:: python
>>> from sqlalchemy import create_engine
>>> # SQLite in-memory database (great for testing)
>>> engine = create_engine("sqlite:///:memory:")
>>> # SQLite file-based database
>>> engine = create_engine("sqlite:///mydb.sqlite")
>>> # PostgreSQL
>>> engine = create_engine("postgresql://user:pass@localhost/dbname")
>>> # MySQL
>>> engine = create_engine("mysql+pymysql://user:pass@localhost/dbname")
Database URL Format
-------------------
SQLAlchemy uses RFC-1738 style URLs to specify database connections. The URL format
provides a standardized way to specify all connection parameters including the database
driver, authentication credentials, host address, port number, and database name.
Understanding this format is essential for configuring connections to different
database systems. The ``make_url()`` function can parse and construct these URLs
programmatically.
.. code-block:: python
>>> from sqlalchemy import make_url
>>> # Format: dialect+driver://username:password@host:port/database
>>> url = make_url("postgresql://user:pass@localhost:5432/mydb")
>>> url.drivername
'postgresql'
>>> url.username
'user'
>>> url.host
'localhost'
>>> url.database
'mydb'
Connect and Execute Raw SQL
---------------------------
While SQLAlchemy encourages using its SQL Expression Language, you can also execute
raw SQL strings directly. This is useful for complex queries that are difficult to
express in SQLAlchemy's API, or when migrating existing SQL code. The ``text()``
function wraps raw SQL strings and allows parameter binding for security. Always
use parameter binding instead of string formatting to prevent SQL injection attacks.
.. code-block:: python
>>> from sqlalchemy import create_engine, text
>>> engine = create_engine("sqlite:///:memory:")
>>> with engine.connect() as conn:
... conn.execute(text("CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT)"))
... conn.execute(text("INSERT INTO test (name) VALUES (:name)"), {"name": "Alice"})
... conn.commit()
... result = conn.execute(text("SELECT * FROM test"))
... print(result.fetchall())
[(1, 'Alice')]
Transaction Management
----------------------
Transactions ensure that a series of database operations either all succeed or all
fail together, maintaining data integrity. SQLAlchemy provides several ways to manage
transactions: implicit transactions with ``begin()``, context managers for automatic
commit/rollback, and manual control with ``commit()`` and ``rollback()``. The ``begin()``
method starts a transaction that will automatically rollback on exceptions and commit
on successful completion when used as a context manager.
.. code-block:: python
>>> from sqlalchemy import create_engine, text
>>> engine = create_engine("sqlite:///:memory:")
>>> # Using begin() for automatic commit/rollback
>>> with engine.begin() as conn:
... conn.execute(text("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)"))
... conn.execute(text("INSERT INTO users (name) VALUES ('Bob')"))
... # Commits automatically if no exception
>>> # Manual transaction control
>>> with engine.connect() as conn:
... trans = conn.begin()
... try:
... conn.execute(text("INSERT INTO users (name) VALUES ('Carol')"))
... trans.commit()
... except:
... trans.rollback()
... raise
Define Tables with Metadata
---------------------------
``MetaData`` is a container that holds information about database tables and other
schema constructs. You can define tables programmatically using the ``Table`` class,
specifying columns with their types and constraints. This approach is part of
SQLAlchemy Core and gives you explicit control over the table structure. The metadata
can then create all defined tables in the database with ``create_all()``, which
generates the appropriate DDL statements for your database dialect.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table(
... "users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)),
... Column("email", String(100))
... )
>>> metadata.create_all(engine)
>>> # Check table columns
>>> [c.name for c in users.columns]
['id', 'name', 'email']
Reflect Existing Tables
-----------------------
Table reflection allows SQLAlchemy to load table definitions from an existing database
schema automatically. This is useful when working with legacy databases or when you
want to avoid duplicating schema definitions. The ``reflect()`` method on ``MetaData``
reads the database schema and creates ``Table`` objects for all tables found. You can
also reflect individual tables using ``autoload_with`` parameter.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> engine = create_engine("sqlite:///:memory:")
>>> # Create a table first
>>> with engine.begin() as conn:
... conn.execute(text("CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, price REAL)"))
>>> # Reflect the table
>>> metadata = MetaData()
>>> metadata.reflect(bind=engine)
>>> list(metadata.tables.keys())
['products']
>>> products = metadata.tables['products']
>>> [c.name for c in products.columns]
['id', 'name', 'price']
Inspect Database Schema
-----------------------
The ``inspect()`` function provides a powerful way to examine database schema details
at runtime. The inspector can retrieve information about tables, columns, indexes,
foreign keys, and other database objects. This is particularly useful for database
administration tasks, schema migrations, and debugging. The inspector works with
any database supported by SQLAlchemy and provides a consistent API across different
database systems.
.. code-block:: python
>>> from sqlalchemy import create_engine, inspect, MetaData, Table, Column, Integer, String
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)))
>>> metadata.create_all(engine)
>>> inspector = inspect(engine)
>>> inspector.get_table_names()
['users']
>>> inspector.get_columns('users') # doctest: +ELLIPSIS
[{'name': 'id', ...}, {'name': 'name', ...}]
Insert Data
-----------
The ``insert()`` construct creates an INSERT statement for a table. You can specify
values using the ``values()`` method or pass them as keyword arguments. For bulk
inserts, pass a list of dictionaries to ``execute()``. SQLAlchemy will generate
efficient multi-row INSERT statements when possible. The ``returning()`` method
can retrieve auto-generated values like primary keys after insertion.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, insert
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)))
>>> metadata.create_all(engine)
>>> # Single insert
>>> with engine.begin() as conn:
... conn.execute(insert(users).values(name="Alice"))
... # Bulk insert
... conn.execute(insert(users), [{"name": "Bob"}, {"name": "Carol"}])
>>> with engine.connect() as conn:
... result = conn.execute(users.select())
... print(result.fetchall())
[(1, 'Alice'), (2, 'Bob'), (3, 'Carol')]
Select Data
-----------
The ``select()`` construct creates SELECT statements with a Pythonic API. You can
specify which columns to retrieve, add WHERE clauses with ``where()``, order results
with ``order_by()``, and limit results with ``limit()`` and ``offset()``. The SQL
Expression Language uses Python operators like ``==``, ``!=``, ``>``, ``<`` which
are overloaded to generate SQL conditions. This provides type safety and prevents
SQL injection.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, select, insert
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)),
... Column("age", Integer))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(users), [
... {"name": "Alice", "age": 30},
... {"name": "Bob", "age": 25},
... {"name": "Carol", "age": 35}])
>>> with engine.connect() as conn:
... # Select all
... result = conn.execute(select(users))
... print(result.fetchall())
... # Select with condition
... result = conn.execute(select(users).where(users.c.age > 28))
... print(result.fetchall())
[(1, 'Alice', 30), (2, 'Bob', 25), (3, 'Carol', 35)]
[(1, 'Alice', 30), (3, 'Carol', 35)]
Update Data
-----------
The ``update()`` construct creates UPDATE statements. Use ``where()`` to specify
which rows to update and ``values()`` to set new column values. Without a WHERE
clause, all rows in the table will be updated. The ``returning()`` method can
retrieve the updated values. For bulk updates with different values per row,
use ``bindparam()`` to create parameterized statements.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> from sqlalchemy import select, insert, update
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(users), [{"name": "Alice"}, {"name": "Bob"}])
... conn.execute(update(users).where(users.c.name == "Alice").values(name="Alicia"))
>>> with engine.connect() as conn:
... result = conn.execute(select(users))
... print(result.fetchall())
[(1, 'Alicia'), (2, 'Bob')]
Delete Data
-----------
The ``delete()`` construct creates DELETE statements. Always use ``where()`` to
specify which rows to delete, unless you intend to delete all rows. Like other
DML statements, ``delete()`` supports ``returning()`` to retrieve deleted rows.
Be careful with DELETE statements as they permanently remove data from the database.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> from sqlalchemy import select, insert, delete
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(users), [{"name": "Alice"}, {"name": "Bob"}, {"name": "Carol"}])
... conn.execute(delete(users).where(users.c.name == "Bob"))
>>> with engine.connect() as conn:
... result = conn.execute(select(users))
... print(result.fetchall())
[(1, 'Alice'), (3, 'Carol')]
SQL Expression Language
-----------------------
SQLAlchemy's SQL Expression Language provides a Pythonic way to construct SQL
statements. Column objects support comparison operators (``==``, ``!=``, ``>``, ``<``),
logical operators (``&`` for AND, ``|`` for OR), and methods like ``in_()``,
``like()``, ``between()``, and ``is_()``. These expressions are composable and
can be combined to build complex queries while maintaining readability and type safety.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> from sqlalchemy import select, insert, and_, or_
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)),
... Column("age", Integer))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(users), [
... {"name": "Alice", "age": 30},
... {"name": "Bob", "age": 25},
... {"name": "Carol", "age": 35}])
>>> with engine.connect() as conn:
... # AND condition
... stmt = select(users).where(and_(users.c.age > 25, users.c.age < 35))
... print(conn.execute(stmt).fetchall())
... # OR condition
... stmt = select(users).where(or_(users.c.name == "Alice", users.c.name == "Bob"))
... print(conn.execute(stmt).fetchall())
... # IN clause
... stmt = select(users).where(users.c.name.in_(["Alice", "Carol"]))
... print(conn.execute(stmt).fetchall())
[(1, 'Alice', 30)]
[(1, 'Alice', 30), (2, 'Bob', 25)]
[(1, 'Alice', 30), (3, 'Carol', 35)]
Join Tables
-----------
The ``join()`` method creates JOIN clauses between tables. SQLAlchemy can automatically
determine join conditions based on foreign key relationships, or you can specify
them explicitly. Use ``select_from()`` to specify the joined tables in a SELECT
statement. SQLAlchemy supports INNER JOIN (default), LEFT OUTER JOIN, RIGHT OUTER
JOIN, and FULL OUTER JOIN through the ``isouter`` and ``full`` parameters.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, ForeignKey
>>> from sqlalchemy import select, insert
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata,
... Column("id", Integer, primary_key=True),
... Column("name", String(50)))
>>> orders = Table("orders", metadata,
... Column("id", Integer, primary_key=True),
... Column("user_id", Integer, ForeignKey("users.id")),
... Column("product", String(50)))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(users), [{"name": "Alice"}, {"name": "Bob"}])
... conn.execute(insert(orders), [
... {"user_id": 1, "product": "Book"},
... {"user_id": 1, "product": "Pen"},
... {"user_id": 2, "product": "Laptop"}])
>>> with engine.connect() as conn:
... stmt = select(users.c.name, orders.c.product).select_from(
... users.join(orders))
... print(conn.execute(stmt).fetchall())
[('Alice', 'Book'), ('Alice', 'Pen'), ('Bob', 'Laptop')]
Aggregate Functions
-------------------
SQLAlchemy provides functions for SQL aggregates like ``count()``, ``sum()``,
``avg()``, ``min()``, and ``max()`` in the ``sqlalchemy.func`` namespace. These
can be used in SELECT statements and combined with ``group_by()`` for grouped
aggregations. The ``func`` object is a special namespace that generates SQL
function calls for any function name you access on it.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
>>> from sqlalchemy import select, insert, func
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> sales = Table("sales", metadata,
... Column("id", Integer, primary_key=True),
... Column("product", String(50)),
... Column("amount", Integer))
>>> metadata.create_all(engine)
>>> with engine.begin() as conn:
... conn.execute(insert(sales), [
... {"product": "A", "amount": 100},
... {"product": "A", "amount": 150},
... {"product": "B", "amount": 200}])
>>> with engine.connect() as conn:
... # Count all rows
... result = conn.execute(select(func.count()).select_from(sales))
... print(result.scalar())
... # Sum with group by
... stmt = select(sales.c.product, func.sum(sales.c.amount)).group_by(sales.c.product)
... print(conn.execute(stmt).fetchall())
3
[('A', 250), ('B', 200)]
Drop Tables
-----------
Tables can be dropped using the ``drop()`` method on a ``Table`` object or
``drop_all()`` on ``MetaData`` to drop all tables. The ``checkfirst`` parameter
prevents errors if the table doesn't exist. Be careful with these operations in
production as they permanently delete data and schema. Always backup your database
before dropping tables.
.. code-block:: python
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, inspect
>>> engine = create_engine("sqlite:///:memory:")
>>> metadata = MetaData()
>>> users = Table("users", metadata, Column("id", Integer, primary_key=True))
>>> products = Table("products", metadata, Column("id", Integer, primary_key=True))
>>> metadata.create_all(engine)
>>> inspector = inspect(engine)
>>> sorted(inspector.get_table_names())
['products', 'users']
>>> # Drop single table
>>> users.drop(engine)
>>> sorted(inspect(engine).get_table_names())
['products']
>>> # Drop all tables
>>> metadata.drop_all(engine)
>>> inspect(engine).get_table_names()
[]
================================================
FILE: docs/notes/extension/cpp-from-python.rst
================================================
.. meta::
:description lang=en: Learn modern C++ syntax from Python - side-by-side comparison of Python and C++ code snippets
:keywords: C++, Python, C++11, C++14, C++17, C++20, modern C++, syntax comparison, learn C++
=======================
Learn C++ from Python
=======================
.. contents:: Table of Contents
:backlinks: none
Modern C++ (C++11, C++14, C++17, C++20) has evolved to include features that make it
syntactically similar to Python, making the transition easier for Python developers.
This comprehensive guide provides side-by-side comparisons and 1-1 mappings between
Python and modern C++ code snippets, covering essential programming concepts like
variables, data structures, functions, lambdas, classes, and algorithms.
Whether you're a Python developer looking to learn C++ for performance optimization,
system programming, or expanding your programming skills, this tutorial demonstrates
how familiar Python patterns translate to modern C++ syntax. Many popular frameworks
like PyTorch, TensorFlow, and NumPy use C++ extensions for performance-critical operations,
especially in deep learning, LLM training, and CUDA GPU programming. Understanding C++
enables you to write custom extensions, optimize bottlenecks, and contribute to these
high-performance libraries.
To learn more about C++ programming, refer to this `C++ cheatsheet `_
for additional reference and best practices.
**Complete working examples:** See `cpp_from_py.cpp `_
for runnable code with integrated Google Test suite. Each function includes Doxygen comments showing the equivalent Python code.
Hello World
-----------
The traditional first program in any language. Both Python and C++ can print text to
the console, though C++ requires including the iostream library and a main function.
**Python**
.. code-block:: python
print("Hello, World!")
**C++**
.. code-block:: cpp
#include
int main() {
std::cout << "Hello, World!" << std::endl;
return 0;
}
Variables
---------
Modern C++ supports automatic type inference with the ``auto`` keyword, making variable
declarations as concise as Python. The compiler deduces types from initialization values.
**Python**
.. code-block:: python
x = 10
y = 3.14
name = "Alice"
is_valid = True
**C++**
.. code-block:: cpp
auto x = 10;
auto y = 3.14;
auto name = "Alice";
auto is_valid = true;
Lists and Vectors
-----------------
Python lists and C++ vectors are dynamic arrays that can grow and shrink. Both support
indexing, appending elements, and querying size. C++ vectors require specifying the element
type, but modern C++ can infer it from initialization.
**Python**
.. code-block:: python
numbers = [1, 2, 3, 4, 5]
numbers.append(6)
print(numbers[0])
print(len(numbers))
**C++**
.. code-block:: cpp
#include
std::vector numbers = {1, 2, 3, 4, 5};
numbers.push_back(6);
std::cout << numbers[0] << std::endl;
std::cout << numbers.size() << std::endl;
Array Slicing and Access
-------------------------
Python supports powerful slicing syntax with negative indices and ranges. C++ doesn't have
built-in slicing, but you can use iterators or create subvectors. Negative indexing requires
manual calculation from the end.
**Python**
.. code-block:: python
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(numbers[0])
print(numbers[-1])
print(numbers[2:5])
print(numbers[:3])
print(numbers[7:])
print(numbers[::2])
print(numbers[::-1])
**C++**
.. code-block:: cpp
#include
#include
std::vector numbers = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
std::cout << numbers[0] << std::endl;
std::cout << numbers[numbers.size() - 1] << std::endl;
std::vector slice1(numbers.begin() + 2, numbers.begin() + 5);
std::vector slice2(numbers.begin(), numbers.begin() + 3);
std::vector slice3(numbers.begin() + 7, numbers.end());
std::vector every_second;
for (size_t i = 0; i < numbers.size(); i += 2) {
every_second.push_back(numbers[i]);
}
std::vector reversed(numbers.rbegin(), numbers.rend());
Dictionaries and Maps
---------------------
Dictionaries in Python and maps in C++ store key-value pairs. Both allow insertion, lookup,
and modification using bracket notation. C++ maps keep keys sorted, while Python dicts
maintain insertion order (Python 3.7+).
**Python**
.. code-block:: python
ages = {"Alice": 30, "Bob": 25}
ages["Charlie"] = 35
print(ages["Alice"])
**C++**
.. code-block:: cpp
#include