Repository: srstevenson/nb-clean Branch: main Commit: 1e21d56623ba Files: 35 Total size: 86.7 KB Directory structure: gitextract_9zcbq9pw/ ├── .github/ │ ├── CODEOWNERS │ ├── CONTRIBUTING.md │ ├── dependabot.yml │ └── workflows/ │ └── ci.yml ├── .gitignore ├── .pre-commit-hooks.yaml ├── .prettierrc.toml ├── .python-version ├── LICENSE ├── README.md ├── justfile ├── pyproject.toml ├── src/ │ └── nb_clean/ │ ├── __init__.py │ ├── __main__.py │ ├── cli.py │ └── py.typed └── tests/ ├── conftest.py ├── notebooks/ │ ├── clean.ipynb │ ├── clean_with_cell_metadata.ipynb │ ├── clean_with_counts.ipynb │ ├── clean_with_empty_cells.ipynb │ ├── clean_with_notebook_metadata.ipynb │ ├── clean_with_outputs.ipynb │ ├── clean_with_outputs_with_counts.ipynb │ ├── clean_with_tags_metadata.ipynb │ ├── clean_with_tags_special_metadata.ipynb │ ├── clean_without_empty_cells.ipynb │ ├── clean_without_notebook_metadata.ipynb │ ├── dirty.ipynb │ ├── dirty_empty_octave.ipynb │ └── dirty_with_version.ipynb ├── test_check_notebook.py ├── test_clean_notebook.py ├── test_cli.py └── test_git_integration.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/CODEOWNERS ================================================ * @srstevenson ================================================ FILE: .github/CONTRIBUTING.md ================================================ # Contributing Thanks for considering contributing! The following is a set of guidelines for doing so. They're guidelines rather than rules, so follow your best judgement, but reading them will help make the contribution process easier and more effective for both you and the maintainers. ## Reporting issues GitHub issues are used for managing bug reports and feature requests, except security vulnerabilities: these should be emailed to the maintainers instead. Search for existing issues before creating a new one, to ensure your problem hasn't already been reported. If it has, you're welcome to comment on the existing issue with extra information that might help reproduce and fix the problem, or sharing why a feature would be useful, but refrain from "+1" type comments. Duplicate issues will be closed with a reference to the existing issue. In your report describe what you did, what you expected to happen, and what happened instead. Provide a [minimal reproducible example][mre] that the maintainers can run. Provide as much detail as you can in your description of the problem, including the version of the project you're using, and details of your operating system and environment, and other information which might help diagnose the problem, such as what you've already tried to fix it. ## Contributing changes ### Planning When you contribute a new change, the responsibility for maintenance is (by default) transferred to the existing project maintainers. The benefit of the contribution must be weighed against the cost of maintaining it. If you're considering contributing a non-trivial bugfix or feature, discuss the changes you plan to make before you start coding by opening an issue. This ensures your proposed change will be accepted, and provides the maintainers the opportunity to help you. ### Implementation Changes are managed using GitHub pull requests. If you're new to pull requests, read the [documentation][pr docs] to learn how they work. [uv] is used for managing dependencies and packaging, and you will need it installed. If you're not familiar with uv, we suggest reading its documentation before you begin. After cloning the repository, you can implement your changes as follows: 1. Install the project and its dependencies into an isolated virtual environment with `uv sync`. 2. Before making your changes, run the tests with `just test`, and ensure they pass. This checks your development environment is correctly configured, and there aren't outstanding issues before you start coding. If they don't pass, you can open a GitHub issue for help debugging. 3. Checkout a new branch for your changes, branching from `main`, with a sensible name for your changes. 4. Implement your changes. 5. If you introduced new functionality or fixed a bug, add appropriate automated tests to prevent future regressions. 6. Ensure you've updated any docstrings or documentation files (including `README.md`) which are affected by your change. 7. Run the formatter, linter and type checker with `just fmt lint`, and tests with `just test`, and fix any problems. 8. Commit your changes, following [these guidelines][commit guidelines] for your commit messages. 9. Fork the base repository on GitHub, push your branch to your fork, and open a pull request against the base repository. Make sure your pull request has a clear title and description. The easier your changes are to understand, the easier it is for the maintainers to approve and merge them. 10. Your pull request will be reviewed by the maintainers and either merged, or feedback will be provided on changes that are required. [commit guidelines]: https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html [mre]: https://stackoverflow.com/help/minimal-reproducible-example [pr docs]: https://docs.github.com/en/github/collaborating-with-pull-requests [uv]: https://docs.astral.sh/uv/ ================================================ FILE: .github/dependabot.yml ================================================ version: 2 updates: - package-ecosystem: "pip" directory: "/" schedule: interval: "monthly" cooldown: default-days: 7 - package-ecosystem: "github-actions" directory: "/" schedule: interval: "monthly" cooldown: default-days: 7 ================================================ FILE: .github/workflows/ci.yml ================================================ name: CI on: push: branches: [main] pull_request: jobs: checks: name: Run checks runs-on: ubuntu-slim strategy: matrix: python: - "3.10" - "3.11" - "3.12" - "3.13" - "3.14" env: UV_PYTHON: ${{ matrix.python }} steps: - uses: actions/checkout@v6 - name: Setup uv uses: astral-sh/setup-uv@v7 - name: Setup Python uses: actions/setup-python@v6 with: python-version: ${{ matrix.python }} - name: Install dependencies run: uv sync --dev - name: Run formatter run: uv run ruff format --check . - name: Run linter run: uv run ruff check . - name: Run type checker run: uv run ty check . - name: Run tests run: uv run coverage run -m pytest - name: Print test coverage report run: uv run coverage report ================================================ FILE: .gitignore ================================================ *.egg-info/ .ipynb_checkpoints/ /.coverage /build/ /coverage.xml /dist/ __pycache__/ ================================================ FILE: .pre-commit-hooks.yaml ================================================ - id: nb-clean name: nb-clean entry: nb-clean clean language: python types_or: [jupyter] minimum_pre_commit_version: 2.9.2 ================================================ FILE: .prettierrc.toml ================================================ proseWrap = "always" ================================================ FILE: .python-version ================================================ 3.10 ================================================ FILE: LICENSE ================================================ Copyright © Scott Stevenson Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ================================================ FILE: README.md ================================================

[![License](https://img.shields.io/github/license/srstevenson/nb-clean?label=License&color=blue)](https://github.com/srstevenson/nb-clean/blob/main/LICENSE) [![GitHub release](https://img.shields.io/github/v/release/srstevenson/nb-clean?label=GitHub)](https://github.com/srstevenson/nb-clean) [![PyPI version](https://img.shields.io/pypi/v/nb-clean?label=PyPI)](https://pypi.org/project/nb-clean/) [![Python versions](https://img.shields.io/pypi/pyversions/nb-clean?label=Python)](https://pypi.org/project/nb-clean/) [![CI status](https://github.com/srstevenson/nb-clean/workflows/CI/badge.svg)](https://github.com/srstevenson/nb-clean/actions) nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines. Jupyter notebooks contain execution metadata that changes every time you run a cell, including execution counts, timestamps, and output data. When committed to version control, these elements create unnecessary diff noise, make meaningful code review difficult, and can accidentally expose sensitive information in cell outputs. By cleaning notebooks before committing, you preserve only the essential code and markdown content, leading to cleaner diffs, more focused reviews, and better collaboration. For a detailed discussion of the challenges notebooks present for version control and collaborative development, see my [PyCon UK 2017 talk][pycon talk] and accompanying [blog post][blog post]. > [!NOTE] > > nb-clean 2.0.0 introduced a new command line interface to make cleaning > notebooks in place easier. If you upgrade from a previous release, you'll need > to migrate to the new interface as described under > [Migrating to nb-clean 2](#migrating-to-nb-clean-2). ## Installation nb-clean requires Python 3.10 or later. To run the latest release of nb-clean in an ephemeral virtual environment, use [uv]: ```bash uvx nb-clean ``` To add nb-clean as a dependency to a Python project managed with uv, use: ```bash uv add --dev nb-clean ``` ## Command line usage ### Understanding notebook metadata Jupyter notebooks contain several types of metadata that nb-clean can handle: **Cell metadata** includes information attached to individual cells, such as tags, slideshow settings, and execution timing. Cell metadata fields like `collapsed`, `scrolled`, `deletable`, and `editable` control notebook interface behaviour, whilst `tags` and custom fields support workflow automation. **Notebook metadata** contains document-level information including the kernel specification, language version, and notebook format version. The language version information (`metadata.language_info.version`) frequently changes between Python versions and creates unnecessary version control noise. **Execution metadata** encompasses execution counts for code cells and their outputs, along with execution timestamps and output data. This metadata changes every time you run cells, regardless of whether the actual code has changed. ### Checking You can check if a notebook is clean with: ```bash nb-clean check notebook.ipynb ``` You can also process notebooks through standard input and output streams, which is useful for integrating with shell pipelines or processing notebooks without writing to disk: ```bash nb-clean check < notebook.ipynb ``` When reading from standard input, nb-clean processes the notebook content directly without accessing the filesystem. This approach is particularly useful for automated workflows, continuous integration pipelines, or when you want to check notebooks without creating temporary files. The check can be run with the following flags: - To check for empty cells use `--remove-empty-cells` or the short form `-e`. - To ignore cell metadata use `--preserve-cell-metadata` or the short form `-m`. This will ignore all metadata fields. You can also pass a list of fields to ignore with `--preserve-cell-metadata field1 field2` or `-m field1 field2`. Note that when _not_ passing a list of fields, either the `-m` or `--preserve-cell-metadata` flag must be passed _after_ the notebook paths to process, or the notebook paths should be preceded with `--` so they are not interpreted as metadata fields. - To ignore cell outputs use `--preserve-cell-outputs` or the short form `-o`. - To ignore cell execution counts use `--preserve-execution-counts` or the short form `-c`. - To ignore language version notebook metadata use `--preserve-notebook-metadata` or the short form `-n`. - To check the notebook does not contain any notebook metadata use `--remove-all-notebook-metadata` or the short form `-M`. For example, to check if a notebook is clean whilst ignoring notebook metadata: ```bash nb-clean check --preserve-notebook-metadata notebook.ipynb ``` To check if a notebook is clean whilst ignoring all cell metadata: ```bash nb-clean check --preserve-cell-metadata -- notebook.ipynb ``` To check if a notebook is clean whilst ignoring only the `tags` cell metadata field: ```bash nb-clean check --preserve-cell-metadata tags -- notebook.ipynb ``` nb-clean will exit with status code 0 if the notebook is clean, and status code 1 if it is not. nb-clean will also print details of cell execution counts, metadata, outputs, and empty cells it finds. Note that the conflicting options `--preserve-notebook-metadata` and `--remove-all-notebook-metadata` cannot be used together, as they represent contradictory instructions. ### Cleaning (interactive) You can clean a Jupyter notebook with: ```bash nb-clean clean notebook.ipynb ``` This cleans the notebook in place. You can also pass the notebook content on standard input, in which case the cleaned notebook is written to standard output: ```bash nb-clean clean < original.ipynb > cleaned.ipynb ``` The cleaning can be run with the following flags: - To remove empty cells use `--remove-empty-cells` or the short form `-e`. - To preserve cell metadata use `--preserve-cell-metadata` or the short form `-m`. This will preserve all metadata fields. You can also pass a list of fields to preserve with `--preserve-cell-metadata field1 field2` or `-m field1 field2`. Note that when _not_ passing a list of fields, either the `-m` or `--preserve-cell-metadata` flag must be passed _after_ the notebook paths to process, or the notebook paths should be preceded with `--` so they are not interpreted as metadata fields. - To preserve cell outputs use `--preserve-cell-outputs` or the short form `-o`. - To preserve cell execution counts use `--preserve-execution-counts` or the short form `-c`. - To preserve notebook metadata (such as language version) use `--preserve-notebook-metadata` or the short form `-n`. - To remove all notebook metadata use `--remove-all-notebook-metadata` or the short form `-M`. For example, to clean a notebook whilst preserving notebook metadata: ```bash nb-clean clean --preserve-notebook-metadata notebook.ipynb ``` To clean a notebook whilst preserving all cell metadata: ```bash nb-clean clean --preserve-cell-metadata -- notebook.ipynb ``` To clean a notebook whilst preserving only the `tags` cell metadata field: ```bash nb-clean clean --preserve-cell-metadata tags -- notebook.ipynb ``` #### Directory processing Both the `check` and `clean` commands can operate on directories as well as individual notebook files. When you provide a directory path, nb-clean will recursively find all `.ipynb` files within that directory and process them. For example: ```bash nb-clean check notebooks/ ``` or ```bash nb-clean clean experiments/ ``` This is particularly useful for batch processing entire project directories or ensuring all notebooks in a repository are clean. ### Cleaning (Git filter) To add a filter to an existing Git repository to automatically clean notebooks when they're staged, run the following from the working tree: ```bash nb-clean add-filter ``` This will configure a filter to remove cell execution counts, metadata, and outputs. The same flags as described above for [interactive cleaning](#cleaning-interactive) can be passed to customise the behaviour. The Git filter operates by configuring the `filter.nb-clean.clean` setting in your repository's local Git configuration and adding the line `*.ipynb filter=nb-clean` to `.git/info/attributes`. This ensures that all notebook files are automatically processed through nb-clean when staged for commit. The filter configuration is local to the repository and won't affect your global or system Git settings. To remove the filter, run: ```bash nb-clean remove-filter ``` ### Cleaning (Jujutsu) nb-clean can be used to clean notebooks tracked with [Jujutsu] rather than Git. Configure Jujutsu to use nb-clean as a fix tool by adding the following snippet to `~/.config/jj/config.toml`: ```toml [fix.tools.nb-clean] command = ["nb-clean", "clean"] patterns = ["glob:'**/*.ipynb'"] ``` The same flags as described above for [interactive cleaning](#cleaning-interactive) can be appended to the `command` array to customise the behaviour. Tracked notebooks can then be cleaned by running `jj fix`. See the [Jujutsu documentation][jujutsu docs] for further details of how to invoke and configure fix tools. ### Cleaning (pre-commit hook) nb-clean can also be used as a [pre-commit] hook. You may prefer this to the Git filter if your project already uses the pre-commit framework. Note that the Git filter and pre-commit hook work differently, with different effects on your working directory. The pre-commit hook operates on the notebook on disk, cleaning the copy in your working directory. The Git filter cleans notebooks as they are added to the index, leaving the copy in your working directory dirty. This means cell outputs are still visible to you in your local Jupyter instance when using the Git filter, but not when using the pre-commit hook. After installing [pre-commit], add the nb-clean hook by adding the following snippet to `.pre-commit-config.yaml` in the root of your repository: ```yaml repos: - repo: https://github.com/srstevenson/nb-clean rev: 4.0.1 hooks: - id: nb-clean ``` You can pass additional arguments to nb-clean with an `args` array. The following example shows how to preserve only two specific metadata fields. Note that, in the example, the final item `--` in the arg list is mandatory. The option `--preserve-cell-metadata` may take an arbitrary number of field arguments, and the `--` argument is needed to separate them from notebook filenames, which `pre-commit` will append to the list of arguments. ```yaml repos: - repo: https://github.com/srstevenson/nb-clean rev: 4.0.1 hooks: - id: nb-clean args: - --remove-empty-cells - --preserve-cell-metadata - tags - slideshow - -- ``` Run `pre-commit install` to ensure the hook is installed, and `pre-commit autoupdate` to update the hook to the latest release of nb-clean. ### Preserving all nbformat metadata To ignore or preserve specifically the metadata defined in the [`nbformat` documentation](https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata), use the following options: `--preserve-cell-metadata collapsed scrolled deletable editable format name tags jupyter execution`. ## Python library usage nb-clean can be used programmatically as a Python library, allowing integration into other tools. ```python import nbformat import nb_clean # Load a notebook with open("notebook.ipynb") as f: notebook = nbformat.read(f, as_version=nbformat.NO_CONVERT) # Check if the notebook is clean is_clean = nb_clean.check_notebook( notebook, preserve_cell_outputs=True, filename="notebook.ipynb" ) # Clean the notebook cleaned_notebook = nb_clean.clean_notebook( notebook, remove_empty_cells=True, preserve_cell_metadata=["tags", "slideshow"] ) ``` The library functions accept the same configuration options as the command-line interface. The `check_notebook()` function returns a boolean indicating whether the notebook is clean, whilst `clean_notebook()` returns a cleaned copy of the notebook. ## Migrating to nb-clean 2 The following table maps from the command line interface of nb-clean 1.6.0 to that of nb-clean >=2.0.0. The examples in the table use long flags, but short flags can also be used instead. | Description | nb-clean 1.6.0 | nb-clean >=2.0.0 | | ------------------------------------------- | ---------------------------------------------------------------- | ----------------------------------------------------------- | | Clean notebook | `nb-clean clean --input notebook.ipynb \| sponge notebook.ipynb` | `nb-clean clean notebook.ipynb` | | Clean notebook (remove empty cells) | `nb-clean clean --input notebook.ipynb --remove-empty` | `nb-clean clean --remove-empty-cells notebook.ipynb` | | Clean notebook (preserve all cell metadata) | `nb-clean clean --input notebook.ipynb --preserve-metadata` | `nb-clean clean --preserve-cell-metadata -- notebook.ipynb` | | Check notebook | `nb-clean check --input notebook.ipynb` | `nb-clean check notebook.ipynb` | | Check notebook (ignore non-empty cells) | `nb-clean check --input notebook.ipynb --remove-empty` | `nb-clean check --remove-empty-cells notebook.ipynb` | | Check notebook (ignore all cell metadata) | `nb-clean check --input notebook.ipynb --preserve-metadata` | `nb-clean check --preserve-cell-metadata -- notebook.ipynb` | | Add Git filter to clean notebooks | `nb-clean configure-git` | `nb-clean add-filter` | | Remove Git filter | `nb-clean unconfigure-git` | `nb-clean remove-filter` | ## Copyright Copyright © Scott Stevenson. nb-clean is distributed under the terms of the [ISC license]. [blog post]: https://srstevenson.com/posts/jupyter-notebooks-and-collaboration/ [isc license]: https://opensource.org/licenses/ISC [jujutsu docs]: https://jj-vcs.github.io/jj/latest/cli-reference/#jj-fix [jujutsu]: https://jj-vcs.github.io/jj/ [pre-commit]: https://pre-commit.com/ [pycon talk]: https://www.youtube.com/watch?v=J3k3HkVnd2c [uv]: https://docs.astral.sh/uv/ ================================================ FILE: justfile ================================================ # show this help message (default) help: @just -l # format with ruff fmt: uv run ruff check --fix uv run ruff format # lint with ruff and type-check with ty lint: uv run ruff check uv run ruff format --check uv run ty check # run tests with pytest and report coverage test: uv run coverage run -m pytest uv run coverage report ================================================ FILE: pyproject.toml ================================================ [project] name = "nb-clean" version = "4.0.1" description = "Clean Jupyter notebooks for versioning" authors = [{ name = "Scott Stevenson", email = "scott@stevenson.io" }] readme = "README.md" license = "ISC" license-files = ["LICENSE"] requires-python = ">=3.10" keywords = ["jupyter", "notebook", "clean", "filter", "git"] classifiers = [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "Natural Language :: English", ] dependencies = ["nbformat>=5.9.2"] [project.urls] Homepage = "https://github.com/srstevenson/nb-clean" Repository = "https://github.com/srstevenson/nb-clean" Issues = "https://github.com/srstevenson/nb-clean/issues" [project.scripts] nb-clean = "nb_clean.cli:main" [dependency-groups] dev = [ "coverage>=7.6.10", "pytest>=7.2.1", "pytest-mock>=3.11.1", "ruff>=0.1.6", "ty>=0.0.19", "typing-extensions>=4.14.1", ] [build-system] requires = ["uv_build>=0.7.19,<0.12"] build-backend = "uv_build" [tool.coverage.report] exclude_also = ["if __name__ == .__main__.:", "if TYPE_CHECKING:"] [tool.ruff] target-version = "py310" [tool.ruff.format] docstring-code-format = true skip-magic-trailing-comma = true [tool.ruff.lint] select = ["ALL"] ignore = [ "COM812", # Trailing comma missing "C901", # Function is too complex "E501", # Line too long "PLR0912", # Too many branches "PLR0913", # Too many arguments in function definition "PLR2004", # Magic value used in comparison "S603", # subprocess call: check for execution of untrusted input "S607", # Starting a process with a partial executable path "T201", # print found ] [tool.ruff.lint.flake8-tidy-imports] ban-relative-imports = "all" [tool.ruff.lint.isort] split-on-trailing-comma = false [tool.ruff.lint.per-file-ignores] "tests/**.py" = [ "D", # pydocstyle "INP001", # Implicit namespace package "S101", # Magic value used in comparison ] [tool.ruff.lint.pydocstyle] convention = "google" [tool.ty.rules] all = "error" ================================================ FILE: src/nb_clean/__init__.py ================================================ """Clean Jupyter notebooks of execution counts, metadata, and outputs.""" from __future__ import annotations import contextlib import subprocess from pathlib import Path from typing import TYPE_CHECKING, Any, Final, cast if TYPE_CHECKING: from collections.abc import Collection import nbformat from typing_extensions import Self VERSION: Final = "4.0.1" GIT_ATTRIBUTES_LINE: Final = "*.ipynb filter=nb-clean" class GitProcessError(Exception): """Exception for errors executing Git.""" def __init__(self: Self, message: str, return_code: int) -> None: """Exception for errors executing Git. Args: message: Error message. return_code: Return code. """ super().__init__(message) self.message: str = message self.return_code: int = return_code def git(*args: str) -> str: """Execute a Git subcommand with the provided arguments. Args: *args: Git subcommand and arguments to execute. Returns: Standard output from the Git command, stripped of whitespace. Raises: GitProcessError: If the Git command fails with a non-zero exit code. Examples: >>> git("rev-parse", "--git-dir") '.git' """ try: process = subprocess.run(["git", *list(args)], capture_output=True, check=True) except subprocess.CalledProcessError as exc: raise GitProcessError(exc.stderr.decode(), exc.returncode) from exc return process.stdout.decode().strip() def git_attributes_path() -> Path: """Get path to the attributes file in the current Git repository. Returns: Path to the attributes file. Examples: >>> git_attributes_path() PosixPath('.git/info/attributes') """ git_dir = git("rev-parse", "--git-dir") return Path(git_dir, "info", "attributes") def add_git_filter( *, remove_empty_cells: bool = False, remove_all_notebook_metadata: bool = False, preserve_cell_metadata: Collection[str] | None = None, preserve_cell_outputs: bool = False, preserve_execution_counts: bool = False, preserve_notebook_metadata: bool = False, ) -> None: """Configure and add a Git filter to automatically clean Jupyter notebooks. This function sets up a Git filter that will automatically clean notebooks when they are staged for commit, removing execution counts, outputs, and metadata according to the specified options. Args: remove_empty_cells: If True, remove empty cells. Defaults to False. remove_all_notebook_metadata: If True, remove all notebook metadata. Defaults to False. preserve_cell_metadata: Controls cell metadata handling. If None, clean all cell metadata. If [], preserve all cell metadata. (This corresponds to the `-m` CLI option without specifying any fields.) If list of str, these are the cell metadata fields to preserve. Defaults to None. preserve_cell_outputs: If True, preserve cell outputs. Defaults to False. preserve_execution_counts: If True, preserve cell execution counts. Defaults to False. preserve_notebook_metadata: If True, preserve notebook metadata such as language version. Defaults to False. Raises: ValueError: If both preserve_notebook_metadata and remove_all_notebook_metadata are True. """ if preserve_notebook_metadata and remove_all_notebook_metadata: msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`" raise ValueError(msg) command = ["nb-clean", "clean"] if remove_empty_cells: command.append("--remove-empty-cells") if preserve_cell_metadata is not None: if len(preserve_cell_metadata) > 0: command.append( f"--preserve-cell-metadata {' '.join(preserve_cell_metadata)}" ) else: command.append("--preserve-cell-metadata") if preserve_cell_outputs: command.append("--preserve-cell-outputs") if preserve_execution_counts: command.append("--preserve-execution-counts") if preserve_notebook_metadata: command.append("--preserve-notebook-metadata") if remove_all_notebook_metadata: command.append("--remove-all-notebook-metadata") git("config", "filter.nb-clean.clean", " ".join(command)) attributes_path = git_attributes_path() if attributes_path.is_file() and GIT_ATTRIBUTES_LINE in attributes_path.read_text( encoding="UTF-8" ): return with attributes_path.open("a", encoding="UTF-8") as file: file.write(f"\n{GIT_ATTRIBUTES_LINE}\n") def remove_git_filter() -> None: """Remove the nb-clean filter from the current Git repository. This function removes the nb-clean filter configuration from the Git repository and cleans up the attributes file by removing the filter directive. Raises: GitProcessError: If Git command execution fails. """ attributes_path = git_attributes_path() if attributes_path.is_file(): original_contents = attributes_path.read_text(encoding="UTF-8").split("\n") revised_contents = [ line for line in original_contents if line != GIT_ATTRIBUTES_LINE ] attributes_path.write_text("\n".join(revised_contents), encoding="UTF-8") git("config", "--remove-section", "filter.nb-clean") def check_notebook( notebook: nbformat.NotebookNode, *, remove_empty_cells: bool = False, remove_all_notebook_metadata: bool = False, preserve_cell_metadata: Collection[str] | None = None, preserve_cell_outputs: bool = False, preserve_execution_counts: bool = False, preserve_notebook_metadata: bool = False, filename: str = "notebook", ) -> bool: """Check notebook is clean of execution counts, metadata, and outputs. Args: notebook: The notebook to check. remove_empty_cells: If True, also check for the presence of empty cells. Defaults to False. remove_all_notebook_metadata: If True, also check for the presence of any notebook metadata. Defaults to False. preserve_cell_metadata: If None, check for all cell metadata. If [], don't check for any cell metadata. (This corresponds to the `-m` CLI option without specifying any fields.) If list of str, these are the cell metadata fields to ignore. Defaults to None. preserve_cell_outputs: If True, don't check for cell outputs. Defaults to False. preserve_execution_counts: If True, don't check for cell execution counts. Defaults to False. preserve_notebook_metadata: If True, preserve notebook metadata such as language version. Defaults to False. filename: Notebook filename to use in log messages. Defaults to "notebook". Returns: True if the notebook is clean, False otherwise. """ if preserve_notebook_metadata and remove_all_notebook_metadata: msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`" raise ValueError(msg) is_clean = True for index, cell in enumerate(notebook.cells): prefix = f"{filename} cell {index}" if remove_empty_cells and not cell["source"]: print(f"{prefix}: empty cell") is_clean = False if preserve_cell_metadata is None: if cell["metadata"]: print(f"{prefix}: metadata") is_clean = False elif len(preserve_cell_metadata) > 0: for field in cell["metadata"]: if field not in preserve_cell_metadata: print(f"{prefix}: metadata {field}") is_clean = False if cell["cell_type"] == "code": if not preserve_execution_counts and cell["execution_count"]: print(f"{prefix}: execution count") is_clean = False if preserve_cell_outputs: if not preserve_execution_counts: for output in cell["outputs"]: if output.get("execution_count") is not None: print(f"{prefix}: output execution count") is_clean = False elif cell["outputs"]: print(f"{prefix}: outputs") is_clean = False if remove_all_notebook_metadata and cast("dict[str, Any]", notebook.metadata): print(f"{filename}: metadata") is_clean = False if not preserve_notebook_metadata: with contextlib.suppress(KeyError): notebook["metadata"]["language_info"]["version"] print(f"{filename} metadata: language_info.version") is_clean = False return is_clean def clean_notebook( notebook: nbformat.NotebookNode, *, remove_empty_cells: bool = False, remove_all_notebook_metadata: bool = False, preserve_cell_metadata: Collection[str] | None = None, preserve_cell_outputs: bool = False, preserve_execution_counts: bool = False, preserve_notebook_metadata: bool = False, ) -> nbformat.NotebookNode: """Clean notebook of execution counts, metadata, and outputs. Args: notebook: The notebook to clean. remove_empty_cells: If True, remove empty cells. Defaults to False. remove_all_notebook_metadata: If True, remove all notebook metadata. Defaults to False. preserve_cell_metadata: If None, clean all cell metadata. If [], preserve all cell metadata. (This corresponds to the `-m` CLI option without specifying any fields.) If list of str, these are the cell metadata fields to preserve. Defaults to None. preserve_cell_outputs: If True, preserve cell outputs. Defaults to False. preserve_execution_counts: If True, preserve cell execution counts. Defaults to False. preserve_notebook_metadata: If True, preserve notebook metadata such as language version. Defaults to False. Returns: The cleaned notebook. """ if preserve_notebook_metadata and remove_all_notebook_metadata: msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`" raise ValueError(msg) if remove_empty_cells: notebook.cells = [cell for cell in notebook.cells if cell["source"]] for cell in notebook.cells: if preserve_cell_metadata is None: cell["metadata"] = {} elif len(preserve_cell_metadata) > 0: cell["metadata"] = { field: value for field, value in cell["metadata"].items() if field in preserve_cell_metadata } if cell["cell_type"] == "code": if not preserve_execution_counts: cell["execution_count"] = None if preserve_cell_outputs: if not preserve_execution_counts: for output in cell["outputs"]: if "execution_count" in output: output["execution_count"] = None else: cell["outputs"] = [] if remove_all_notebook_metadata: notebook.metadata = {} elif not preserve_notebook_metadata: with contextlib.suppress(KeyError): del notebook["metadata"]["language_info"]["version"] return notebook ================================================ FILE: src/nb_clean/__main__.py ================================================ """Top-level script to run nb-clean.""" from nb_clean.cli import main if __name__ == "__main__": main() ================================================ FILE: src/nb_clean/cli.py ================================================ """Command line interface to nb-clean.""" from __future__ import annotations import argparse import os import sys from dataclasses import dataclass, field from pathlib import Path from typing import TYPE_CHECKING, NoReturn, TextIO, cast import nbformat import nb_clean if TYPE_CHECKING: from collections.abc import Collection, Iterable, Sequence @dataclass class Args(argparse.Namespace): """Arguments parsed from the command-line.""" subcommand: str = "" inputs: list[Path] = field(default_factory=list) remove_empty_cells: bool = False remove_all_notebook_metadata: bool = False preserve_cell_metadata: list[str] | None = None preserve_cell_outputs: bool = False preserve_execution_counts: bool = False preserve_notebook_metadata: bool = False def expand_directories(paths: Iterable[Path]) -> list[Path]: """Expand paths to directories into paths to notebooks contained within. Args: paths: Paths to expand, including directories. Returns: Paths with directories expanded into notebooks contained within. """ expanded: set[Path] = set() for path in paths: if path.is_dir(): expanded.update(path.rglob("*.ipynb")) else: expanded.add(path) return list(expanded) def exit_with_error(message: str, return_code: int) -> NoReturn: """Print an error message to standard error and exit. Args: message: Error message to print to standard error. return_code: Return code with which to exit. """ print(f"nb-clean: error: {message}", file=sys.stderr) sys.exit(return_code) def add_filter( *, remove_empty_cells: bool, remove_all_notebook_metadata: bool, preserve_cell_metadata: Collection[str] | None, preserve_cell_outputs: bool, preserve_execution_counts: bool, preserve_notebook_metadata: bool, ) -> None: """Add the nb-clean filter to the current Git repository. Args: remove_empty_cells: Configure the filter to remove empty cells. remove_all_notebook_metadata: Configure the filter to remove all notebook metadata. preserve_cell_metadata: Configure the filter to preserve cell metadata. preserve_cell_outputs: Configure the filter to preserve cell outputs. preserve_execution_counts: Configure the filter to preserve cell execution counts. preserve_notebook_metadata: Configure the filter to preserve notebook metadata such as language version. """ try: nb_clean.add_git_filter( remove_empty_cells=remove_empty_cells, remove_all_notebook_metadata=remove_all_notebook_metadata, preserve_cell_metadata=preserve_cell_metadata, preserve_cell_outputs=preserve_cell_outputs, preserve_execution_counts=preserve_execution_counts, preserve_notebook_metadata=preserve_notebook_metadata, ) except nb_clean.GitProcessError as exc: exit_with_error(exc.message, exc.return_code) def remove_filter() -> None: """Remove the nb-clean filter from the current Git repository. This function removes the nb-clean filter configuration and cleans up the Git attributes file. If Git command execution fails, the program will exit with an appropriate error code. """ try: nb_clean.remove_git_filter() except nb_clean.GitProcessError as exc: exit_with_error(exc.message, exc.return_code) def check( inputs: Iterable[Path], *, remove_empty_cells: bool, remove_all_notebook_metadata: bool, preserve_cell_metadata: Collection[str] | None, preserve_cell_outputs: bool, preserve_execution_counts: bool, preserve_notebook_metadata: bool, ) -> None: """Check notebooks are clean of execution counts, metadata, and outputs. Args: inputs: Input notebook paths to check, empty list for stdin. remove_empty_cells: Check for the presence of empty cells. remove_all_notebook_metadata: Check for any notebook metadata. preserve_cell_metadata: Don't check for cell metadata. preserve_cell_outputs: Don't check for cell outputs. preserve_execution_counts: Don't check for cell execution counts. preserve_notebook_metadata: Don't check for notebook metadata such as language version. """ if inputs: processed_inputs: list[Path] | list[TextIO] = expand_directories(inputs) else: processed_inputs = [sys.stdin] all_clean = True for input_ in processed_inputs: name = "stdin" if input_ is sys.stdin else os.fspath(cast("Path", input_)) notebook = cast( "nbformat.NotebookNode", nbformat.read(input_, as_version=nbformat.NO_CONVERT), ) is_clean = nb_clean.check_notebook( notebook, remove_empty_cells=remove_empty_cells, remove_all_notebook_metadata=remove_all_notebook_metadata, preserve_cell_metadata=preserve_cell_metadata, preserve_cell_outputs=preserve_cell_outputs, preserve_execution_counts=preserve_execution_counts, preserve_notebook_metadata=preserve_notebook_metadata, filename=name, ) all_clean &= is_clean if not all_clean: sys.exit(1) def clean( inputs: Iterable[Path], *, remove_empty_cells: bool, remove_all_notebook_metadata: bool, preserve_cell_metadata: Collection[str] | None, preserve_cell_outputs: bool, preserve_execution_counts: bool, preserve_notebook_metadata: bool, ) -> None: """Clean notebooks of execution counts, metadata, and outputs. Args: inputs: Input notebook paths to clean, empty list for stdin. remove_empty_cells: Remove empty cells. remove_all_notebook_metadata: Remove all notebook metadata. preserve_cell_metadata: Don't clean cell metadata. preserve_cell_outputs: Don't clean cell outputs. preserve_execution_counts: Don't clean cell execution counts. preserve_notebook_metadata: Don't clean notebook metadata such as language version. """ if inputs: processed_inputs: list[Path] | list[TextIO] = expand_directories(inputs) outputs = processed_inputs else: processed_inputs = [sys.stdin] outputs = [sys.stdout] for input_, output in zip(processed_inputs, outputs, strict=True): notebook = cast( "nbformat.NotebookNode", nbformat.read(input_, as_version=nbformat.NO_CONVERT), ) notebook = nb_clean.clean_notebook( notebook, remove_empty_cells=remove_empty_cells, remove_all_notebook_metadata=remove_all_notebook_metadata, preserve_cell_metadata=preserve_cell_metadata, preserve_cell_outputs=preserve_cell_outputs, preserve_execution_counts=preserve_execution_counts, preserve_notebook_metadata=preserve_notebook_metadata, ) nbformat.write(notebook, output) def parse_args(args: Sequence[str]) -> Args: """Parse command line arguments and call corresponding function. Args: args: Command line arguments to parse. Returns: Parsed command line arguments. """ parser = argparse.ArgumentParser(description=__doc__) subparsers = parser.add_subparsers(dest="subcommand", required=True) subparsers.add_parser("version", help="print version number") add_filter_parser = subparsers.add_parser( "add-filter", help="add Git filter to clean notebooks before staging" ) add_filter_parser.add_argument( "-e", "--remove-empty-cells", action="store_true", help="remove empty cells" ) add_filter_parser.add_argument( "-M", "--remove-all-notebook-metadata", action="store_true", help="remove all notebook metadata", ) add_filter_parser.add_argument( "-m", "--preserve-cell-metadata", default=None, nargs="*", help="preserve cell metadata, all unless fields are specified", ) add_filter_parser.add_argument( "-o", "--preserve-cell-outputs", action="store_true", help="preserve cell outputs", ) add_filter_parser.add_argument( "-c", "--preserve-execution-counts", action="store_true", help="preserve cell execution counts", ) add_filter_parser.add_argument( "-n", "--preserve-notebook-metadata", action="store_true", help="preserve notebook metadata", ) subparsers.add_parser( "remove-filter", help="remove Git filter that cleans notebooks before staging" ) check_parser = subparsers.add_parser( "check", help=( "check a notebook is clean of cell execution counts, metadata, and outputs" ), ) check_parser.add_argument( "inputs", nargs="*", metavar="PATH", type=Path, help="input file" ) check_parser.add_argument( "-e", "--remove-empty-cells", action="store_true", help="check for empty cells" ) check_parser.add_argument( "-M", "--remove-all-notebook-metadata", action="store_true", help="check for any notebook metadata", ) check_parser.add_argument( "-m", "--preserve-cell-metadata", default=None, nargs="*", help="preserve cell metadata, all unless fields are specified", ) check_parser.add_argument( "-o", "--preserve-cell-outputs", action="store_true", help="preserve cell outputs", ) check_parser.add_argument( "-c", "--preserve-execution-counts", action="store_true", help="preserve cell execution counts", ) check_parser.add_argument( "-n", "--preserve-notebook-metadata", action="store_true", help="preserve notebook metadata", ) clean_parser = subparsers.add_parser( "clean", help="clean notebook of cell execution counts, metadata, and outputs" ) clean_parser.add_argument( "inputs", nargs="*", metavar="PATH", type=Path, help="input path" ) clean_parser.add_argument( "-e", "--remove-empty-cells", action="store_true", help="remove empty cells" ) clean_parser.add_argument( "-M", "--remove-all-notebook-metadata", action="store_true", help="remove all notebook metadata", ) clean_parser.add_argument( "-m", "--preserve-cell-metadata", default=None, nargs="*", help="preserve cell metadata, all unless fields are specified", ) clean_parser.add_argument( "-o", "--preserve-cell-outputs", action="store_true", help="preserve cell outputs", ) clean_parser.add_argument( "-c", "--preserve-execution-counts", action="store_true", help="preserve cell execution counts", ) clean_parser.add_argument( "-n", "--preserve-notebook-metadata", action="store_true", help="preserve notebook metadata", ) return parser.parse_args(args, namespace=Args()) def main() -> None: # pragma: no cover """Command line entrypoint for nb-clean. Parses command line arguments and dispatches to the appropriate subcommand handler (version, add-filter, remove-filter, check, or clean). """ args = parse_args(sys.argv[1:]) if args.subcommand == "version": print(f"nb-clean {nb_clean.VERSION}") elif args.subcommand == "add-filter": add_filter( remove_empty_cells=args.remove_empty_cells, remove_all_notebook_metadata=args.remove_all_notebook_metadata, preserve_cell_metadata=args.preserve_cell_metadata, preserve_cell_outputs=args.preserve_cell_outputs, preserve_execution_counts=args.preserve_execution_counts, preserve_notebook_metadata=args.preserve_notebook_metadata, ) elif args.subcommand == "remove-filter": remove_filter() elif args.subcommand == "check": check( args.inputs, remove_empty_cells=args.remove_empty_cells, remove_all_notebook_metadata=args.remove_all_notebook_metadata, preserve_cell_metadata=args.preserve_cell_metadata, preserve_cell_outputs=args.preserve_cell_outputs, preserve_execution_counts=args.preserve_execution_counts, preserve_notebook_metadata=args.preserve_notebook_metadata, ) elif args.subcommand == "clean": clean( args.inputs, remove_empty_cells=args.remove_empty_cells, remove_all_notebook_metadata=args.remove_all_notebook_metadata, preserve_cell_metadata=args.preserve_cell_metadata, preserve_cell_outputs=args.preserve_cell_outputs, preserve_execution_counts=args.preserve_execution_counts, preserve_notebook_metadata=args.preserve_notebook_metadata, ) else: # This should never happen due to argparse validation, but be defensive exit_with_error(f"Unknown subcommand: {args.subcommand}", 1) ================================================ FILE: src/nb_clean/py.typed ================================================ ================================================ FILE: tests/conftest.py ================================================ from pathlib import Path from typing import Final, cast import nbformat import pytest NOTEBOOKS_DIR: Final = Path(__file__).parent / "notebooks" def _read_notebook(filename: str) -> nbformat.NotebookNode: return cast( "nbformat.NotebookNode", nbformat.read(NOTEBOOKS_DIR / filename, as_version=nbformat.NO_CONVERT), ) @pytest.fixture def dirty_notebook() -> nbformat.NotebookNode: return _read_notebook("dirty.ipynb") @pytest.fixture def dirty_notebook_with_version() -> nbformat.NotebookNode: return _read_notebook("dirty_with_version.ipynb") @pytest.fixture def clean_notebook() -> nbformat.NotebookNode: return _read_notebook("clean.ipynb") @pytest.fixture def clean_notebook_with_notebook_metadata() -> nbformat.NotebookNode: return _read_notebook("clean_with_notebook_metadata.ipynb") @pytest.fixture def clean_notebook_without_empty_cells() -> nbformat.NotebookNode: return _read_notebook("clean_without_empty_cells.ipynb") @pytest.fixture def clean_notebook_with_empty_cells() -> nbformat.NotebookNode: return _read_notebook("clean_with_empty_cells.ipynb") @pytest.fixture def clean_notebook_with_counts() -> nbformat.NotebookNode: return _read_notebook("clean_with_counts.ipynb") @pytest.fixture def clean_notebook_with_cell_metadata() -> nbformat.NotebookNode: return _read_notebook("clean_with_cell_metadata.ipynb") @pytest.fixture def clean_notebook_with_tags_metadata() -> nbformat.NotebookNode: return _read_notebook("clean_with_tags_metadata.ipynb") @pytest.fixture def clean_notebook_with_tags_special_metadata() -> nbformat.NotebookNode: return _read_notebook("clean_with_tags_special_metadata.ipynb") @pytest.fixture def clean_notebook_with_outputs() -> nbformat.NotebookNode: return _read_notebook("clean_with_outputs.ipynb") @pytest.fixture def clean_notebook_with_outputs_with_counts() -> nbformat.NotebookNode: return _read_notebook("clean_with_outputs_with_counts.ipynb") @pytest.fixture def clean_notebook_without_notebook_metadata() -> nbformat.NotebookNode: return _read_notebook("clean_without_notebook_metadata.ipynb") ================================================ FILE: tests/notebooks/clean.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_cell_metadata.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "nbclean": "test", "special": "my special metadata", "tags": [ "before-import", "answer" ] }, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_counts.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_empty_cells.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_notebook_metadata.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_outputs.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello, world'" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world\n" ] } ], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_outputs_with_counts.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello, world'" ] }, "execution_count": 0, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world\n" ] } ], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_tags_metadata.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "before-import", "answer" ] }, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_with_tags_special_metadata.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "special": "my special metadata", "tags": [ "before-import", "answer" ] }, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_without_empty_cells.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/clean_without_notebook_metadata.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/dirty.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 3, "metadata": { "nbclean": "test", "tags": [ "before-import", "answer" ], "special": "my special metadata" }, "outputs": [ { "data": { "text/plain": [ "'Hello, world'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world\n" ] } ], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/notebooks/dirty_empty_octave.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "id": "10cfba24-bab5-47a0-9ab8-5d1fc01f1f58", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Octave", "language": "octave", "name": "octave" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: tests/notebooks/dirty_with_version.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "nbclean": "test" }, "outputs": [], "source": [ "text = \"Hello, world\"\n", "text" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbclean": "test", "tags": [ "example-tag", "another-tag" ], "special": "my special metadata" }, "outputs": [], "source": [ "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:Python3] *", "language": "python", "name": "conda-env-Python3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: tests/test_check_notebook.py ================================================ from __future__ import annotations from typing import TYPE_CHECKING, cast import pytest import nb_clean if TYPE_CHECKING: from collections.abc import Collection import nbformat @pytest.mark.parametrize( ("notebook_name", "is_clean"), [ ("clean_notebook", True), ("dirty_notebook", False), ("dirty_notebook_with_version", False), ], ) def test_check_notebook( notebook_name: str, *, is_clean: bool, request: pytest.FixtureRequest ) -> None: notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name)) assert nb_clean.check_notebook(notebook) is is_clean @pytest.mark.parametrize("preserve_notebook_metadata", [True, False]) def test_check_notebook_preserve_notebook_metadata( clean_notebook_with_notebook_metadata: nbformat.NotebookNode, *, preserve_notebook_metadata: bool, ) -> None: assert ( nb_clean.check_notebook( clean_notebook_with_notebook_metadata, preserve_notebook_metadata=preserve_notebook_metadata, ) is preserve_notebook_metadata ) @pytest.mark.parametrize("remove_empty_cells", [True, False]) def test_check_notebook_remove_empty_cells( clean_notebook_with_empty_cells: nbformat.NotebookNode, *, remove_empty_cells: bool ) -> None: output = nb_clean.check_notebook( clean_notebook_with_empty_cells, remove_empty_cells=remove_empty_cells ) assert output is not remove_empty_cells @pytest.mark.parametrize( "preserve_cell_metadata", [ [], ["tags"], ["other"], ["tags", "special"], ["nbformat", "tags", "special"], None, ], ) def test_check_notebook_preserve_cell_metadata( clean_notebook_with_cell_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str] | None, ) -> None: expected = (preserve_cell_metadata is not None) and ( preserve_cell_metadata == [] or {"tags", "special", "nbclean"}.issubset(preserve_cell_metadata) ) output = nb_clean.check_notebook( clean_notebook_with_cell_metadata, preserve_cell_metadata=preserve_cell_metadata ) assert output is expected @pytest.mark.parametrize( "preserve_cell_metadata", [ [], ["tags"], ["other"], ["tags", "special"], ["nbformat", "tags", "special"], None, ], ) def test_check_notebook_preserve_cell_metadata_tags( clean_notebook_with_tags_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str] | None, ) -> None: expected = (preserve_cell_metadata is not None) and ( preserve_cell_metadata == [] or {"tags"}.issubset(preserve_cell_metadata) ) output = nb_clean.check_notebook( clean_notebook_with_tags_metadata, preserve_cell_metadata=preserve_cell_metadata ) assert output is expected @pytest.mark.parametrize( "preserve_cell_metadata", [ [], ["tags"], ["other"], ["tags", "special"], ["nbformat", "tags", "special"], None, ], ) def test_check_notebook_preserve_cell_metadata_tags_special( clean_notebook_with_tags_special_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str] | None, ) -> None: expected = (preserve_cell_metadata is not None) and ( preserve_cell_metadata == [] or {"tags", "special"}.issubset(preserve_cell_metadata) ) output = nb_clean.check_notebook( clean_notebook_with_tags_special_metadata, preserve_cell_metadata=preserve_cell_metadata, ) assert output is expected @pytest.mark.parametrize( ("notebook_name", "preserve_cell_outputs", "is_clean"), [ ("clean_notebook_with_outputs", True, True), ("clean_notebook_with_outputs", False, False), ("clean_notebook_with_outputs_with_counts", True, False), ], ) def test_check_notebook_preserve_outputs( notebook_name: str, *, preserve_cell_outputs: bool, is_clean: bool, request: pytest.FixtureRequest, ) -> None: notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name)) output = nb_clean.check_notebook( notebook, preserve_cell_outputs=preserve_cell_outputs ) assert output is is_clean @pytest.mark.parametrize( ("notebook_name", "preserve_execution_counts", "is_clean"), [ ("clean_notebook_with_counts", True, True), ("clean_notebook_with_counts", False, False), ], ) def test_check_notebook_preserve_execution_counts( notebook_name: str, *, preserve_execution_counts: bool, is_clean: bool, request: pytest.FixtureRequest, ) -> None: notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name)) output = nb_clean.check_notebook( notebook, preserve_execution_counts=preserve_execution_counts ) assert output is is_clean @pytest.mark.parametrize( ("notebook_name", "remove_all_notebook_metadata", "is_clean"), [ ("clean_notebook_with_notebook_metadata", True, False), ("clean_notebook_with_notebook_metadata", False, False), ("clean_notebook_without_notebook_metadata", True, True), ("clean_notebook_without_notebook_metadata", False, True), ("clean_notebook", True, False), ("clean_notebook", False, True), ], ) def test_check_notebook_remove_all_notebook_metadata( notebook_name: str, *, remove_all_notebook_metadata: bool, is_clean: bool, request: pytest.FixtureRequest, ) -> None: # The test with `("clean_notebook_with_notebook_metadata", False, True)` # is False due to `clean_notebook_with_notebook_metadata` containing # `language_info.version` detected when `preserve_notebook_metadata=False`. notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name)) assert ( nb_clean.check_notebook( notebook, remove_all_notebook_metadata=remove_all_notebook_metadata ) == is_clean ) def test_check_notebook_exclusive_arguments( dirty_notebook: nbformat.NotebookNode, ) -> None: with pytest.raises( ValueError, match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`", ): nb_clean.check_notebook( dirty_notebook, remove_all_notebook_metadata=True, preserve_notebook_metadata=True, ) ================================================ FILE: tests/test_clean_notebook.py ================================================ from collections.abc import Collection from typing import cast import nbformat import pytest import nb_clean def test_clean_notebook( dirty_notebook: nbformat.NotebookNode, clean_notebook: nbformat.NotebookNode ) -> None: assert nb_clean.clean_notebook(dirty_notebook) == clean_notebook @pytest.mark.parametrize( ("preserve_notebook_metadata", "expected_output_name"), [(True, "clean_notebook_with_notebook_metadata"), (False, "clean_notebook")], ) def test_clean_notebook_with_notebook_metadata( clean_notebook_with_notebook_metadata: nbformat.NotebookNode, *, preserve_notebook_metadata: bool, expected_output_name: str, request: pytest.FixtureRequest, ) -> None: expected_output = cast( "nbformat.NotebookNode", request.getfixturevalue(expected_output_name) ) assert ( nb_clean.clean_notebook( clean_notebook_with_notebook_metadata, preserve_notebook_metadata=preserve_notebook_metadata, ) == expected_output ) def test_clean_notebook_remove_empty_cells( clean_notebook_with_empty_cells: nbformat.NotebookNode, clean_notebook_without_empty_cells: nbformat.NotebookNode, ) -> None: assert ( nb_clean.clean_notebook( clean_notebook_with_empty_cells, remove_empty_cells=True ) == clean_notebook_without_empty_cells ) @pytest.mark.parametrize( "preserve_cell_metadata", [[], ["nbclean", "tags", "special"], ["nbclean", "tags", "special", "toomany"]], ) def test_clean_notebook_preserve_cell_metadata( dirty_notebook: nbformat.NotebookNode, clean_notebook_with_cell_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str], ) -> None: assert ( nb_clean.clean_notebook( dirty_notebook, preserve_cell_metadata=preserve_cell_metadata ) == clean_notebook_with_cell_metadata ) @pytest.mark.parametrize("preserve_cell_metadata", [["tags"], ["tags", "toomany"]]) def test_clean_notebook_preserve_cell_metadata_tags( dirty_notebook: nbformat.NotebookNode, clean_notebook_with_tags_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str], ) -> None: assert ( nb_clean.clean_notebook( dirty_notebook, preserve_cell_metadata=preserve_cell_metadata ) == clean_notebook_with_tags_metadata ) @pytest.mark.parametrize( "preserve_cell_metadata", [["tags", "special"], ["tags", "special", "toomany"]] ) def test_clean_notebook_preserve_cell_metadata_tags_special( dirty_notebook: nbformat.NotebookNode, clean_notebook_with_tags_special_metadata: nbformat.NotebookNode, preserve_cell_metadata: Collection[str], ) -> None: assert ( nb_clean.clean_notebook( dirty_notebook, preserve_cell_metadata=preserve_cell_metadata ) == clean_notebook_with_tags_special_metadata ) def test_clean_notebook_preserve_outputs( dirty_notebook: nbformat.NotebookNode, clean_notebook_with_outputs: nbformat.NotebookNode, ) -> None: assert ( nb_clean.clean_notebook(dirty_notebook, preserve_cell_outputs=True) == clean_notebook_with_outputs ) def test_clean_notebook_preserve_execution_counts( dirty_notebook: nbformat.NotebookNode, clean_notebook_with_counts: nbformat.NotebookNode, ) -> None: assert ( nb_clean.clean_notebook(dirty_notebook, preserve_execution_counts=True) == clean_notebook_with_counts ) def test_clean_notebook_remove_all_notebook_metadata( dirty_notebook: nbformat.NotebookNode, clean_notebook_without_notebook_metadata: nbformat.NotebookNode, ) -> None: assert ( nb_clean.clean_notebook(dirty_notebook, remove_all_notebook_metadata=True) == clean_notebook_without_notebook_metadata ) def test_clean_notebook_exclusive_arguments( dirty_notebook: nbformat.NotebookNode, ) -> None: with pytest.raises( ValueError, match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`", ): nb_clean.clean_notebook( dirty_notebook, remove_all_notebook_metadata=True, preserve_notebook_metadata=True, ) ================================================ FILE: tests/test_cli.py ================================================ from __future__ import annotations import io import os import sys from pathlib import Path from typing import TYPE_CHECKING, cast import nbformat import pytest import nb_clean import nb_clean.cli if TYPE_CHECKING: from collections.abc import Collection, Iterable from pytest import CaptureFixture # noqa: PT013 def test_expand_directories_with_files() -> None: paths = [Path("tests/notebooks/dirty.ipynb")] assert nb_clean.cli.expand_directories(paths) == paths def test_expand_directories_recursively() -> None: input_paths = [Path("tests")] expanded_paths = nb_clean.cli.expand_directories(input_paths) assert len(expanded_paths) > len(input_paths) assert all(path.is_file() and path.suffix == ".ipynb" for path in expanded_paths) def test_exit_with_error(capsys: CaptureFixture[str]) -> None: with pytest.raises(SystemExit) as exc: nb_clean.cli.exit_with_error("error message", 42) assert exc.value.code == 42 assert capsys.readouterr().err == "nb-clean: error: error message\n" def test_add_filter_dispatch(monkeypatch: pytest.MonkeyPatch) -> None: captured: dict[str, object] = {} def fake_add_git_filter(**kwargs: object) -> None: captured.update(kwargs) monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter) argv = ["nb-clean", "add-filter", "-e", "-n"] monkeypatch.setattr(sys, "argv", argv) nb_clean.cli.main() assert captured == { "remove_empty_cells": True, "remove_all_notebook_metadata": False, "preserve_cell_metadata": None, "preserve_cell_outputs": False, "preserve_execution_counts": False, "preserve_notebook_metadata": True, } def test_add_filter_remove_all_notebook_metadata_dispatch( monkeypatch: pytest.MonkeyPatch, ) -> None: captured: dict[str, object] = {} def fake_add_git_filter(**kwargs: object) -> None: captured.update(kwargs) monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter) argv = ["nb-clean", "add-filter", "-e", "-M"] monkeypatch.setattr(sys, "argv", argv) nb_clean.cli.main() assert captured == { "remove_empty_cells": True, "remove_all_notebook_metadata": True, "preserve_cell_metadata": None, "preserve_cell_outputs": False, "preserve_execution_counts": False, "preserve_notebook_metadata": False, } def test_add_filter_failure_dispatch( capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch ) -> None: def fake_add_git_filter(**_kwargs: object) -> None: raise nb_clean.GitProcessError(message="error message", return_code=42) monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter) monkeypatch.setattr(sys, "argv", ["nb-clean", "add-filter", "-e", "-M"]) with pytest.raises(SystemExit) as exc: nb_clean.cli.main() assert exc.value.code == 42 assert capsys.readouterr().err == "nb-clean: error: error message\n" def test_remove_filter_dispatch(monkeypatch: pytest.MonkeyPatch) -> None: called = {"value": False} def fake_remove_git_filter() -> None: called["value"] = True monkeypatch.setattr(nb_clean, "remove_git_filter", fake_remove_git_filter) monkeypatch.setattr(sys, "argv", ["nb-clean", "remove-filter"]) nb_clean.cli.main() assert called["value"] def test_remove_filter_failure_dispatch( capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch ) -> None: def fake_remove_git_filter() -> None: raise nb_clean.GitProcessError(message="error message", return_code=42) monkeypatch.setattr(nb_clean, "remove_git_filter", fake_remove_git_filter) monkeypatch.setattr(sys, "argv", ["nb-clean", "remove-filter"]) with pytest.raises(SystemExit) as exc: nb_clean.cli.main() assert exc.value.code == 42 assert capsys.readouterr().err == "nb-clean: error: error message\n" @pytest.mark.parametrize( ("name", "expect_exit"), [("clean.ipynb", False), ("dirty.ipynb", True)] ) def test_check_file( tmp_path: Path, monkeypatch: pytest.MonkeyPatch, name: str, *, expect_exit: bool ) -> None: src = Path("tests/notebooks") / name dst = tmp_path / name dst.write_bytes(src.read_bytes()) monkeypatch.setattr(sys, "argv", ["nb-clean", "check", os.fspath(dst)]) if expect_exit: with pytest.raises(SystemExit) as exc: nb_clean.cli.main() assert exc.value.code == 1 else: nb_clean.cli.main() @pytest.mark.parametrize( ("notebook_name", "expect_exit"), [("clean_notebook", False), ("dirty_notebook", True)], ) def test_check_stdin( monkeypatch: pytest.MonkeyPatch, notebook_name: str, *, expect_exit: bool, request: pytest.FixtureRequest, ) -> None: notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name)) monkeypatch.setattr(sys, "argv", ["nb-clean", "check"]) content = cast("str", nbformat.writes(notebook)) monkeypatch.setattr(sys, "stdin", io.StringIO(content)) if expect_exit: with pytest.raises(SystemExit) as exc: nb_clean.cli.main() assert exc.value.code == 1 else: nb_clean.cli.main() def test_clean_file(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: src_dirty = Path("tests/notebooks/dirty.ipynb") dst_dirty = tmp_path / "dirty.ipynb" dst_dirty.write_bytes(src_dirty.read_bytes()) monkeypatch.setattr(sys, "argv", ["nb-clean", "clean", str(dst_dirty)]) nb_clean.cli.main() cleaned = cast( "nbformat.NotebookNode", nbformat.read(dst_dirty, as_version=nbformat.NO_CONVERT), ) expected = cast( "nbformat.NotebookNode", nbformat.read( Path("tests/notebooks/clean.ipynb"), as_version=nbformat.NO_CONVERT ), ) assert cleaned == expected def test_clean_stdin( capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch ) -> None: dirty = cast( "nbformat.NotebookNode", nbformat.read( Path("tests/notebooks/dirty.ipynb"), as_version=nbformat.NO_CONVERT ), ) expected = cast( "nbformat.NotebookNode", nbformat.read( Path("tests/notebooks/clean.ipynb"), as_version=nbformat.NO_CONVERT ), ) monkeypatch.setattr(sys, "argv", ["nb-clean", "clean"]) dirty_content = cast("str", nbformat.writes(dirty)) monkeypatch.setattr(sys, "stdin", io.StringIO(dirty_content)) nb_clean.cli.main() out = capsys.readouterr().out expected_text = cast("str", nbformat.writes(expected)) assert out.strip() == expected_text.strip() @pytest.mark.parametrize( ( "argv", "inputs", "remove_empty_cells", "remove_all_notebook_metadata", "preserve_cell_metadata", "preserve_cell_outputs", "preserve_execution_counts", "preserve_notebook_metadata", ), [ ("add-filter -e", [], True, False, None, False, False, False), ( "check -m -o a.ipynb b.ipynb", ["a.ipynb", "b.ipynb"], False, False, [], True, False, False, ), ( "check -m tags -o a.ipynb b.ipynb", ["a.ipynb", "b.ipynb"], False, False, ["tags"], True, False, False, ), ( "check -m tags special -o a.ipynb b.ipynb", ["a.ipynb", "b.ipynb"], False, False, ["tags", "special"], True, False, False, ), ("clean -e -o a.ipynb", ["a.ipynb"], True, False, None, True, False, False), ("clean -e -c -o a.ipynb", ["a.ipynb"], True, False, None, True, True, False), ], ) def test_parse_args( argv: str, inputs: Iterable[str], *, remove_empty_cells: bool, remove_all_notebook_metadata: bool, preserve_cell_metadata: Collection[str] | None, preserve_cell_outputs: bool, preserve_execution_counts: bool, preserve_notebook_metadata: bool, ) -> None: args = nb_clean.cli.parse_args(argv.split()) if inputs: assert args.inputs == [Path(path) for path in inputs] assert args.remove_empty_cells is remove_empty_cells assert args.remove_all_notebook_metadata is remove_all_notebook_metadata assert args.preserve_cell_metadata == preserve_cell_metadata assert args.preserve_cell_outputs is preserve_cell_outputs assert args.preserve_execution_counts is preserve_execution_counts assert args.preserve_notebook_metadata is preserve_notebook_metadata ================================================ FILE: tests/test_git_integration.py ================================================ from __future__ import annotations import subprocess from pathlib import Path from typing import TYPE_CHECKING from unittest.mock import Mock import pytest import nb_clean if TYPE_CHECKING: from collections.abc import Collection from pytest_mock import MockerFixture def test_git(mocker: MockerFixture) -> None: mock_process = Mock() mock_process.stdout = b" output string " mock_run = mocker.patch("nb_clean.subprocess.run", return_value=mock_process) output = nb_clean.git("command", "--flag") mock_run.assert_called_once_with( ["git", "command", "--flag"], capture_output=True, check=True ) assert output == "output string" def test_git_failure(mocker: MockerFixture) -> None: mocker.patch( "nb_clean.subprocess.run", side_effect=subprocess.CalledProcessError( returncode=42, cmd="command", stderr=b"standard error" ), ) with pytest.raises(nb_clean.GitProcessError) as exc: nb_clean.git("command", "--flag") assert exc.value.message == "standard error" assert exc.value.return_code == 42 def test_git_attributes_path(mocker: MockerFixture) -> None: mocker.patch("nb_clean.git", return_value="dir/.git") assert nb_clean.git_attributes_path() == Path("dir", ".git", "info", "attributes") @pytest.mark.parametrize( ( "remove_empty_cells", "remove_all_notebook_metadata", "preserve_cell_metadata", "preserve_cell_outputs", "preserve_execution_counts", "preserve_notebook_metadata", "filter_command", ), [ (False, False, None, False, False, False, "nb-clean clean"), (True, False, None, False, False, False, "nb-clean clean --remove-empty-cells"), ( False, False, [], False, False, False, "nb-clean clean --preserve-cell-metadata", ), ( False, False, ["tags"], False, False, False, "nb-clean clean --preserve-cell-metadata tags", ), ( False, False, ["tags", "special"], False, False, False, "nb-clean clean --preserve-cell-metadata tags special", ), ( False, False, None, True, False, False, "nb-clean clean --preserve-cell-outputs", ), ( True, False, [], True, False, False, "nb-clean clean --remove-empty-cells --preserve-cell-metadata --preserve-cell-outputs", ), ( False, False, None, False, True, True, "nb-clean clean --preserve-execution-counts --preserve-notebook-metadata", ), ( False, True, None, False, False, False, "nb-clean clean --remove-all-notebook-metadata", ), ], ) def test_add_git_filter( mocker: MockerFixture, tmp_path: Path, *, remove_empty_cells: bool, remove_all_notebook_metadata: bool, preserve_cell_metadata: Collection[str] | None, preserve_cell_outputs: bool, preserve_execution_counts: bool, preserve_notebook_metadata: bool, filter_command: str, ) -> None: mock_git = mocker.patch("nb_clean.git") mock_git_attributes_path = mocker.patch( "nb_clean.git_attributes_path", return_value=tmp_path / "attributes" ) nb_clean.add_git_filter( remove_empty_cells=remove_empty_cells, remove_all_notebook_metadata=remove_all_notebook_metadata, preserve_cell_metadata=preserve_cell_metadata, preserve_cell_outputs=preserve_cell_outputs, preserve_execution_counts=preserve_execution_counts, preserve_notebook_metadata=preserve_notebook_metadata, ) mock_git.assert_called_once_with("config", "filter.nb-clean.clean", filter_command) mock_git_attributes_path.assert_called_once() assert nb_clean.GIT_ATTRIBUTES_LINE in (tmp_path / "attributes").read_text() def test_add_git_filter_exclusive_arguments() -> None: with pytest.raises( ValueError, match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`", ): nb_clean.add_git_filter( remove_all_notebook_metadata=True, preserve_notebook_metadata=True ) def test_add_git_filter_idempotent(mocker: MockerFixture, tmp_path: Path) -> None: mocker.patch("nb_clean.git") (tmp_path / "attributes").write_text(nb_clean.GIT_ATTRIBUTES_LINE) mock_git_attributes_path = mocker.patch( "nb_clean.git_attributes_path", return_value=tmp_path / "attributes" ) nb_clean.add_git_filter() mock_git_attributes_path.assert_called_once() assert (tmp_path / "attributes").read_text() == nb_clean.GIT_ATTRIBUTES_LINE @pytest.mark.parametrize("filter_exists", [True, False]) def test_remove_git_filter( mocker: MockerFixture, tmp_path: Path, *, filter_exists: bool ) -> None: mock_git = mocker.patch("nb_clean.git") mock_git_attributes_path = mocker.patch( "nb_clean.git_attributes_path", return_value=tmp_path / "attributes" ) (tmp_path / "attributes").touch() if filter_exists: (tmp_path / "attributes").write_text(nb_clean.GIT_ATTRIBUTES_LINE) nb_clean.remove_git_filter() mock_git_attributes_path.assert_called_once() mock_git.assert_called_once_with("config", "--remove-section", "filter.nb-clean") if filter_exists: assert nb_clean.GIT_ATTRIBUTES_LINE not in (tmp_path / "attributes").read_text()