Repository: srstevenson/nb-clean
Branch: main
Commit: 1e21d56623ba
Files: 35
Total size: 86.7 KB

Directory structure:
gitextract_9zcbq9pw/

├── .github/
│   ├── CODEOWNERS
│   ├── CONTRIBUTING.md
│   ├── dependabot.yml
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── .pre-commit-hooks.yaml
├── .prettierrc.toml
├── .python-version
├── LICENSE
├── README.md
├── justfile
├── pyproject.toml
├── src/
│   └── nb_clean/
│       ├── __init__.py
│       ├── __main__.py
│       ├── cli.py
│       └── py.typed
└── tests/
    ├── conftest.py
    ├── notebooks/
    │   ├── clean.ipynb
    │   ├── clean_with_cell_metadata.ipynb
    │   ├── clean_with_counts.ipynb
    │   ├── clean_with_empty_cells.ipynb
    │   ├── clean_with_notebook_metadata.ipynb
    │   ├── clean_with_outputs.ipynb
    │   ├── clean_with_outputs_with_counts.ipynb
    │   ├── clean_with_tags_metadata.ipynb
    │   ├── clean_with_tags_special_metadata.ipynb
    │   ├── clean_without_empty_cells.ipynb
    │   ├── clean_without_notebook_metadata.ipynb
    │   ├── dirty.ipynb
    │   ├── dirty_empty_octave.ipynb
    │   └── dirty_with_version.ipynb
    ├── test_check_notebook.py
    ├── test_clean_notebook.py
    ├── test_cli.py
    └── test_git_integration.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/CODEOWNERS
================================================
* @srstevenson


================================================
FILE: .github/CONTRIBUTING.md
================================================
# Contributing

Thanks for considering contributing! The following is a set of guidelines for
doing so. They're guidelines rather than rules, so follow your best judgement,
but reading them will help make the contribution process easier and more
effective for both you and the maintainers.

## Reporting issues

GitHub issues are used for managing bug reports and feature requests, except
security vulnerabilities: these should be emailed to the maintainers instead.

Search for existing issues before creating a new one, to ensure your problem
hasn't already been reported. If it has, you're welcome to comment on the
existing issue with extra information that might help reproduce and fix the
problem, or sharing why a feature would be useful, but refrain from "+1" type
comments. Duplicate issues will be closed with a reference to the existing
issue.

In your report describe what you did, what you expected to happen, and what
happened instead. Provide a [minimal reproducible example][mre] that the
maintainers can run. Provide as much detail as you can in your description of
the problem, including the version of the project you're using, and details of
your operating system and environment, and other information which might help
diagnose the problem, such as what you've already tried to fix it.

## Contributing changes

### Planning

When you contribute a new change, the responsibility for maintenance is (by
default) transferred to the existing project maintainers. The benefit of the
contribution must be weighed against the cost of maintaining it.

If you're considering contributing a non-trivial bugfix or feature, discuss the
changes you plan to make before you start coding by opening an issue. This
ensures your proposed change will be accepted, and provides the maintainers the
opportunity to help you.

### Implementation

Changes are managed using GitHub pull requests. If you're new to pull requests,
read the [documentation][pr docs] to learn how they work.

[uv] is used for managing dependencies and packaging, and you will need it
installed. If you're not familiar with uv, we suggest reading its documentation
before you begin.

After cloning the repository, you can implement your changes as follows:

1. Install the project and its dependencies into an isolated virtual environment
   with `uv sync`.
2. Before making your changes, run the tests with `just test`, and ensure they
   pass. This checks your development environment is correctly configured, and
   there aren't outstanding issues before you start coding. If they don't pass,
   you can open a GitHub issue for help debugging.
3. Checkout a new branch for your changes, branching from `main`, with a
   sensible name for your changes.
4. Implement your changes.
5. If you introduced new functionality or fixed a bug, add appropriate automated
   tests to prevent future regressions.
6. Ensure you've updated any docstrings or documentation files (including
   `README.md`) which are affected by your change.
7. Run the formatter, linter and type checker with `just fmt lint`, and tests
   with `just test`, and fix any problems.
8. Commit your changes, following [these guidelines][commit guidelines] for your
   commit messages.
9. Fork the base repository on GitHub, push your branch to your fork, and open a
   pull request against the base repository. Make sure your pull request has a
   clear title and description. The easier your changes are to understand, the
   easier it is for the maintainers to approve and merge them.
10. Your pull request will be reviewed by the maintainers and either merged, or
    feedback will be provided on changes that are required.

[commit guidelines]:
  https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
[mre]: https://stackoverflow.com/help/minimal-reproducible-example
[pr docs]: https://docs.github.com/en/github/collaborating-with-pull-requests
[uv]: https://docs.astral.sh/uv/


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "monthly"
    cooldown:
      default-days: 7
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "monthly"
    cooldown:
      default-days: 7


================================================
FILE: .github/workflows/ci.yml
================================================
name: CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  checks:
    name: Run checks
    runs-on: ubuntu-slim
    strategy:
      matrix:
        python:
          - "3.10"
          - "3.11"
          - "3.12"
          - "3.13"
          - "3.14"
    env:
      UV_PYTHON: ${{ matrix.python }}
    steps:
      - uses: actions/checkout@v6

      - name: Setup uv
        uses: astral-sh/setup-uv@v7

      - name: Setup Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}

      - name: Install dependencies
        run: uv sync --dev

      - name: Run formatter
        run: uv run ruff format --check .

      - name: Run linter
        run: uv run ruff check .

      - name: Run type checker
        run: uv run ty check .

      - name: Run tests
        run: uv run coverage run -m pytest

      - name: Print test coverage report
        run: uv run coverage report


================================================
FILE: .gitignore
================================================
*.egg-info/
.ipynb_checkpoints/
/.coverage
/build/
/coverage.xml
/dist/
__pycache__/


================================================
FILE: .pre-commit-hooks.yaml
================================================
- id: nb-clean
  name: nb-clean
  entry: nb-clean clean
  language: python
  types_or: [jupyter]
  minimum_pre_commit_version: 2.9.2


================================================
FILE: .prettierrc.toml
================================================
proseWrap = "always"


================================================
FILE: .python-version
================================================
3.10


================================================
FILE: LICENSE
================================================
Copyright © Scott Stevenson <scott@stevenson.io>

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.


================================================
FILE: README.md
================================================
<p align="center"><img src="images/nb-clean.png" width=300></p>

[![License](https://img.shields.io/github/license/srstevenson/nb-clean?label=License&color=blue)](https://github.com/srstevenson/nb-clean/blob/main/LICENSE)
[![GitHub release](https://img.shields.io/github/v/release/srstevenson/nb-clean?label=GitHub)](https://github.com/srstevenson/nb-clean)
[![PyPI version](https://img.shields.io/pypi/v/nb-clean?label=PyPI)](https://pypi.org/project/nb-clean/)
[![Python versions](https://img.shields.io/pypi/pyversions/nb-clean?label=Python)](https://pypi.org/project/nb-clean/)
[![CI status](https://github.com/srstevenson/nb-clean/workflows/CI/badge.svg)](https://github.com/srstevenson/nb-clean/actions)

nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs,
and (optionally) empty cells, preparing them for committing to version control.
It provides both a Git filter and pre-commit hook to automatically clean
notebooks before they're staged, and can also be used with other version control
systems, as a command line tool, and as a Python library. It can determine if a
notebook is clean or not, which can be used as a check in your continuous
integration pipelines.

Jupyter notebooks contain execution metadata that changes every time you run a
cell, including execution counts, timestamps, and output data. When committed to
version control, these elements create unnecessary diff noise, make meaningful
code review difficult, and can accidentally expose sensitive information in cell
outputs. By cleaning notebooks before committing, you preserve only the
essential code and markdown content, leading to cleaner diffs, more focused
reviews, and better collaboration.

For a detailed discussion of the challenges notebooks present for version
control and collaborative development, see my [PyCon UK 2017 talk][pycon talk]
and accompanying [blog post][blog post].

> [!NOTE]
>
> nb-clean 2.0.0 introduced a new command line interface to make cleaning
> notebooks in place easier. If you upgrade from a previous release, you'll need
> to migrate to the new interface as described under
> [Migrating to nb-clean 2](#migrating-to-nb-clean-2).

## Installation

nb-clean requires Python 3.10 or later. To run the latest release of nb-clean in
an ephemeral virtual environment, use [uv]:

```bash
uvx nb-clean
```

To add nb-clean as a dependency to a Python project managed with uv, use:

```bash
uv add --dev nb-clean
```

## Command line usage

### Understanding notebook metadata

Jupyter notebooks contain several types of metadata that nb-clean can handle:

**Cell metadata** includes information attached to individual cells, such as
tags, slideshow settings, and execution timing. Cell metadata fields like
`collapsed`, `scrolled`, `deletable`, and `editable` control notebook interface
behaviour, whilst `tags` and custom fields support workflow automation.

**Notebook metadata** contains document-level information including the kernel
specification, language version, and notebook format version. The language
version information (`metadata.language_info.version`) frequently changes
between Python versions and creates unnecessary version control noise.

**Execution metadata** encompasses execution counts for code cells and their
outputs, along with execution timestamps and output data. This metadata changes
every time you run cells, regardless of whether the actual code has changed.

### Checking

You can check if a notebook is clean with:

```bash
nb-clean check notebook.ipynb
```

You can also process notebooks through standard input and output streams, which
is useful for integrating with shell pipelines or processing notebooks without
writing to disk:

```bash
nb-clean check < notebook.ipynb
```

When reading from standard input, nb-clean processes the notebook content
directly without accessing the filesystem. This approach is particularly useful
for automated workflows, continuous integration pipelines, or when you want to
check notebooks without creating temporary files.

The check can be run with the following flags:

- To check for empty cells use `--remove-empty-cells` or the short form `-e`.
- To ignore cell metadata use `--preserve-cell-metadata` or the short form `-m`.
  This will ignore all metadata fields. You can also pass a list of fields to
  ignore with `--preserve-cell-metadata field1 field2` or `-m field1 field2`.
  Note that when _not_ passing a list of fields, either the `-m` or
  `--preserve-cell-metadata` flag must be passed _after_ the notebook paths to
  process, or the notebook paths should be preceded with `--` so they are not
  interpreted as metadata fields.
- To ignore cell outputs use `--preserve-cell-outputs` or the short form `-o`.
- To ignore cell execution counts use `--preserve-execution-counts` or the short
  form `-c`.
- To ignore language version notebook metadata use
  `--preserve-notebook-metadata` or the short form `-n`.
- To check the notebook does not contain any notebook metadata use
  `--remove-all-notebook-metadata` or the short form `-M`.

For example, to check if a notebook is clean whilst ignoring notebook metadata:

```bash
nb-clean check --preserve-notebook-metadata notebook.ipynb
```

To check if a notebook is clean whilst ignoring all cell metadata:

```bash
nb-clean check --preserve-cell-metadata -- notebook.ipynb
```

To check if a notebook is clean whilst ignoring only the `tags` cell metadata
field:

```bash
nb-clean check --preserve-cell-metadata tags -- notebook.ipynb
```

nb-clean will exit with status code 0 if the notebook is clean, and status code
1 if it is not. nb-clean will also print details of cell execution counts,
metadata, outputs, and empty cells it finds.

Note that the conflicting options `--preserve-notebook-metadata` and
`--remove-all-notebook-metadata` cannot be used together, as they represent
contradictory instructions.

### Cleaning (interactive)

You can clean a Jupyter notebook with:

```bash
nb-clean clean notebook.ipynb
```

This cleans the notebook in place. You can also pass the notebook content on
standard input, in which case the cleaned notebook is written to standard
output:

```bash
nb-clean clean < original.ipynb > cleaned.ipynb
```

The cleaning can be run with the following flags:

- To remove empty cells use `--remove-empty-cells` or the short form `-e`.
- To preserve cell metadata use `--preserve-cell-metadata` or the short form
  `-m`. This will preserve all metadata fields. You can also pass a list of
  fields to preserve with `--preserve-cell-metadata field1 field2` or
  `-m field1 field2`. Note that when _not_ passing a list of fields, either the
  `-m` or `--preserve-cell-metadata` flag must be passed _after_ the notebook
  paths to process, or the notebook paths should be preceded with `--` so they
  are not interpreted as metadata fields.
- To preserve cell outputs use `--preserve-cell-outputs` or the short form `-o`.
- To preserve cell execution counts use `--preserve-execution-counts` or the
  short form `-c`.
- To preserve notebook metadata (such as language version) use
  `--preserve-notebook-metadata` or the short form `-n`.
- To remove all notebook metadata use `--remove-all-notebook-metadata` or the
  short form `-M`.

For example, to clean a notebook whilst preserving notebook metadata:

```bash
nb-clean clean --preserve-notebook-metadata notebook.ipynb
```

To clean a notebook whilst preserving all cell metadata:

```bash
nb-clean clean --preserve-cell-metadata -- notebook.ipynb
```

To clean a notebook whilst preserving only the `tags` cell metadata field:

```bash
nb-clean clean --preserve-cell-metadata tags -- notebook.ipynb
```

#### Directory processing

Both the `check` and `clean` commands can operate on directories as well as
individual notebook files. When you provide a directory path, nb-clean will
recursively find all `.ipynb` files within that directory and process them. For
example:

```bash
nb-clean check notebooks/
```

or

```bash
nb-clean clean experiments/
```

This is particularly useful for batch processing entire project directories or
ensuring all notebooks in a repository are clean.

### Cleaning (Git filter)

To add a filter to an existing Git repository to automatically clean notebooks
when they're staged, run the following from the working tree:

```bash
nb-clean add-filter
```

This will configure a filter to remove cell execution counts, metadata, and
outputs. The same flags as described above for
[interactive cleaning](#cleaning-interactive) can be passed to customise the
behaviour.

The Git filter operates by configuring the `filter.nb-clean.clean` setting in
your repository's local Git configuration and adding the line
`*.ipynb filter=nb-clean` to `.git/info/attributes`. This ensures that all
notebook files are automatically processed through nb-clean when staged for
commit. The filter configuration is local to the repository and won't affect
your global or system Git settings.

To remove the filter, run:

```bash
nb-clean remove-filter
```

### Cleaning (Jujutsu)

nb-clean can be used to clean notebooks tracked with [Jujutsu] rather than Git.
Configure Jujutsu to use nb-clean as a fix tool by adding the following snippet
to `~/.config/jj/config.toml`:

```toml
[fix.tools.nb-clean]
command = ["nb-clean", "clean"]
patterns = ["glob:'**/*.ipynb'"]
```

The same flags as described above for
[interactive cleaning](#cleaning-interactive) can be appended to the `command`
array to customise the behaviour.

Tracked notebooks can then be cleaned by running `jj fix`. See the [Jujutsu
documentation][jujutsu docs] for further details of how to invoke and configure
fix tools.

### Cleaning (pre-commit hook)

nb-clean can also be used as a [pre-commit] hook. You may prefer this to the Git
filter if your project already uses the pre-commit framework.

Note that the Git filter and pre-commit hook work differently, with different
effects on your working directory. The pre-commit hook operates on the notebook
on disk, cleaning the copy in your working directory. The Git filter cleans
notebooks as they are added to the index, leaving the copy in your working
directory dirty. This means cell outputs are still visible to you in your local
Jupyter instance when using the Git filter, but not when using the pre-commit
hook.

After installing [pre-commit], add the nb-clean hook by adding the following
snippet to `.pre-commit-config.yaml` in the root of your repository:

```yaml
repos:
  - repo: https://github.com/srstevenson/nb-clean
    rev: 4.0.1
    hooks:
      - id: nb-clean
```

You can pass additional arguments to nb-clean with an `args` array. The
following example shows how to preserve only two specific metadata fields. Note
that, in the example, the final item `--` in the arg list is mandatory. The
option `--preserve-cell-metadata` may take an arbitrary number of field
arguments, and the `--` argument is needed to separate them from notebook
filenames, which `pre-commit` will append to the list of arguments.

```yaml
repos:
  - repo: https://github.com/srstevenson/nb-clean
    rev: 4.0.1
    hooks:
      - id: nb-clean
        args:
          - --remove-empty-cells
          - --preserve-cell-metadata
          - tags
          - slideshow
          - --
```

Run `pre-commit install` to ensure the hook is installed, and
`pre-commit autoupdate` to update the hook to the latest release of nb-clean.

### Preserving all nbformat metadata

To ignore or preserve specifically the metadata defined in the
[`nbformat` documentation](https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata),
use the following options:
`--preserve-cell-metadata collapsed scrolled deletable editable format name tags jupyter execution`.

## Python library usage

nb-clean can be used programmatically as a Python library, allowing integration
into other tools.

```python
import nbformat

import nb_clean

# Load a notebook
with open("notebook.ipynb") as f:
    notebook = nbformat.read(f, as_version=nbformat.NO_CONVERT)

# Check if the notebook is clean
is_clean = nb_clean.check_notebook(
    notebook, preserve_cell_outputs=True, filename="notebook.ipynb"
)

# Clean the notebook
cleaned_notebook = nb_clean.clean_notebook(
    notebook, remove_empty_cells=True, preserve_cell_metadata=["tags", "slideshow"]
)
```

The library functions accept the same configuration options as the command-line
interface. The `check_notebook()` function returns a boolean indicating whether
the notebook is clean, whilst `clean_notebook()` returns a cleaned copy of the
notebook.

## Migrating to nb-clean 2

The following table maps from the command line interface of nb-clean 1.6.0 to
that of nb-clean >=2.0.0.

The examples in the table use long flags, but short flags can also be used
instead.

| Description                                 | nb-clean 1.6.0                                                   | nb-clean >=2.0.0                                            |
| ------------------------------------------- | ---------------------------------------------------------------- | ----------------------------------------------------------- |
| Clean notebook                              | `nb-clean clean --input notebook.ipynb \| sponge notebook.ipynb` | `nb-clean clean notebook.ipynb`                             |
| Clean notebook (remove empty cells)         | `nb-clean clean --input notebook.ipynb --remove-empty`           | `nb-clean clean --remove-empty-cells notebook.ipynb`        |
| Clean notebook (preserve all cell metadata) | `nb-clean clean --input notebook.ipynb --preserve-metadata`      | `nb-clean clean --preserve-cell-metadata -- notebook.ipynb` |
| Check notebook                              | `nb-clean check --input notebook.ipynb`                          | `nb-clean check notebook.ipynb`                             |
| Check notebook (ignore non-empty cells)     | `nb-clean check --input notebook.ipynb --remove-empty`           | `nb-clean check --remove-empty-cells notebook.ipynb`        |
| Check notebook (ignore all cell metadata)   | `nb-clean check --input notebook.ipynb --preserve-metadata`      | `nb-clean check --preserve-cell-metadata -- notebook.ipynb` |
| Add Git filter to clean notebooks           | `nb-clean configure-git`                                         | `nb-clean add-filter`                                       |
| Remove Git filter                           | `nb-clean unconfigure-git`                                       | `nb-clean remove-filter`                                    |

## Copyright

Copyright © Scott Stevenson.

nb-clean is distributed under the terms of the [ISC license].

[blog post]: https://srstevenson.com/posts/jupyter-notebooks-and-collaboration/
[isc license]: https://opensource.org/licenses/ISC
[jujutsu docs]: https://jj-vcs.github.io/jj/latest/cli-reference/#jj-fix
[jujutsu]: https://jj-vcs.github.io/jj/
[pre-commit]: https://pre-commit.com/
[pycon talk]: https://www.youtube.com/watch?v=J3k3HkVnd2c
[uv]: https://docs.astral.sh/uv/


================================================
FILE: justfile
================================================
# show this help message (default)
help:
    @just -l

# format with ruff
fmt:
    uv run ruff check --fix
    uv run ruff format

# lint with ruff and type-check with ty
lint:
    uv run ruff check
    uv run ruff format --check
    uv run ty check

# run tests with pytest and report coverage
test:
    uv run coverage run -m pytest
    uv run coverage report


================================================
FILE: pyproject.toml
================================================
[project]
name = "nb-clean"
version = "4.0.1"
description = "Clean Jupyter notebooks for versioning"
authors = [{ name = "Scott Stevenson", email = "scott@stevenson.io" }]
readme = "README.md"
license = "ISC"
license-files = ["LICENSE"]
requires-python = ">=3.10"
keywords = ["jupyter", "notebook", "clean", "filter", "git"]
classifiers = [
  "Development Status :: 5 - Production/Stable",
  "Intended Audience :: Science/Research",
  "Natural Language :: English",
]
dependencies = ["nbformat>=5.9.2"]

[project.urls]
Homepage = "https://github.com/srstevenson/nb-clean"
Repository = "https://github.com/srstevenson/nb-clean"
Issues = "https://github.com/srstevenson/nb-clean/issues"

[project.scripts]
nb-clean = "nb_clean.cli:main"

[dependency-groups]
dev = [
  "coverage>=7.6.10",
  "pytest>=7.2.1",
  "pytest-mock>=3.11.1",
  "ruff>=0.1.6",
  "ty>=0.0.19",
  "typing-extensions>=4.14.1",
]

[build-system]
requires = ["uv_build>=0.7.19,<0.12"]
build-backend = "uv_build"

[tool.coverage.report]
exclude_also = ["if __name__ == .__main__.:", "if TYPE_CHECKING:"]

[tool.ruff]
target-version = "py310"

[tool.ruff.format]
docstring-code-format = true
skip-magic-trailing-comma = true

[tool.ruff.lint]
select = ["ALL"]
ignore = [
  "COM812",  # Trailing comma missing
  "C901",    # Function is too complex
  "E501",    # Line too long
  "PLR0912", # Too many branches
  "PLR0913", # Too many arguments in function definition
  "PLR2004", # Magic value used in comparison
  "S603",    # subprocess call: check for execution of untrusted input
  "S607",    # Starting a process with a partial executable path
  "T201",    # print found
]

[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"

[tool.ruff.lint.isort]
split-on-trailing-comma = false

[tool.ruff.lint.per-file-ignores]
"tests/**.py" = [
  "D",      # pydocstyle
  "INP001", # Implicit namespace package
  "S101",   # Magic value used in comparison
]

[tool.ruff.lint.pydocstyle]
convention = "google"

[tool.ty.rules]
all = "error"


================================================
FILE: src/nb_clean/__init__.py
================================================
"""Clean Jupyter notebooks of execution counts, metadata, and outputs."""

from __future__ import annotations

import contextlib
import subprocess
from pathlib import Path
from typing import TYPE_CHECKING, Any, Final, cast

if TYPE_CHECKING:
    from collections.abc import Collection

    import nbformat
    from typing_extensions import Self

VERSION: Final = "4.0.1"
GIT_ATTRIBUTES_LINE: Final = "*.ipynb filter=nb-clean"


class GitProcessError(Exception):
    """Exception for errors executing Git."""

    def __init__(self: Self, message: str, return_code: int) -> None:
        """Exception for errors executing Git.

        Args:
            message: Error message.
            return_code: Return code.
        """
        super().__init__(message)
        self.message: str = message
        self.return_code: int = return_code


def git(*args: str) -> str:
    """Execute a Git subcommand with the provided arguments.

    Args:
        *args: Git subcommand and arguments to execute.

    Returns:
        Standard output from the Git command, stripped of whitespace.

    Raises:
        GitProcessError: If the Git command fails with a non-zero exit code.

    Examples:
        >>> git("rev-parse", "--git-dir")
        '.git'
    """
    try:
        process = subprocess.run(["git", *list(args)], capture_output=True, check=True)
    except subprocess.CalledProcessError as exc:
        raise GitProcessError(exc.stderr.decode(), exc.returncode) from exc

    return process.stdout.decode().strip()


def git_attributes_path() -> Path:
    """Get path to the attributes file in the current Git repository.

    Returns:
        Path to the attributes file.

    Examples:
        >>> git_attributes_path()
        PosixPath('.git/info/attributes')
    """
    git_dir = git("rev-parse", "--git-dir")
    return Path(git_dir, "info", "attributes")


def add_git_filter(
    *,
    remove_empty_cells: bool = False,
    remove_all_notebook_metadata: bool = False,
    preserve_cell_metadata: Collection[str] | None = None,
    preserve_cell_outputs: bool = False,
    preserve_execution_counts: bool = False,
    preserve_notebook_metadata: bool = False,
) -> None:
    """Configure and add a Git filter to automatically clean Jupyter notebooks.

    This function sets up a Git filter that will automatically clean notebooks
    when they are staged for commit, removing execution counts, outputs, and
    metadata according to the specified options.

    Args:
        remove_empty_cells: If True, remove empty cells. Defaults to False.
        remove_all_notebook_metadata: If True, remove all notebook metadata. Defaults to False.
        preserve_cell_metadata: Controls cell metadata handling. If None, clean all cell metadata.
            If [], preserve all cell metadata.
            (This corresponds to the `-m` CLI option without specifying any fields.)
            If list of str, these are the cell metadata fields to preserve.
            Defaults to None.
        preserve_cell_outputs: If True, preserve cell outputs. Defaults to False.
        preserve_execution_counts: If True, preserve cell execution counts. Defaults to False.
        preserve_notebook_metadata: If True, preserve notebook metadata such as language version.
            Defaults to False.

    Raises:
        ValueError: If both preserve_notebook_metadata and remove_all_notebook_metadata are True.
    """
    if preserve_notebook_metadata and remove_all_notebook_metadata:
        msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`"
        raise ValueError(msg)

    command = ["nb-clean", "clean"]

    if remove_empty_cells:
        command.append("--remove-empty-cells")

    if preserve_cell_metadata is not None:
        if len(preserve_cell_metadata) > 0:
            command.append(
                f"--preserve-cell-metadata {' '.join(preserve_cell_metadata)}"
            )
        else:
            command.append("--preserve-cell-metadata")

    if preserve_cell_outputs:
        command.append("--preserve-cell-outputs")

    if preserve_execution_counts:
        command.append("--preserve-execution-counts")

    if preserve_notebook_metadata:
        command.append("--preserve-notebook-metadata")

    if remove_all_notebook_metadata:
        command.append("--remove-all-notebook-metadata")

    git("config", "filter.nb-clean.clean", " ".join(command))

    attributes_path = git_attributes_path()

    if attributes_path.is_file() and GIT_ATTRIBUTES_LINE in attributes_path.read_text(
        encoding="UTF-8"
    ):
        return

    with attributes_path.open("a", encoding="UTF-8") as file:
        file.write(f"\n{GIT_ATTRIBUTES_LINE}\n")


def remove_git_filter() -> None:
    """Remove the nb-clean filter from the current Git repository.

    This function removes the nb-clean filter configuration from the Git repository
    and cleans up the attributes file by removing the filter directive.

    Raises:
        GitProcessError: If Git command execution fails.
    """
    attributes_path = git_attributes_path()

    if attributes_path.is_file():
        original_contents = attributes_path.read_text(encoding="UTF-8").split("\n")
        revised_contents = [
            line for line in original_contents if line != GIT_ATTRIBUTES_LINE
        ]
        attributes_path.write_text("\n".join(revised_contents), encoding="UTF-8")

    git("config", "--remove-section", "filter.nb-clean")


def check_notebook(
    notebook: nbformat.NotebookNode,
    *,
    remove_empty_cells: bool = False,
    remove_all_notebook_metadata: bool = False,
    preserve_cell_metadata: Collection[str] | None = None,
    preserve_cell_outputs: bool = False,
    preserve_execution_counts: bool = False,
    preserve_notebook_metadata: bool = False,
    filename: str = "notebook",
) -> bool:
    """Check notebook is clean of execution counts, metadata, and outputs.

    Args:
        notebook: The notebook to check.
        remove_empty_cells: If True, also check for the presence of empty cells. Defaults to False.
        remove_all_notebook_metadata: If True, also check for the presence of any notebook metadata.
            Defaults to False.
        preserve_cell_metadata: If None, check for all cell metadata.
            If [], don't check for any cell metadata.
            (This corresponds to the `-m` CLI option without specifying any fields.)
            If list of str, these are the cell metadata fields to ignore.
            Defaults to None.
        preserve_cell_outputs: If True, don't check for cell outputs. Defaults to False.
        preserve_execution_counts: If True, don't check for cell execution counts. Defaults to False.
        preserve_notebook_metadata: If True, preserve notebook metadata such as language version.
            Defaults to False.
        filename: Notebook filename to use in log messages. Defaults to "notebook".

    Returns:
        True if the notebook is clean, False otherwise.
    """
    if preserve_notebook_metadata and remove_all_notebook_metadata:
        msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`"
        raise ValueError(msg)

    is_clean = True

    for index, cell in enumerate(notebook.cells):
        prefix = f"{filename} cell {index}"

        if remove_empty_cells and not cell["source"]:
            print(f"{prefix}: empty cell")
            is_clean = False

        if preserve_cell_metadata is None:
            if cell["metadata"]:
                print(f"{prefix}: metadata")
                is_clean = False
        elif len(preserve_cell_metadata) > 0:
            for field in cell["metadata"]:
                if field not in preserve_cell_metadata:
                    print(f"{prefix}: metadata {field}")
                    is_clean = False

        if cell["cell_type"] == "code":
            if not preserve_execution_counts and cell["execution_count"]:
                print(f"{prefix}: execution count")
                is_clean = False

            if preserve_cell_outputs:
                if not preserve_execution_counts:
                    for output in cell["outputs"]:
                        if output.get("execution_count") is not None:
                            print(f"{prefix}: output execution count")
                            is_clean = False
            elif cell["outputs"]:
                print(f"{prefix}: outputs")
                is_clean = False

    if remove_all_notebook_metadata and cast("dict[str, Any]", notebook.metadata):
        print(f"{filename}: metadata")
        is_clean = False

    if not preserve_notebook_metadata:
        with contextlib.suppress(KeyError):
            notebook["metadata"]["language_info"]["version"]
            print(f"{filename} metadata: language_info.version")
            is_clean = False

    return is_clean


def clean_notebook(
    notebook: nbformat.NotebookNode,
    *,
    remove_empty_cells: bool = False,
    remove_all_notebook_metadata: bool = False,
    preserve_cell_metadata: Collection[str] | None = None,
    preserve_cell_outputs: bool = False,
    preserve_execution_counts: bool = False,
    preserve_notebook_metadata: bool = False,
) -> nbformat.NotebookNode:
    """Clean notebook of execution counts, metadata, and outputs.

    Args:
        notebook: The notebook to clean.
        remove_empty_cells: If True, remove empty cells. Defaults to False.
        remove_all_notebook_metadata: If True, remove all notebook metadata. Defaults to False.
        preserve_cell_metadata: If None, clean all cell metadata.
            If [], preserve all cell metadata.
            (This corresponds to the `-m` CLI option without specifying any fields.)
            If list of str, these are the cell metadata fields to preserve.
            Defaults to None.
        preserve_cell_outputs: If True, preserve cell outputs. Defaults to False.
        preserve_execution_counts: If True, preserve cell execution counts. Defaults to False.
        preserve_notebook_metadata: If True, preserve notebook metadata such as language version.
            Defaults to False.

    Returns:
        The cleaned notebook.
    """
    if preserve_notebook_metadata and remove_all_notebook_metadata:
        msg = "`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`"
        raise ValueError(msg)

    if remove_empty_cells:
        notebook.cells = [cell for cell in notebook.cells if cell["source"]]

    for cell in notebook.cells:
        if preserve_cell_metadata is None:
            cell["metadata"] = {}
        elif len(preserve_cell_metadata) > 0:
            cell["metadata"] = {
                field: value
                for field, value in cell["metadata"].items()
                if field in preserve_cell_metadata
            }
        if cell["cell_type"] == "code":
            if not preserve_execution_counts:
                cell["execution_count"] = None
            if preserve_cell_outputs:
                if not preserve_execution_counts:
                    for output in cell["outputs"]:
                        if "execution_count" in output:
                            output["execution_count"] = None
            else:
                cell["outputs"] = []

    if remove_all_notebook_metadata:
        notebook.metadata = {}
    elif not preserve_notebook_metadata:
        with contextlib.suppress(KeyError):
            del notebook["metadata"]["language_info"]["version"]

    return notebook


================================================
FILE: src/nb_clean/__main__.py
================================================
"""Top-level script to run nb-clean."""

from nb_clean.cli import main

if __name__ == "__main__":
    main()


================================================
FILE: src/nb_clean/cli.py
================================================
"""Command line interface to nb-clean."""

from __future__ import annotations

import argparse
import os
import sys
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, NoReturn, TextIO, cast

import nbformat

import nb_clean

if TYPE_CHECKING:
    from collections.abc import Collection, Iterable, Sequence


@dataclass
class Args(argparse.Namespace):
    """Arguments parsed from the command-line."""

    subcommand: str = ""
    inputs: list[Path] = field(default_factory=list)
    remove_empty_cells: bool = False
    remove_all_notebook_metadata: bool = False
    preserve_cell_metadata: list[str] | None = None
    preserve_cell_outputs: bool = False
    preserve_execution_counts: bool = False
    preserve_notebook_metadata: bool = False


def expand_directories(paths: Iterable[Path]) -> list[Path]:
    """Expand paths to directories into paths to notebooks contained within.

    Args:
        paths: Paths to expand, including directories.

    Returns:
        Paths with directories expanded into notebooks contained within.
    """
    expanded: set[Path] = set()
    for path in paths:
        if path.is_dir():
            expanded.update(path.rglob("*.ipynb"))
        else:
            expanded.add(path)
    return list(expanded)


def exit_with_error(message: str, return_code: int) -> NoReturn:
    """Print an error message to standard error and exit.

    Args:
        message: Error message to print to standard error.
        return_code: Return code with which to exit.
    """
    print(f"nb-clean: error: {message}", file=sys.stderr)
    sys.exit(return_code)


def add_filter(
    *,
    remove_empty_cells: bool,
    remove_all_notebook_metadata: bool,
    preserve_cell_metadata: Collection[str] | None,
    preserve_cell_outputs: bool,
    preserve_execution_counts: bool,
    preserve_notebook_metadata: bool,
) -> None:
    """Add the nb-clean filter to the current Git repository.

    Args:
        remove_empty_cells: Configure the filter to remove empty cells.
        remove_all_notebook_metadata: Configure the filter to remove all notebook metadata.
        preserve_cell_metadata: Configure the filter to preserve cell metadata.
        preserve_cell_outputs: Configure the filter to preserve cell outputs.
        preserve_execution_counts: Configure the filter to preserve cell execution counts.
        preserve_notebook_metadata: Configure the filter to preserve notebook metadata such as language version.
    """
    try:
        nb_clean.add_git_filter(
            remove_empty_cells=remove_empty_cells,
            remove_all_notebook_metadata=remove_all_notebook_metadata,
            preserve_cell_metadata=preserve_cell_metadata,
            preserve_cell_outputs=preserve_cell_outputs,
            preserve_execution_counts=preserve_execution_counts,
            preserve_notebook_metadata=preserve_notebook_metadata,
        )
    except nb_clean.GitProcessError as exc:
        exit_with_error(exc.message, exc.return_code)


def remove_filter() -> None:
    """Remove the nb-clean filter from the current Git repository.

    This function removes the nb-clean filter configuration and cleans up
    the Git attributes file. If Git command execution fails, the program
    will exit with an appropriate error code.
    """
    try:
        nb_clean.remove_git_filter()
    except nb_clean.GitProcessError as exc:
        exit_with_error(exc.message, exc.return_code)


def check(
    inputs: Iterable[Path],
    *,
    remove_empty_cells: bool,
    remove_all_notebook_metadata: bool,
    preserve_cell_metadata: Collection[str] | None,
    preserve_cell_outputs: bool,
    preserve_execution_counts: bool,
    preserve_notebook_metadata: bool,
) -> None:
    """Check notebooks are clean of execution counts, metadata, and outputs.

    Args:
        inputs: Input notebook paths to check, empty list for stdin.
        remove_empty_cells: Check for the presence of empty cells.
        remove_all_notebook_metadata: Check for any notebook metadata.
        preserve_cell_metadata: Don't check for cell metadata.
        preserve_cell_outputs: Don't check for cell outputs.
        preserve_execution_counts: Don't check for cell execution counts.
        preserve_notebook_metadata: Don't check for notebook metadata such as language version.
    """
    if inputs:
        processed_inputs: list[Path] | list[TextIO] = expand_directories(inputs)
    else:
        processed_inputs = [sys.stdin]

    all_clean = True
    for input_ in processed_inputs:
        name = "stdin" if input_ is sys.stdin else os.fspath(cast("Path", input_))

        notebook = cast(
            "nbformat.NotebookNode",
            nbformat.read(input_, as_version=nbformat.NO_CONVERT),
        )
        is_clean = nb_clean.check_notebook(
            notebook,
            remove_empty_cells=remove_empty_cells,
            remove_all_notebook_metadata=remove_all_notebook_metadata,
            preserve_cell_metadata=preserve_cell_metadata,
            preserve_cell_outputs=preserve_cell_outputs,
            preserve_execution_counts=preserve_execution_counts,
            preserve_notebook_metadata=preserve_notebook_metadata,
            filename=name,
        )
        all_clean &= is_clean

    if not all_clean:
        sys.exit(1)


def clean(
    inputs: Iterable[Path],
    *,
    remove_empty_cells: bool,
    remove_all_notebook_metadata: bool,
    preserve_cell_metadata: Collection[str] | None,
    preserve_cell_outputs: bool,
    preserve_execution_counts: bool,
    preserve_notebook_metadata: bool,
) -> None:
    """Clean notebooks of execution counts, metadata, and outputs.

    Args:
        inputs: Input notebook paths to clean, empty list for stdin.
        remove_empty_cells: Remove empty cells.
        remove_all_notebook_metadata: Remove all notebook metadata.
        preserve_cell_metadata: Don't clean cell metadata.
        preserve_cell_outputs: Don't clean cell outputs.
        preserve_execution_counts: Don't clean cell execution counts.
        preserve_notebook_metadata: Don't clean notebook metadata such as language version.
    """
    if inputs:
        processed_inputs: list[Path] | list[TextIO] = expand_directories(inputs)
        outputs = processed_inputs
    else:
        processed_inputs = [sys.stdin]
        outputs = [sys.stdout]

    for input_, output in zip(processed_inputs, outputs, strict=True):
        notebook = cast(
            "nbformat.NotebookNode",
            nbformat.read(input_, as_version=nbformat.NO_CONVERT),
        )

        notebook = nb_clean.clean_notebook(
            notebook,
            remove_empty_cells=remove_empty_cells,
            remove_all_notebook_metadata=remove_all_notebook_metadata,
            preserve_cell_metadata=preserve_cell_metadata,
            preserve_cell_outputs=preserve_cell_outputs,
            preserve_execution_counts=preserve_execution_counts,
            preserve_notebook_metadata=preserve_notebook_metadata,
        )
        nbformat.write(notebook, output)


def parse_args(args: Sequence[str]) -> Args:
    """Parse command line arguments and call corresponding function.

    Args:
        args: Command line arguments to parse.

    Returns:
        Parsed command line arguments.
    """
    parser = argparse.ArgumentParser(description=__doc__)
    subparsers = parser.add_subparsers(dest="subcommand", required=True)

    subparsers.add_parser("version", help="print version number")

    add_filter_parser = subparsers.add_parser(
        "add-filter", help="add Git filter to clean notebooks before staging"
    )
    add_filter_parser.add_argument(
        "-e", "--remove-empty-cells", action="store_true", help="remove empty cells"
    )
    add_filter_parser.add_argument(
        "-M",
        "--remove-all-notebook-metadata",
        action="store_true",
        help="remove all notebook metadata",
    )
    add_filter_parser.add_argument(
        "-m",
        "--preserve-cell-metadata",
        default=None,
        nargs="*",
        help="preserve cell metadata, all unless fields are specified",
    )
    add_filter_parser.add_argument(
        "-o",
        "--preserve-cell-outputs",
        action="store_true",
        help="preserve cell outputs",
    )
    add_filter_parser.add_argument(
        "-c",
        "--preserve-execution-counts",
        action="store_true",
        help="preserve cell execution counts",
    )
    add_filter_parser.add_argument(
        "-n",
        "--preserve-notebook-metadata",
        action="store_true",
        help="preserve notebook metadata",
    )

    subparsers.add_parser(
        "remove-filter", help="remove Git filter that cleans notebooks before staging"
    )

    check_parser = subparsers.add_parser(
        "check",
        help=(
            "check a notebook is clean of cell execution counts, metadata, and outputs"
        ),
    )
    check_parser.add_argument(
        "inputs", nargs="*", metavar="PATH", type=Path, help="input file"
    )
    check_parser.add_argument(
        "-e", "--remove-empty-cells", action="store_true", help="check for empty cells"
    )
    check_parser.add_argument(
        "-M",
        "--remove-all-notebook-metadata",
        action="store_true",
        help="check for any notebook metadata",
    )
    check_parser.add_argument(
        "-m",
        "--preserve-cell-metadata",
        default=None,
        nargs="*",
        help="preserve cell metadata, all unless fields are specified",
    )
    check_parser.add_argument(
        "-o",
        "--preserve-cell-outputs",
        action="store_true",
        help="preserve cell outputs",
    )
    check_parser.add_argument(
        "-c",
        "--preserve-execution-counts",
        action="store_true",
        help="preserve cell execution counts",
    )
    check_parser.add_argument(
        "-n",
        "--preserve-notebook-metadata",
        action="store_true",
        help="preserve notebook metadata",
    )

    clean_parser = subparsers.add_parser(
        "clean", help="clean notebook of cell execution counts, metadata, and outputs"
    )
    clean_parser.add_argument(
        "inputs", nargs="*", metavar="PATH", type=Path, help="input path"
    )
    clean_parser.add_argument(
        "-e", "--remove-empty-cells", action="store_true", help="remove empty cells"
    )
    clean_parser.add_argument(
        "-M",
        "--remove-all-notebook-metadata",
        action="store_true",
        help="remove all notebook metadata",
    )
    clean_parser.add_argument(
        "-m",
        "--preserve-cell-metadata",
        default=None,
        nargs="*",
        help="preserve cell metadata, all unless fields are specified",
    )
    clean_parser.add_argument(
        "-o",
        "--preserve-cell-outputs",
        action="store_true",
        help="preserve cell outputs",
    )
    clean_parser.add_argument(
        "-c",
        "--preserve-execution-counts",
        action="store_true",
        help="preserve cell execution counts",
    )
    clean_parser.add_argument(
        "-n",
        "--preserve-notebook-metadata",
        action="store_true",
        help="preserve notebook metadata",
    )

    return parser.parse_args(args, namespace=Args())


def main() -> None:  # pragma: no cover
    """Command line entrypoint for nb-clean.

    Parses command line arguments and dispatches to the appropriate
    subcommand handler (version, add-filter, remove-filter, check, or clean).
    """
    args = parse_args(sys.argv[1:])

    if args.subcommand == "version":
        print(f"nb-clean {nb_clean.VERSION}")
    elif args.subcommand == "add-filter":
        add_filter(
            remove_empty_cells=args.remove_empty_cells,
            remove_all_notebook_metadata=args.remove_all_notebook_metadata,
            preserve_cell_metadata=args.preserve_cell_metadata,
            preserve_cell_outputs=args.preserve_cell_outputs,
            preserve_execution_counts=args.preserve_execution_counts,
            preserve_notebook_metadata=args.preserve_notebook_metadata,
        )
    elif args.subcommand == "remove-filter":
        remove_filter()
    elif args.subcommand == "check":
        check(
            args.inputs,
            remove_empty_cells=args.remove_empty_cells,
            remove_all_notebook_metadata=args.remove_all_notebook_metadata,
            preserve_cell_metadata=args.preserve_cell_metadata,
            preserve_cell_outputs=args.preserve_cell_outputs,
            preserve_execution_counts=args.preserve_execution_counts,
            preserve_notebook_metadata=args.preserve_notebook_metadata,
        )
    elif args.subcommand == "clean":
        clean(
            args.inputs,
            remove_empty_cells=args.remove_empty_cells,
            remove_all_notebook_metadata=args.remove_all_notebook_metadata,
            preserve_cell_metadata=args.preserve_cell_metadata,
            preserve_cell_outputs=args.preserve_cell_outputs,
            preserve_execution_counts=args.preserve_execution_counts,
            preserve_notebook_metadata=args.preserve_notebook_metadata,
        )
    else:
        # This should never happen due to argparse validation, but be defensive
        exit_with_error(f"Unknown subcommand: {args.subcommand}", 1)


================================================
FILE: src/nb_clean/py.typed
================================================


================================================
FILE: tests/conftest.py
================================================
from pathlib import Path
from typing import Final, cast

import nbformat
import pytest

NOTEBOOKS_DIR: Final = Path(__file__).parent / "notebooks"


def _read_notebook(filename: str) -> nbformat.NotebookNode:
    return cast(
        "nbformat.NotebookNode",
        nbformat.read(NOTEBOOKS_DIR / filename, as_version=nbformat.NO_CONVERT),
    )


@pytest.fixture
def dirty_notebook() -> nbformat.NotebookNode:
    return _read_notebook("dirty.ipynb")


@pytest.fixture
def dirty_notebook_with_version() -> nbformat.NotebookNode:
    return _read_notebook("dirty_with_version.ipynb")


@pytest.fixture
def clean_notebook() -> nbformat.NotebookNode:
    return _read_notebook("clean.ipynb")


@pytest.fixture
def clean_notebook_with_notebook_metadata() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_notebook_metadata.ipynb")


@pytest.fixture
def clean_notebook_without_empty_cells() -> nbformat.NotebookNode:
    return _read_notebook("clean_without_empty_cells.ipynb")


@pytest.fixture
def clean_notebook_with_empty_cells() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_empty_cells.ipynb")


@pytest.fixture
def clean_notebook_with_counts() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_counts.ipynb")


@pytest.fixture
def clean_notebook_with_cell_metadata() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_cell_metadata.ipynb")


@pytest.fixture
def clean_notebook_with_tags_metadata() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_tags_metadata.ipynb")


@pytest.fixture
def clean_notebook_with_tags_special_metadata() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_tags_special_metadata.ipynb")


@pytest.fixture
def clean_notebook_with_outputs() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_outputs.ipynb")


@pytest.fixture
def clean_notebook_with_outputs_with_counts() -> nbformat.NotebookNode:
    return _read_notebook("clean_with_outputs_with_counts.ipynb")


@pytest.fixture
def clean_notebook_without_notebook_metadata() -> nbformat.NotebookNode:
    return _read_notebook("clean_without_notebook_metadata.ipynb")


================================================
FILE: tests/notebooks/clean.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_cell_metadata.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbclean": "test",
    "special": "my special metadata",
    "tags": [
     "before-import",
     "answer"
    ]
   },
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_counts.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_empty_cells.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_notebook_metadata.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_outputs.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hello, world'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world\n"
     ]
    }
   ],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_outputs_with_counts.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hello, world'"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world\n"
     ]
    }
   ],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_tags_metadata.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "before-import",
     "answer"
    ]
   },
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_with_tags_special_metadata.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "special": "my special metadata",
    "tags": [
     "before-import",
     "answer"
    ]
   },
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_without_empty_cells.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/clean_without_notebook_metadata.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/dirty.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "nbclean": "test",
    "tags": [
     "before-import",
     "answer"
    ],
    "special": "my special metadata"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hello, world'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world\n"
     ]
    }
   ],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/notebooks/dirty_empty_octave.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10cfba24-bab5-47a0-9ab8-5d1fc01f1f58",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Octave",
   "language": "octave",
   "name": "octave"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: tests/notebooks/dirty_with_version.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbclean": "test"
   },
   "outputs": [],
   "source": [
    "text = \"Hello, world\"\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbclean": "test",
    "tags": [
     "example-tag",
     "another-tag"
    ],
    "special": "my special metadata"
   },
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: tests/test_check_notebook.py
================================================
from __future__ import annotations

from typing import TYPE_CHECKING, cast

import pytest

import nb_clean

if TYPE_CHECKING:
    from collections.abc import Collection

    import nbformat


@pytest.mark.parametrize(
    ("notebook_name", "is_clean"),
    [
        ("clean_notebook", True),
        ("dirty_notebook", False),
        ("dirty_notebook_with_version", False),
    ],
)
def test_check_notebook(
    notebook_name: str, *, is_clean: bool, request: pytest.FixtureRequest
) -> None:
    notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name))
    assert nb_clean.check_notebook(notebook) is is_clean


@pytest.mark.parametrize("preserve_notebook_metadata", [True, False])
def test_check_notebook_preserve_notebook_metadata(
    clean_notebook_with_notebook_metadata: nbformat.NotebookNode,
    *,
    preserve_notebook_metadata: bool,
) -> None:
    assert (
        nb_clean.check_notebook(
            clean_notebook_with_notebook_metadata,
            preserve_notebook_metadata=preserve_notebook_metadata,
        )
        is preserve_notebook_metadata
    )


@pytest.mark.parametrize("remove_empty_cells", [True, False])
def test_check_notebook_remove_empty_cells(
    clean_notebook_with_empty_cells: nbformat.NotebookNode, *, remove_empty_cells: bool
) -> None:
    output = nb_clean.check_notebook(
        clean_notebook_with_empty_cells, remove_empty_cells=remove_empty_cells
    )
    assert output is not remove_empty_cells


@pytest.mark.parametrize(
    "preserve_cell_metadata",
    [
        [],
        ["tags"],
        ["other"],
        ["tags", "special"],
        ["nbformat", "tags", "special"],
        None,
    ],
)
def test_check_notebook_preserve_cell_metadata(
    clean_notebook_with_cell_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str] | None,
) -> None:
    expected = (preserve_cell_metadata is not None) and (
        preserve_cell_metadata == []
        or {"tags", "special", "nbclean"}.issubset(preserve_cell_metadata)
    )
    output = nb_clean.check_notebook(
        clean_notebook_with_cell_metadata, preserve_cell_metadata=preserve_cell_metadata
    )
    assert output is expected


@pytest.mark.parametrize(
    "preserve_cell_metadata",
    [
        [],
        ["tags"],
        ["other"],
        ["tags", "special"],
        ["nbformat", "tags", "special"],
        None,
    ],
)
def test_check_notebook_preserve_cell_metadata_tags(
    clean_notebook_with_tags_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str] | None,
) -> None:
    expected = (preserve_cell_metadata is not None) and (
        preserve_cell_metadata == [] or {"tags"}.issubset(preserve_cell_metadata)
    )
    output = nb_clean.check_notebook(
        clean_notebook_with_tags_metadata, preserve_cell_metadata=preserve_cell_metadata
    )
    assert output is expected


@pytest.mark.parametrize(
    "preserve_cell_metadata",
    [
        [],
        ["tags"],
        ["other"],
        ["tags", "special"],
        ["nbformat", "tags", "special"],
        None,
    ],
)
def test_check_notebook_preserve_cell_metadata_tags_special(
    clean_notebook_with_tags_special_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str] | None,
) -> None:
    expected = (preserve_cell_metadata is not None) and (
        preserve_cell_metadata == []
        or {"tags", "special"}.issubset(preserve_cell_metadata)
    )
    output = nb_clean.check_notebook(
        clean_notebook_with_tags_special_metadata,
        preserve_cell_metadata=preserve_cell_metadata,
    )
    assert output is expected


@pytest.mark.parametrize(
    ("notebook_name", "preserve_cell_outputs", "is_clean"),
    [
        ("clean_notebook_with_outputs", True, True),
        ("clean_notebook_with_outputs", False, False),
        ("clean_notebook_with_outputs_with_counts", True, False),
    ],
)
def test_check_notebook_preserve_outputs(
    notebook_name: str,
    *,
    preserve_cell_outputs: bool,
    is_clean: bool,
    request: pytest.FixtureRequest,
) -> None:
    notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name))
    output = nb_clean.check_notebook(
        notebook, preserve_cell_outputs=preserve_cell_outputs
    )
    assert output is is_clean


@pytest.mark.parametrize(
    ("notebook_name", "preserve_execution_counts", "is_clean"),
    [
        ("clean_notebook_with_counts", True, True),
        ("clean_notebook_with_counts", False, False),
    ],
)
def test_check_notebook_preserve_execution_counts(
    notebook_name: str,
    *,
    preserve_execution_counts: bool,
    is_clean: bool,
    request: pytest.FixtureRequest,
) -> None:
    notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name))
    output = nb_clean.check_notebook(
        notebook, preserve_execution_counts=preserve_execution_counts
    )
    assert output is is_clean


@pytest.mark.parametrize(
    ("notebook_name", "remove_all_notebook_metadata", "is_clean"),
    [
        ("clean_notebook_with_notebook_metadata", True, False),
        ("clean_notebook_with_notebook_metadata", False, False),
        ("clean_notebook_without_notebook_metadata", True, True),
        ("clean_notebook_without_notebook_metadata", False, True),
        ("clean_notebook", True, False),
        ("clean_notebook", False, True),
    ],
)
def test_check_notebook_remove_all_notebook_metadata(
    notebook_name: str,
    *,
    remove_all_notebook_metadata: bool,
    is_clean: bool,
    request: pytest.FixtureRequest,
) -> None:
    # The test with `("clean_notebook_with_notebook_metadata", False, True)`
    # is False due to `clean_notebook_with_notebook_metadata` containing
    # `language_info.version` detected when `preserve_notebook_metadata=False`.
    notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name))
    assert (
        nb_clean.check_notebook(
            notebook, remove_all_notebook_metadata=remove_all_notebook_metadata
        )
        == is_clean
    )


def test_check_notebook_exclusive_arguments(
    dirty_notebook: nbformat.NotebookNode,
) -> None:
    with pytest.raises(
        ValueError,
        match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`",
    ):
        nb_clean.check_notebook(
            dirty_notebook,
            remove_all_notebook_metadata=True,
            preserve_notebook_metadata=True,
        )


================================================
FILE: tests/test_clean_notebook.py
================================================
from collections.abc import Collection
from typing import cast

import nbformat
import pytest

import nb_clean


def test_clean_notebook(
    dirty_notebook: nbformat.NotebookNode, clean_notebook: nbformat.NotebookNode
) -> None:
    assert nb_clean.clean_notebook(dirty_notebook) == clean_notebook


@pytest.mark.parametrize(
    ("preserve_notebook_metadata", "expected_output_name"),
    [(True, "clean_notebook_with_notebook_metadata"), (False, "clean_notebook")],
)
def test_clean_notebook_with_notebook_metadata(
    clean_notebook_with_notebook_metadata: nbformat.NotebookNode,
    *,
    preserve_notebook_metadata: bool,
    expected_output_name: str,
    request: pytest.FixtureRequest,
) -> None:
    expected_output = cast(
        "nbformat.NotebookNode", request.getfixturevalue(expected_output_name)
    )
    assert (
        nb_clean.clean_notebook(
            clean_notebook_with_notebook_metadata,
            preserve_notebook_metadata=preserve_notebook_metadata,
        )
        == expected_output
    )


def test_clean_notebook_remove_empty_cells(
    clean_notebook_with_empty_cells: nbformat.NotebookNode,
    clean_notebook_without_empty_cells: nbformat.NotebookNode,
) -> None:
    assert (
        nb_clean.clean_notebook(
            clean_notebook_with_empty_cells, remove_empty_cells=True
        )
        == clean_notebook_without_empty_cells
    )


@pytest.mark.parametrize(
    "preserve_cell_metadata",
    [[], ["nbclean", "tags", "special"], ["nbclean", "tags", "special", "toomany"]],
)
def test_clean_notebook_preserve_cell_metadata(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_with_cell_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str],
) -> None:
    assert (
        nb_clean.clean_notebook(
            dirty_notebook, preserve_cell_metadata=preserve_cell_metadata
        )
        == clean_notebook_with_cell_metadata
    )


@pytest.mark.parametrize("preserve_cell_metadata", [["tags"], ["tags", "toomany"]])
def test_clean_notebook_preserve_cell_metadata_tags(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_with_tags_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str],
) -> None:
    assert (
        nb_clean.clean_notebook(
            dirty_notebook, preserve_cell_metadata=preserve_cell_metadata
        )
        == clean_notebook_with_tags_metadata
    )


@pytest.mark.parametrize(
    "preserve_cell_metadata", [["tags", "special"], ["tags", "special", "toomany"]]
)
def test_clean_notebook_preserve_cell_metadata_tags_special(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_with_tags_special_metadata: nbformat.NotebookNode,
    preserve_cell_metadata: Collection[str],
) -> None:
    assert (
        nb_clean.clean_notebook(
            dirty_notebook, preserve_cell_metadata=preserve_cell_metadata
        )
        == clean_notebook_with_tags_special_metadata
    )


def test_clean_notebook_preserve_outputs(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_with_outputs: nbformat.NotebookNode,
) -> None:
    assert (
        nb_clean.clean_notebook(dirty_notebook, preserve_cell_outputs=True)
        == clean_notebook_with_outputs
    )


def test_clean_notebook_preserve_execution_counts(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_with_counts: nbformat.NotebookNode,
) -> None:
    assert (
        nb_clean.clean_notebook(dirty_notebook, preserve_execution_counts=True)
        == clean_notebook_with_counts
    )


def test_clean_notebook_remove_all_notebook_metadata(
    dirty_notebook: nbformat.NotebookNode,
    clean_notebook_without_notebook_metadata: nbformat.NotebookNode,
) -> None:
    assert (
        nb_clean.clean_notebook(dirty_notebook, remove_all_notebook_metadata=True)
        == clean_notebook_without_notebook_metadata
    )


def test_clean_notebook_exclusive_arguments(
    dirty_notebook: nbformat.NotebookNode,
) -> None:
    with pytest.raises(
        ValueError,
        match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`",
    ):
        nb_clean.clean_notebook(
            dirty_notebook,
            remove_all_notebook_metadata=True,
            preserve_notebook_metadata=True,
        )


================================================
FILE: tests/test_cli.py
================================================
from __future__ import annotations

import io
import os
import sys
from pathlib import Path
from typing import TYPE_CHECKING, cast

import nbformat
import pytest

import nb_clean
import nb_clean.cli

if TYPE_CHECKING:
    from collections.abc import Collection, Iterable

    from pytest import CaptureFixture  # noqa: PT013


def test_expand_directories_with_files() -> None:
    paths = [Path("tests/notebooks/dirty.ipynb")]
    assert nb_clean.cli.expand_directories(paths) == paths


def test_expand_directories_recursively() -> None:
    input_paths = [Path("tests")]
    expanded_paths = nb_clean.cli.expand_directories(input_paths)
    assert len(expanded_paths) > len(input_paths)
    assert all(path.is_file() and path.suffix == ".ipynb" for path in expanded_paths)


def test_exit_with_error(capsys: CaptureFixture[str]) -> None:
    with pytest.raises(SystemExit) as exc:
        nb_clean.cli.exit_with_error("error message", 42)
    assert exc.value.code == 42
    assert capsys.readouterr().err == "nb-clean: error: error message\n"


def test_add_filter_dispatch(monkeypatch: pytest.MonkeyPatch) -> None:
    captured: dict[str, object] = {}

    def fake_add_git_filter(**kwargs: object) -> None:
        captured.update(kwargs)

    monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter)

    argv = ["nb-clean", "add-filter", "-e", "-n"]
    monkeypatch.setattr(sys, "argv", argv)
    nb_clean.cli.main()

    assert captured == {
        "remove_empty_cells": True,
        "remove_all_notebook_metadata": False,
        "preserve_cell_metadata": None,
        "preserve_cell_outputs": False,
        "preserve_execution_counts": False,
        "preserve_notebook_metadata": True,
    }


def test_add_filter_remove_all_notebook_metadata_dispatch(
    monkeypatch: pytest.MonkeyPatch,
) -> None:
    captured: dict[str, object] = {}

    def fake_add_git_filter(**kwargs: object) -> None:
        captured.update(kwargs)

    monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter)

    argv = ["nb-clean", "add-filter", "-e", "-M"]
    monkeypatch.setattr(sys, "argv", argv)
    nb_clean.cli.main()

    assert captured == {
        "remove_empty_cells": True,
        "remove_all_notebook_metadata": True,
        "preserve_cell_metadata": None,
        "preserve_cell_outputs": False,
        "preserve_execution_counts": False,
        "preserve_notebook_metadata": False,
    }


def test_add_filter_failure_dispatch(
    capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch
) -> None:
    def fake_add_git_filter(**_kwargs: object) -> None:
        raise nb_clean.GitProcessError(message="error message", return_code=42)

    monkeypatch.setattr(nb_clean, "add_git_filter", fake_add_git_filter)
    monkeypatch.setattr(sys, "argv", ["nb-clean", "add-filter", "-e", "-M"])

    with pytest.raises(SystemExit) as exc:
        nb_clean.cli.main()
    assert exc.value.code == 42
    assert capsys.readouterr().err == "nb-clean: error: error message\n"


def test_remove_filter_dispatch(monkeypatch: pytest.MonkeyPatch) -> None:
    called = {"value": False}

    def fake_remove_git_filter() -> None:
        called["value"] = True

    monkeypatch.setattr(nb_clean, "remove_git_filter", fake_remove_git_filter)
    monkeypatch.setattr(sys, "argv", ["nb-clean", "remove-filter"])
    nb_clean.cli.main()
    assert called["value"]


def test_remove_filter_failure_dispatch(
    capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch
) -> None:
    def fake_remove_git_filter() -> None:
        raise nb_clean.GitProcessError(message="error message", return_code=42)

    monkeypatch.setattr(nb_clean, "remove_git_filter", fake_remove_git_filter)
    monkeypatch.setattr(sys, "argv", ["nb-clean", "remove-filter"])

    with pytest.raises(SystemExit) as exc:
        nb_clean.cli.main()
    assert exc.value.code == 42
    assert capsys.readouterr().err == "nb-clean: error: error message\n"


@pytest.mark.parametrize(
    ("name", "expect_exit"), [("clean.ipynb", False), ("dirty.ipynb", True)]
)
def test_check_file(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch, name: str, *, expect_exit: bool
) -> None:
    src = Path("tests/notebooks") / name
    dst = tmp_path / name
    dst.write_bytes(src.read_bytes())

    monkeypatch.setattr(sys, "argv", ["nb-clean", "check", os.fspath(dst)])

    if expect_exit:
        with pytest.raises(SystemExit) as exc:
            nb_clean.cli.main()
        assert exc.value.code == 1
    else:
        nb_clean.cli.main()


@pytest.mark.parametrize(
    ("notebook_name", "expect_exit"),
    [("clean_notebook", False), ("dirty_notebook", True)],
)
def test_check_stdin(
    monkeypatch: pytest.MonkeyPatch,
    notebook_name: str,
    *,
    expect_exit: bool,
    request: pytest.FixtureRequest,
) -> None:
    notebook = cast("nbformat.NotebookNode", request.getfixturevalue(notebook_name))
    monkeypatch.setattr(sys, "argv", ["nb-clean", "check"])
    content = cast("str", nbformat.writes(notebook))
    monkeypatch.setattr(sys, "stdin", io.StringIO(content))
    if expect_exit:
        with pytest.raises(SystemExit) as exc:
            nb_clean.cli.main()
        assert exc.value.code == 1
    else:
        nb_clean.cli.main()


def test_clean_file(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
    src_dirty = Path("tests/notebooks/dirty.ipynb")
    dst_dirty = tmp_path / "dirty.ipynb"
    dst_dirty.write_bytes(src_dirty.read_bytes())

    monkeypatch.setattr(sys, "argv", ["nb-clean", "clean", str(dst_dirty)])
    nb_clean.cli.main()

    cleaned = cast(
        "nbformat.NotebookNode",
        nbformat.read(dst_dirty, as_version=nbformat.NO_CONVERT),
    )
    expected = cast(
        "nbformat.NotebookNode",
        nbformat.read(
            Path("tests/notebooks/clean.ipynb"), as_version=nbformat.NO_CONVERT
        ),
    )
    assert cleaned == expected


def test_clean_stdin(
    capsys: CaptureFixture[str], monkeypatch: pytest.MonkeyPatch
) -> None:
    dirty = cast(
        "nbformat.NotebookNode",
        nbformat.read(
            Path("tests/notebooks/dirty.ipynb"), as_version=nbformat.NO_CONVERT
        ),
    )
    expected = cast(
        "nbformat.NotebookNode",
        nbformat.read(
            Path("tests/notebooks/clean.ipynb"), as_version=nbformat.NO_CONVERT
        ),
    )

    monkeypatch.setattr(sys, "argv", ["nb-clean", "clean"])
    dirty_content = cast("str", nbformat.writes(dirty))
    monkeypatch.setattr(sys, "stdin", io.StringIO(dirty_content))

    nb_clean.cli.main()

    out = capsys.readouterr().out
    expected_text = cast("str", nbformat.writes(expected))
    assert out.strip() == expected_text.strip()


@pytest.mark.parametrize(
    (
        "argv",
        "inputs",
        "remove_empty_cells",
        "remove_all_notebook_metadata",
        "preserve_cell_metadata",
        "preserve_cell_outputs",
        "preserve_execution_counts",
        "preserve_notebook_metadata",
    ),
    [
        ("add-filter -e", [], True, False, None, False, False, False),
        (
            "check -m -o a.ipynb b.ipynb",
            ["a.ipynb", "b.ipynb"],
            False,
            False,
            [],
            True,
            False,
            False,
        ),
        (
            "check -m tags -o a.ipynb b.ipynb",
            ["a.ipynb", "b.ipynb"],
            False,
            False,
            ["tags"],
            True,
            False,
            False,
        ),
        (
            "check -m tags special -o a.ipynb b.ipynb",
            ["a.ipynb", "b.ipynb"],
            False,
            False,
            ["tags", "special"],
            True,
            False,
            False,
        ),
        ("clean -e -o a.ipynb", ["a.ipynb"], True, False, None, True, False, False),
        ("clean -e -c -o a.ipynb", ["a.ipynb"], True, False, None, True, True, False),
    ],
)
def test_parse_args(
    argv: str,
    inputs: Iterable[str],
    *,
    remove_empty_cells: bool,
    remove_all_notebook_metadata: bool,
    preserve_cell_metadata: Collection[str] | None,
    preserve_cell_outputs: bool,
    preserve_execution_counts: bool,
    preserve_notebook_metadata: bool,
) -> None:
    args = nb_clean.cli.parse_args(argv.split())
    if inputs:
        assert args.inputs == [Path(path) for path in inputs]
    assert args.remove_empty_cells is remove_empty_cells
    assert args.remove_all_notebook_metadata is remove_all_notebook_metadata
    assert args.preserve_cell_metadata == preserve_cell_metadata
    assert args.preserve_cell_outputs is preserve_cell_outputs
    assert args.preserve_execution_counts is preserve_execution_counts
    assert args.preserve_notebook_metadata is preserve_notebook_metadata


================================================
FILE: tests/test_git_integration.py
================================================
from __future__ import annotations

import subprocess
from pathlib import Path
from typing import TYPE_CHECKING
from unittest.mock import Mock

import pytest

import nb_clean

if TYPE_CHECKING:
    from collections.abc import Collection

    from pytest_mock import MockerFixture


def test_git(mocker: MockerFixture) -> None:
    mock_process = Mock()
    mock_process.stdout = b" output string "
    mock_run = mocker.patch("nb_clean.subprocess.run", return_value=mock_process)
    output = nb_clean.git("command", "--flag")
    mock_run.assert_called_once_with(
        ["git", "command", "--flag"], capture_output=True, check=True
    )
    assert output == "output string"


def test_git_failure(mocker: MockerFixture) -> None:
    mocker.patch(
        "nb_clean.subprocess.run",
        side_effect=subprocess.CalledProcessError(
            returncode=42, cmd="command", stderr=b"standard error"
        ),
    )
    with pytest.raises(nb_clean.GitProcessError) as exc:
        nb_clean.git("command", "--flag")
    assert exc.value.message == "standard error"
    assert exc.value.return_code == 42


def test_git_attributes_path(mocker: MockerFixture) -> None:
    mocker.patch("nb_clean.git", return_value="dir/.git")
    assert nb_clean.git_attributes_path() == Path("dir", ".git", "info", "attributes")


@pytest.mark.parametrize(
    (
        "remove_empty_cells",
        "remove_all_notebook_metadata",
        "preserve_cell_metadata",
        "preserve_cell_outputs",
        "preserve_execution_counts",
        "preserve_notebook_metadata",
        "filter_command",
    ),
    [
        (False, False, None, False, False, False, "nb-clean clean"),
        (True, False, None, False, False, False, "nb-clean clean --remove-empty-cells"),
        (
            False,
            False,
            [],
            False,
            False,
            False,
            "nb-clean clean --preserve-cell-metadata",
        ),
        (
            False,
            False,
            ["tags"],
            False,
            False,
            False,
            "nb-clean clean --preserve-cell-metadata tags",
        ),
        (
            False,
            False,
            ["tags", "special"],
            False,
            False,
            False,
            "nb-clean clean --preserve-cell-metadata tags special",
        ),
        (
            False,
            False,
            None,
            True,
            False,
            False,
            "nb-clean clean --preserve-cell-outputs",
        ),
        (
            True,
            False,
            [],
            True,
            False,
            False,
            "nb-clean clean --remove-empty-cells --preserve-cell-metadata --preserve-cell-outputs",
        ),
        (
            False,
            False,
            None,
            False,
            True,
            True,
            "nb-clean clean --preserve-execution-counts --preserve-notebook-metadata",
        ),
        (
            False,
            True,
            None,
            False,
            False,
            False,
            "nb-clean clean --remove-all-notebook-metadata",
        ),
    ],
)
def test_add_git_filter(
    mocker: MockerFixture,
    tmp_path: Path,
    *,
    remove_empty_cells: bool,
    remove_all_notebook_metadata: bool,
    preserve_cell_metadata: Collection[str] | None,
    preserve_cell_outputs: bool,
    preserve_execution_counts: bool,
    preserve_notebook_metadata: bool,
    filter_command: str,
) -> None:
    mock_git = mocker.patch("nb_clean.git")
    mock_git_attributes_path = mocker.patch(
        "nb_clean.git_attributes_path", return_value=tmp_path / "attributes"
    )
    nb_clean.add_git_filter(
        remove_empty_cells=remove_empty_cells,
        remove_all_notebook_metadata=remove_all_notebook_metadata,
        preserve_cell_metadata=preserve_cell_metadata,
        preserve_cell_outputs=preserve_cell_outputs,
        preserve_execution_counts=preserve_execution_counts,
        preserve_notebook_metadata=preserve_notebook_metadata,
    )
    mock_git.assert_called_once_with("config", "filter.nb-clean.clean", filter_command)
    mock_git_attributes_path.assert_called_once()
    assert nb_clean.GIT_ATTRIBUTES_LINE in (tmp_path / "attributes").read_text()


def test_add_git_filter_exclusive_arguments() -> None:
    with pytest.raises(
        ValueError,
        match="`preserve_notebook_metadata` and `remove_all_notebook_metadata` cannot both be `True`",
    ):
        nb_clean.add_git_filter(
            remove_all_notebook_metadata=True, preserve_notebook_metadata=True
        )


def test_add_git_filter_idempotent(mocker: MockerFixture, tmp_path: Path) -> None:
    mocker.patch("nb_clean.git")
    (tmp_path / "attributes").write_text(nb_clean.GIT_ATTRIBUTES_LINE)
    mock_git_attributes_path = mocker.patch(
        "nb_clean.git_attributes_path", return_value=tmp_path / "attributes"
    )
    nb_clean.add_git_filter()
    mock_git_attributes_path.assert_called_once()
    assert (tmp_path / "attributes").read_text() == nb_clean.GIT_ATTRIBUTES_LINE


@pytest.mark.parametrize("filter_exists", [True, False])
def test_remove_git_filter(
    mocker: MockerFixture, tmp_path: Path, *, filter_exists: bool
) -> None:
    mock_git = mocker.patch("nb_clean.git")
    mock_git_attributes_path = mocker.patch(
        "nb_clean.git_attributes_path", return_value=tmp_path / "attributes"
    )
    (tmp_path / "attributes").touch()
    if filter_exists:
        (tmp_path / "attributes").write_text(nb_clean.GIT_ATTRIBUTES_LINE)
    nb_clean.remove_git_filter()
    mock_git_attributes_path.assert_called_once()
    mock_git.assert_called_once_with("config", "--remove-section", "filter.nb-clean")
    if filter_exists:
        assert nb_clean.GIT_ATTRIBUTES_LINE not in (tmp_path / "attributes").read_text()