Full Code of jankrepl/mildlyoverfitted for AI

master 22f0ecc67cef cached

118 files

314.6 KB

85.5k tokens

274 symbols

1 requests

Download .txt

Showing preview only (344K chars total). Download the full file or copy to clipboard to get everything.

Repository: jankrepl/mildlyoverfitted
Branch: master
Commit: 22f0ecc67cef
Files: 118
Total size: 314.6 KB

Directory structure:
gitextract_ixgqmhua/

├── .gitignore
├── LICENSE
├── README.md
├── github_adventures/
│   ├── automata/
│   │   ├── model.py
│   │   └── train.py
│   ├── diffaugment/
│   │   ├── README.MD
│   │   ├── script.py
│   │   └── utils.py
│   ├── dino/
│   │   ├── data/
│   │   │   ├── README.md
│   │   │   └── imagenette_labels.json
│   │   ├── evaluation.py
│   │   ├── train.py
│   │   ├── utils.py
│   │   ├── visualize_attentions.ipynb
│   │   └── visualize_augmentations.ipynb
│   ├── gpt/
│   │   ├── README.md
│   │   ├── copy_and_generate.py
│   │   ├── distribution_visualizations.ipynb
│   │   ├── ipython_code.py
│   │   ├── model.py
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── integer/
│   │   ├── README.md
│   │   ├── bert.py
│   │   ├── experiments.sh
│   │   ├── fetch_data.py
│   │   ├── glove.py
│   │   ├── lstm.py
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── lottery/
│   │   ├── README.md
│   │   ├── data.py
│   │   ├── main.py
│   │   ├── parallel_launch.sh
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── mixer/
│   │   ├── README.md
│   │   ├── official.py
│   │   ├── ours.py
│   │   └── test_compare.py
│   ├── mixup/
│   │   ├── launch_experiments.sh
│   │   ├── train.py
│   │   └── utils.py
│   ├── ner_evaluation/
│   │   ├── README.md
│   │   ├── ours.py
│   │   ├── test_ours.py
│   │   └── try.py
│   ├── neuron/
│   │   ├── README.md
│   │   ├── evaluate_noise.py
│   │   ├── evaluate_shuffling.py
│   │   ├── evaluate_video.py
│   │   ├── launch.sh
│   │   ├── pretrained/
│   │   │   ├── MLP.pkl
│   │   │   ├── MLP_augment.pkl
│   │   │   ├── invariant_official.pkl
│   │   │   ├── invariant_ours.pkl
│   │   │   ├── linear.pkl
│   │   │   └── linear_augment.pkl
│   │   ├── requirements.txt
│   │   ├── solutions.py
│   │   ├── tasks.py
│   │   ├── torch_utils.py
│   │   └── trainer.py
│   ├── pondernet/
│   │   ├── experiment_1.sh
│   │   ├── experiment_2.sh
│   │   ├── requirements.txt
│   │   ├── train.py
│   │   └── utils.py
│   ├── product_quantization/
│   │   ├── README.md
│   │   ├── convert.py
│   │   ├── custom.py
│   │   ├── faiss_101_ipython.py
│   │   ├── generate_index.py
│   │   ├── parse.py
│   │   ├── requirements.txt
│   │   ├── run_all.sh
│   │   └── run_gradio.py
│   ├── siren/
│   │   ├── activations.py
│   │   ├── core.py
│   │   └── train.py
│   └── vision_transformer/
│       ├── classes.txt
│       ├── custom.py
│       ├── forward.py
│       └── verify.py
└── mini_tutorials/
    ├── bentoml/
    │   ├── README.md
    │   ├── bentofile.yaml
    │   ├── create_model.py
    │   ├── requirements.txt
    │   └── service.py
    ├── custom_optimizer_in_pytorch/
    │   ├── custom.py
    │   └── src.py
    ├── deploying_on_kubernetes/
    │   ├── Dockerfile
    │   ├── DockerfileConda
    │   └── README.md
    ├── embedding/
    │   ├── README.md
    │   ├── Visualize.ipynb
    │   └── src.py
    ├── fewshot_text_classification/
    │   ├── classify.py
    │   └── template.jinja2
    ├── gradient_wrt_input/
    │   ├── explain.py
    │   ├── fool.py
    │   └── utils.py
    ├── haiku_basics/
    │   ├── buffers_in_torch.py
    │   ├── parameter.py
    │   ├── reallife.py
    │   ├── requirements.txt
    │   └── state.py
    ├── httpx_rate_limiting/
    │   └── script.py
    ├── mocking_neural_networks/
    │   ├── app.py
    │   └── test.py
    ├── numpy_equality_testing/
    │   └── test.py
    ├── openai_function_calling/
    │   └── example.py
    ├── rag_with_reranking/
    │   ├── README.md
    │   ├── answer.py
    │   ├── input.txt
    │   ├── postman_collection.json
    │   └── upload_data.py
    └── visualizing_activations_with_forward_hooks/
        └── src.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024 Jan Krepl

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# mildlyoverfitted

Code for https://www.youtube.com/c/mildlyoverfitted.


### Overview
| Name                                                                           | Video                                | Code                                                                                                                       |
|--------------------------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| Asynchronous requests and rate limiting                                        | [link](https://youtu.be/luWsr9exlE4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/httpx_rate_limiting)                |
| BentoML Sagemaker deployment                                                   | [link](https://youtu.be/Zci_D4az9FU) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml)                |
| Custom optimizer in PyTorch                                                    | [link](https://youtu.be/zvp8K4iX2Cs) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/custom_optimizer_in_pytorch)                |
| Deploying machine learning models on Kubernetes                                | [link](https://youtu.be/DQRNt8Diyw4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/deploying_on_kubernetes)                             |
| Differentiable augmentation for GANs (using Kornia)                            | [link](https://youtu.be/J97EM3Clyys) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/diffaugment)                             |
| DINO in PyTorch                                                                | [link](https://youtu.be/psmMEWKk4Uk) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/dino)                                    |
| Few-shot text classification with prompts                                      | [link](https://youtu.be/AhqgDXcBU2M) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification)                                    |
| GPT in PyTorch                                                                 | [link](https://youtu.be/d7IRM40VMYM) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/gpt)                                    |
| Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients) | [link](https://youtu.be/5lFiZTSsp40) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/gradient_wrt_input)                         |
| Growing neural cellular automata in PyTorch                                    | [link](https://youtu.be/21ACbWoF2Oo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/automata)                                |
| Haiku basics                                                                   | [link](https://youtu.be/yXCKS-ZoYTY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics)                            |
| Integer embeddings in PyTorch                                                  | [link](https://youtu.be/bybuSBVzOdg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/integer)                                 |
| Mixup in PyTorch                                                               | [link](https://youtu.be/hGAKHKqmXdY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/mixup)                                   |
| MLP-Mixer in Flax and PyTorch                                                  | [link](https://youtu.be/HqytB2GUbHA) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/mixer)                                   |
| Mocking neural networks: unit testing in deep learning                         | [link](https://youtu.be/_KVV9jXSzvo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/mocking_neural_networks)                    |
| NER model evaluation                                                           | [link](https://youtu.be/70YAUYP3hrw) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation)                                    |
| NumPy equality testing                                                         | [link](https://youtu.be/sai1g5fjyb8) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/numpy_equality_testing)                     |
| OpenAI function calling                                                        | [link](https://youtu.be/_B7F_6nTVEg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/openai_function_calling)                     |
| PonderNet in PyTorch                                                           | [link](https://youtu.be/JLFz1dU5HR4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/pondernet)                               |
| Product quantization in Faiss and from scratch                                 | [link](https://youtu.be/PNVJvZEkuXo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/product_quantization)                               |
| Retrieval augmented generation with OpenSearch and reranking                   | [link](https://youtu.be/OsE7YcDcPz0) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/rag_with_reranking)                               |
| SIREN in PyTorch                                                               | [link](https://youtu.be/s4iFEoNlYhM) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/siren)                                   |
| The Lottery Ticket Hypothesis and pruning in PyTorch                           | [link](https://youtu.be/bQt0CLXXAqg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/lottery)                                  |
| The Sensory Neuron as a Transformer in PyTorch                                 | [link](https://youtu.be/mi_mzlhBGAU) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/neuron)                                  |
| `torch.nn.Embedding` explained (+ Character-level language model)              | [link](https://youtu.be/euwN5DHfLEo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/embedding)                                  |
| Vision Transformer in PyTorch                                                  | [link](https://youtu.be/ovB0ddFtzzA) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/vision_transformer)                      |
| Visualizing activations with forward hooks (PyTorch)                           | [link](https://youtu.be/1ZbLA7ofasY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/visualizing_activations_with_forward_hooks) |


================================================
FILE: github_adventures/automata/model.py
================================================
import torch
import torch.nn as nn


class CAModel(nn.Module):
    """Cell automata model.

    Parameters
    ----------
    n_channels : int
        Number of channels of the grid.

    hidden_channels : int
        Hidden channels that are related to the pixelwise 1x1 convolution.

    fire_rate : float
        Number between 0 and 1. The lower it is the more likely it is for
        cells to be set to zero during the `stochastic_update` process.

    device : torch.device
        Determines on what device we perfrom all the computations.

    Attributes
    ----------
    update_module : nn.Sequential
        The only part of the network containing trainable parameters. Composed
        of 1x1 convolution, ReLu and 1x1 convolution.

    filters : torch.Tensor
        Constant tensor of shape `(3 * n_channels, 1, 3, 3)`.
    """
    def __init__(self, n_channels=16, hidden_channels=128, fire_rate=0.5, device=None):
        super().__init__()


        self.fire_rate = 0.5
        self.n_channels = n_channels
        self.device = device or torch.device("cpu")

        # Perceive step
        sobel_filter_ = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
        scalar = 8.0

        sobel_filter_x = sobel_filter_ / scalar
        sobel_filter_y = sobel_filter_.t() / scalar
        identity_filter = torch.tensor(
                [
                    [0, 0, 0],
                    [0, 1, 0],
                    [0, 0, 0],
                ],
                dtype=torch.float32,
        )
        filters = torch.stack(
                [identity_filter, sobel_filter_x, sobel_filter_y]
        )  # (3, 3, 3)
        filters = filters.repeat((n_channels, 1, 1))  # (3 * n_channels, 3, 3)
        self.filters = filters[:, None, ...].to(
                self.device
        )  # (3 * n_channels, 1, 3, 3)

        # Update step
        self.update_module = nn.Sequential(
                nn.Conv2d(
                    3 * n_channels,
                    hidden_channels,
                    kernel_size=1,  # (1, 1)
                ),
                nn.ReLU(),
                nn.Conv2d(
                    hidden_channels,
                    n_channels,
                    kernel_size=1,
                    bias=False,
                ),
        )

        with torch.no_grad():
            self.update_module[2].weight.zero_()

        self.to(self.device)

    def perceive(self, x):
        """Approximate channelwise gradient and combine with the input.

        This is the only place where we include information on the
        neighboring cells. However, we are not using any learnable
        parameters here.

        Parameters
        ----------
        x : torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.

        Returns
        -------
        torch.Tensor
            Shape `(n_samples, 3 * n_channels, grid_size, grid_size)`.
        """
        return nn.functional.conv2d(x, self.filters, padding=1, groups=self.n_channels)

    def update(self, x):
        """Perform update.

        Note that this is the only part of the forward pass that uses
        trainable parameters

        Paramters
        ---------
        x : torch.Tensor
            Shape `(n_samples, 3 * n_channels, grid_size, grid_size)`.

        Returns
        -------
        torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.
        """
        return self.update_module(x)

    @staticmethod
    def stochastic_update(x, fire_rate):
        """Run pixel-wise dropout.

        Unlike dropout there is no scaling taking place.

        Parameters
        ----------
        x : torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.

        fire_rate : float
            Number between 0 and 1. The higher the more likely a given cell
            updates.

        Returns
        -------
        torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.
        """
        device = x.device

        mask = (torch.rand(x[:, :1, :, :].shape) <= fire_rate).to(device, torch.float32)
        return x * mask  # broadcasted over all channels

    @staticmethod
    def get_living_mask(x):
        """Identify living cells.

        Parameters
        ----------
        x : torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.

        Returns
        -------
        torch.Tensor
            Shape `(n_samples, 1, grid_size, grid_size)` and the
            dtype is bool.
        """
        return (
            nn.functional.max_pool2d(
                x[:, 3:4, :, :], kernel_size=3, stride=1, padding=1
            )
            > 0.1
        )

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Shape `(n_samples, n_channels, grid_size, grid_size)`.

        Returns
        -------
        torch.Tensor
            Shape `(n_sample, n_channels, grid_size, grid_size)`.
        """
        pre_life_mask = self.get_living_mask(x)

        y = self.perceive(x)
        dx = self.update(y)
        dx = self.stochastic_update(dx, fire_rate=self.fire_rate)

        x = x + dx

        post_life_mask = self.get_living_mask(x)
        life_mask = (pre_life_mask & post_life_mask).to(torch.float32)

        return x * life_mask


================================================
FILE: github_adventures/automata/train.py
================================================
import argparse
import pathlib

import numpy as np
import torch
import torch.nn as nn
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm

from model import CAModel


def load_image(path, size=40):
    """Load an image.

    Parameters
    ----------
    path : pathlib.Path
        Path to where the image is located. Note that the image needs to be
        RGBA.

    size : int
        The image will be resized to a square wit ha side length of `size`.

    Returns
    -------
    torch.Tensor
        4D float image of shape `(1, 4, size, size)`. The RGB channels
        are premultiplied by the alpha channel.
    """
    img = Image.open(path)
    img = img.resize((size, size), Image.ANTIALIAS)
    img = np.float32(img) / 255.0
    img[..., :3] *= img[..., 3:]

    return torch.from_numpy(img).permute(2, 0, 1)[None, ...]


def to_rgb(img_rgba):
    """Convert RGBA image to RGB image.

    Parameters
    ----------
    img_rgba : torch.Tensor
        4D tensor of shape `(1, 4, size, size)` where the RGB channels
        were already multiplied by the alpha.

    Returns
    -------
    img_rgb : torch.Tensor
        4D tensor of shape `(1, 3, size, size)`.
    """
    rgb, a = img_rgba[:, :3, ...], torch.clamp(img_rgba[:, 3:, ...], 0, 1)
    return torch.clamp(1.0 - a + rgb, 0, 1)


def make_seed(size, n_channels):
    """Create a starting tensor for training.

    The only active pixels are going to be in the middle.

    Parameters
    ----------
    size : int
        The height and the width of the tensor.

    n_channels : int
        Overall number of channels. Note that it needs to be higher than 4
        since the first 4 channels represent RGBA.

    Returns
    -------
    torch.Tensor
        4D float tensor of shape `(1, n_chanels, size, size)`.
    """
    x = torch.zeros((1, n_channels, size, size), dtype=torch.float32)
    x[:, 3:, size // 2, size // 2] = 1
    return x


def main(argv=None):
    parser = argparse.ArgumentParser(
            description="Training script for the Celluar Automata"
    )
    parser.add_argument("img", type=str, help="Path to the image we want to reproduce")

    parser.add_argument(
            "-b",
            "--batch-size",
            type=int,
            default=8,
            help="Batch size. Samples will always be taken randomly from the pool."
    )
    parser.add_argument(
            "-d",
            "--device",
            type=str,
            default="cpu",
            help="Device to use",
            choices=("cpu", "cuda"),
    )
    parser.add_argument(
            "-e",
            "--eval-frequency",
            type=int,
            default=500,
            help="Evaluation frequency.",
    )
    parser.add_argument(
            "-i",
            "--eval-iterations",
            type=int,
            default=300,
            help="Number of iterations when evaluating.",
    )
    parser.add_argument(
            "-n",
            "--n-batches",
            type=int,
            default=5000,
            help="Number of batches to train for.",
    )
    parser.add_argument(
            "-c",
            "--n-channels",
            type=int,
            default=16,
            help="Number of channels of the input tensor",
    )
    parser.add_argument(
            "-l",
            "--logdir",
            type=str,
            default="logs",
            help="Folder where all the logs and outputs are saved.",
    )
    parser.add_argument(
            "-p",
            "--padding",
            type=int,
            default=16,
            help="Padding. The shape after padding is (h + 2 * p, w + 2 * p).",
    )
    parser.add_argument(
            "--pool-size",
            type=int,
            default=1024,
            help="Size of the training pool",
    )
    parser.add_argument(
            "-s",
            "--size",
            type=int,
            default=40,
            help="Image size",
    )
    # Parse arguments
    args = parser.parse_args()
    print(vars(args))

    # Misc
    device = torch.device(args.device)

    log_path = pathlib.Path(args.logdir)
    log_path.mkdir(parents=True, exist_ok=True)
    writer = SummaryWriter(log_path)

    # Target image
    target_img_ = load_image(args.img, size=args.size)
    p = args.padding
    target_img_ = nn.functional.pad(target_img_, (p, p, p, p), "constant", 0)
    target_img = target_img_.to(device)
    target_img = target_img.repeat(args.batch_size, 1, 1, 1)

    writer.add_image("ground truth", to_rgb(target_img_)[0])

    # Model and optimizer
    model = CAModel(n_channels=args.n_channels, device=device)
    optimizer = torch.optim.Adam(model.parameters(), lr=2e-3)

    # Pool initialization
    seed = make_seed(args.size, args.n_channels).to(device)
    seed = nn.functional.pad(seed, (p, p, p, p), "constant", 0)
    pool = seed.clone().repeat(args.pool_size, 1, 1, 1)

    for it in tqdm(range(args.n_batches)):
        batch_ixs = np.random.choice(
                args.pool_size, args.batch_size, replace=False
        ).tolist()

        x = pool[batch_ixs]
        for i in range(np.random.randint(64, 96)):
            x = model(x)

        loss_batch = ((target_img - x[:, :4, ...]) ** 2).mean(dim=[1, 2, 3])
        loss = loss_batch.mean()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        writer.add_scalar("train/loss", loss, it)

        argmax_batch = loss_batch.argmax().item()
        argmax_pool = batch_ixs[argmax_batch]
        remaining_batch = [i for i in range(args.batch_size) if i != argmax_batch]
        remaining_pool = [i for i in batch_ixs if i != argmax_pool]

        pool[argmax_pool] = seed.clone()
        pool[remaining_pool] = x[remaining_batch].detach()

        if it % args.eval_frequency == 0:
            x_eval = seed.clone()  # (1, n_channels, size, size)

            eval_video = torch.empty(1, args.eval_iterations, 3, *x_eval.shape[2:])

            for it_eval in range(args.eval_iterations):
                x_eval = model(x_eval)
                x_eval_out = to_rgb(x_eval[:, :4].detach().cpu())
                eval_video[0, it_eval] = x_eval_out

            writer.add_video("eval", eval_video, it, fps=60)


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/diffaugment/README.MD
================================================
# Data
https://hanlab.mit.edu/projects/data-efficient-gans/datasets/100-shot-grumpy_cat.zip

Just unzip it into `data/` and the code should work out of the box.


================================================
FILE: github_adventures/diffaugment/script.py
================================================
import argparse
import pathlib
import pprint
from datetime import datetime

import kornia.augmentation as K
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from torchvision.utils import make_grid
from tqdm import tqdm

from utils import DatasetImages, Discriminator, Generator, init_weights_


def main(argv=None):
    # CLI
    parser = argparse.ArgumentParser()
    parser.add_argument("name", help="Name of the experiment")
    parser.add_argument(
        "-a",
        "--augment",
        action="store_true",
        help="If True, we apply augmentations",
    )
    parser.add_argument(
        "-b", "--batch-size", type=int, default=16, help="Batch size"
    )
    parser.add_argument(
        "--b1",
        type=float,
        default=0.5,
        help="Adam optimizer hyperparamter",
    )
    parser.add_argument(
        "--b2",
        type=float,
        default=0.999,
        help="Adam optimizer hyperparamter",
    )
    parser.add_argument(
        "-d",
        "--device",
        type=str,
        default="cpu",
        choices=["cpu", "cuda"],
        help="Device to use",
    )
    parser.add_argument(
        "--eval-frequency",
        type=int,
        default=400,
        help="Generate generator images every `eval_frequency` epochs",
    )
    parser.add_argument(
        "--latent-dim",
        type=int,
        default=100,
        help="Dimensionality of the random noise",
    )
    parser.add_argument(
        "--lr", type=float, default=0.0002, help="Learning rate"
    )
    parser.add_argument(
        "--ndf",
        type=int,
        default=32,
        help="Number of discriminator feature maps (after first convolution)",
    )
    parser.add_argument(
        "--ngf",
        type=int,
        default=32,
        help="Number of generator feature maps (before last transposed convolution)",
    )
    parser.add_argument(
        "-n",
        "--n-epochs",
        type=int,
        default=200,
        help="Number of training epochs",
    )
    parser.add_argument(
        "--mosaic-size",
        type=int,
        default=10,
        help="Size of the side of the rectangular mosaic",
    )
    parser.add_argument(
        "-p",
        "--prob",
        type=float,
        default=0.9,
        help="Probability of applying an augmentation",
    )

    args = parser.parse_args(argv)
    args_d = vars(args)
    print(args)

    img_size = 128

    # Additional parameters
    device = torch.device(args.device)
    mosaic_kwargs = {"nrow": args.mosaic_size, "normalize": True}
    n_mosaic_cells = args.mosaic_size * args.mosaic_size
    sample_showcase_ix = (
        0  # this one will be used to demonstrate the augmentations
    )

    augment_module = torch.nn.Sequential(
        K.RandomAffine(degrees=0, translate=(1 / 8, 1 / 8), p=args.prob),
        K.RandomErasing((0.0, 0.5), p=args.prob),
    )

    # Loss function
    adversarial_loss = torch.nn.BCELoss()

    # Initialize generator and discriminator
    generator = Generator(latent_dim=args.latent_dim, ngf=args.ngf)
    discriminator = Discriminator(
        ndf=args.ndf, augment_module=augment_module if args.augment else None
    )

    generator.to(device)
    discriminator.to(device)

    # Initialize weights
    generator.apply(init_weights_)
    discriminator.apply(init_weights_)

    # Configure data loader
    data_path = pathlib.Path("data")
    tform = transforms.Compose(
        [
            transforms.Resize(img_size),
            transforms.ToTensor(),
            transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
        ]
    )
    dataset = DatasetImages(
        data_path,
        transform=tform,
    )
    dataloader = DataLoader(
        dataset,
        batch_size=args.batch_size,
        shuffle=True,
    )

    # Optimizers
    optimizer_G = torch.optim.Adam(
        generator.parameters(), lr=args.lr, betas=(args.b1, args.b2)
    )
    optimizer_D = torch.optim.Adam(
        discriminator.parameters(), lr=args.lr, betas=(args.b1, args.b2)
    )

    # Output path and metadata
    output_path = pathlib.Path("outputs") / args.name
    output_path.mkdir(exist_ok=True, parents=True)

    # Add other parameters (not included in CLI)
    args_d["time"] = datetime.now()
    args_d["kornia"] = str(augment_module)

    # Prepare tensorboard writer
    writer = SummaryWriter(output_path)

    # Log hyperparameters as text
    writer.add_text(
        "hyperparameter",
        pprint.pformat(args_d).replace(
            "\n", "  \n"
        ),  # markdown needs 2 spaces before newline
        0,
    )
    # Log true data
    writer.add_image(
        "true_data",
        make_grid(
            torch.stack([dataset[i] for i in range(n_mosaic_cells)]),
            **mosaic_kwargs
        ),
        0,
    )
    # Log augmented data
    batch_showcase = dataset[sample_showcase_ix][None, ...].repeat(
        n_mosaic_cells, 1, 1, 1
    )
    batch_showcase_aug = discriminator.augment_module(batch_showcase)
    writer.add_image(
        "augmentations", make_grid(batch_showcase_aug, **mosaic_kwargs), 0
    )

    # Prepate evaluation noise
    z_eval = torch.randn(n_mosaic_cells, args.latent_dim).to(device)

    for epoch in tqdm(range(args.n_epochs)):
        for i, imgs in enumerate(dataloader):
            n_samples, *_ = imgs.shape
            batches_done = epoch * len(dataloader) + i

            # Adversarial ground truths
            valid = 0.9 * torch.ones(
                n_samples, 1, device=device, dtype=torch.float32
            )
            fake = torch.zeros(n_samples, 1, device=device, dtype=torch.float32)

            # D preparation
            optimizer_D.zero_grad()

            # D loss on reals
            real_imgs = imgs.to(device)
            d_x = discriminator(real_imgs)
            real_loss = adversarial_loss(d_x, valid)
            real_loss.backward()

            # D loss on fakes
            z = torch.randn(n_samples, args.latent_dim).to(device)
            gen_imgs = generator(z)
            d_g_z1 = discriminator(gen_imgs.detach())

            fake_loss = adversarial_loss(d_g_z1, fake)
            fake_loss.backward()

            optimizer_D.step()  # we called backward twice, the result is a sum

            # G preparation
            optimizer_G.zero_grad()

            # G loss
            d_g_z2 = discriminator(gen_imgs)
            g_loss = adversarial_loss(d_g_z2, valid)

            g_loss.backward()
            optimizer_G.step()

            # Logging
            if batches_done % 50 == 0:
                writer.add_scalar("d_x", d_x.mean().item(), batches_done)
                writer.add_scalar("d_g_z1", d_g_z1.mean().item(), batches_done)
                writer.add_scalar("d_g_z2", d_g_z2.mean().item(), batches_done)
                writer.add_scalar(
                    "D_loss", (real_loss + fake_loss).item(), batches_done
                )
                writer.add_scalar("G_loss", g_loss.item(), batches_done)

            if epoch % args.eval_frequency == 0 and i == 0:
                generator.eval()
                discriminator.eval()

                # Generate fake images
                gen_imgs_eval = generator(z_eval)

                # Generate nice mosaic
                writer.add_image(
                    "fake",
                    make_grid(gen_imgs_eval.data, **mosaic_kwargs),
                    batches_done,
                )

                # Save checkpoint (and potentially overwrite an existing one)
                torch.save(generator, output_path / "model.pt")

                # Make sure generator and discriminator in the training mode
                generator.train()
                discriminator.train()


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/diffaugment/utils.py
================================================
import torch.nn as nn
from PIL import Image
from torch.utils.data import Dataset


class DatasetImages(Dataset):
    """Dataset loading photos on the hard drive.

    Parameters
    ----------
    path : pathlib.Path
        Path to the folder containing all the images.

    transform : None or callable
        The transform to be applied when yielding the image.

    Attributes
    ----------
    all_paths : list
        List of all paths to the `.jpg` images.
    """
    def __init__(self, path, transform=None):
        super().__init__()

        self.all_paths = sorted([p for p in path.iterdir() if p.suffix == ".jpg"])
        self.transform = transform

    def __len__(self):
        """Compute length of the dataset."""
        return len(self.all_paths)

    def __getitem__(self, ix):
        """Get a single item."""
        img = Image.open(self.all_paths[ix])

        if self.transform is not None:
            img = self.transform(img)

        return img



class Generator(nn.Module):
    """Generator network.

    Parameters
    ----------
    latent_dim : int
        The dimensionality of the input noise.

    ngf : int
        Number of generator filters. Note that the actual number of filters
        will be a multiple of this number and is going to be divided by two in
        each consecutive block of the network.

    Attributes
    ----------
    main : torch.Sequential
        The actual network that is composed of `ConvTranspose2d`, `BatchNorm2d`
        and `ReLU` blocks.
    """

    def __init__(self, latent_dim, ngf=64):
        super().__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(latent_dim, ngf * 16, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 16),
            nn.ReLU(True),
            # (ngf * 16) x 4 x 4
            nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # (ngf * 8) x 8 x 8
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # (ngf * 4) x 16 x 16
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # (ngf * 2) x 32 x 32
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # ngf x 64 x 64
            nn.ConvTranspose2d(ngf, 3, 4, 2, 1, bias=False),
            nn.Tanh(),
            # 3 x 128 x 128
        )

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input noise of shape `(n_samples, latent_dim)`.

        Returns
        -------
        torch.Tensor
            Generated images of shape `(n_samples, 3, 128, 128)`.
        """
        x = x.reshape(*x.shape, 1, 1)  # (n_samples, latent_dim, 1, 1)
        return self.main(x)


class Discriminator(nn.Module):
    """Discriminator netowrk.

    Parameters
    ----------
    ndf : int
        Number of discriminator filters. It represents the number of filters
        after the first convolution block. Each consecutive block will double
        the number.

    augment_module : nn.Module or None
        If provided it represents the Kornia module that performs
        differentiable augmentation of the images.

    Attributes
    ----------
    augment_module : nn.Module
        If the input parameter `augment_module` provided then this is the
        same thing. If not, then this is just an identity mapping.
    """
    def __init__(self, ndf=16, augment_module=None):
        super().__init__()
        self.main = nn.Sequential(
            # 3 x 128 x 128
            nn.Conv2d(3, ndf, 4, stride=2, padding=1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # ndf x 64 x 64
            nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # (ndf * 2) x 32 x 32
            nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # (ndf * 4) x 16 x 16
            nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # (ndf * 8) x 8 x 8
            nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 16),
            nn.LeakyReLU(0.2, inplace=True),
            # (ndf * 16) x 4 x 4
            nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
            nn.Sigmoid()
            # 1 x 1 x 1
        )
        if augment_module is not None:
            self.augment_module = augment_module
        else:
            self.augment_module = nn.Identity()


    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input images of shape `(n_samples, 3, 128, 128)`.

        Returns
        -------
        torch.Tensor
            Classification outputs of shape `(n_samples, 1)`.
        """
        if self.training:
            x = self.augment_module(x)

        x = self.main(x)  # (n_samples, 1, 1, 1)
        x = x.reshape(len(x), -1)  # (n_samples, 1)
        return x


def init_weights_(module):
    """Initialize weights by sampling from a normal distribution.

    Note that this operation is modifying the weights in place.

    Parameters
    ----------
    module : nn.Module
        Module with trainable weights.
    """
    cls_name = module.__class__.__name__

    if cls_name in {"Conv2d", "ConvTranspose2d"}:
        nn.init.normal_(module.weight.data, 0.0, 0.02)

    elif cls_name == "BatchNorm2d":
        nn.init.normal_(module.weight.data, 1.0, 0.02)
        nn.init.constant_(module.bias.data, 0.0)


================================================
FILE: github_adventures/dino/data/README.md
================================================
The `Imagenette` dataset was used. You can find it here: https://github.com/fastai/imagenette (320 px version). 


================================================
FILE: github_adventures/dino/data/imagenette_labels.json
================================================
{"n01440764": "tench", "n02102040": "english_springer", "n02979186": "cassette_player", "n03000684": "chain_saw", "n03028079": "church", "n03394916": "french_horn", "n03417042": "garbage_truck", "n03425413": "gas_pump", "n03445777": "golf_ball", "n03888257": "parachute"}

================================================
FILE: github_adventures/dino/evaluation.py
================================================
import numpy as np
import torch
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier


def compute_knn(backbone, data_loader_train, data_loader_val):
    """Get CLS embeddings and use KNN classifier on them.

    We load all embeddings in memory and use sklearn. Should
    be doable.

    Parameters
    ----------
    backbone : timm.models.vision_transformer.VisionTransformer
        Vision transformer whose head is just an identity
        mapping.

    data_loader_train, data_loader_val : torch.utils.data.DataLoader
        Training and validation dataloader that does not apply any
        augmentations. Just casting to tensor and then normalizing.

    Returns
    -------
    val_accuracy : float
        Validation accuracy.
    """
    device = next(backbone.parameters()).device

    data_loaders = {
        "train": data_loader_train,
        "val": data_loader_val,
    }
    lists = {
        "X_train": [],
        "y_train": [],
        "X_val": [],
        "y_val": [],
    }

    for name, data_loader in data_loaders.items():
        for imgs, y in data_loader:
            imgs = imgs.to(device)
            lists[f"X_{name}"].append(backbone(imgs).detach().cpu().numpy())
            lists[f"y_{name}"].append(y.detach().cpu().numpy())

    arrays = {k: np.concatenate(l) for k, l in lists.items()}

    estimator = KNeighborsClassifier()
    estimator.fit(arrays["X_train"], arrays["y_train"])
    y_val_pred = estimator.predict(arrays["X_val"])

    acc = accuracy_score(arrays["y_val"], y_val_pred)

    return acc

def compute_embedding(backbone, data_loader):
    """Compute CLS embedding and prepare for TensorBoard.

    Parameters
    ----------
    backbone : timm.models.vision_transformer.VisionTransformer
        Vision transformer. The head should be an identity mapping.

    data_loader : torch.utils.data.DataLoader
        Validation dataloader that does not apply any augmentations. Just
        casting to tensor and then normalizing.

    Returns
    -------
    embs : torch.Tensor
        Embeddings of shape `(n_samples, out_dim)`.

    imgs : torch.Tensor
        Images of shape `(n_samples, 3, height, width)`.

    labels : list
        List of strings representing the classes.
    """
    device = next(backbone.parameters()).device

    embs_l = []
    imgs_l = []
    labels = []

    for img, y in data_loader:
        img = img.to(device)
        embs_l.append(backbone(img).detach().cpu())
        imgs_l.append(((img * 0.224) + 0.45).cpu())  # undo norm
        labels.extend([data_loader.dataset.classes[i] for i in y.tolist()])

    embs = torch.cat(embs_l, dim=0)
    imgs = torch.cat(imgs_l, dim=0)

    return embs, imgs, labels


================================================
FILE: github_adventures/dino/train.py
================================================
import argparse
import json
import pathlib

import timm
import torch
import torchvision.transforms as transforms
import tqdm
from torch.utils.data import DataLoader, SubsetRandomSampler
from torch.utils.tensorboard import SummaryWriter
from torchvision.datasets import ImageFolder

from evaluation import compute_embedding, compute_knn
from utils import DataAugmentation, Head, Loss, MultiCropWrapper, clip_gradients


def main():
    parser = argparse.ArgumentParser(
        "DINO training CLI",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument("-b", "--batch-size", type=int, default=32)
    parser.add_argument(
        "-d", "--device", type=str, choices=("cpu", "cuda"), default="cpu"
    )
    parser.add_argument("-l", "--logging-freq", type=int, default=200)
    parser.add_argument("--momentum-teacher", type=int, default=0.9995)
    parser.add_argument("-c", "--n-crops", type=int, default=4)
    parser.add_argument("-e", "--n-epochs", type=int, default=100)
    parser.add_argument("-o", "--out-dim", type=int, default=1024)
    parser.add_argument("-t", "--tensorboard-dir", type=str, default="logs")
    parser.add_argument("--clip-grad", type=float, default=2.0)
    parser.add_argument("--norm-last-layer", action="store_true")
    parser.add_argument("--batch-size-eval", type=int, default=64)
    parser.add_argument("--teacher-temp", type=float, default=0.04)
    parser.add_argument("--student-temp", type=float, default=0.1)
    parser.add_argument("--pretrained", action="store_true")
    parser.add_argument("-w", "--weight-decay", type=float, default=0.4)

    args = parser.parse_args()
    print(vars(args))
    # Parameters
    vit_name, dim = "vit_deit_small_patch16_224", 384
    path_dataset_train = pathlib.Path("data/imagenette2-320/train")
    path_dataset_val = pathlib.Path("data/imagenette2-320/val")
    path_labels = pathlib.Path("data/imagenette_labels.json")

    logging_path = pathlib.Path(args.tensorboard_dir)
    device = torch.device(args.device)

    n_workers = 4

    # Data related
    with path_labels.open("r") as f:
        label_mapping = json.load(f)

    transform_aug = DataAugmentation(size=224, n_local_crops=args.n_crops - 2)
    transform_plain = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
            transforms.Resize((224, 224)),
        ]
    )

    dataset_train_aug = ImageFolder(path_dataset_train, transform=transform_aug)
    dataset_train_plain = ImageFolder(path_dataset_train, transform=transform_plain)
    dataset_val_plain = ImageFolder(path_dataset_val, transform=transform_plain)

    if dataset_train_plain.classes != dataset_val_plain.classes:
        raise ValueError("Inconsistent classes")

    data_loader_train_aug = DataLoader(
        dataset_train_aug,
        batch_size=args.batch_size,
        shuffle=True,
        drop_last=True,
        num_workers=n_workers,
        pin_memory=True,
    )
    data_loader_train_plain = DataLoader(
        dataset_train_plain,
        batch_size=args.batch_size_eval,
        drop_last=False,
        num_workers=n_workers,
    )
    data_loader_val_plain = DataLoader(
        dataset_val_plain,
        batch_size=args.batch_size_eval,
        drop_last=False,
        num_workers=n_workers,
    )
    data_loader_val_plain_subset = DataLoader(
        dataset_val_plain,
        batch_size=args.batch_size_eval,
        drop_last=False,
        sampler=SubsetRandomSampler(list(range(0, len(dataset_val_plain), 50))),
        num_workers=n_workers,
    )

    # Logging
    writer = SummaryWriter(logging_path)
    writer.add_text("arguments", json.dumps(vars(args)))

    # Neural network related
    student_vit = timm.create_model(vit_name, pretrained=args.pretrained)
    teacher_vit = timm.create_model(vit_name, pretrained=args.pretrained)

    student = MultiCropWrapper(
        student_vit,
        Head(
            dim,
            args.out_dim,
            norm_last_layer=args.norm_last_layer,
        ),
    )
    teacher = MultiCropWrapper(teacher_vit, Head(dim, args.out_dim))
    student, teacher = student.to(device), teacher.to(device)

    teacher.load_state_dict(student.state_dict())

    for p in teacher.parameters():
        p.requires_grad = False

    # Loss related
    loss_inst = Loss(
        args.out_dim,
        teacher_temp=args.teacher_temp,
        student_temp=args.student_temp,
    ).to(device)
    lr = 0.0005 * args.batch_size / 256
    optimizer = torch.optim.AdamW(
        student.parameters(),
        lr=lr,
        weight_decay=args.weight_decay,
    )

    # Training loop
    n_batches = len(dataset_train_aug) // args.batch_size
    best_acc = 0
    n_steps = 0

    for e in range(args.n_epochs):
        for i, (images, _) in tqdm.tqdm(
            enumerate(data_loader_train_aug), total=n_batches
        ):
            if n_steps % args.logging_freq == 0:
                student.eval()

                # Embedding
                embs, imgs, labels_ = compute_embedding(
                    student.backbone,
                    data_loader_val_plain_subset,
                )
                writer.add_embedding(
                    embs,
                    metadata=[label_mapping[l] for l in labels_],
                    label_img=imgs,
                    global_step=n_steps,
                    tag="embeddings",
                )

                # KNN
                current_acc = compute_knn(
                    student.backbone,
                    data_loader_train_plain,
                    data_loader_val_plain,
                )
                writer.add_scalar("knn-accuracy", current_acc, n_steps)
                if current_acc > best_acc:
                    torch.save(student, logging_path / "best_model.pth")
                    best_acc = current_acc

                student.train()

            images = [img.to(device) for img in images]

            teacher_output = teacher(images[:2])
            student_output = student(images)

            loss = loss_inst(student_output, teacher_output)

            optimizer.zero_grad()
            loss.backward()
            clip_gradients(student, args.clip_grad)
            optimizer.step()

            with torch.no_grad():
                for student_ps, teacher_ps in zip(
                    student.parameters(), teacher.parameters()
                ):
                    teacher_ps.data.mul_(args.momentum_teacher)
                    teacher_ps.data.add_(
                        (1 - args.momentum_teacher) * student_ps.detach().data
                    )

            writer.add_scalar("train_loss", loss, n_steps)

            n_steps += 1


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/dino/utils.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image


class DataAugmentation:
    """Create crops of an input image together with additional augmentation.

    It generates 2 global crops and `n_local_crops` local crops.

    Parameters
    ----------
    global_crops_scale : tuple
        Range of sizes for the global crops.

    local_crops_scale : tuple
        Range of sizes for the local crops.

    n_local_crops : int
        Number of local crops to create.

    size : int
        The size of the final image.

    Attributes
    ----------
    global_1, global_2 : transforms.Compose
        Two global transforms.

    local : transforms.Compose
        Local transform. Note that the augmentation is stochastic so one
        instance is enough and will lead to different crops.
    """
    def __init__(
        self,
        global_crops_scale=(0.4, 1),
        local_crops_scale=(0.05, 0.4),
        n_local_crops=8,
        size=224,
    ):
        self.n_local_crops = n_local_crops
        RandomGaussianBlur = lambda p: transforms.RandomApply(  # noqa
            [transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2))],
            p=p,
        )

        flip_and_jitter = transforms.Compose(
            [
                transforms.RandomHorizontalFlip(p=0.5),
                transforms.RandomApply(
                    [
                        transforms.ColorJitter(
                            brightness=0.4,
                            contrast=0.4,
                            saturation=0.2,
                            hue=0.1,
                        ),
                    ]
                ),
                transforms.RandomGrayscale(p=0.2),
            ]
        )

        normalize = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
            ]
        )

        self.global_1 = transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    size,
                    scale=global_crops_scale,
                    interpolation=Image.BICUBIC,
                ),
                flip_and_jitter,
                RandomGaussianBlur(1.0),  # always apply
                normalize,
            ],
        )

        self.global_2 = transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    size,
                    scale=global_crops_scale,
                    interpolation=Image.BICUBIC,
                ),
                flip_and_jitter,
                RandomGaussianBlur(0.1),
                transforms.RandomSolarize(170, p=0.2),
                normalize,
            ],
        )

        self.local = transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    size,
                    scale=local_crops_scale,
                    interpolation=Image.BICUBIC,
                ),
                flip_and_jitter,
                RandomGaussianBlur(0.5),
                normalize,
            ],
        )

    def __call__(self, img):
        """Apply transformation.

        Parameters
        ----------
        img : PIL.Image
            Input image.

        Returns
        -------
        all_crops : list
            List of `torch.Tensor` representing different views of
            the input `img`.
        """
        all_crops = []
        all_crops.append(self.global_1(img))
        all_crops.append(self.global_2(img))

        all_crops.extend([self.local(img) for _ in range(self.n_local_crops)])

        return all_crops


class Head(nn.Module):
    """Network hooked up to the CLS token embedding.

    Just a MLP with the last layer being normalized in a particular way.

    Parameters
    ----------
    in_dim : int
        The dimensionality of the token embedding.

    out_dim : int
        The dimensionality of the final layer (we compute the softmax over).

    hidden_dim : int
        Dimensionality of the hidden layers.

    bottleneck_dim : int
        Dimensionality of the second last layer.

    n_layers : int
        The number of layers.

    norm_last_layer : bool
        If True, then we freeze the norm of the weight of the last linear layer
        to 1.

    Attributes
    ----------
    mlp : nn.Sequential
        Vanilla multi-layer perceptron.

    last_layer : nn.Linear
        Reparametrized linear layer with weight normalization. That means
        that that it will have `weight_g` and `weight_v` as learnable
        parameters instead of a single `weight`.
    """

    def __init__(
        self,
        in_dim,
        out_dim,
        hidden_dim=512,
        bottleneck_dim=256,
        n_layers=3,
        norm_last_layer=False,
    ):
        super().__init__()
        if n_layers == 1:
            self.mlp = nn.Linear(in_dim, bottleneck_dim)
        else:
            layers = [nn.Linear(in_dim, hidden_dim)]
            layers.append(nn.GELU())
            for _ in range(n_layers - 2):
                layers.append(nn.Linear(hidden_dim, hidden_dim))
                layers.append(nn.GELU())
            layers.append(nn.Linear(hidden_dim, bottleneck_dim))
            self.mlp = nn.Sequential(*layers)

        self.apply(self._init_weights)

        self.last_layer = nn.utils.weight_norm(
            nn.Linear(bottleneck_dim, out_dim, bias=False)
        )
        self.last_layer.weight_g.data.fill_(1)
        if norm_last_layer:
            self.last_layer.weight_g.requires_grad = False

    def _init_weights(self, m):
        """Initialize learnable parameters."""
        if isinstance(m, nn.Linear):
            nn.init.normal_(m.weight, std=0.02)
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        """Run forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Of shape `(n_samples, in_dim)`.

        Returns
        -------
        torch.Tensor
            Of shape `(n_samples, out_dim)`.
        """
        x = self.mlp(x)  # (n_samples, bottleneck_dim)
        x = nn.functional.normalize(x, dim=-1, p=2)  # (n_samples, bottleneck_dim)
        x = self.last_layer(x)  # (n_samples, out_dim)

        return x


class MultiCropWrapper(nn.Module):
    """Convenience class for forward pass of multiple crops.

    Parameters
    ----------
    backbone : timm.models.vision_transformer.VisionTransformer
        Instantiated Vision Transformer. Note that we will take the `head`
        attribute and replace it with `nn.Identity`.

    new_head : Head
        New head that is going to be put on top of the `backbone`.
    """
    def __init__(self, backbone, new_head):
        super().__init__()
        backbone.head = nn.Identity()  # deactivate original head
        self.backbone = backbone
        self.new_head = new_head

    def forward(self, x):
        """Run the forward pass.

        The different crops are concatenated along the batch dimension
        and then a single forward pass is fun. The resulting tensor
        is then chunked back to per crop tensors.

        Parameters
        ----------
        x : list
            List of `torch.Tensor` each of shape `(n_samples, 3, size, size)`.

        Returns
        -------
        tuple
            Tuple of `torch.Tensor` each of shape `(n_samples, out_dim)` where
            `output_dim` is determined by `Head`.
        """
        n_crops = len(x)
        concatenated = torch.cat(x, dim=0)  # (n_samples * n_crops, 3, size, size)
        cls_embedding = self.backbone(concatenated)  # (n_samples * n_crops, in_dim)
        logits = self.new_head(cls_embedding)  # (n_samples * n_crops, out_dim)
        chunks = logits.chunk(n_crops)  # n_crops * (n_samples, out_dim)

        return chunks


class Loss(nn.Module):
    """The loss function.

    We subclass the `nn.Module` becuase we want to create a buffer for the
    logits center of the teacher.

    Parameters
    ----------
    out_dim : int
        The dimensionality of the final layer (we computed the softmax over).

    teacher_temp, student_temp : float
        Softmax temperature of the teacher resp. student.

    center_momentum : float
        Hyperparameter for the exponential moving average that determines
        the center logits. The higher the more the running average matters.
    """
    def __init__(
        self, out_dim, teacher_temp=0.04, student_temp=0.1, center_momentum=0.9
    ):
        super().__init__()
        self.student_temp = student_temp
        self.teacher_temp = teacher_temp
        self.center_momentum = center_momentum
        self.register_buffer("center", torch.zeros(1, out_dim))

    def forward(self, student_output, teacher_output):
        """Evaluate loss.

        Parameters
        ----------
        student_output, teacher_output : tuple
            Tuple of tensors of shape `(n_samples, out_dim)` representing
            logits. The length is equal to number of crops.
            Note that student processed all crops and that the two initial crops
            are the global ones.

        Returns
        -------
        loss : torch.Tensor
            Scalar representing the average loss.
        """
        student_temp = [s / self.student_temp for s in student_output]
        teacher_temp = [(t - self.center) / self.teacher_temp for t in teacher_output]

        student_sm = [F.log_softmax(s, dim=-1) for s in student_temp]
        teacher_sm = [F.softmax(t, dim=-1).detach() for t in teacher_temp]

        total_loss = 0
        n_loss_terms = 0

        for t_ix, t in enumerate(teacher_sm):
            for s_ix, s in enumerate(student_sm):
                if t_ix == s_ix:
                    continue

                loss = torch.sum(-t * s, dim=-1)  # (n_samples,)
                total_loss += loss.mean()  # scalar
                n_loss_terms += 1

        total_loss /= n_loss_terms
        self.update_center(teacher_output)

        return total_loss

    @torch.no_grad()
    def update_center(self, teacher_output):
        """Update center used for teacher output.

        Compute the exponential moving average.

        Parameters
        ----------
        teacher_output : tuple
            Tuple of tensors of shape `(n_samples, out_dim)` where each
            tensor represents a different crop.
        """
        batch_center = torch.cat(teacher_output).mean(
            dim=0, keepdim=True
        )  # (1, out_dim)
        self.center = self.center * self.center_momentum + batch_center * (
            1 - self.center_momentum
        )

def clip_gradients(model, clip=2.0):
    """Rescale norm of computed gradients.

    Parameters
    ----------
    model : nn.Module
        Module.

    clip : float
        Maximum norm.
    """
    for p in model.parameters():
        if p.grad is not None:
            param_norm = p.grad.data.norm(2)
            clip_coef = clip / (param_norm + 1e-6)
            if clip_coef < 1:
                p.grad.data.mul_(clip_coef)


================================================
FILE: github_adventures/dino/visualize_attentions.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a3bd5ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ipywidgets\n",
    "import matplotlib.pyplot as plt\n",
    "import timm\n",
    "import torch\n",
    "from torchvision.datasets import ImageFolder\n",
    "import torchvision.transforms as transforms\n",
    "from torchvision.utils import make_grid\n",
    "import torch.nn.functional as F"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6eaa0ef",
   "metadata": {},
   "source": [
    "# Helpers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c0b2e7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_last_attention(backbone, x):\n",
    "    \"\"\"Get the attention weights of CLS from the last self-attention layer.\n",
    "\n",
    "    Very hacky!\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    backbone : timm.models.vision_transformer.VisionTransformer\n",
    "        Instantiated Vision Transformer. Note that we will in-place\n",
    "        take the `head` attribute and replace it with `nn.Identity`.\n",
    "\n",
    "    x : torch.Tensor\n",
    "        Batch of images of shape `(n_samples, 3, size, size)`.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    torch.Tensor\n",
    "        Attention weights `(n_samples, n_heads, n_patches)`.\n",
    "    \"\"\"\n",
    "    attn_module = backbone.blocks[-1].attn\n",
    "    n_heads = attn_module.num_heads\n",
    "\n",
    "    # define hook\n",
    "    inp = None\n",
    "    def fprehook(self, inputs):\n",
    "        nonlocal inp\n",
    "        inp = inputs[0]\n",
    "\n",
    "    # Register a hook\n",
    "    handle = attn_module.register_forward_pre_hook(fprehook)\n",
    "\n",
    "    # Run forward pass\n",
    "    _ = backbone(x)\n",
    "    handle.remove()\n",
    "\n",
    "    B, N, C = inp.shape\n",
    "    qkv = attn_module.qkv(inp).reshape(B, N, 3, n_heads, C // n_heads).permute(2, 0, 3, 1, 4)\n",
    "    q, k, v = qkv[0], qkv[1], qkv[2]\n",
    "\n",
    "    attn = (q @ k.transpose(-2, -1)) * attn_module.scale\n",
    "    attn = attn.softmax(dim=-1)\n",
    "\n",
    "    return attn[:, :, 0, 1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57b72b84",
   "metadata": {},
   "outputs": [],
   "source": [
    "def threshold(attn, k=30):\n",
    "    n_heads = len(attn)\n",
    "    indices = attn.argsort(dim=1, descending=True)[:, k:]\n",
    "\n",
    "    for head in range(n_heads):\n",
    "        attn[head, indices[head]] = 0\n",
    "\n",
    "    attn /= attn.sum(dim=1, keepdim=True)\n",
    "\n",
    "    return attn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59e9009d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def visualize_attention(img, backbone, k=30):\n",
    "    \"\"\"Create attention image.\n",
    "\n",
    "    Parameteres\n",
    "    -----------\n",
    "    img : PIL.Image\n",
    "        RGB image.\n",
    "\n",
    "    backbone : timm.models.vision_transformer.VisionTransformer\n",
    "        The vision transformer.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    new_img : torch.Tensor\n",
    "        Image of shape (n_heads, 1, height, width).\n",
    "    \"\"\"\n",
    "    # imply parameters\n",
    "\n",
    "    patch_size = backbone.patch_embed.proj.kernel_size[0]\n",
    "\n",
    "    transform = transforms.Compose([\n",
    "\n",
    "        transforms.Resize((224, 224)),\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    device = next(backbone.parameters()).device\n",
    "    x = transform(img)[None, ...].to(device)\n",
    "    attn = get_last_attention(backbone, x)[0]  # (n_heads, n_patches)\n",
    "    attn = attn / attn.sum(dim=1, keepdim=True)  # (n_heads, n_patches)\n",
    "    attn = threshold(attn, k)\n",
    "    attn = attn.reshape(-1, 14, 14)  # (n_heads, 14, 14)\n",
    "    attn = F.interpolate(attn.unsqueeze(0),\n",
    "        scale_factor=patch_size,\n",
    "        mode=\"nearest\"\n",
    "        )[0]\n",
    "\n",
    "    return attn"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df0972ec",
   "metadata": {},
   "source": [
    "# Preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6e0d987",
   "metadata": {},
   "outputs": [],
   "source": [
    "models = {\n",
    "    \"supervised\": timm.create_model(\"vit_deit_small_patch16_224\", pretrained=True),\n",
    "    \"selfsupervised\": torch.load(\"best_model.pth\", map_location=\"cpu\").backbone,\n",
    "}\n",
    "dataset = ImageFolder(\"data/imagenette2-320/val\")\n",
    "\n",
    "colors = [\"yellow\", \"red\", \"green\", \"blue\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "690e3a1f",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "@ipywidgets.interact\n",
    "def _(\n",
    "    i=ipywidgets.IntSlider(min=0, max=len(dataset) - 1, continuous_update=False),\n",
    "    k=ipywidgets.IntSlider(min=0, max=195, value=10, continuous_update=False),\n",
    "    model=ipywidgets.Dropdown(options=[\"supervised\", \"selfsupervised\"]),\n",
    "):\n",
    "    img = dataset[i][0]\n",
    "    attns = visualize_attention(img, models[model], k=k).detach()[:].permute(1, 2, 0).numpy()\n",
    "\n",
    "    tform = transforms.Compose([\n",
    "\n",
    "        transforms.Resize((224, 224)),\n",
    "    ])\n",
    "    # original image\n",
    "    plt.imshow(tform(img))\n",
    "    plt.axis(\"off\")\n",
    "    plt.show()\n",
    "\n",
    "    kwargs = {\"vmin\": 0, \"vmax\": 0.24}\n",
    "    # Attentions\n",
    "    n_heads = 6\n",
    "\n",
    "    fig, axs = plt.subplots(2, 3, figsize=(10, 7))\n",
    "    \n",
    "    for i in range(n_heads):\n",
    "        ax = axs[i // 3, i % 3]\n",
    "        ax.imshow(attns[..., i], **kwargs)\n",
    "        ax.axis(\"off\")\n",
    "        \n",
    "    plt.tight_layout()\n",
    "        \n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d83eae10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3244, 1942, 3482, 688, 1509, 3709"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: github_adventures/dino/visualize_augmentations.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5801191a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "import ipywidgets\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import torch\n",
    "from PIL import Image\n",
    "from torchvision.datasets import ImageFolder\n",
    "\n",
    "from utils import DataAugmentation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad4f7f91",
   "metadata": {},
   "outputs": [],
   "source": [
    "def to_numpy(t):\n",
    "    array = torch.clip((t * 0.224) + 0.45, 0, 1).permute(1, 2, 0).numpy()\n",
    "    return array\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db09874a",
   "metadata": {},
   "outputs": [],
   "source": [
    "transform = DataAugmentation(n_local_crops=2)\n",
    "dataset = ImageFolder(\"data/imagenette2-320/train/\", transform=transform)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48738037",
   "metadata": {},
   "outputs": [],
   "source": [
    "@ipywidgets.interact\n",
    "def _(\n",
    "    i=ipywidgets.IntSlider(min=0, max=len(dataset) - 1, continuous_update=False),\n",
    "    seed=ipywidgets.IntSlider(min=0, max=50, continuous_update=False),\n",
    "):\n",
    "    torch.manual_seed(seed)\n",
    "    all_crops, _ = dataset[i]\n",
    "    titles = [\"Global 1\", \"Global 2\", \"Local 1\", \"Local 2\"]\n",
    "    \n",
    "    original_img = np.array(Image.open(dataset.samples[i][0]))\n",
    "    _, ax_orig = plt.subplots(figsize=(15, 5))\n",
    "    ax_orig.imshow(original_img)\n",
    "    ax_orig.set_title(\"Original\")\n",
    "    ax_orig.axis(\"off\")\n",
    "    \n",
    "    \n",
    "    fig, axs = plt.subplots(2, 2, figsize=(10, 10))\n",
    "    \n",
    "    for i, title in enumerate(titles):\n",
    "        ax = axs[i // 2, i % 2]\n",
    "        ax.imshow(to_numpy(all_crops[i]))\n",
    "        ax.set_title(title)\n",
    "        ax.axis(\"off\")\n",
    "    fig.tight_layout()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: github_adventures/gpt/README.md
================================================
# GPT-2 custom implementation
## Installation

```python
pip install -r requirements.txt
```

## Launching script
To copy weights of an official model + generate some text use the script
`copy_and_generate.py`

```python
(gpt) gpt$ python copy_and_generate.py --help
usage: Copy weights of a HF model and generate text. [-h] [--sample] [-s STEPS] [-r RANDOM_STATE]
                                                     [-t TEMPERATURE] [-k TOP_K] [-v]
                                                     {gpt2,gpt2-medium,gpt2-large,distilgpt2}
                                                     initial_text

positional arguments:
  {gpt2,gpt2-medium,gpt2-large,distilgpt2}
                        Pretrained model to use
  initial_text          Initial text

optional arguments:
  -h, --help            show this help message and exit
  --sample              If True sample randomly otherwise take the most probable token (default: False)
  -s STEPS, --steps STEPS
                        Number of new tokens to generate (default: 30)
  -r RANDOM_STATE, --random-state RANDOM_STATE
                        Random state (default: None)
  -t TEMPERATURE, --temperature TEMPERATURE
                        Softmax logits temperature (default: 1)
  -k TOP_K, --top-k TOP_K
                        If specified, then selecting k most probable tokens (default: None)
  -v, --verbose         If True, then verbose (default: False)

```


================================================
FILE: github_adventures/gpt/copy_and_generate.py
================================================
import argparse
import logging

import torch

from model import GPT
from transformers import AutoModelForCausalLM, AutoTokenizer
from utils import copy_model, generate_token

logging.basicConfig(format="[%(levelname)s] %(asctime)s %(message)s")
logger = logging.getLogger(__file__)


def main(argv=None):
    """Copy weights and generate some text."""
    parser = argparse.ArgumentParser(
        "Copy weights of a HF model and generate text.",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )

    parser.add_argument(
        "model_name",
        type=str,
        choices=("gpt2", "gpt2-medium", "gpt2-large", "distilgpt2"),
        help="Pretrained model to use",
    )
    parser.add_argument(
        "initial_text",
        type=str,
        help="Initial text",
    )
    parser.add_argument(
        "--sample",
        action="store_true",
        help="If True sample randomly otherwise take the most probable token",
    )
    parser.add_argument(
        "-s",
        "--steps",
        default=30,
        type=int,
        help="Number of new tokens to generate",
    )
    parser.add_argument("-r", "--random-state", type=int, help="Random state")
    parser.add_argument(
        "-t",
        "--temperature",
        default=1,
        type=float,
        help="Softmax logits temperature",
    )
    parser.add_argument(
        "-k",
        "--top-k",
        type=int,
        help="If specified, then selecting k most probable tokens",
    )
    parser.add_argument(
        "-v", "--verbose", action="store_true", help="If True, then verbose"
    )

    args = parser.parse_args(argv)

    # Setup logging
    if args.verbose:
        logger.setLevel(logging.INFO)
    else:
        logger.setLevel(logging.WARNING)

    logger.info(f"CLI parameters: {vars(args)})")
    tokenizer = AutoTokenizer.from_pretrained(args.model_name)

    model_official = AutoModelForCausalLM.from_pretrained(args.model_name)
    config_official = model_official.config

    our_params = [
        "vocab_size",
        "n_layer",
        "n_embd",
        "n_head",
        "n_positions",
        "attn_pdrop",
        "embd_pdrop",
        "resid_pdrop",
        "layer_norm_epsilon",
    ]

    config_ours = {k: getattr(config_official, k) for k in our_params}
    logger.info(f"Model hyperparameters: {config_ours}")

    model_ours = GPT(**config_ours)
    model_ours.eval()

    copy_model(model_official, model_ours)

    token_ixs = tokenizer(args.initial_text)["input_ids"]

    if args.random_state:
        torch.manual_seed(args.random_state)

    # Sample
    for step in range(args.steps):
        new_token_ix = generate_token(
            model_ours,
            token_ixs,
            sample=args.sample,
            top_k=args.top_k,
            temperature=args.temperature,
        )
        token_ixs.append(new_token_ix)
        logger.info(f"Step {step} done")

    text = tokenizer.decode(token_ixs)
    print(text)


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/gpt/distribution_visualizations.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "896ffe86",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ipywidgets\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import torch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09b6e1f4",
   "metadata": {},
   "source": [
    "# <center> Applying temperature + keeping only top K values</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c7442cf",
   "metadata": {},
   "source": [
    "$T=\\mbox{temperature}$ $$\\large P_i=\\frac{e^{\\frac{y_i}T}}{\\sum_{k=1}^n e^{\\frac{y_k}T}}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95833de6",
   "metadata": {},
   "outputs": [],
   "source": [
    "@ipywidgets.interact\n",
    "def _(\n",
    "    n_tokens=ipywidgets.IntSlider(min=4, max=30, value=8, continuous_update=False),\n",
    "    random_state=ipywidgets.IntSlider(min=0, max=10, value=2, continuous_update=False),\n",
    "    temperature=ipywidgets.FloatSlider(min=0, max=10, value=1, continuous_update=False),\n",
    "    top_k=ipywidgets.IntSlider(min=1, max=20, value=8, continuous_update=False),\n",
    "    ):\n",
    "    # Preparations\n",
    "    top_k = min(top_k, n_tokens)\n",
    "    torch.manual_seed(random_state)\n",
    "    logits = 10 * torch.rand(n_tokens,)\n",
    "\n",
    "\n",
    "    # Generate original\n",
    "    probs_orig = torch.nn.functional.softmax(logits, dim=0).numpy()\n",
    "    \n",
    "    # Generate new\n",
    "    logits = logits / temperature\n",
    "    top_values, _ = torch.topk(logits, top_k)  # (top_k,)                                                                                                                                                                                 \n",
    "    logits[logits < top_values.min()] = -torch.inf       \n",
    "    probs_new = torch.nn.functional.softmax(logits, dim=0).numpy()\n",
    "\n",
    "    # Plotting\n",
    "    fig, (ax_orig, ax_new) = plt.subplots(1, 2, sharey=True, figsize=(10, 2), dpi=100)\n",
    "    x = range(n_tokens)\n",
    "\n",
    "    ax_orig.bar(x, probs_orig)\n",
    "    ax_orig.set_ylim((0, 1))\n",
    "    ax_orig.set_title(\"Original\")\n",
    "    \n",
    "    ax_new.bar(x, probs_new)\n",
    "    ax_new.set_title(\"Temperature + top K\")\n",
    "    \n",
    "    plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: github_adventures/gpt/ipython_code.py
================================================
>>> import torch
>>> from model import GPT
>>> from transformers import AutoModelForCausalLM
>>> hparams_names = [
...     "vocab_size",
...     "n_layer",
...     "n_embd",
...     "n_head",
...     "n_positions",
...     "attn_pdrop",
...     "embd_pdrop",
...     "resid_pdrop",
...     "layer_norm_epsilon",
...     ]
...
>>> model_name = "gpt2"
>>> model_official = AutoModelForCausalLM.from_pretrained(model_name, tie_word_embeddings=False)
>>> config_official = model_official.config
>>> config_official
>>> config_ours = {name: getattr(config_official, name) for name in hparams_names}
>>> config_ours
>>> model_ours = GPT(**config_ours)
>>> sum(p.numel() for p in model_ours.parameters())
>>> sum(p.numel() for p in model_official.parameters())
>>> _ = model_official.eval()
>>> _ = model_ours.eval()
>>> idx = torch.tensor([[1, 123, 52, 28]], dtype=torch.long)
>>> logits_official = model_official(idx).logits
>>> logits_ours = model_ours(idx)
>>> logits_official.shape
>>> logits_ours.shape
>>> torch.allclose(logits_ours, logits_official, rtol=0, atol=1e-3)
>>> (logits_ours - logits_official).abs().max()
>>> from utils import copy_model
>>> copy_model(model_official, model_ours)
>>> logits_official = model_official(idx).logits
>>> logits_ours = model_ours(idx)
>>> torch.allclose(logits_ours, logits_official, rtol=0, atol=1e-3)
>>> (logits_ours - logits_official).abs().max()


================================================
FILE: github_adventures/gpt/model.py
================================================
import torch
import torch.nn as nn

from transformers.activations import gelu_new


class CustomGELU(nn.Module):
    """GELU implementation taken from the `transformers`."""

    def forward(self, x):
        """Run forward pass."""
        return gelu_new(x)


class Block(nn.Module):
    """Decoder block.

    Parameters
    ----------
    n_embd : int
        Dimensionality of the embeddings.

    n_head : int
        Number of attention heads.

    n_positions : int
        Maximum number of tokens.

    attn_pdrop : float
        Probability of dropout on attention weights.

    resid_pdrop : float
        Probability of dropout after applying the MLP.

    layer_norm_epsilon : float
        Hyperparameter of layer normalization.

    Attributes
    ----------
    ln_1, ln_2 : nn.LayerNorm
        Layer norms.

    attention : nn.MultiHeadAttention
        Attention module.

    mlp : nn.Sequential
        Multilayer perceptron.

    """

    def __init__(
        self,
        *,
        n_embd,
        n_head,
        n_positions,
        attn_pdrop,
        resid_pdrop,
        layer_norm_epsilon,
    ):
        super().__init__()

        self.ln_1 = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)
        self.ln_2 = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)

        self.attention = nn.MultiheadAttention(
            embed_dim=n_embd,
            num_heads=n_head,
            dropout=attn_pdrop,
            bias=True,
            batch_first=True,
        )
        self.register_buffer(
            "mask",
            (1 - torch.tril(torch.ones(n_positions, n_positions))).to(
                dtype=torch.bool
            ),
        )

        self.mlp = nn.Sequential(
            nn.Linear(n_embd, 4 * n_embd),
            CustomGELU(),
            nn.Linear(4 * n_embd, n_embd),
            nn.Dropout(resid_pdrop),
        )

    def forward(self, x):
        """Run forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input tensor of shape `(batch_size, n_tokens, n_embd)`.

        Returns
        -------
        torch.Tensor
            Output tensor of shape `(batch_size, n_tokens, n_embd)`.
        """
        batch_size, n_tokens, n_embd = x.shape

        x_ = self.ln_1(x)  # (batch_size, n_tokens, n_embd)

        mask = self.mask[:n_tokens, :n_tokens]  # (n_tokens, n_tokens)

        attn_out, _ = self.attention(
            x_, x_, x_, attn_mask=mask, need_weights=False
        )  # (batch_size, n_tokens, n_embd)
        x = x + attn_out  # (batch_size, n_tokens, n_embd)
        x = x + self.mlp(self.ln_2(x))  # (batch_size, n_tokens, n_embd)

        return x


class GPT(nn.Module):
    """Entire GPT model.

    Parameters
    ----------
    vocab_size : int
        Number of tokens in the vocabulary.

    n_layer : int
        Number of decoder blocks to include.

    n_embd : int
        Dimensionality of the embeddings.

    n_head : int
        Number of attention heads.

    n_positions : int
        Maximum number of tokens.

    attn_pdrop : float
        Probability of dropout on attention weights.

    embd_pdrop : float
        Probability of dropout on the sum of embeddings.

    resid_pdrop : float
        Probability of dropout after applying the MLP.

    layer_norm_epsilon : float
        Hyperparameter of layer normalization.

    Attributes
    ----------
    token_emb : nn.Embedding
        Token embeddings.

    pos_emb : nn.Embedding
        Positional embedding.

    drop : nn.Dropout
        Dropout module to be applied on the sum of embeddings.

    blocks : nn.Sequential
        List of decoder blocks.

    ln : nn.LayerNorm
        Layer norm applied before applying `head`.

    head : nn.Linear
        Final linear layer.
    """

    def __init__(
        self,
        *,
        vocab_size,
        n_layer,
        n_embd,
        n_head,
        n_positions,
        attn_pdrop,
        embd_pdrop,
        resid_pdrop,
        layer_norm_epsilon,
    ):
        super().__init__()
        self.n_positions = n_positions
        self.token_emb = nn.Embedding(vocab_size, n_embd)
        self.pos_emb = nn.Embedding(n_positions, n_embd)

        self.drop = nn.Dropout(embd_pdrop)

        self.blocks = nn.Sequential(
            *[
                Block(
                    n_embd=n_embd,
                    n_head=n_head,
                    n_positions=n_positions,
                    attn_pdrop=attn_pdrop,
                    resid_pdrop=resid_pdrop,
                    layer_norm_epsilon=layer_norm_epsilon,
                )
                for _ in range(n_layer)
            ]
        )
        self.ln = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)
        self.head = nn.Linear(n_embd, vocab_size, bias=False)

    def forward(self, idx):
        """Run forward pass.

        Parameters
        ----------
        idx : torch.Tensor
            Integer tensor of shape `(batch_size, n_tokens)` where each
            element is in the range `[0, vocab_size)`.

        Returns
        -------
        logits : torch.Tensor
            Tensor of shape `(batch_size, n_tokens, vocab_size)`.
        """
        batch_size, n_tokens = idx.shape
        device = idx.device

        if n_tokens > self.n_positions:
            raise ValueError("There are too many tokens in the input")

        positions = torch.arange(n_tokens, device=device)  # (n_tokens,)

        token_emb = self.token_emb(idx)  # (batch_size, n_tokens, n_embd)
        pos_emb = self.pos_emb(positions)[None, ...]  # (1, n_tokens, n_embd)
        x = self.drop(token_emb + pos_emb)  # (batch_size, n_tokens, n_embd)
        x = self.blocks(x)  # (batch_size, n_tokens, n_embd)
        x = self.ln(x)  # (batch_size, n_tokens, n_embd)
        logits = self.head(x)  # (batch_size, n_tokens, vocab_size)

        return logits


================================================
FILE: github_adventures/gpt/requirements.txt
================================================
ipython==7.30.1
ipywidgets==7.6.5
jupyter==1.0.0
matplotlib==3.5.1
numpy==1.21.5
torch==1.10.1
-e git+https://github.com/huggingface/transformers.git@05fa1a7ac17bb7aa07b9e0c1e138ecb31a28bbfe#egg=transformers


================================================
FILE: github_adventures/gpt/utils.py
================================================
import torch


def copy_parameter(param_official, param_ours):
    """Copy values of one tensor to another tensor.

    Parameters
    ----------
    param_official : torch.Tensor
        The value of this tensor will be copied.

    param_ours : torch.Tensor
        This tensor will be overwritten in-place with the values from
        `param_official`.
    """
    if param_official.shape != param_ours.shape:
        raise ValueError("The shapes of the provided tensors are different")

    with torch.no_grad():
        param_ours.copy_(param_official)


def copy_block(block_official, block_ours):
    """Copy all parameters within a transformer block.

    Parameters
    ----------
    block_official : transformers.models.gpt2.modeling_gpt2.GPT2Block
        Block coming from the huggingface code.

    block_ours : Block
        Our block.
    """
    b_a = block_official
    b_b = block_ours

    # LN 1
    copy_parameter(b_a.ln_1.weight, b_b.ln_1.weight)
    copy_parameter(b_a.ln_1.bias, b_b.ln_1.bias)

    # Attention
    copy_parameter(b_a.attn.c_attn.weight.T, b_b.attention.in_proj_weight)
    copy_parameter(b_a.attn.c_attn.bias, b_b.attention.in_proj_bias)

    copy_parameter(b_a.attn.c_proj.weight.T, b_b.attention.out_proj.weight)
    copy_parameter(b_a.attn.c_proj.bias, b_b.attention.out_proj.bias)

    # LN 2
    copy_parameter(b_a.ln_2.weight, b_b.ln_2.weight)
    copy_parameter(b_a.ln_2.bias, b_b.ln_2.bias)

    # MLP
    copy_parameter(b_a.mlp.c_fc.weight.T, b_b.mlp[0].weight)
    copy_parameter(b_a.mlp.c_fc.bias, b_b.mlp[0].bias)

    copy_parameter(b_a.mlp.c_proj.weight.T, b_b.mlp[2].weight)
    copy_parameter(b_a.mlp.c_proj.bias, b_b.mlp[2].bias)


def copy_model(model_official, model_ours):
    """Copy all trainable weights.

    Parameters
    ----------
    model_official : transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel
        Huggingface model.

    model_ours : GPT
        Our model.
    """
    m_a = model_official
    m_b = model_ours

    # Token and positional embeddings
    copy_parameter(m_a.transformer.wpe.weight, m_b.pos_emb.weight)
    copy_parameter(m_a.transformer.wte.weight, m_b.token_emb.weight)

    # Blocks
    for block_official, block_ours in zip(m_a.transformer.h, m_b.blocks):
        copy_block(block_official, block_ours)

    # Head
    copy_parameter(m_a.transformer.ln_f.weight, m_b.ln.weight)
    copy_parameter(m_a.transformer.ln_f.bias, m_b.ln.bias)
    copy_parameter(m_a.lm_head.weight, m_b.head.weight)


@torch.no_grad()
def generate_token(
    model, token_ixs, temperature=1.0, sample=False, top_k=None
):
    """Generate a single token given previous tokens.

    Parameters
    ----------
    model : GPT
        Our GPT model.

    token_ixs : list
        List of conditional input token ids.

    temperature : float
        The higher the more variability and vice versa.

    sample : bool
        If True, we sample from the distribution (=there is randomness). If
        False, we just take the argmax (=there is no randomness).

    top_k : int or None
        If not None then we modify the distribution to only contain the `top_k`
        most probable outcomes.

    Returns
    -------
    new_token_ix : int
        Index of the new token.
    """
    context_token_ixs = token_ixs[-model.n_positions :]
    ixs = torch.tensor(context_token_ixs).to(dtype=torch.long)[
        None, :
    ]  # (1, n_tokens)

    logits_all = model(ixs)  # (1, n_tokens, vocab_size)
    logits = logits_all[0, -1, :]  # (vocab_size,)
    logits = logits / temperature  # (vocab_size,)

    if top_k is not None:
        # Find the top k biggest elements, set the remaining elements to -inf
        top_values, _ = torch.topk(logits, top_k)  # (top_k,)
        logits[logits < top_values.min()] = -torch.inf

    probs = torch.nn.functional.softmax(logits, dim=0)  # (vocab_size,)

    if sample:
        new_token_ix = torch.multinomial(probs, num_samples=1)
    else:
        new_token_ix = probs.argmax()

    return new_token_ix.item()


================================================
FILE: github_adventures/integer/README.md
================================================
# On-line encyclopedia of integer sequences
You can use the `fetch_data.py` to download the sequences. However,
I actually found out (after filming the video) that you can literally
download all the sequences here:
https://oeis.org/wiki/Welcome#Compressed_Versions


So you should probably do that and spare their API.

# The GloVe embeddings
The one that I used in the video are located here:
https://nlp.stanford.edu/data/glove.6B.zip


================================================
FILE: github_adventures/integer/bert.py
================================================
import argparse

import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
from transformers import BertModel, BertTokenizer

from utils import create_classification_targets, train_classifier


def main(argv=None):
    parser = argparse.ArgumentParser("Evaluating BERT integer embeddings")

    parser.add_argument(
        "log_folder",
        type=str,
        help="Folder where to log results",
    )
    parser.add_argument(
        "--max-value-eval",
        type=int,
        default=500,
        help="Number of integers to run the evaluation on",
    )
    args = parser.parse_args(argv)
    model_name = "bert-base-uncased"

    # Create writer
    writer = SummaryWriter(args.log_folder)

    tokenizer = BertTokenizer.from_pretrained(model_name)
    model = BertModel.from_pretrained(model_name)

    # Retrieve embeddings
    to_find = list(map(str, range(args.max_value_eval)))
    positions = np.array(tokenizer.convert_tokens_to_ids(to_find))
    unk_token_position = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)
    is_valid = positions != unk_token_position

    print(
        "The following numbers are missing",
        [i for i, x in enumerate(is_valid) if not x],
    )

    arange = np.arange(args.max_value_eval)
    numbers = arange[is_valid]
    embeddings = (
        model.embeddings.word_embeddings(torch.from_numpy(positions[is_valid]))
        .detach()
        .numpy()
    )

    ys_clf = create_classification_targets(numbers)

    keys = sorted(ys_clf.keys())
    metadata = np.array([numbers] + [ys_clf[k] for k in keys]).T.tolist()
    metadata_header = ["value"] + keys

    for name, y in ys_clf.items():
        metrics = train_classifier(embeddings, y)
        for metric_name, value in metrics.items():
            writer.add_scalar(
                f"{name}/{metric_name}",
                value,
            )

    writer.add_embedding(
        embeddings,
        metadata=metadata,
        metadata_header=metadata_header,
    )


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/integer/experiments.sh
================================================
set -x

OUTPUT_PATH=results
GLOVE_PATH=glove.6B.300d.txt
SEQUENCES_PATH=raw_data.pkl
MAX_VALUE_EVAL=500

python glove.py --max-value-eval $MAX_VALUE_EVAL $GLOVE_PATH $OUTPUT_PATH/glove
python bert.py --max-value-eval $MAX_VALUE_EVAL $OUTPUT_PATH/BERT
python lstm.py \
    $SEQUENCES_PATH \
    $OUTPUT_PATH/LSTM \
    --batch-size 128 \
    --device cuda \
    --embedding-dim 128 \
    --hidden-dim 256 \
    --max-value-eval $MAX_VALUE_EVAL \
    --max-value 20000 \
    --n-epochs 20000 \
    --sequence-len 100


================================================
FILE: github_adventures/integer/fetch_data.py
================================================
import pathlib
import pickle

import requests

from joblib import Parallel, delayed, parallel_backend


def get_sequence(sequence_id):
    """Get an integer sequence from the online OEIS.

    Parameters
    ----------
    sequence_id : int
        Unique identifier for the desired sequence.

    Returns
    -------
    sequence : list
        List of integers

    Raises
    ------
    HTTPError
        Was not possible to get the given sequence
    """
    url = f"https://oeis.org/search?fmt=json&q=id:A{sequence_id:07}"
    print(sequence_id)
    response = requests.get(url)

    response.raise_for_status()

    data_str = response.json()["results"][0]["data"]
    sequence = [int(x) for x in data_str.split(",")]

    return sequence


if __name__ == "__main__":
    # Parameters
    n_sequences = 5000
    start_id = 1  # seems like 1 - 340_000 are valid sequences
    n_jobs = 64
    backend = "threading"  # "threading" or "loky"

    # Preparation
    end_id = start_id + n_sequences
    output_folder = pathlib.Path("data/")
    output_folder.mkdir(exist_ok=True, parents=True)
    output_path = output_folder / f"{start_id}_{end_id - 1}.pkl"

    with parallel_backend(backend, n_jobs=n_jobs):
        res = Parallel()(delayed(get_sequence)(i) for i in range(start_id, end_id))

    with output_path.open("wb") as f:
        pickle.dump(res, f)



================================================
FILE: github_adventures/integer/glove.py
================================================
import argparse

import numpy as np
from torch.utils.tensorboard import SummaryWriter

from utils import create_classification_targets, train_classifier


def main(argv=None):
    parser = argparse.ArgumentParser("Evaluating GloVe integer embeddings")

    parser.add_argument(
        "glove_path",
        type=str,
        help="Path to a txt file holding the GloVe embeddings",
    )
    parser.add_argument(
        "log_folder",
        type=str,
        help="Folder where to log results",
    )
    parser.add_argument(
        "--max-value-eval",
        type=int,
        default=500,
        help="Number of integers to run the evaluation on",
    )
    parser.add_argument(
        "--dim",
        type=int,
        default=300,
        help="Dimensionality of the embeddings",
    )
    args = parser.parse_args()

    # Create writer
    writer = SummaryWriter(args.log_folder)

    # Retrieve embeddings
    to_find = set(map(str, range(args.max_value_eval)))
    embeddings = np.empty((args.max_value_eval, args.dim))

    with open(args.glove_path) as f:
        for line in f:
            token, *vector_ = line.split(" ")

            if token in to_find:
                embeddings[int(token)] = list(map(float, vector_))
                to_find.remove(token)

    assert not to_find

    arange = np.arange(args.max_value_eval)
    ys_clf = create_classification_targets(arange)

    keys = sorted(ys_clf.keys())
    metadata = np.array([arange] + [ys_clf[k] for k in keys]).T.tolist()
    metadata_header = ["value"] + keys

    for name, y in ys_clf.items():
        metrics = train_classifier(embeddings, y)
        for metric_name, value in metrics.items():
            writer.add_scalar(
                f"{name}/{metric_name}",
                value,
            )

    writer.add_embedding(
        embeddings,
        metadata=metadata,
        metadata_header=metadata_header,
    )


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/integer/lstm.py
================================================
import argparse
import json
import pathlib
import pickle

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import tqdm
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

from utils import (
    CustomDataset,
    Network,
    create_classification_targets,
    train_classifier,
)


def main(argv=None):
    parser = argparse.ArgumentParser("Embedding integers using LSTM")

    parser.add_argument(
        "data_path", type=str, help="Path to the pickled sequences"
    )

    parser.add_argument(
        "log_folder", type=str, help="Folder where to log results"
    )

    parser.add_argument(
        "-b", "--batch-size", type=int, default=128, help="Batch size"
    )

    parser.add_argument(
        "-d", "--dense-dim", type=int, default=256, help="Dense dimension"
    )

    parser.add_argument("--device", type=str, default="cpu", help="Device")

    parser.add_argument(
        "-e",
        "--embedding-dim",
        type=int,
        default=128,
        help="Embedding dimension",
    )

    parser.add_argument(
        "--hidden-dim", type=int, default=256, help="Hidden dimension"
    )
    parser.add_argument(
        "--max-value-eval",
        type=int,
        default=500,
        help="Evaluation limit",
    )

    parser.add_argument(
        "-m",
        "--max-value",
        type=int,
        default=20000,
        help="The maximum allowed value (non inclusive)",
    )

    parser.add_argument(
        "-n", "--n-epochs", type=int, default=100, help="Number of epochs"
    )

    parser.add_argument(
        "-l",
        "--sequence-len",
        type=int,
        default=100,
        help="The maximum length of a sequence",
    )

    args = parser.parse_args(argv)

    # Preparations
    device = torch.device(args.device)
    eval_frequency = 500

    log_folder = pathlib.Path(args.log_folder)
    model_path = log_folder / "checkpoint.pth"

    writer = SummaryWriter(log_folder)
    writer.add_text("parameters", json.dumps(vars(args)))

    # Dataset related
    data_path = pathlib.Path(args.data_path)
    with data_path.open("rb") as f:
        raw_sequences = pickle.load(f)

    dataset = CustomDataset(
        raw_sequences,
        max_value=args.max_value,
        sequence_len=args.sequence_len,
    )

    fig, ax = plt.subplots()
    ax.hist(dataset.normalized_sequences.ravel(), bins=100)
    ax.set_title(
        f"Number distribution (numbers={dataset.normalized_sequences.shape})"
    )
    writer.add_figure("number distribution", fig)

    dataloader = DataLoader(
        dataset,
        shuffle=True,
        batch_size=args.batch_size,
        pin_memory=True,
    )

    # Newtork, loss and the optimizer
    net = Network(
        max_value=args.max_value,
        hidden_dim=args.hidden_dim,
        embedding_dim=args.embedding_dim,
        dense_dim=args.dense_dim,
    )

    net.to(device)

    loss_inst = nn.CrossEntropyLoss(
        ignore_index=args.max_value,
    )

    optimizer = torch.optim.Adam(net.parameters())

    # Validation preparation
    max_value_eval = args.max_value_eval or args.max_value
    arange = np.arange(max_value_eval)
    ys_clf = create_classification_targets(arange)

    keys = sorted(ys_clf.keys())
    metadata = np.array([arange] + [ys_clf[k] for k in keys]).T.tolist()
    metadata_header = ["value"] + keys

    step = 0
    for _ in range(args.n_epochs):
        for x in tqdm.tqdm(dataloader):
            x = x.to(device)
            logits_ = net(x)  # (batch_size, sequence_len, max_value)

            logits = logits_[:, :-1].permute(
                0, 2, 1
            )  # (batch_size, max_value, sequence_len - 1)
            target = x[:, 1:]  # (batch_size, sequence_len - 1)
            loss = loss_inst(logits, target)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            writer.add_scalar("loss", loss, step)

            if step % eval_frequency == 0:
                X = (
                    net.embedding.weight.detach()
                    .cpu()
                    .numpy()[:max_value_eval]
                )

                writer.add_embedding(
                    X,
                    global_step=step,
                    tag="Integer embeddings",
                    metadata=metadata,
                    metadata_header=metadata_header,
                )

                for name, y in ys_clf.items():
                    metrics = train_classifier(X, y)
                    for metric_name, value in metrics.items():
                        writer.add_scalar(
                            f"{name}/{metric_name}",
                            value,
                            step,
                        )
                torch.save(net, model_path)

            step += 1



if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/integer/requirements.txt
================================================
joblib
matplotlib
numpy
requests
scikit-learn
sympy
tensorboard
torch
transformers


================================================
FILE: github_adventures/integer/utils.py
================================================
import numpy as np
import torch
import torch.nn as nn
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sympy.ntheory import isprime
from torch.utils.data import Dataset


class CustomDataset(Dataset):
    """Dataset containing integer sequences.

    Parameters
    ----------
    raw_sequences : list of list of str
        Containing the original raw sequences. Note
        that their length differs.

    sequence_len : int
        The lenght og the sequence. If the original sequence is shorter,
        we just pad it with `max_value`. If the original sequence is longer
        we simply cut if off.

    max_value : int
        The maximum allowed value (non inclusive). We will only consider
        sequences that had the first `sequence_len` elements in
        the range `[0, max_value)`.

    Attributes
    ----------
    normalized_sequences : np.ndarray
        2D array of shape `(n_sequences, sequence_len)`. It only contains
        sequences that had the first `sequence_len` elements in
        the range `[0, max_value)`.
    """

    def __init__(
        self,
        raw_sequences,
        sequence_len=80,
        max_value=2000,
    ):
        filtered_sequences = list(
            filter(
                lambda seq: all(
                    0 <= x < max_value for x in seq[:sequence_len]
                ),
                raw_sequences,
            )
        )

        n_sequences = len(filtered_sequences)

        self.normalized_sequences = max_value * np.ones(
            (n_sequences, sequence_len),
            dtype=np.int64,
        )

        for i, seq in enumerate(filtered_sequences):
            actual_len = min(len(seq), sequence_len)
            self.normalized_sequences[i, :actual_len] = seq[:actual_len]

    def __len__(self):
        """Get the length of the dataset."""
        return len(self.normalized_sequences)

    def __getitem__(self, ix):
        """Get a single sample of the dataset."""
        return self.normalized_sequences[ix]


class Network(nn.Module):
    """Network predicting next number in the sequence.

    Parameters
    ----------
    max_value : int
        Maximum integer value allowed inside of the sequence. We will
        generate an embedding for each of the numbers in `[0, max_value]`.

    embedding_dim : int
        Dimensionality of the integer embeddings.

    n_layers : int
        Number of layers inside of the LSTM.

    hidden_dim : int
        Dimensionality of the hidden state (LSTM).

    dense_dim : int
        Dimensionality of the dense layer.

    Attributes
    ----------
    embedding : torch.nn.Embedding
        Embeddings of all the integers.

    lstm : torch.nn.LSTM
        LSTM subnetwork. Inputs integer embeddings and outputs
        new hidden states.

    linear : torch.nn.Linear
        Inputs hidden states and tranforms them.

    classifier : torch.nn.Linear
        Inputs outputs of the `linear` and outputs the logits
        over all possible integers.
    """

    def __init__(
        self,
        max_value=2000,
        embedding_dim=100,
        n_layers=2,
        hidden_dim=64,
        dense_dim=256,
    ):
        super().__init__()

        self.embedding = nn.Embedding(
            num_embeddings=max_value + 1,
            embedding_dim=embedding_dim,
            padding_idx=max_value,
        )

        self.lstm = nn.LSTM(
            input_size=embedding_dim,
            hidden_size=hidden_dim,
            num_layers=n_layers,
            batch_first=True,
        )

        self.linear = nn.Linear(
            hidden_dim,
            dense_dim,
        )

        self.classifier = nn.Linear(
            dense_dim,
            max_value,
        )

    def forward(self, x):
        """Run forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input tensor of shape `(batch_size, sequence_len)` and has
            dtype `torch.long`.

        Returns
        -------
        logits : torch.Tensor
            Logits over all possible integers of shape
            `(batch_size, sequence_len, max_value)`.
        """
        emb = self.embedding(x)  # (batch_size, sequence_len, embedding_dim)
        h, *_ = self.lstm(emb)  # (batch_size, sequence_len, hidden_dim)
        dense = torch.relu(
            self.linear(h)
        )  # (batch_size, sequence_len, dense_dim)
        logits = self.classifier(
            dense
        )  # (batch_size, sequence_len, max_value)

        return logits


def train_classifier(X, y, random_state=2):
    """Cross-validate classification problem using logistic regression.

    Parameters
    ----------
    X : np.ndarray
        2D array holding the features of shape `(n_samples, n_features)`.

    y : np.ndarray
        1D array holding the classification targets of shape `(n_samples,)`.

    random_state : int
        Guaranteeing reproducibility.

    Returns
    -------
    metrics : dict
        Holds train and validation accuracy averaged over all the folds.
    """
    cv = StratifiedKFold(
        n_splits=5,
        random_state=random_state,
        shuffle=True,
    )

    clf = make_pipeline(
        StandardScaler(),
        LogisticRegression(
            max_iter=2000,
            random_state=random_state,
        ),
    )

    res = cross_validate(
        clf,
        X,
        y,
        return_train_score=True,
        cv=cv,
    )

    metrics = {
        "train_acc": res["train_score"].mean(),
        "test_acc": res["test_score"].mean(),
    }

    return metrics


def create_classification_targets(indices):
    """Create multiple classification targets.

    They represent common properties of integers.

    Parameters
    ----------
    indices : np.ndarray
        1D array holding the integers for which we want to compute
        the targets.

    Returns
    -------
    targets : dict
        Keys are property names and the values are arrays of the same shape
        as `indices` representing whether a given integer does / does not
        have a given property.
    """

    targets = {
        "divisibility_2": (indices % 2 == 0).astype(float),
        "divisibility_3": (indices % 3 == 0).astype(float),
        "divisibility_4": (indices % 4 == 0).astype(float),
        "divisibility_5": (indices % 5 == 0).astype(float),
        "divisibility_10": (indices % 10 == 0).astype(float),
        "prime": np.vectorize(isprime)(indices).astype(float),
    }

    return targets


================================================
FILE: github_adventures/lottery/README.md
================================================
# The Lottery Ticket Hypothesis
## Installation
```bash
pip install -r requirements.txt
```

## Running experiments
The training logic is implemented inside of the script `main.py`. To
get more information about the CLI run

```bash
python main.py --help
```

If you want to run an entire grid search over different hyperparameters
you can use the `parallel_launch.sh` script. Note that it depends on a tool
called `parallel` ([more info](https://www.gnu.org/software/parallel/)). Note
that the script allows for dry runs (default behavior) and progress bars.

```bash
./parallel_launch.sh
```


================================================
FILE: github_adventures/lottery/data.py
================================================
from torch.utils.data import Dataset
from torchvision.datasets import MNIST
from torchvision.transforms import Compose, Lambda, ToTensor


class MNISTDataset(Dataset):
    """MNIST dataset.

    Feature images are automatically flattened.

    Parameters
    ----------
    root : str
        Directory where the actual data is located (or downloaded to).

    train : bool
        If True the training set is returned (60_000 samples). Otherwise
        the validation set is returned (10_000 samples).

    Attributes
    ----------
    tv_dataset : MNIST
        Instance of the torchvision `MNIST` dataset class.
    """

    def __init__(self, root, train=True, download=True):
        transform = Compose(
            [
                ToTensor(),
                Lambda(lambda x: x.ravel()),
            ]
        )

        self.tv_dataset = MNIST(
            root,
            train=train,
            download=download,
            transform=transform,
        )

    def __len__(self):
        """Get the length of the dataset."""
        return len(self.tv_dataset)

    def __getitem__(self, ix):
        """Get a selected sample.

        Parameters
        ----------
        ix : int
            Index of the sample to get.

        Returns
        -------
        x : torch.Tensor
            Flattened feature tensor of shape `(784,)`.

        y : torch.Tensor
            Scalar representing the ground truth label. Number between 0 and 9.
        """
        return self.tv_dataset[ix]


================================================
FILE: github_adventures/lottery/main.py
================================================
import argparse

import torch
import torch.nn as nn
import tqdm
from torch.utils.data import DataLoader

import wandb
from data import MNISTDataset
from utils import MLP, compute_stats, copy_weights_mlp, prune_mlp, reinit_mlp


def loop_dataloader(dataloader):
    """Loop infinitely over a dataloader.

    Parameters
    ----------
    dataloader : DataLoader
        DataLoader streaming batches of samples.

    Yields
    ------
    X_batch : torch.Tensor
        Batch of features.

    y_batch : torch.Tensor
        Batch of predictions.
    """
    while True:
        for x in iter(dataloader):
            yield x


def train(
    model,
    dataloader_train,
    loss_inst,
    optimizer,
    max_iter=10_000,
    dataloader_val=None,
    val_freq=500,
):
    """Run the training loop.

    Parameters
    ----------
    model : nn.Module
        Neural network (in our case MLP).

    dataloader_train : DataLoader
        Dataloader yielding training samples.

    loss_inst : callable
        Computes the loss when called.

    optimizer : torch.optim.Optimizer
        Instance of an optimizer.

    max_iter : int
        The number of iterations we run the training for
        (= number of graident descent steps).

    dataloader_val : None or DataLoader
        Dataloader yielding validation samples. If provided it will
        also single to us that we want to track metrics.

    val_freq : int
        How often evaluation run.
    """
    iterable = loop_dataloader(dataloader_train)
    iterable = tqdm.tqdm(iterable, total=max_iter)

    it = 0
    for X_batch, y_batch in iterable:
        if it == max_iter:
            break

        logit_batch = model(X_batch)

        loss = loss_inst(logit_batch, y_batch)
        if dataloader_val is not None:
            wandb.log({"loss": loss}, step=it)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if it % val_freq == 0 and dataloader_val is not None:
            is_equal = []

            for X_batch_val, y_batch_val in dataloader_val:
                is_equal.append(
                    model(X_batch_val).argmax(dim=-1) == y_batch_val
                )

            is_equal_t = torch.cat(is_equal)
            acc = is_equal_t.sum() / len(is_equal_t)
            wandb.log({"accuracy_val": acc}, step=it)

        it += 1


def main(argv=None):
    """Create CLI and run experiments."""
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument(
        "-i",
        "--max-iter",
        help="Number of iterations",
        type=int,
        default=50000,
    )
    parser.add_argument(
        "-b",
        "--batch-size",
        help="Batch size",
        type=int,
        default=60,
    )
    parser.add_argument(
        "--prune-iter",
        help="Number of prune iterations",
        type=int,
        default=1,
    )
    parser.add_argument(
        "-m",
        "--prune-method",
        help="Pruning method to employ",
        type=str,
        choices=("l1", "random"),
        default="l1",
    )
    parser.add_argument(
        "-p",
        "--prune-ratio",
        help="Percentage of weights to remove",
        type=float,
        default=0.2,
    )
    parser.add_argument(
        "--val-freq",
        help="How often to compute the validation accuracy",
        type=int,
        default=250,
    )
    parser.add_argument(
        "-r",
        "--reinitialize",
        help="If true, reinitializes randomly all weights after pruning",
        type=str,
        choices=("true", "false"),  # easy for hyperparameter search
        default="false",
    )
    parser.add_argument(
        "-s",
        "--random-state",
        help="Random state",
        type=int,
    )
    parser.add_argument(
        "--wandb-entity",
        help="W&B entity",
        type=str,
        default="mildlyoverfitted",
    )
    parser.add_argument(
        "--wandb-project",
        help="W&B project",
        type=str,
    )
    args = parser.parse_args(argv)

    wandb.init(
        project=args.wandb_project,
        entity=args.wandb_entity,
        config=vars(args),
    )
    wandb.define_metric("accuracy_val", summary="max")

    dataset_train = MNISTDataset(
        "data",
        train=True,
        download=True,
    )
    dataset_val = MNISTDataset(
        "data",
        train=False,
        download=True,
    )

    if args.random_state is not None:
        torch.manual_seed(args.random_state)

    dataloader_train = DataLoader(
        dataset_train, batch_size=args.batch_size, shuffle=True
    )
    dataloader_val = DataLoader(
        dataset_val, batch_size=args.batch_size, shuffle=True
    )

    kwargs = dict(
        n_features=28 * 28,
        hidden_layer_sizes=(300, 100),
        n_targets=10,
    )

    mlp = MLP(**kwargs)

    mlp_copy = MLP(**kwargs)
    mlp_copy.load_state_dict(mlp.state_dict())

    loss_inst = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(mlp.parameters(), lr=1.2 * 1e-3)

    # Train and prune loop
    if args.prune_ratio > 0:
        per_round_prune_ratio = 1 - (1 - args.prune_ratio) ** (
            1 / args.prune_iter
        )

        per_round_prune_ratios = [per_round_prune_ratio] * len(mlp.module_list)
        per_round_prune_ratios[-1] /= 2

        per_round_max_iter = int(args.max_iter / args.prune_iter)

        for prune_it in range(args.prune_iter):
            train(
                mlp,
                dataloader_train,
                loss_inst,
                optimizer,
                max_iter=per_round_max_iter,
            )
            prune_mlp(mlp, per_round_prune_ratios, method=args.prune_method)

            copy_weights_mlp(mlp_copy, mlp)

            stats = compute_stats(mlp)
            for name, stat in stats.items():
                summary_name = f"{name}_pruneiter={prune_it}"
                wandb.run.summary[summary_name] = stat

    if args.reinitialize == "true":
        reinit_mlp(mlp)

    # Run actual training with a final pruned network
    train(
        mlp,
        dataloader_train,
        loss_inst,
        optimizer,
        max_iter=args.max_iter,
        dataloader_val=dataloader_val,
        val_freq=args.val_freq,
    )


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/lottery/parallel_launch.sh
================================================
# Parallel parameters
N_JOBS=4
ARGS="-P$N_JOBS --header :" # arguments for parallel
# ARGS="--bar "$ARGS
ARGS="--dry-run "$ARGS

# Experiment parameters
ENTITY='mildlyoverfitted'
PROJECT='lottery_parallel_2'  # it should already exist to avoid issues

MAX_ITERS=(15000)
PRUNE_ITERS=(1 5)
PRUNE_METHODS=('l1' 'random')
PRUNE_RATIOS=(0 0.1 0.25 0.5 0.8 0.9 0.93 0.97)
REINITIALIZES=('true' 'false')
RANDOM_STATES=(1 2 3 4 5)

parallel $ARGS \
    python main.py \
        --max-iter={max_iter} \
        --prune-iter={prune_iter} \
        --prune-method={prune_method} \
        --prune-ratio={prune_ratio} \
        --random-state={random_state} \
        --reinitialize={reinitialize} \
        --wandb-entity=$ENTITY \
        --wandb-project=$PROJECT \
            ::: max_iter "${MAX_ITERS[@]}" \
            ::: prune_iter "${PRUNE_ITERS[@]}" \
            ::: prune_method "${PRUNE_METHODS[@]}" \
            ::: prune_ratio "${PRUNE_RATIOS[@]}" \
            ::: random_state "${RANDOM_STATES[@]}" \
            ::: reinitialize "${REINITIALIZES[@]}" \


================================================
FILE: github_adventures/lottery/requirements.txt
================================================
numpy
pillow
six
torch
torch-vision
tqdm
wandb


================================================
FILE: github_adventures/lottery/utils.py
================================================
import math

import torch
import torch.nn as nn
from torch.nn.utils.prune import l1_unstructured, random_unstructured


class MLP(nn.Module):
    """Multilayer perceptron.

    The bias is included in all linear layers.

    Parameters
    ----------
    n_features : int
        Number of input features (pixels inside of MNIST images).

    hidden_layer_sizes : tuple
        Tuple of ints representing sizes of the hidden layers.

    n_targets : int
        Number of target classes (10 for MNIST).

    Attributes
    ----------
    module_list : nn.ModuleList
        List holding all the linear layers in the right order.
    """

    def __init__(self, n_features, hidden_layer_sizes, n_targets):
        super().__init__()

        layer_sizes = (n_features,) + hidden_layer_sizes + (n_targets,)
        layer_list = []

        for i in range(len(layer_sizes) - 1):
            layer_list.append(nn.Linear(layer_sizes[i], layer_sizes[i + 1]))

        self.module_list = nn.ModuleList(layer_list)

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Batch of features of shape `(batch_size, n_features)`.

        Returns
        -------
        torch.Tensor
            Batch of predictions (logits) of shape `(batch_size, n_targets)`.
        """
        n_layers = len(self.module_list)

        for i, layer in enumerate(self.module_list):
            x = layer(x)

            if i < n_layers - 1:
                x = nn.functional.relu(x)

        return x


def prune_linear(linear, prune_ratio=0.3, method="l1"):
    """Prune a linear layer.

    Modifies the module in-place. We make an assumption that the bias
    is included.

    Parameters
    ----------
    linear : nn.Linear
        Linear module containing a bias.

    prune_ratio : float
        Number between 0 and 1 representing the percentage of weights
        to prune.

    method : str, {"l1", "random"}
        Pruning method to use.
    """
    if method == "l1":
        prune_func = l1_unstructured
    elif method == "random":
        prune_func = random_unstructured
    else:
        raise ValueError

    prune_func(linear, "weight", prune_ratio)
    prune_func(linear, "bias", prune_ratio)


def prune_mlp(mlp, prune_ratio=0.3, method="l1"):
    """Prune each layer of the multilayer perceptron.

    Modifies the module in-place. We make an assumption that each
    linear layer has the bias included.

    Parameters
    ----------
    mlp : MLP
        Multilayer perceptron instance.

    prune_ratio : float or list
        Number between 0 and 1 representing the percentage of weights
        to prune. If `list` then different ratio for each
        layer.

    method : str, {"l1", "random"}
        Pruning method to use.
    """
    if isinstance(prune_ratio, float):
        prune_ratios = [prune_ratio] * len(mlp.module_list)
    elif isinstance(prune_ratio, list):
        if len(prune_ratio) != len(mlp.module_list):
            raise ValueError("Incompatible number of prune ratios provided")

        prune_ratios = prune_ratio
    else:
        raise TypeError

    for prune_ratio, linear in zip(prune_ratios, mlp.module_list):
        prune_linear(linear, prune_ratio=prune_ratio, method=method)


def check_pruned_linear(linear):
    """Check if a Linear module was pruned.

    We require both the bias and the weight to be pruned.

    Parameters
    ----------
    linear : nn.Linear
        Linear module containing a bias.

    Returns
    -------
    bool
        True if the model has been pruned.
    """
    params = {param_name for param_name, _ in linear.named_parameters()}
    expected_params = {"weight_orig", "bias_orig"}

    return params == expected_params


def reinit_linear(linear):
    """Reinitialize a linear layer.

    This is an in-place operation.
    If the module has some pruning logic we are not going to remove it
    and we only initialize the underlying tensors - `weight_orig` and
    `bias_orig`.

    Parameters
    ----------
    linear : nn.Linear
        Linear model containing a bias.
    """
    is_pruned = check_pruned_linear(linear)

    # Get parameters of interest
    if is_pruned:
        weight = linear.weight_orig
        bias = linear.bias_orig
    else:
        weight = linear.weight
        bias = linear.bias

    # Initialize weight
    nn.init.kaiming_uniform_(weight, a=math.sqrt(5))

    # Initialize bias
    fan_in, _ = nn.init._calculate_fan_in_and_fan_out(weight)
    bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
    nn.init.uniform_(bias, -bound, bound)


def reinit_mlp(mlp):
    """Reinitialize all layers of the MLP.

    Parameters
    ----------
    mlp : MLP
        Multi-layer perceptron.
    """
    for linear in mlp.module_list:
        reinit_linear(linear)


def copy_weights_linear(linear_unpruned, linear_pruned):
    """Copy weights from an unpruned model to a pruned model.

    Modifies `linear_pruned` in place.

    Parameters
    ----------
    linear_unpruned : nn.Linear
        Linear model with a bias that was not pruned.

    linear_pruned : nn.Linear
        Linear model with a bias that was pruned.
    """
    assert check_pruned_linear(linear_pruned)
    assert not check_pruned_linear(linear_unpruned)

    with torch.no_grad():
        linear_pruned.weight_orig.copy_(linear_unpruned.weight)
        linear_pruned.bias_orig.copy_(linear_unpruned.bias)


def copy_weights_mlp(mlp_unpruned, mlp_pruned):
    """Copy weights of an unpruned network to a pruned network.

    Modifies `mlp_pruned` in place.

    Parameters
    ----------
    mlp_unpruned : MLP
        MLP model that was not pruned.

    mlp_pruned : MLP
        MLP model that was pruned.
    """
    zipped = zip(mlp_unpruned.module_list, mlp_pruned.module_list)

    for linear_unpruned, linear_pruned in zipped:
        copy_weights_linear(linear_unpruned, linear_pruned)


def compute_stats(mlp):
    """Compute important statistics related to pruning.

    Parameters
    ----------
    mlp : MLP
        Multilayer perceptron.

    Returns
    -------
    dict
        Statistics.
    """
    stats = {}
    total_params = 0
    total_pruned_params = 0

    for layer_ix, linear in enumerate(mlp.module_list):
        assert check_pruned_linear(linear)

        weight_mask = linear.weight_mask
        bias_mask = linear.bias_mask

        params = weight_mask.numel() + bias_mask.numel()
        pruned_params = (weight_mask == 0).sum() + (bias_mask == 0).sum()

        total_params += params
        total_pruned_params += pruned_params

        stats[f"layer{layer_ix}_total_params"] = params
        stats[f"layer{layer_ix}_pruned_params"] = pruned_params
        stats[f"layer{layer_ix}_actual_prune_ratio"] = pruned_params / params

    stats["total_params"] = total_params
    stats["total_pruned_params"] = total_pruned_params
    stats["actual_prune_ratio"] = total_pruned_params / total_params

    return stats


================================================
FILE: github_adventures/mixer/README.md
================================================
Note that the `official.py` is just a copy of the
code provided in `https://arxiv.org/abs/2105.01601` and probably here
`https://github.com/google-research/vision_transformer`. Please refer to those
sources for licensing information.


================================================
FILE: github_adventures/mixer/official.py
================================================
import einops
import flax.linen as nn
import jax.numpy as jnp


class MlpBlock(nn.Module):
    mlp_dim: int

    @nn.compact
    def __call__(self, x):
        y = nn.Dense(self.mlp_dim)(x)
        y = nn.gelu(y)
        return nn.Dense(x.shape[-1])(y)


class MixerBlock(nn.Module):
    tokens_mlp_dim: int
    channels_mlp_dim: int

    @nn.compact
    def __call__(self, x):
        y = nn.LayerNorm()(x)  # (n_samples, n_patches, hidden_dim)
        y = jnp.swapaxes(y, 1, 2)
        y = MlpBlock(self.tokens_mlp_dim, name="token_mixing")(y)
        y = jnp.swapaxes(y, 1, 2)
        x = x + y
        y = nn.LayerNorm()(x)
        return x + MlpBlock(self.channels_mlp_dim, name="channel_mixing")(y)


class MlpMixer(nn.Module):
    num_classes: int
    num_blocks: int
    patch_size: int
    hidden_dim: int
    tokens_mlp_dim: int
    channels_mlp_dim: int

    @nn.compact
    def __call__(self, x):
        s = self.patch_size
        x = nn.Conv(self.hidden_dim, (s, s), strides=(s, s), name="stem")(x)
        x = einops.rearrange(x, "n h w c -> n (h w) c")
        for _ in range(self.num_blocks):
            x = MixerBlock(self.tokens_mlp_dim, self.channels_mlp_dim)(x)
        x = nn.LayerNorm(name="pre_head_layer_norm")(x)
        x = jnp.mean(x, axis=1)
        return nn.Dense(
            self.num_classes, name="head", kernel_init=nn.initializers.zeros
        )(x)


================================================
FILE: github_adventures/mixer/ours.py
================================================
import einops
import torch.nn as nn


class MlpBlock(nn.Module):
    """Multilayer perceptron.

    Parameters
    ----------
    dim : int
        Input and output dimension of the entire block. Inside of the mixer
        it will either be equal to `n_patches` or `hidden_dim`.

    mlp_dim : int
        Dimension of the hidden layer.

    Attributes
    ----------
    linear_1, linear_2 : nn.Linear
        Linear layers.

    activation : nn.GELU
        Activation.
    """

    def __init__(self, dim, mlp_dim=None):
        super().__init__()

        mlp_dim = dim if mlp_dim is None else mlp_dim
        self.linear_1 = nn.Linear(dim, mlp_dim)
        self.activation = nn.GELU()
        self.linear_2 = nn.Linear(mlp_dim, dim)

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input tensor of shape `(n_samples, n_channels, n_patches)` or
            `(n_samples, n_patches, n_channels)`.

        Returns
        -------
        torch.Tensor
            Output tensor that has exactly the same shape as the input `x`.
        """
        x = self.linear_1(x)  # (n_samples, *, mlp_dim)
        x = self.activation(x)  # (n_samples, *, mlp_dim)
        x = self.linear_2(x)  # (n_samples, *, dim)
        return x


class MixerBlock(nn.Module):
    """Mixer block that contains two `MlpBlock`s and two `LayerNorm`s.

    Parameters
    ----------
    n_patches : int
        Number of patches the image is split up into.

    hidden_dim : int
        Dimensionality of patch embeddings.

    tokens_mlp_dim : int
        Hidden dimension for the `MlpBlock` when doing token mixing.

    channels_mlp_dim : int
        Hidden dimension for the `MlpBlock` when doing channel mixing.

    Attributes
    ----------
    norm_1, norm_2 : nn.LayerNorm
        Layer normalization.

    token_mlp_block : MlpBlock
        Token mixing MLP.

    channel_mlp_block : MlpBlock
        Channel mixing MLP.
    """

    def __init__(
        self, *, n_patches, hidden_dim, tokens_mlp_dim, channels_mlp_dim
    ):
        super().__init__()

        self.norm_1 = nn.LayerNorm(hidden_dim)
        self.norm_2 = nn.LayerNorm(hidden_dim)

        self.token_mlp_block = MlpBlock(n_patches, tokens_mlp_dim)
        self.channel_mlp_block = MlpBlock(hidden_dim, channels_mlp_dim)

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Tensor of shape `(n_samples, n_patches, hidden_dim)`.

        Returns
        -------
        torch.Tensor
            Tensor of the same shape as `x`, i.e.
            `(n_samples, n_patches, hidden_dim)`.
        """
        y = self.norm_1(x)  # (n_samples, n_patches, hidden_dim)
        y = y.permute(0, 2, 1)  # (n_samples, hidden_dim, n_patches)
        y = self.token_mlp_block(y)  # (n_samples, hidden_dim, n_patches)
        y = y.permute(0, 2, 1)  # (n_samples, n_patches, hidden_dim)
        x = x + y  # (n_samples, n_patches, hidden_dim)
        y = self.norm_2(x)  # (n_samples, n_patches, hidden_dim)
        res = x + self.channel_mlp_block(
            y
        )  # (n_samples, n_patches, hidden_dim)
        return res


class MlpMixer(nn.Module):
    """Entire network.

    Parameters
    ----------
    image_size : int
        Height and width (assuming it is a square) of the input image.

    patch_size : int
        Height and width (assuming it is a square) of the patches. Note
        that we assume that `image_size % patch_size == 0`.

    tokens_mlp_dim : int
        Hidden dimension for the `MlpBlock` when doing the token mixing.

    channels_mlp_dim : int
        Hidden dimension for the `MlpBlock` when diong the channel mixing.

    n_classes : int
        Number of classes for classification.

    hidden_dim : int
        Dimensionality of patch embeddings.

    n_blocks : int
        The number of `MixerBlock`s in the architecture.

    Attributes
    ----------
    patch_embedder : nn.Conv2D
        Splits the image up into multiple patches and then embeds each of them
        (using shared weights).

    blocks : nn.ModuleList
        List of `MixerBlock` instances.

    pre_head_norm : nn.LayerNorm
        Layer normalization applied just before the classification head.

    head_classifier : nn.Linear
        The classification head.
    """
    def __init__(
        self,
        *,
        image_size,
        patch_size,
        tokens_mlp_dim,
        channels_mlp_dim,
        n_classes,
        hidden_dim,
        n_blocks,
    ):
        super().__init__()
        n_patches = (image_size // patch_size) ** 2 # assumes divisibility

        self.patch_embedder = nn.Conv2d(
            3,
            hidden_dim,
            kernel_size=patch_size,
            stride=patch_size,
        )
        self.blocks = nn.ModuleList(
            [
                MixerBlock(
                    n_patches=n_patches,
                    hidden_dim=hidden_dim,
                    tokens_mlp_dim=tokens_mlp_dim,
                    channels_mlp_dim=channels_mlp_dim,
                )
                for _ in range(n_blocks)
            ]
        )

        self.pre_head_norm = nn.LayerNorm(hidden_dim)
        self.head_classifier = nn.Linear(hidden_dim, n_classes)

    def forward(self, x):
        """Run the forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input batch of square images of shape
            `(n_samples, n_channels, image_size, image_size)`.

        Returns
        -------
        torch.Tensor
            Class logits of shape `(n_samples, n_classes)`.
        """
        x = self.patch_embedder(
            x
        )  # (n_samples, hidden_dim, n_patches ** (1/2), n_patches ** (1/2))
        x = einops.rearrange(
            x, "n c h w -> n (h w) c"
        )  # (n_samples, n_patches, hidden_dim)
        for mixer_block in self.blocks:
            x = mixer_block(x)  # (n_samples, n_patches, hidden_dim)

        x = self.pre_head_norm(x)  # (n_samples, n_patches, hidden_dim)
        x = x.mean(dim=1)  # (n_samples, hidden_dim)
        y = self.head_classifier(x)  # (n_samples, n_classes)

        return y


================================================
FILE: github_adventures/mixer/test_compare.py
================================================
import jax
import numpy as np
import pytest
import torch

from official import MlpMixer as OfficialMixer
from ours import MlpMixer as OurMixer


@pytest.mark.parametrize("image_size", [6, 12])
@pytest.mark.parametrize("patch_size", [2, 3])
@pytest.mark.parametrize("hidden_dim", [4, 5])
@pytest.mark.parametrize("n_blocks", [1, 2])
@pytest.mark.parametrize("n_classes", [4, 8])
@pytest.mark.parametrize("tokens_mlp_dim", [2, 4])
@pytest.mark.parametrize("channels_mlp_dim", [3, 6])
def test_compare(
    image_size,
    patch_size,
    hidden_dim,
    n_blocks,
    n_classes,
    tokens_mlp_dim,
    channels_mlp_dim,
):
    # Create Flax model
    model_flax = OfficialMixer(
        num_classes=n_classes,
        num_blocks=n_blocks,
        patch_size=patch_size,
        hidden_dim=hidden_dim,
        tokens_mlp_dim=tokens_mlp_dim,
        channels_mlp_dim=channels_mlp_dim,
    )
    key1, key2 = jax.random.split(jax.random.PRNGKey(0))
    x = jax.random.normal(key1, (11, image_size, image_size, 3))  # Dummy input
    params = model_flax.init(key2, x)  # initialization call

    n_params_flax = sum(
        jax.tree_leaves(jax.tree_map(lambda x: np.prod(x.shape), params))
    )
    shape_flax = model_flax.apply(params, x).shape

    # Create Torch model
    model_torch = OurMixer(
        image_size=image_size,
        patch_size=patch_size,
        hidden_dim=hidden_dim,
        n_blocks=n_blocks,
        n_classes=n_classes,
        tokens_mlp_dim=tokens_mlp_dim,
        channels_mlp_dim=channels_mlp_dim,
    )

    n_params_torch = sum(
        p.numel() for p in model_torch.parameters() if p.requires_grad
    )
    shape_torch = model_torch(torch.rand(11, 3, image_size, image_size)).shape

    assert n_params_flax == n_params_torch
    assert shape_flax == shape_torch == (11, n_classes)


================================================
FILE: github_adventures/mixup/launch_experiments.sh
================================================
set -x

N_EPOCHS=100000
N_SAMPLES=1000
SEED=123
TBOARD_DIR=tb_results/$SEED

python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/no_regularization
python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/weight_decay --weight-decay 0.6
python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/dropout -p 0.2 
python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/mixup --mixup 
python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/input_mixup -k 0 1 --mixup
python train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/hidden_layers_mixup -k 1 4 --mixup 


================================================
FILE: github_adventures/mixup/train.py
================================================
import argparse
import json

import numpy as np
import torch
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

from utils import (
    CustomDataset,
    MLPClassifierMixup,
    generate_prediction_img,
    generate_spirals,
)


def main(argv=None):
    parser = argparse.ArgumentParser("Training")

    # Parameters
    parser.add_argument(
        "logpath",
        type=str,
    )
    parser.add_argument(
        "-b",
        "--batch-size",
        type=int,
        default=32,
        help="Batch size",
    )
    parser.add_argument(
        "--mixup",
        action="store_true",
    )
    parser.add_argument(
        "-p",
        "--dropout-probability",
        type=float,
        default=0,
        help="The probability of dropout",
    )
    parser.add_argument(
        "--hidden-dims",
        nargs="+",
        type=int,
        default=(32, 32, 32),
        help="Hidden dimensions of the MLP",
    )
    parser.add_argument(
        "-c",
        "--n-cycles",
        type=float,
        default=2,
        help="Number of cycles when creating the spiral dataset",
    )
    parser.add_argument(
        "-n",
        "--n-epochs",
        type=int,
        default=100,
        help="Number of epochs",
    )
    parser.add_argument(
        "-k",
        "--mixing-layer",
        type=int,
        nargs=2,
        default=(None, None),
        help="The range of k to sample from",
    )
    parser.add_argument(
        "-s",
        "--n-samples",
        type=int,
        default=1000,
        help="Number of samples",
    )
    parser.add_argument(
        "-r",
        "--random-state",
        type=int,
        default=5,
        help="Random state",
    )
    parser.add_argument(
        "--weight-decay",
        type=float,
        default=0.0,
        help="Weight decay",
    )

    args = parser.parse_args(argv)

    device = torch.device("cpu")
    dtype = torch.float32

    np.random.seed(args.random_state)
    torch.manual_seed(args.random_state)

    # Dataset preparation
    X, y = generate_spirals(
        args.n_samples,
        noise_std=0,
        n_cycles=args.n_cycles,
    )

    X_train, X_test, y_train, y_test = train_test_split(
        X,
        y,
        test_size=0.9,
        shuffle=True,
        stratify=y,
    )

    X_test_t = torch.from_numpy(X_test).to(device, dtype)

    dataset_train = CustomDataset(X_train, y_train)

    dataloader_train = DataLoader(
        dataset_train,
        batch_size=2 * args.batch_size,
        drop_last=True,
        shuffle=True,
    )

    # Model and loss definition
    model = MLPClassifierMixup(
        n_features=2,
        hidden_dims=tuple(args.hidden_dims),
        p=args.dropout_probability,
    )
    model.to(device, dtype)

    optimizer = torch.optim.AdamW(
        model.parameters(),
        weight_decay=args.weight_decay,
    )

    loss_fn = torch.nn.BCEWithLogitsLoss()

    # Summary
    writer = SummaryWriter(args.logpath)
    writer.add_text("hparams", json.dumps(vars(args)))

    # Training + evaluation loop
    bs = args.batch_size
    n_steps = 0
    for e in range(args.n_epochs):
        for X_batch, y_batch in dataloader_train:
            X_batch, y_batch = X_batch.to(device, dtype), y_batch.to(
                device, dtype
            )
            if args.mixup:
                k_min, k_max = args.mixing_layer
                k_min = k_min or 0
                k_max = k_max or model.n_hidden + 1

                k = np.random.randint(k_min, k_max)
                lam = np.random.beta(2, 2)
                writer.add_scalar("k", k, n_steps)
                writer.add_scalar("lambda", lam, n_steps)

                h = model(X_batch, start=0, end=k)  # (2 * batch_size, *)

                h_mixed = lam * h[:bs] + (1 - lam) * h[bs:]  # (batch_size, *)
                y_mixed = lam * y_batch[:bs] + (1 - lam) * y_batch[bs:]  # (batch_size,)

                logits = model(h_mixed, start=k, end=None)  # (batch_size, 1)
                loss = loss_fn(logits.squeeze(), y_mixed)

            else:
                logits = model(X_batch[:bs])  # (batch_size, 1)
                loss = loss_fn(logits.squeeze(), y_batch[:bs])

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Logging
            writer.add_scalar("loss_train", loss, n_steps)

            if n_steps % 2500 == 0:
                model.eval()
                fig_gen = generate_prediction_img(
                    model,
                    X_train,
                    X_test,
                    y_train,
                    y_test,
                )
                writer.add_figure("test", next(fig_gen))
                writer.add_figure("contour", next(fig_gen), n_steps)
                writer.add_figure("contour_train", next(fig_gen), n_steps)

                with torch.no_grad():
                    logits_test = model(X_test_t).squeeze().detach().cpu()

                acc_test = (
                    torch.sigmoid(logits_test).round().numpy() == y_test
                ).sum() / len(y_test)
                loss_test = loss_fn(logits_test, torch.from_numpy(y_test))

                writer.add_scalar("loss_test", loss_test, n_steps)
                writer.add_scalar("accuracy_test", acc_test, n_steps)

                model.train()

            n_steps += 1


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/mixup/utils.py
================================================
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from matplotlib.colors import ListedColormap
from torch.utils.data import Dataset


class MLPClassifierMixup(nn.Module):
    """Multilayer perceptron with inbuilt mixup logic.

    Assuming binary classification.

    Parameters
    ----------
    n_features : int
        Number of features.

    hidden_dims : tuple
        The sizes of the hidden layers.

    p : float
        Dropout probability.

    Attributes
    ----------
    hidden_layers : nn.ModuleList
        List of hidden layers that are each composed of a `Linear`,
        `LeakyReLU` and `Dropout` modules.

    n_hidden : int
        Number of hidden layers.

    clf : nn.Linear
        The classifier at the end of the pipeline.
    """

    def __init__(self, n_features, hidden_dims, p=0):
        super().__init__()
        dims = (n_features,) + hidden_dims

        self.n_hidden = len(hidden_dims)
        self.hidden_layers = nn.ModuleList(
            [
                nn.Sequential(
                    nn.Linear(dims[i], dims[i + 1]),
                    nn.LeakyReLU(0.2),
                    nn.Dropout(p),
                )
                for i in range(self.n_hidden)
            ]
        )
        self.clf = nn.Linear(dims[-1], 1)

    def forward(self, x, start=0, end=None):
        """Run forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Input of shape `(n_samples, dim)`. Note that the dim
            will depend on `start`.

        start : int
            The hidden layer where the forward pass starts (inclusive). We
            use a convention of `start=0` and `end=0` as a noop and the input
            tensor is returned. Useful for implementing input mixing.

        end : int or None
            The ending hidden layer (exclusive). If None, then always run until
            the last hidden layer and then we also apply the classifier.
        """
        for module in self.hidden_layers[start:end]:
            x = module(x)

        if end is None:
            x = self.clf(x)

        return x


class CustomDataset(Dataset):
    """Custom classification dataset assuming we have X and y loaded in memory.

    Parameters
    ----------
    X : np.ndarray
        Features of shape `(n_samples, n_features)`.

    y : np.ndarray
        Targets of shape `(n_samples,)`.
    """

    def __init__(self, X, y):
        if len(X) != len(y):
            raise ValueError("Inconsistent number of samples")

        classes = np.unique(y)
        if not np.array_equal(np.sort(classes), np.array([0, 1])):
            raise ValueError

        self.X = X
        self.y = y

    def __len__(self):
        """Compute the length of the dataset."""
        return len(self.X)

    def __getitem__(self, ix):
        """Return a single sample."""
        return self.X[ix], self.y[ix]


def generate_spirals(
    n_samples,
    noise_std=0.05,
    n_cycles=2,
    random_state=None,
):
    """Generate two spirals dataset.

    Parameters
    ----------
    n_samples : int
        Number of samples to generate. For simplicity, an even number
        is required. The targets (2 spirals) are perfectly balanced.

    noise_std : float
        Standard deviation of the noise added to the spirals.

    n_cycles : int
        Number of revolutions the spirals make.

    random_state : int or None
        Controls randomness.

    Returns
    -------
    X : np.ndarray
        Features of shape `(n_samples, n_features)`.

    y : np.ndarray
        Targets of shape `(n_samples,)`. There are two
        classes 0 and 1 representing the two spirals.
    """
    if n_samples % 2 != 0:
        raise ValueError("The number of samples needs to be even")

    n_samples_per_class = int(n_samples // 2)

    angle_1 = np.linspace(0, n_cycles * 2 * np.pi, n_samples_per_class)
    angle_2 = np.pi + angle_1
    radius = np.linspace(0.2, 2, n_samples_per_class)

    x_1 = radius * np.cos(angle_1)
    y_1 = radius * np.sin(angle_1)

    x_2 = radius * np.cos(angle_2)
    y_2 = radius * np.sin(angle_2)

    X = np.concatenate(
        [
            np.stack([x_1, y_1], axis=1),
            np.stack([x_2, y_2], axis=1),
        ],
        axis=0,
    )
    y = np.zeros((n_samples,))
    y[n_samples_per_class:] = 1.0

    if random_state is not None:
        np.random.seed(random_state)

    new_ixs = np.random.permutation(n_samples)

    X = X[new_ixs] + np.random.normal(
        loc=0, scale=noise_std, size=(n_samples, 2)
    )
    y = y[new_ixs]

    return X, y


def generate_prediction_img(
    model,
    X_train,
    X_test,
    y_train,
    y_test,
):
    """Generate contour and scatter plots with predictions.

    Parameters
    ----------
    model : MLPClassifierMixup
        Instance of a multilayer-perceptron.

    X_train, X_test : np.ndarray
        Trand and test features of shape `(n_samples, n_features)`.

    y_train, y_test : np.ndarray
        Train and test targets of shape `(n_samples,)`.

    Yields
    ------
    matplotlib.Figure
        Different figures.
    """
    device = next(model.parameters()).device
    dtype = next(model.parameters()).dtype

    cm = plt.cm.RdBu
    cm_bright = ListedColormap(["#FF0000", "#0000FF"])

    delta = 0.5

    xlim = (X_test[:, 0].min() - delta, X_test[:, 0].max() + delta)
    ylim = (X_test[:, 1].min() - delta, X_test[:, 1].max() + delta)

    n = 50
    xx, yy = np.meshgrid(
        np.linspace(xlim[0], xlim[1], n),
        np.linspace(ylim[0], ylim[1], n),
    )
    grid = np.stack([xx.ravel(), yy.ravel()], axis=1)

    with torch.no_grad():
        logits = model(torch.from_numpy(grid).to(device, dtype))

    probs = torch.sigmoid(logits)[:, 0].detach().cpu().numpy()

    probs = probs.reshape(xx.shape)

    fig, ax = plt.subplots(1, 1, dpi=170)

    ax.scatter(
        X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k"
    )
    ax.set_title("Test data")

    yield fig
    ax.cla()

    ax.contourf(xx, yy, probs, cmap=cm, alpha=0.8)
    ax.set_title("Prediction contours")

    yield fig

    ax.scatter(
        X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k"
    )
    ax.set_title("Train data + prediction contours")

    yield fig


================================================
FILE: github_adventures/ner_evaluation/README.md
================================================
* https://github.com/huggingface/evaluate/blob/af3c30561d840b83e54fc5f7150ea58046d6af69/metrics/seqeval/seqeval.py#L120
* https://github.com/chakki-works/seqeval/blob/cd01b5210eaa65e691c22320aba56f2be9e9fc43/seqeval/metrics/sequence_labeling.py#L1




================================================
FILE: github_adventures/ner_evaluation/ours.py
================================================
import re
import pandas as pd
from sklearn.metrics import classification_report


def check_valid(annots: list[str]) -> bool:
    allowed_pattern = re.compile(r"^(O$|B-.+$|I-.+$)")

    annots = ["O"] + annots
    n = len(annots)

    if any(allowed_pattern.match(annot) is None for annot in annots):
        return False

    for i in range(1, n):
        annot = annots[i]

        if annot.startswith("I-"):
            if annots[i - 1] == "O" or annots[i - 1][2:] != annot[2:]:
                return False


    return True

def get_etypes(annots: list[str]) -> list[None | str]:
    return [annot[2:] if annot != "O" else None for annot in annots]


def get_entities(annots: list[str]) -> list[dict[str, int | str]]:
    if not check_valid(annots):
        raise ValueError("Invalid input.")

    annots = ["O"] + annots + ["O"]
    etypes = get_etypes(annots)
    n = len(annots)

    start_patterns = {
        ("O", "B-"),  # ["O", "B-LOC"]
        ("B-", "B-"),  # ["B-PERSON", "B-LOC"]
        ("I-", "B-"),  # ["B-LOC", "I-LOC", "B-PERSON"]
    }

    end_patterns = {
        ("I-", "O"), # ["B-LOC", "I-LOC", "O"]
        ("B-", "O"), # ["B-LOC", "O"]
        ("B-", "B-"),  # ["B-PERSON", "B-LOC"]
        ("I-", "B-"),  # ["B-LOC", "I-LOC", "B-PERSON"]
    }

    entities: list[dict[str, int | str]] = []


    i = 1
    start = None

    while i < n:
        prev, curr = annots[i - 1], annots[i]
        pattern = (prev[:2], curr[:2])


        if pattern in end_patterns and start is not None:
            entities.append(
                {
                    "start": start - 1,
                    "end": i - 2,
                    "etype": etypes[i - 1],

                }
            )

            start = None

        if pattern in start_patterns:
            start = i

        i += 1

    return entities


def get_report(annots_true: list[str], annots_pred: list[str]) -> dict:
    if len(annots_true) != len(annots_pred):
        raise ValueError("Unequal lengths")

    entities_true = pd.DataFrame(get_entities(annots_true))
    entities_pred = pd.DataFrame(get_entities(annots_pred))


    entities_true = entities_true.rename(columns={"etype": "etype_true"})
    entities_pred = entities_pred.rename(columns={"etype": "etype_pred"})

    df_merge = entities_true.merge(entities_pred, on=["start", "end"], how="outer")
    df = df_merge.fillna("")

    labels = (set(df["etype_true"].tolist()) | set(df["etype_pred"].tolist())) - {""}

    report = classification_report(
        df["etype_true"],
        df["etype_pred"],
        output_dict=True,
        labels=list(labels),
    )
    return report


================================================
FILE: github_adventures/ner_evaluation/test_ours.py
================================================
import pytest
from seqeval.metrics import classification_report as cr
from seqeval.scheme import IOB2
from ours import check_valid, get_entities, get_etypes, get_report


@pytest.mark.parametrize(
    "inp,out",
    [
        ([], True),
        (["NONSENSE", "O"], False),
        (["O", "O", "O"], True),
        (["B-"], False),
        (["O", "I-ORG", "O"], False),
        (["O", "B-ORG", "I-PERSON"], False),
        (["O", "B-ORG", "B-PERSON"], True),
        (["O", "SOMETHING", "B-PERSON"], False),
        (["O-", "O", "O"], False),
        (["B-A", "O", "B-T"], True),
        (["I-a", "B-a", "B-a", "I-a", "I-a", "O"], False),
    ],
)
def test_check_valid(inp, out):
    assert check_valid(inp) == out


@pytest.mark.parametrize(
    "inp,out",
    [
        ([], []),
        (["O", "O", "O"], [None, None, None]),
        (["O", "B-ORG", "O"], [None, "ORG", None]),
        (["O", "B-ORG", "B-ORG"], [None, "ORG", "ORG"]),
        (["O", "B-PERSON", "I-PERSON"], [None, "PERSON", "PERSON"]),
        (["B-A", "O", "B-T"], ["A", None, "T"]),
    ],
)
def test_get_etypes(inp, out):
    assert get_etypes(inp) == out


@pytest.mark.parametrize(
    "inp,out",
    [
        (["O", "O", "O"], []),
        (["O", "B-ORG", "O"], [{"start": 1, "end": 1, "etype": "ORG"}]),
        (
            ["O", "B-ORG", "B-ORG"],
            [
                {"start": 1, "end": 1, "etype": "ORG"},
                {"start": 2, "end": 2, "etype": "ORG"},
            ],
        ),
        (["O", "B-PERSON", "I-PERSON"], [{"start": 1, "end": 2, "etype": "PERSON"}]),
        (
            ["B-A", "O", "B-T"],
            [
                {"start": 0, "end": 0, "etype": "A"},
                {"start": 2, "end": 2, "etype": "T"},
            ],
        ),
        (["B-LOC", "I-LOC", "I-LOC"], [{"start": 0, "end": 2, "etype": "LOC"}]),
        (
            ["B-A", "I-A", "B-T"],
            [
                {"start": 0, "end": 1, "etype": "A"},
                {"start": 2, "end": 2, "etype": "T"},
            ],
        ),
    ],
)
def test_get_entities(inp, out):
    assert get_entities(inp) == out


@pytest.mark.parametrize(
    "annots_true,annots_pred",
    [
        (
            ["O", "B-PERSON", "I-PERSON", "O"],
            ["O", "B-PERSON", "I-PERSON", "O"],
        ),
        (
            ["O", "B-PERSON", "I-PERSON", "B-LOC"],
            ["O", "B-PERSON", "I-PERSON", "O"],
        ),
        (
            ["O", "B-PERSON", "I-PERSON", "O"],
            ["O", "O", "B-PERSON", "O"],
        ),
        (
            ["O", "B-PERSON", "I-PERSON", "O"],
            ["O", "O", "B-PERSON", "O"],
        ),
        (
            ["B-PERSON", "B-LOC", "I-LOC", "B-DATE"],
            ["B-PERSON", "B-DATE", "B-PERSON", "B-DATE"],
        ),
        (
            ["B-PERSON", "I-PERSON", "I-PERSON", "O", "O", "B-LOC", "B-DATE"],
            ["B-PERSON", "I-PERSON", "I-PERSON", "O", "O", "B-LOC", "B-DATE"],
        ),
        (
            ["B-PERSON", "O", "O", "O", "B-LOC", "I-LOC", "O", "B-LOC"],
            ["B-PERSON", "O", "B-DATE", "O", "B-LOC", "I-LOC", "I-LOC", "I-LOC"],
        ),
        (
            ["B-PERSON", "I-PERSON", "O", "B-LOC", "I-LOC", "O", "B-PERSON", "B-PERSON", "B-LOC"],
            ["B-PERSON", "I-PERSON", "O", "B-LOC", "B-LOC", "O", "B-PERSON", "B-PERSON", "B-LOC"],
        ),
    ]
)
def test_get_report(annots_true, annots_pred):
    report = get_report(annots_true, annots_pred)
    seqeval_report = cr([annots_true], [annots_pred], scheme=IOB2, mode="strict", output_dict=True)

    keys_to_delete = {"accuracy", "micro avg"}

    for rep in (report, seqeval_report):
        for key in keys_to_delete:
            try:
                rep.pop(key)
            except KeyError:
                pass


    assert report == seqeval_report


================================================
FILE: github_adventures/ner_evaluation/try.py
================================================
import pprint
import evaluate


metric = evaluate.load("seqeval")


# Tom Cruise is great
annots_true = ["B-PERSON", "I-PERSON", "O", "O"]
# annots_pred = ["B-PERSON", "I-PERSON", "O", "O"]
# annots_pred = ["O", "O", "O", "O"]
# annots_pred = ["B-PERSON", "O", "O", "O"]
annots_pred = ["B-LOCATION", "I-LOCATION", "O", "O"]


result = metric.compute(references=[annots_true], predictions=[annots_pred])

pprint.pprint(result)


================================================
FILE: github_adventures/neuron/README.md
================================================
# Installation

```bash
pip install -r requirements.txt
```

# Running training
To run the same experiments as in the video run

```bash
./launch.sh
```

However, feel free to check the contents of the `launch.sh` for single
experiments.

# Evaluation and pretrained models
This repo contains multiple pretrained models inside of `pretrained/`. They
are all `.pkl` files and they were created by pickling `solutions.Solution`
subclasses. To load them inside of Python run something along these lines

```python
import pickle

solution_path = "pretrained/invariant_ours.pkl"  # you can change this

with open(solution_path, "rb") as f:
    solution = pickle.load(f)[0]

```

You can also run any of the below scripts to reproduce the results from
the end of the video.


```bash
EPISODES=30

python evaluate_shuffling.py -e $EPISODES
python evaluate_noise.py -e $EPISODES
python evaluate_video.py -e $EPISODES
```


================================================
FILE: github_adventures/neuron/evaluate_noise.py
================================================
"""Assumes you have already trained your model and you have a checkpoint."""
import argparse
import pathlib
import pickle

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from tasks import IncompatibleNFeatures, Task


def main(argv=None):
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-e",
        "--n-episodes",
        type=int,
        default=200,
    )
    args = parser.parse_args(argv)

    # Prepare solutions and tasks
    checkpoint_path = pathlib.Path("pretrained") / "invariant_official.pkl"
    assert checkpoint_path.exists()

    with checkpoint_path.open("rb") as f:
        obj = pickle.load(f)

        if len(obj) == 1:
            solution_inst = obj[0]
        elif len(obj) == 2:
            solver, solution_inst = obj
            solution_inst.set_params(solver.result.xfavorite)
        else:
            raise ValueError

    results = []

    for n_noise_features in range(0, 30, 5):
        for shuffle in [True, False]:
            print(f"{n_noise_features=}, {shuffle=}")
            task = Task(
                render=False,
                n_noise_features=n_noise_features,
                shuffle_on_reset=shuffle,
                env_seed=None,
                feature_seed=None,
            )
            for episode_ix in range(args.n_episodes):
                reward = task.rollout(solution_inst)
                results.append(
                    {
                        "n_noise_features": n_noise_features,
                        "shuffle": shuffle,
                        "episode_ix": episode_ix,
                        "reward": reward,
                    }
                )

    results_df = pd.DataFrame(results)
    fig, ax = plt.subplots(1, 1, figsize=(10, 5), dpi=300)

    sns.violinplot(
        data=results_df,
        x="n_noise_features",
        y="reward",
        hue="shuffle",
        split=True,
        inner="quart",
        linewidth=1,
        palette="muted",
        ax=ax,
        scale="count",
    )
    sns.despine(left=True)
    ax.set_ylim(0, 1000)
    ax.grid(True)

    fig.tight_layout()
    fig.savefig("invariant_model_noise.png")


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/neuron/evaluate_shuffling.py
================================================
"""Assumes you have already trained your model and you have a checkpoint."""
import argparse
import pathlib
import pickle

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from tasks import IncompatibleNFeatures, Task


def main(argv=None):
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-e",
        "--n-episodes",
        type=int,
        default=200,
    )
    args = parser.parse_args(argv)

    # Prepare solutions and tasks
    checkpoints = {}

    checkpoint_folder = pathlib.Path("pretrained")
    assert checkpoint_folder.exists()

    checkpoint_paths = [
        checkpoint_folder / "linear.pkl",
        checkpoint_folder / "linear_augment.pkl",
        checkpoint_folder / "MLP.pkl",
        checkpoint_folder / "MLP_augment.pkl",
        checkpoint_folder / "invariant_ours.pkl",
        checkpoint_folder / "invariant_official.pkl",
    ]

    for path in checkpoint_paths:
        with path.open("rb") as f:
            obj = pickle.load(f)

            if len(obj) == 1:
                solution_inst = obj[0]
            elif len(obj) == 2:
                solver, solution_inst = obj
                solution_inst.set_params(solver.result.xfavorite)
            else:
                raise ValueError

        checkpoints[path.stem] = solution_inst

    results = []

    for model_name, solution_inst in checkpoints.items():
        for shuffle in [True, False]:
            print(f"{model_name=}, {shuffle=}")
            task = Task(
                render=False,
                n_noise_features=0,
                shuffle_on_reset=shuffle,
                env_seed=None,
                feature_seed=None,
            )
            for episode_ix in range(args.n_episodes):
                reward = task.rollout(solution_inst)
                results.append(
                    {
                        "model": model_name,
                        "shuffle": shuffle,
                        "episode_ix": episode_ix,
                        "reward": reward,
                    }
                )

    results_df = pd.DataFrame(results)
    fig, ax = plt.subplots(1, 1, figsize=(10, 5), dpi=300)

    sns.violinplot(
        data=results_df,
        x="model",
        y="reward",
        hue="shuffle",
        split=True,
        inner="quart",
        linewidth=1,
        palette="muted",
        ax=ax,
        scale="count",
        order=sorted(checkpoints.keys()),
    )
    sns.despine(left=True)
    ax.set_ylim(0, 1000)
    ax.grid(True)

    fig.tight_layout()
    fig.savefig("all_models_shuffling.png")


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/neuron/evaluate_video.py
================================================
"""Assumes you have already trained your model and you have a checkpoint."""
import argparse
import pathlib
import pickle

from gym.wrappers import Monitor
import matplotlib.pyplot as plt

from tasks import IncompatibleNFeatures, Task


def main(argv=None):
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-e",
        "--n-episodes",
        type=int,
        default=2,
    )
    args = parser.parse_args(argv)

    # Prepare solutions and tasks
    checkpoints = {}

    checkpoint_folder = pathlib.Path("pretrained")
    assert checkpoint_folder.exists()

    checkpoint_paths = [
        checkpoint_folder / "linear.pkl",
        checkpoint_folder / "linear_augment.pkl",
        checkpoint_folder / "MLP.pkl",
        checkpoint_folder / "MLP_augment.pkl",
        checkpoint_folder / "invariant_ours.pkl",
        checkpoint_folder / "invariant_official.pkl",
    ]
    checkpoint_paths = checkpoint_paths

    for path in checkpoint_paths:
        with path.open("rb") as f:
            obj = pickle.load(f)

            if len(obj) == 1:
                solution_inst = obj[0]
            elif len(obj) == 2:
                solver, solution_inst = obj
                solution_inst.set_params(solver.result.xfavorite)
            else:
                raise ValueError

        checkpoints[path.stem] = solution_inst

    for model_name, solution_inst in checkpoints.items():
        for shuffle in [True, False]:
            for episode_ix in range(args.n_episodes):
                print(f"{model_name=}, {shuffle=}")
                task = Task(
                    render=False,
                    n_noise_features=0,
                    shuffle_on_reset=shuffle,
                    env_seed=None,
                    feature_seed=None,
                )

                task.env = Monitor(
                    task.env,
                    f"videos/{model_name}/{shuffle}/{episode_ix}/",
                )
                reward = task.rollout(solution_inst)

if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/neuron/launch.sh
================================================
OUTPUT_FOLDER=log_dir

python trainer.py --max-iter 1000 linear $OUTPUT_FOLDER/linear
python trainer.py --max-iter 1000 --shuffle-on-reset linear $OUTPUT_FOLDER/linear_augment
python trainer.py --max-iter 1000 MLP $OUTPUT_FOLDER/MLP
python trainer.py --max-iter 2000 --shuffle-on-reset MLP $OUTPUT_FOLDER/MLP_augment
python trainer.py --max-iter 14000 invariant $OUTPUT_FOLDER/invariant


================================================
FILE: github_adventures/neuron/requirements.txt
================================================
cma
gym
gym-cartpole-swingup
matplotlib
numpy
pandas
seaborn
tensorboard
torch
tqdm


================================================
FILE: github_adventures/neuron/solutions.py
================================================
import abc

import numpy as np
import torch

from torch_utils import PermutationInvariantNetwork, MLP


class Solution(abc.ABC):
    """Solution abstract class.

    Attributes
    ----------
    policy : torch.nn.Module
        Network that holds all the learnable parameters.
    """

    @abc.abstractmethod
    def clone(self, obs):
        """Create a copy of the current solution without any links to self."""

    @abc.abstractmethod
    def get_action(self, obs):
        """Determine the next action given the observation array."""

    @abc.abstractmethod
    def get_n_features(self):
        """Get the number of features expected by the model.

        If None then the model can process variable-sized feature
        vectors.
        """

    @abc.abstractmethod
    def reset(self):
        """Reset solution.

        Will be called at the beginning of each rollout.

        Does not mean we will "reinitialize" the weights of `policy`.
        """

    def get_params(self):
        """Get learnable parameters of the solution.

        Returns
        -------
        params : np.ndarray
            1D array containing all parameters.
        """
        params_l = []

        for p in self.policy.parameters():
            params_l.append(p.numpy().ravel())

        params = np.concatenate(params_l)

        return params

    def set_params(self, params):
        """Set the learnable parameters.

        Parameters
        ----------
        params : np.ndarray
            1D array containing all parameters.

        Returns
        -------
        self : Solution
        """
        start_ix, end_ix = 0, 0

        for p in self.policy.parameters():
            end_ix = start_ix + np.prod(p.shape)
            p.data = torch.from_numpy(
                params[start_ix:end_ix].reshape(p.shape)
            ).float()
            start_ix = end_ix

        return self

    def get_n_params(self):
        return len(self.get_params())


class MLPSolution(Solution):
    """Multilayer perceptron solution.

    Parameters
    ----------
    n_features : int
        Number of input features.

    hidden_layer_sizes : tuple
        Tuple of int that defines the sizes of all hidden layers.

    Attributes
    ----------
    kwargs : dict
        All parameters necessary to instantiate the class.

    policy : MLP
        Policy network - multilayer perceptron.
    """

    def __init__(self, n_features=5, hidden_layer_sizes=(16,)):
        self.kwargs = {
            "n_features": n_features,
            "hidden_layer_sizes": hidden_layer_sizes,
        }
        self.dtype = torch.float32

        self.policy = MLP(n_features, hidden_layer_sizes)
        self.policy.to(self.dtype)
        self.policy.eval()

    def clone(self):
        old_policy = self.policy
        new_solution = self.__class__(**self.kwargs)

        new_solution.policy.load_state_dict(
            old_policy.state_dict(),
        )

        return new_solution

    def get_action(self, obs):
        y = self.policy(torch.from_numpy(obs).to(self.dtype))

        action = y.item()
        return action

    def get_n_features(self):
        return self.kwargs["n_features"]

    def reset(self):
        pass


class PermutationInvariantSolution(Solution):
    """Permutation invariant solution.

    Parameters
    ----------
    n_embeddings : int
        Number of rows in the Q tensor.

    proj_dim : int
        Size of the space to which we project the K and Q tensors.

    hidden_size : int
        Dimensionality of the Q and K tensors before linear projections.

    Attributes
    ----------
    kwargs : dict
        All parameters necessary to instantiate the class

    dtype : torch.dtype
        Dtype of both the network weights and input features.

    policy : PermutationInvariantNetwork
        Policy network.

    prev_action : float
        Stores the previous action. Automatically updated each time we call
        `get_action`.
    """

    def __init__(
        self,
        n_embeddings=16,
        proj_dim=32,
        hidden_size=8,
    ):
        self.kwargs = {
            "n_embeddings": n_embeddings,
            "proj_dim": proj_dim,
            "hidden_size": hidden_size,
        }
        self.policy = PermutationInvariantNetwork(
            n_embeddings=n_embeddings,
            proj_dim=proj_dim,
            hidden_size=hidden_size,
        )
        self.dtype = torch.float32

        self.policy.to(self.dtype)
        self.policy.eval()

        self.prev_action = 0  # will be continuously updated

    def clone(self):
        old_policy = self.policy
        new_solution = self.__class__(**self.kwargs)

        new_solution.policy.load_state_dict(
            old_policy.state_dict(),
        )

        return new_solution

    def get_action(self, obs):
        y = self.policy(torch.from_numpy(obs).to(self.dtype), self.prev_action)

        action = y.item()
        self.prev_action = action

        return action

    def reset(self):
        self.policy.attention_neuron.hx = None
        self.previous_action = 0

    def get_n_features(self):
        return None


================================================
FILE: github_adventures/neuron/tasks.py
================================================
import gym
import gym_cartpole_swingup  # noqa has a sideffect
import numpy as np

N_ORIGINAL_FEATURES = 5


class IncompatibleNFeatures(Exception):
    """Raised when observation and model number of features does not match."""


class Task:
    """Cartpoleswingup task.

    Parameters
    ----------
    render : bool
        If True, we render each step into a video frame.

    shuffle_on_reset : bool
        If True, the features are randomly shuffled before each rollout.

    n_noise_features : int
        Number of noise features added to the observation vector.

    env_seed : None or int
        Random state controling the underlying `gym.Env`.

    feature_seed : None or int
        Random state controling the shuffling and noise features.

    max_episode_steps : int
        Maximum number of steps per episode (=rollout). After his number
        `done=True` automatically.

    Attributes
    ----------
    n_features : int
        Overall number of features (original + noise).

    perm_ix : np.ndarray
        1D array storing a permutation indices of the features.

    env : gym.Env
        Environment.

    rnd : RandomState
        Random state.
    """

    def __init__(
        self,
        render=False,
        shuffle_on_reset=False,
        n_noise_features=0,
        env_seed=None,
        feature_seed=None,
        max_episode_steps=1000,
    ):

        self.env = gym.make("CartPoleSwingUp-v1")
        self.env._max_episode_steps = max_episode_steps
        self.shuffle_on_reset = shuffle_on_reset
        self.render = render
        self.n_noise_features = n_noise_features

        self.n_features = N_ORIGINAL_FEATURES + n_noise_features

        self.perm_ix = np.arange(self.n_features)
        self.noise_std = 0.1

        # Set seeds
        self.env.seed(env_seed)
        self.rnd = np.random.RandomState(seed=feature_seed)

    def reset_for_rollout(self):
        """Generate a new permutation of the features.

        It is going to be called at the beginning of each episode.
        Note that the permutation stays constant throughout the episode.
        """
        self.perm_ix = np.arange(self.n_features)

        if self.shuffle_on_reset:
            self.rnd.shuffle(self.perm_ix)

    def modify_obs(self, obs):
        """Modify raw observations.

        Parameters
        ----------
        obs : np.ndarray
            Raw observation/feature array of shape `(5,)`.

        Returns
        -------
        obs_modified : np.ndarray
            Modified observation array of shape `(5 + n_noise_features,)`.
            If `shuffle_on_reset` then the order of the features is going
            to change.
        """
        noise = self.rnd.randn(self.n_noise_features) * self.noise_std
        obs_and_noise = np.concatenate([obs, noise], axis=0)
        obs_modified = obs_and_noise[self.perm_ix]

        return obs_modified

    def rollout(self, solution):
        """Run a single episode/rollout.

        Parameters
        ----------
        solution : solutions.Solution
            Instance of a solution that yields an action given an
            observation.

        Returns
        -------
        ep_reward : int
            Overall episode reward computed as a sum of per step rewards.
        """
        # sanity check
        n_features_solution = solution.get_n_features()
        n_features_task = self.n_features

        if (
            n_features_solution is not None
            and n_features_solution != n_features_task
        ):
            raise IncompatibleNFeatures

        self.reset_for_rollout()
        solution.reset()  # important for PermutationInvariantSolution

        obs = self.env.reset()
        if self.render:
            self.env.render()

        ep_reward = 0
        done = False

        while not done:
            obs_modified = self.modify_obs(obs)
            action = solution.get_action(obs_modified)
            obs, reward, done, _ = self.env.step(action)

            ep_reward += reward
            if self.render:
                self.env.render()

        return ep_reward


================================================
FILE: github_adventures/neuron/torch_utils.py
================================================
import numpy as np
import torch
import torch.nn as nn


class MLP(nn.Module):
    """Multilayer perceptron policy network.

    Parameters
    ----------
    n_features : int
        Number of input features.

    hidden_layer_sizes : tuple
        Tuple of int that defines the sizes of all hidden layers.

    Attributes
    ----------
    net : nn.Sequential
        The actual network.
    """

    def __init__(self, n_features, hidden_layer_sizes):
        super().__init__()

        layer_sizes = (n_features,) + hidden_layer_sizes + (1,)

        layers = []

        for i in range(len(layer_sizes) - 1):
            in_features = layer_sizes[i]
            out_features = layer_sizes[i + 1]
            layers.extend(
                [
                    nn.Linear(in_features, out_features),
                    nn.Tanh(),
                ]
            )

        self.net = nn.Sequential(*layers)

        for p in self.parameters():
            p.requires_grad = False


    def forward(self, obs):
        """Run forward pass.

        Parameters
        ----------
        obs : torch.Tensor
            1D tensor representing the input observation of shape
            `(n_features,)`.

        Returns
        -------
        torch.Tensor
            Scalar between -1 and 1 representing the action.
        """

        return self.net(obs[None, :])[0]


def pos_table(n_embeddings, hidden_size):
    """Create a table of positional encodings.

    Parameters
    ----------
    n_embeddings : int
        Number of rows of the table.

    hidden_size : int
        Number of columns of the table.

    Returns
    -------
    tab : np.ndarray
        2D array holding the positional encodings.
    """

    def get_angle(x, h):
        return x / np.power(10000, 2 * (h // 2) / hidden_size)

    def get_angle_vec(x):
        return [get_angle(x, j) for j in range(hidden_size)]

    tab = np.array([get_angle_vec(i) for i in range(n_embeddings)]).astype(
        float
    )
    tab[:, 0::2] = np.sin(tab[:, 0::2])
    tab[:, 1::2] = np.cos(tab[:, 1::2])

    return tab


class AttentionMatrix(nn.Module):
    """Generates attention matrix using the key and query tensors.

    Parameters
    ----------
    proj_dim : int
        Size of the space to which we project the K and Q tensors.

    hidden_size : int
        Dimensionality of the Q and K tensors before linear projections.

    scale : bool
        If True, then the attention matrix will be divided by
        `proj_dim ** (1 / 2)` elementwise.

    Attributes
    ----------
    proj_q, proj_k : torch.nn.Linear
        Linear models projecting the Q and K tensors.

    scalar : float
        Number used for scaling the attention matrix elementwise.
    """

    def __init__(self, hidden_size, proj_dim, scale=True):
        super().__init__()

        self.proj_q = nn.Linear(
            in_features=hidden_size, out_features=proj_dim, bias=False
        )
        self.proj_k = nn.Linear(
            in_features=hidden_size, out_features=proj_dim, bias=False
        )
        if scale:
            self.scalar = np.sqrt(proj_dim)
        else:
            self.scalar = 1

    def forward(self, data_q, data_k):
        """Run the forward pass.

        Parameters
        ----------
        data_q : torch.Tensor
            Query tensor of shape `(n_embeddings, hidden_size)`.

        data_k : torch.Tensor
            Key tensor of shape `(n_features, hidden_size)`.

        Returns
        -------
        attention_weights : torch.Tensor
            Attention weights (don't sum up to 1 in general) of shape
            `(n_embeddings, n_features)`.
        """
        q = self.proj_q(data_q)  # (n_embeddings, proj_dim)
        k = self.proj_k(data_k)  # (n_features, proj_dim)
        dot = q @ k.T  # (n_embeddings, n_features)
        dot_scaled = torch.div(dot, self.scalar)  # (n_embeddings, n_features)
        attention_weights = torch.tanh(
            dot_scaled
        )  # (n_embeddings, n_features)

        return attention_weights


class AttentionNeuron(nn.Module):
    """Permutation invariant layer.

    Parameters
    ----------
    n_embeddings : int
        Number of rows in the Q tensor. In our case it is equal to the length
        of the latent code `m`.

    proj_dim : int
        Size of the space to which we project the K and Q tensors.

    hidden_size : int
        The dimensionality of the Q and K tensors before linear projections.

    Attributes
    ----------
    hx : tuple or None
        If not None then a tuple of 2 hidden state tensors (LSTM specific)

    lstm : nn.LSTMCell
        LSTM cell that inputs a hidden state and an observation and
        outputs a new hidden state.

    attention_matrix : AttentionMatrix
        Attention matrix (only needs Q and K tensors).

    Q : torch.Tensor
        Query tensor that is not learnable since it is populated with
        positional encodings.
    """

    def __init__(
        self,
        n_embeddings=16,
        proj_dim=32,
        hidden_size=8,
    ):
        super().__init__()
        self.n_embeddings = n_embeddings
        self.proj_dim = proj_dim
        self.hidden_size = hidden_size

        # Modules
        self.hx = None
        self.lstm = nn.LSTMCell(input_size=2, hidden_size=hidden_size)

        self.attention_matrix = AttentionMatrix(
            hidden_size=hidden_size,
            proj_dim=proj_dim,
            scale=False,
        )

        self.register_buffer(
            "Q",
            torch.from_numpy(
                pos_table(
                    n_embeddings,
                    hidden_size,
                )
            ).float(),
        )

    def forward(self, obs, prev_action):
        """Run forward pass.

        Parameters
        ----------
        obs : torch.Tensor
            1D tensor representing the input observations of shape
            `(n_features,)`.

        prev_action : float
            Number between -1 and 1 based on what the previous action was.

        Returns
        -------
        latent_code : torch.Tensor
            1D tensor representing the latent code of shape `(n_embeddings,)`.

        attn_weights : torch.Tensor
            2D tensor of shape `(n_embeddings, n_features)` representing
            attention weights.
        """
        n_features = len(obs)
        prev_action = float(prev_action)

        obs_and_act = torch.cat(
            [
                obs[:, None],
                torch.ones(n_features, 1) * prev_action,
            ],
            dim=-1,
        )  # (n_features, 2)

        if self.hx is None:
            self.hx = (
                torch.zeros(n_features, self.hidden_size),
                torch.zeros(n_features, self.hidden_size),
            )

        self.hx = self.lstm(
            obs_and_act, self.hx
        )  # Tuple[(n_features, hidden_size)]

        data_q = self.Q  # (n_embeddings, hidden_size)
        data_k = self.hx[0]  # (n_features, hidden_size)
        data_v = obs[:, None]  # (n_features, 1)

        attn_weights = self.attention_matrix(
            data_q=data_q, data_k=data_k
        )  # (n_embeddings, n_features)

        latent_code_ = torch.tanh(attn_weights @ data_v)  # (n_embeddings, 1)
        latent_code = latent_code_.squeeze()  # (n_embeddings,)

        return latent_code, attn_weights


class PermutationInvariantNetwork(nn.Module):
    """Permutation invariant policy network.

    Parameters
    ----------
    n_embeddings : int
        Number of rows in the Q tensor.

    proj_dim : int
        Size of the space to which we project the K and Q tensors.

    hidden_size : int
        Dimensionality of the Q and K matrices before linear projections.

    Attributes
    ----------
    attention_neuron : AttentionNeuron
        Permutation invariant layer that generates latent codes.

    linear : nn.Linear
        Maps the latent code into a single number.
    """

    def __init__(
        self,
        n_embeddings=16,
        proj_dim=32,
        hidden_size=8,
    ):
        super().__init__()

        self.attention_neuron = AttentionNeuron(
            n_embeddings=n_embeddings,
            proj_dim=proj_dim,
            hidden_size=hidden_size,
        )

        self.linear = nn.Linear(n_embeddings, 1)

        for p in self.parameters():
            p.requires_grad = False

    def forward(self, obs, prev_action):
        """Run forward pass.

        Parameters
        ----------
        obs : torch.Tensor
            1D tensor representing the input observations of shape
            `(n_features,)`.

        prev_action : float
            Number between -1 and 1 based on what the previous action was.

        Returns
        -------
        y : torch.Tensor
            Scalar tensor with a value in range (-1, 1) representing the
            next action.
        """

        latent_code, _ = self.attention_neuron(
            obs, prev_action
        )  # (n_embeddings,)

        y_ = torch.tanh(self.linear(latent_code[None, :]))  # (1, 1)
        y = y_[0]  # (1,)

        return y


================================================
FILE: github_adventures/neuron/trainer.py
================================================
import argparse
import json
import multiprocessing as mp
import pathlib
import pickle
from functools import partial

import cma
import numpy as np
import tqdm
from torch.utils.tensorboard import SummaryWriter

from solutions import (
    MLPSolution,
    PermutationInvariantSolution,
)
from tasks import Task, N_ORIGINAL_FEATURES


def save(folder, n_iter, solver, solution_inst):
    """Save checkpoint.

    Parameters
    ----------
    folder : str
        Output folder.

    n_iter : int
        Iteration that corresponds to the checkpoint.

    solver : cma.CMAEvolutionStrategy
        Solver instance.

    solution_inst : Solution
        Solution instance.
    """
    folder = pathlib.Path(folder)
    folder.mkdir(parents=True, exist_ok=True)

    path = folder / f"{n_iter}.pkl"

    with path.open("wb") as f:
        obj = (solver, solution_inst)
        pickle.dump(obj, f)


def get_fitness(
    solution_inst,
    *,
    shuffle_on_reset,
    n_episodes,
    n_noise_features,
    env_seed,
    feature_seed,
):
    """Get fitness function used by the CMA optimizer/solver.

    Can be run independently on a single worker.


    Returns
    -------
    fitness : list
        List of floats of length `n_episodes` holding the per episode reward.
    """
    task = Task(
        render=False,
        shuffle_on_reset=shuffle_on_reset,
        n_noise_features=n_noise_features,
        env_seed=env_seed,
        feature_seed=feature_seed,
    )
    fitness = [task.rollout(solution_inst) for _ in range(n_episodes)]

    return fitness


def main(argv=None):
    parser = argparse.ArgumentParser(
        "Training",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )

    parser.add_argument(
        "solution",
        type=str,
        choices=(
            "linear",
            "MLP",
            "invariant",
        ),
    )
    parser.add_argument(
        "log_dir",
        type=str,
        help="Logging folder",
    )
    parser.add_argument(
        "--checkpoint",
        type=str,
        help="Pickled solver and solution",
    )
    parser.add_argument(
        "--env-seed",
        type=int,
    )
    parser.add_argument(
        "--eval-frequency",
        type=int,
        default=25,
    )
    parser.add_argument(
        "--feature-seed",
        type=int,
    )
    parser.add_argument(
        "-m",
        "--max-iter",
        type=int,
        default=10000,
        help="Maximum number of iterations",
    )
    parser.add_argument(
        "-e",
        "--n-episodes",
        type=int,
        default=16,
        help="Number of rollouts for fitness evaluation",
    )
    parser.add_argument(
        "-j",
        "--n-jobs",
        type=int,
        default=-1,
        help="Number of processes",
    )
    parser.add_argument(
        "-n",
        "--n-noise-features",
        type=int,
        default=0,
        help="Number of noise features",
    )
    parser.add_argument(
        "-p",
        "--population-size",
        type=int,
        default=256,
        help="Number of solutions per generation",
    )
    parser.add_argument(
        "-s",
        "--shuffle-on-reset",
        action="store_true",
        help="Shuffle features before each rollout",
    )

    args = parser.parse_args(argv)

    writer = SummaryWriter(args.log_dir)
    writer.add_text("parameters", json.dumps(vars(args)))

    # Solution map
    if args.solution == "linear":
        solution_inst = MLPSolution(
            n_features=N_ORIGINAL_FEATURES + args.n_noise_features,
            hidden_layer_sizes=tuple(),
        )

    elif args.solution == "MLP":
        solution_inst = MLPSolution(
            n_features=N_ORIGINAL_FEATURES + args.n_noise_features,
            hidden_layer_sizes=(16,),
        )

    elif args.solution == "invariant":
        solution_inst = PermutationInvariantSolution(
            n_embeddings=16,
            proj_dim=32,
            hidden_size=8,
        )

    else:
        raise ValueError

    # Prepare solver
    if args.checkpoint is None:
        x0 = np.zeros(solution_inst.get_n_params())
        solver = cma.CMAEvolutionStrategy(
            x0=x0,
            sigma0=0.1,
            inopts={
                "popsize": args.population_size,
                "seed": 42,
                "randn": np.random.randn,
            },
        )
    else:
        with open(args.checkpoint, "rb") as f:
            solver, solution_inst_ = pickle.load(f)

            assert isinstance(solution_inst, solution_inst_.__class__)

            solution_inst = solution_inst_

    get_fitness_partial = partial(
        get_fitness,
        n_episodes=args.n_episodes,
        shuffle_on_reset=args.shuffle_on_reset,
        n_noise_features=args.n_noise_features,
        env_seed=args.env_seed,
        feature_seed=args.feature_seed,
    )

    if args.n_jobs == -1:
        n_jobs = mp.cpu_count()
    else:
        n_jobs = args.n_jobs


    with mp.Pool(processes=n_jobs) as pool:
        for n_iter in tqdm.tqdm(range(args.max_iter)):
            try:
                params_set = solver.ask()
                iterable = [
                    solution_inst.clone().set_params(p) for p in params_set
                ]
                rewards = pool.map(get_fitness_partial, iterable)
                pos_fitnesses = [np.mean(r) for r in rewards]

                neg_fitnesses = [-x for x in pos_fitnesses]

                all_parameters = np.concatenate(params_set)
                metrics = {
                    "parameter_mean": all_parameters.mean(),
                    "parameter_std": all_parameters.std(),
                    "mean": np.mean(pos_fitnesses),
                    "max (generation)": np.max(pos_fitnesses),
                    "max (overall)": -solver.result.fbest,
                }

                for metric_name, metric in metrics.items():
                    writer.add_scalar(metric_name, metric, global_step=n_iter)

                if (n_iter % args.eval_frequency == 0) or (
                    n_iter == (args.max_iter - 1)
                ):
                    save(args.log_dir, n_iter, solver, solution_inst)

                solver.tell(params_set, neg_fitnesses)

            except KeyboardInterrupt:
                save(
                    args.log_dir,
                    n_iter,
                    solver,
                    solution_inst,
                )
                break


if __name__ == "__main__":
    main()


================================================
FILE: github_adventures/pondernet/experiment_1.sh
================================================
set -x 
SEED=$RANDOM
LAMBDAS=(0.1 0.3 0.5 0.7 0.9)

for lambda in ${LAMBDAS[@]}
do
	python train.py \
		--batch-size 128 \
		--beta 0.01 \
		--device cuda \
		--eval-frequency 4000 \
		--n-iter 100000 \
		--n-hidden 128 \
		--lambda-p $lambda \
		--n-elems 15 \
		results/experiment_a/$SEED/lambda_$lambda
done


================================================
FILE: github_adventures/pondernet/experiment_2.sh
================================================
set -x 
SEED=$RANDOM

python train.py \
	--batch-size 128 \
	--beta 0.01 \
	--eval-frequency 4000 \
	--device cuda \
	--lambda-p 0.2 \
	--n-elems 30 \
	--n-iter 1500000 \
	--n-hidden 128 \
	--n-nonzero 1 25 \
	results/experiment_b/$SEED


================================================
FILE: github_adventures/pondernet/requirements.txt
================================================
matplotlib
numpy
tensorboard
torch
tqdm


================================================
FILE: github_adventures/pondernet/train.py
================================================
from argparse import ArgumentParser
import json
import pathlib

import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm

from utils import (
    ParityDataset,
    PonderNet,
    ReconstructionLoss,
    RegularizationLoss,
)


@torch.no_grad()
def evaluate(dataloader, module):
    """Compute relevant metrics.

    Parameters
    ----------
    dataloader : DataLoader
        Dataloader that yields batches of `x` and `y`.

    module : PonderNet
        Our pondering network.

    Returns
    -------
    metrics_single : dict
        Scalar metrics. The keys are names and the values are `torch.Tensor`.
        These metrics are computed as mean values over the entire dataset.

    metrics_per_step : dict
        Per step metrics. The keys are names and the values are `torch.Tensor`
        of shape `(max_steps,)`. These metrics are computed as mean values over
        the entire dataset.

    """
    # Imply device and dtype
    param = next(module.parameters())
    device, dtype = param.device, param.dtype

    metrics_single_ = {
        "accuracy_halted": [],
        "halting_step": [],
    }
    metrics_per_step_ = {
        "accuracy": [],
        "p": [],
    }

    for x_batch, y_true_batch in dataloader:
        x_batch = x_batch.to(device, dtype)  # (batch_size, n_elems)
        y_true_batch = y_true_batch.to(device, dtype)  # (batch_size,)

        y_pred_batch, p, halting_step = module(x_batch)
        y_halted_batch = y_pred_batch.gather(
            dim=0,
            index=halting_step[None, :] - 1,
        )[
            0
        ]  # (batch_size,)

        # Computing single metrics (mean over samples in the batch)
        accuracy_halted = (
            ((y_halted_batch > 0) == y_true_batch).to(torch.float32).mean()
        )

        metrics_single_["accuracy_halted"].append(accuracy_halted)
        metrics_single_["halting_step"].append(
            halting_step.to(torch.float).mean()
        )

        # Computing per step metrics (mean over samples in the batch)
        accuracy = (
            ((y_pred_batch > 0) == y_true_batch[None, :])
            .to(torch.float32)
            .mean(dim=1)
        )

        metrics_per_step_["accuracy"].append(accuracy)
        metrics_per_step_["p"].append(p.mean(dim=1))

    metrics_single = {
        name: torch.stack(values).mean(dim=0).cpu().numpy()
        for name, values in metrics_single_.items()
    }

    metrics_per_step = {
        name: torch.stack(values).mean(dim=0).cpu().numpy()
        for name, values in metrics_per_step_.items()
    }

    return metrics_single, metrics_per_step


def plot_distributions(target, predicted):
    """Create a barplot.

    Parameters
    ----------
    target, predicted : np.ndarray
        Arrays of shape `(max_steps,)` representing the target and predicted
        probability distributions.

    Returns
    -------
    matplotlib.Figure
    """
    support = list(range(1, len(target) + 1))

    fig, ax = plt.subplots(dpi=140)

    ax.bar(
        support,
        target,
        color="red",
        label=f"Target - Geometric({target[0].item():.2f})",
    )

    ax.bar(
        support,
        predicted,
        color="green",
        width=0.4,
        label="Predicted",
    )

    ax.set_ylim(0, 0.6)
    ax.set_xticks(support)
    ax.legend()
    ax.grid()

    return fig


def plot_accuracy(accuracy):
    """Create a barplot representing accuracy over different halting steps.

    Parameters
    ----------
    accuracy : np.array
        1D array representing accuracy if we were to take the output after
        the corresponding step.

    Returns
    -------
    matplotlib.Figure
    """
    support = list(range(1, len(accuracy) + 1))

    fig, ax = plt.subplots(dpi=140)

    ax.bar(
        support,
        accuracy,
        label="Accuracy over different steps",
    )

    ax.set_ylim(0, 1)
    ax.set_xticks(support)
    ax.legend()
    ax.grid()

    return fig


def main(argv=None):
    """CLI for training."""
    parser = ArgumentParser()

    parser.add_argument(
        "log_folder",
        type=str,
        help="Folder where tensorboard logging is saved",
    )
    parser.add_argument(
        "--batch-size",
        type=int,
        default=128,
        help="Batch size",
    )
    parser.add_argument(
        "--beta",
        type=float,
        default=0.01,
        help="Regularization loss coefficient",
    )
    parser.add_argument(
        "-d",
        "--device",
        type=str,
        choices={"cpu", "cuda"},
        default="cpu",
        help="Device to use",
    )
    parser.add_argument(
        "--eval-frequency",
        type=int,
        default=10_000,
        help="Evaluation is run every `eval_frequency` steps",
    )
    parser.add_argument(
        "--lambda-p",
        type=float,
        default=0.4,
        help="True probability of success for a geometric distribution",
    )
    parser.add_argument(
        "--n-iter",
        type=int,
        default=1_000_000,
        help="Number of gradient steps",
    )
    parser.add_argument(
        "--n-elems",
        type=int,
        default=64,
        help="Number of elements",
    )
    parser.add_argument(
        "--n-hidden",
        type=int,
        default=64,
        help="Number of hidden elements in the reccurent cell",
    )
    parser.add_argument(
        "--n-nonzero",
        type=int,
        nargs=2,
        default=(None, None),
        help="Lower and upper bound on nonzero elements in the training set",
    )
    parser.add_argument(
        "--max-steps",
        type=int,
        default=20,
        help="Maximum number of pondering steps",
    )

    # Parameters
    args = parser.parse_args(argv)
    print(args)

    device = torch.device(args.device)
    dtype = torch.float32
    n_eval_samples = 1000
    batch_size_eval = 50

    if args.n_nonzero[0] is None and args.n_nonzero[1] is None:
        threshold = int(0.3 * args.n_elems)
        range_nonzero_easy = (1, threshold)
        range_nonzero_hard = (args.n_elems - threshold, args.n_elems)
    else:
        range_nonzero_easy = (1, args.n_nonzero[1])
        range_nonzero_hard = (args.n_nonzero[1] + 1, args.n_elems)

    # Tensorboard
    log_folder = pathlib.Path(args.log_folder)
    writer = SummaryWriter(log_folder)
    writer.add_text("parameters", json.dumps(vars(args)))

    # Prepare data
    dataloader_train = DataLoader(
        ParityDataset(
            n_samples=args.batch_size * args.n_iter,
            n_elems=args.n_elems,
            n_nonzero_min=args.n_nonzero[0],
            n_nonzero_max=args.n_nonzero[1],
        ),
        batch_size=args.batch_size,
    )  # consider specifying `num_workers` for speedups
    eval_dataloaders = {
        "test": DataLoader(
            ParityDataset(
                n_samples=n_eval_samples,
                n_elems=args.n_elems,
                n_nonzero_min=args.n_nonzero[0],
                n_nonzero_max=args.n_nonzero[1],
            ),
            batch_size=batch_size_eval,
        ),
        f"{range_nonzero_easy[0]}_{range_nonzero_easy[1]}": DataLoader(
            ParityDataset(
                n_samples=n_eval_samples,
                n_elems=args.n_elems,
                n_nonzero_min=range_nonzero_easy[0],
                n_nonzero_max=range_nonzero_easy[1],
            ),
            batch_size=batch_size_eval,
        ),
        f"{range_nonzero_hard[0]}_{range_nonzero_hard[1]}": DataLoader(
            ParityDataset(
                n_samples=n_eval_samples,
                n_elems=args.n_elems,
                n_nonzero_min=range_nonzero_hard[0],
                n_nonzero_max=range_nonzero_hard[1],
            ),
            batch_size=batch_size_eval,
        ),
    }

    # Model preparation
    module = PonderNet(
        n_elems=args.n_elems,
        n_hidden=args.n_hidden,
        max_steps=args.max_steps,
    )
    module = module.to(device, dtype)

    # Loss preparation
    loss_rec_inst = ReconstructionLoss(
        nn.BCEWithLogitsLoss(reduction="none")
    ).to(device, dtype)

    loss_reg_inst = RegularizationLoss(
        lambda_p=args.lambda_p,
        max_steps=args.max_steps,
    ).to(device, dtype)

    # Optimizer
    optimizer = torch.optim.Adam(
        module.parameters(),
        lr=0.0003,
    )

    # Training and evaluation loops
    itera

Download .txt

gitextract_ixgqmhua/

├── .gitignore
├── LICENSE
├── README.md
├── github_adventures/
│   ├── automata/
│   │   ├── model.py
│   │   └── train.py
│   ├── diffaugment/
│   │   ├── README.MD
│   │   ├── script.py
│   │   └── utils.py
│   ├── dino/
│   │   ├── data/
│   │   │   ├── README.md
│   │   │   └── imagenette_labels.json
│   │   ├── evaluation.py
│   │   ├── train.py
│   │   ├── utils.py
│   │   ├── visualize_attentions.ipynb
│   │   └── visualize_augmentations.ipynb
│   ├── gpt/
│   │   ├── README.md
│   │   ├── copy_and_generate.py
│   │   ├── distribution_visualizations.ipynb
│   │   ├── ipython_code.py
│   │   ├── model.py
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── integer/
│   │   ├── README.md
│   │   ├── bert.py
│   │   ├── experiments.sh
│   │   ├── fetch_data.py
│   │   ├── glove.py
│   │   ├── lstm.py
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── lottery/
│   │   ├── README.md
│   │   ├── data.py
│   │   ├── main.py
│   │   ├── parallel_launch.sh
│   │   ├── requirements.txt
│   │   └── utils.py
│   ├── mixer/
│   │   ├── README.md
│   │   ├── official.py
│   │   ├── ours.py
│   │   └── test_compare.py
│   ├── mixup/
│   │   ├── launch_experiments.sh
│   │   ├── train.py
│   │   └── utils.py
│   ├── ner_evaluation/
│   │   ├── README.md
│   │   ├── ours.py
│   │   ├── test_ours.py
│   │   └── try.py
│   ├── neuron/
│   │   ├── README.md
│   │   ├── evaluate_noise.py
│   │   ├── evaluate_shuffling.py
│   │   ├── evaluate_video.py
│   │   ├── launch.sh
│   │   ├── pretrained/
│   │   │   ├── MLP.pkl
│   │   │   ├── MLP_augment.pkl
│   │   │   ├── invariant_official.pkl
│   │   │   ├── invariant_ours.pkl
│   │   │   ├── linear.pkl
│   │   │   └── linear_augment.pkl
│   │   ├── requirements.txt
│   │   ├── solutions.py
│   │   ├── tasks.py
│   │   ├── torch_utils.py
│   │   └── trainer.py
│   ├── pondernet/
│   │   ├── experiment_1.sh
│   │   ├── experiment_2.sh
│   │   ├── requirements.txt
│   │   ├── train.py
│   │   └── utils.py
│   ├── product_quantization/
│   │   ├── README.md
│   │   ├── convert.py
│   │   ├── custom.py
│   │   ├── faiss_101_ipython.py
│   │   ├── generate_index.py
│   │   ├── parse.py
│   │   ├── requirements.txt
│   │   ├── run_all.sh
│   │   └── run_gradio.py
│   ├── siren/
│   │   ├── activations.py
│   │   ├── core.py
│   │   └── train.py
│   └── vision_transformer/
│       ├── classes.txt
│       ├── custom.py
│       ├── forward.py
│       └── verify.py
└── mini_tutorials/
    ├── bentoml/
    │   ├── README.md
    │   ├── bentofile.yaml
    │   ├── create_model.py
    │   ├── requirements.txt
    │   └── service.py
    ├── custom_optimizer_in_pytorch/
    │   ├── custom.py
    │   └── src.py
    ├── deploying_on_kubernetes/
    │   ├── Dockerfile
    │   ├── DockerfileConda
    │   └── README.md
    ├── embedding/
    │   ├── README.md
    │   ├── Visualize.ipynb
    │   └── src.py
    ├── fewshot_text_classification/
    │   ├── classify.py
    │   └── template.jinja2
    ├── gradient_wrt_input/
    │   ├── explain.py
    │   ├── fool.py
    │   └── utils.py
    ├── haiku_basics/
    │   ├── buffers_in_torch.py
    │   ├── parameter.py
    │   ├── reallife.py
    │   ├── requirements.txt
    │   └── state.py
    ├── httpx_rate_limiting/
    │   └── script.py
    ├── mocking_neural_networks/
    │   ├── app.py
    │   └── test.py
    ├── numpy_equality_testing/
    │   └── test.py
    ├── openai_function_calling/
    │   └── example.py
    ├── rag_with_reranking/
    │   ├── README.md
    │   ├── answer.py
    │   ├── input.txt
    │   ├── postman_collection.json
    │   └── upload_data.py
    └── visualizing_activations_with_forward_hooks/
        └── src.py

Download .txt

SYMBOL INDEX (274 symbols across 59 files)

FILE: github_adventures/automata/model.py
  class CAModel (line 5) | class CAModel(nn.Module):
    method __init__ (line 32) | def __init__(self, n_channels=16, hidden_channels=128, fire_rate=0.5, ...
    method perceive (line 83) | def perceive(self, x):
    method update (line 102) | def update(self, x):
    method stochastic_update (line 121) | def stochastic_update(x, fire_rate):
    method get_living_mask (line 146) | def get_living_mask(x):
    method forward (line 167) | def forward(self, x):

FILE: github_adventures/automata/train.py
  function load_image (line 14) | def load_image(path, size=40):
  function to_rgb (line 40) | def to_rgb(img_rgba):
  function make_seed (line 58) | def make_seed(size, n_channels):
  function main (line 82) | def main(argv=None):

FILE: github_adventures/diffaugment/script.py
  function main (line 17) | def main(argv=None):

FILE: github_adventures/diffaugment/utils.py
  class DatasetImages (line 6) | class DatasetImages(Dataset):
    method __init__ (line 22) | def __init__(self, path, transform=None):
    method __len__ (line 28) | def __len__(self):
    method __getitem__ (line 32) | def __getitem__(self, ix):
  class Generator (line 43) | class Generator(nn.Module):
    method __init__ (line 63) | def __init__(self, latent_dim, ngf=64):
    method forward (line 91) | def forward(self, x):
  class Discriminator (line 108) | class Discriminator(nn.Module):
    method __init__ (line 128) | def __init__(self, ndf=16, augment_module=None):
    method forward (line 161) | def forward(self, x):
  function init_weights_ (line 182) | def init_weights_(module):

FILE: github_adventures/dino/evaluation.py
  function compute_knn (line 7) | def compute_knn(backbone, data_loader_train, data_loader_val):
  function compute_embedding (line 57) | def compute_embedding(backbone, data_loader):

FILE: github_adventures/dino/train.py
  function main (line 17) | def main():

FILE: github_adventures/dino/utils.py
  class DataAugmentation (line 8) | class DataAugmentation:
    method __init__ (line 36) | def __init__(
    method __call__ (line 113) | def __call__(self, img):
  class Head (line 136) | class Head(nn.Module):
    method __init__ (line 173) | def __init__(
    method _init_weights (line 203) | def _init_weights(self, m):
    method forward (line 210) | def forward(self, x):
  class MultiCropWrapper (line 230) | class MultiCropWrapper(nn.Module):
    method __init__ (line 242) | def __init__(self, backbone, new_head):
    method forward (line 248) | def forward(self, x):
  class Loss (line 275) | class Loss(nn.Module):
    method __init__ (line 293) | def __init__(
    method forward (line 302) | def forward(self, student_output, teacher_output):
    method update_center (line 342) | def update_center(self, teacher_output):
  function clip_gradients (line 360) | def clip_gradients(model, clip=2.0):

FILE: github_adventures/gpt/copy_and_generate.py
  function main (line 14) | def main(argv=None):

FILE: github_adventures/gpt/model.py
  class CustomGELU (line 7) | class CustomGELU(nn.Module):
    method forward (line 10) | def forward(self, x):
  class Block (line 15) | class Block(nn.Module):
    method __init__ (line 51) | def __init__(
    method forward (line 87) | def forward(self, x):
  class GPT (line 115) | class GPT(nn.Module):
    method __init__ (line 168) | def __init__(
    method forward (line 204) | def forward(self, idx):

FILE: github_adventures/gpt/utils.py
  function copy_parameter (line 4) | def copy_parameter(param_official, param_ours):
  function copy_block (line 23) | def copy_block(block_official, block_ours):
  function copy_model (line 60) | def copy_model(model_official, model_ours):
  function generate_token (line 89) | def generate_token(

FILE: github_adventures/integer/bert.py
  function main (line 11) | def main(argv=None):

FILE: github_adventures/integer/fetch_data.py
  function get_sequence (line 9) | def get_sequence(sequence_id):

FILE: github_adventures/integer/glove.py
  function main (line 9) | def main(argv=None):

FILE: github_adventures/integer/lstm.py
  function main (line 22) | def main(argv=None):

FILE: github_adventures/integer/utils.py
  class CustomDataset (line 12) | class CustomDataset(Dataset):
    method __init__ (line 39) | def __init__(
    method __len__ (line 65) | def __len__(self):
    method __getitem__ (line 69) | def __getitem__(self, ix):
  class Network (line 74) | class Network(nn.Module):
    method __init__ (line 112) | def __init__(
    method forward (line 145) | def forward(self, x):
  function train_classifier (line 172) | def train_classifier(X, y, random_state=2):
  function create_classification_targets (line 221) | def create_classification_targets(indices):

FILE: github_adventures/lottery/data.py
  class MNISTDataset (line 6) | class MNISTDataset(Dataset):
    method __init__ (line 26) | def __init__(self, root, train=True, download=True):
    method __len__ (line 41) | def __len__(self):
    method __getitem__ (line 45) | def __getitem__(self, ix):

FILE: github_adventures/lottery/main.py
  function loop_dataloader (line 13) | def loop_dataloader(dataloader):
  function train (line 34) | def train(
  function main (line 103) | def main(argv=None):

FILE: github_adventures/lottery/utils.py
  class MLP (line 8) | class MLP(nn.Module):
    method __init__ (line 30) | def __init__(self, n_features, hidden_layer_sizes, n_targets):
    method forward (line 41) | def forward(self, x):
  function prune_linear (line 65) | def prune_linear(linear, prune_ratio=0.3, method="l1"):
  function prune_mlp (line 94) | def prune_mlp(mlp, prune_ratio=0.3, method="l1"):
  function check_pruned_linear (line 127) | def check_pruned_linear(linear):
  function reinit_linear (line 148) | def reinit_linear(linear):
  function reinit_mlp (line 180) | def reinit_mlp(mlp):
  function copy_weights_linear (line 192) | def copy_weights_linear(linear_unpruned, linear_pruned):
  function copy_weights_mlp (line 213) | def copy_weights_mlp(mlp_unpruned, mlp_pruned):
  function compute_stats (line 232) | def compute_stats(mlp):

FILE: github_adventures/mixer/official.py
  class MlpBlock (line 6) | class MlpBlock(nn.Module):
    method __call__ (line 10) | def __call__(self, x):
  class MixerBlock (line 16) | class MixerBlock(nn.Module):
    method __call__ (line 21) | def __call__(self, x):
  class MlpMixer (line 31) | class MlpMixer(nn.Module):
    method __call__ (line 40) | def __call__(self, x):

FILE: github_adventures/mixer/ours.py
  class MlpBlock (line 5) | class MlpBlock(nn.Module):
    method __init__ (line 26) | def __init__(self, dim, mlp_dim=None):
    method forward (line 34) | def forward(self, x):
  class MixerBlock (line 54) | class MixerBlock(nn.Module):
    method __init__ (line 83) | def __init__(
    method forward (line 94) | def forward(self, x):
  class MlpMixer (line 120) | class MlpMixer(nn.Module):
    method __init__ (line 162) | def __init__(
    method forward (line 197) | def forward(self, x):

FILE: github_adventures/mixer/test_compare.py
  function test_compare (line 17) | def test_compare(

FILE: github_adventures/mixup/train.py
  function main (line 18) | def main(argv=None):

FILE: github_adventures/mixup/utils.py
  class MLPClassifierMixup (line 9) | class MLPClassifierMixup(nn.Module):
    method __init__ (line 38) | def __init__(self, n_features, hidden_dims, p=0):
    method forward (line 55) | def forward(self, x, start=0, end=None):
  class CustomDataset (line 82) | class CustomDataset(Dataset):
    method __init__ (line 94) | def __init__(self, X, y):
    method __len__ (line 105) | def __len__(self):
    method __getitem__ (line 109) | def __getitem__(self, ix):
  function generate_spirals (line 114) | def generate_spirals(
  function generate_prediction_img (line 184) | def generate_prediction_img(

FILE: github_adventures/ner_evaluation/ours.py
  function check_valid (line 6) | def check_valid(annots: list[str]) -> bool:
  function get_etypes (line 25) | def get_etypes(annots: list[str]) -> list[None | str]:
  function get_entities (line 29) | def get_entities(annots: list[str]) -> list[dict[str, int | str]]:
  function get_report (line 81) | def get_report(annots_true: list[str], annots_pred: list[str]) -> dict:

FILE: github_adventures/ner_evaluation/test_ours.py
  function test_check_valid (line 23) | def test_check_valid(inp, out):
  function test_get_etypes (line 38) | def test_get_etypes(inp, out):
  function test_get_entities (line 72) | def test_get_entities(inp, out):
  function test_get_report (line 113) | def test_get_report(annots_true, annots_pred):

FILE: github_adventures/neuron/evaluate_noise.py
  function main (line 13) | def main(argv=None):

FILE: github_adventures/neuron/evaluate_shuffling.py
  function main (line 13) | def main(argv=None):

FILE: github_adventures/neuron/evaluate_video.py
  function main (line 12) | def main(argv=None):

FILE: github_adventures/neuron/solutions.py
  class Solution (line 9) | class Solution(abc.ABC):
    method clone (line 19) | def clone(self, obs):
    method get_action (line 23) | def get_action(self, obs):
    method get_n_features (line 27) | def get_n_features(self):
    method reset (line 35) | def reset(self):
    method get_params (line 43) | def get_params(self):
    method set_params (line 60) | def set_params(self, params):
    method get_n_params (line 83) | def get_n_params(self):
  class MLPSolution (line 87) | class MLPSolution(Solution):
    method __init__ (line 107) | def __init__(self, n_features=5, hidden_layer_sizes=(16,)):
    method clone (line 118) | def clone(self):
    method get_action (line 128) | def get_action(self, obs):
    method get_n_features (line 134) | def get_n_features(self):
    method reset (line 137) | def reset(self):
  class PermutationInvariantSolution (line 141) | class PermutationInvariantSolution(Solution):
    method __init__ (line 171) | def __init__(
    method clone (line 194) | def clone(self):
    method get_action (line 204) | def get_action(self, obs):
    method reset (line 212) | def reset(self):
    method get_n_features (line 216) | def get_n_features(self):

FILE: github_adventures/neuron/tasks.py
  class IncompatibleNFeatures (line 8) | class IncompatibleNFeatures(Exception):
  class Task (line 12) | class Task:
    method __init__ (line 51) | def __init__(
    method reset_for_rollout (line 76) | def reset_for_rollout(self):
    method modify_obs (line 87) | def modify_obs(self, obs):
    method rollout (line 108) | def rollout(self, solution):

FILE: github_adventures/neuron/torch_utils.py
  class MLP (line 6) | class MLP(nn.Module):
    method __init__ (line 23) | def __init__(self, n_features, hidden_layer_sizes):
    method forward (line 46) | def forward(self, obs):
  function pos_table (line 64) | def pos_table(n_embeddings, hidden_size):
  class AttentionMatrix (line 96) | class AttentionMatrix(nn.Module):
    method __init__ (line 120) | def __init__(self, hidden_size, proj_dim, scale=True):
    method forward (line 134) | def forward(self, data_q, data_k):
  class AttentionNeuron (line 162) | class AttentionNeuron(nn.Module):
    method __init__ (line 194) | def __init__(
    method forward (line 225) | def forward(self, obs, prev_action):
  class PermutationInvariantNetwork (line 281) | class PermutationInvariantNetwork(nn.Module):
    method __init__ (line 304) | def __init__(
    method forward (line 323) | def forward(self, obs, prev_action):

FILE: github_adventures/neuron/trainer.py
  function save (line 20) | def save(folder, n_iter, solver, solution_inst):
  function get_fitness (line 47) | def get_fitness(
  function main (line 78) | def main(argv=None):

FILE: github_adventures/pondernet/train.py
  function evaluate (line 21) | def evaluate(dataloader, module):
  function plot_distributions (line 102) | def plot_distributions(target, predicted):
  function plot_accuracy (line 142) | def plot_accuracy(accuracy):
  function main (line 173) | def main(argv=None):

FILE: github_adventures/pondernet/utils.py
  class ParityDataset (line 6) | class ParityDataset(Dataset):
    method __init__ (line 22) | def __init__(
    method __len__ (line 39) | def __len__(self):
    method __getitem__ (line 43) | def __getitem__(self, idx):
  class PonderNet (line 60) | class PonderNet(nn.Module):
    method __init__ (line 93) | def __init__(
    method forward (line 106) | def forward(self, x):
  class ReconstructionLoss (line 182) | class ReconstructionLoss(nn.Module):
    method __init__ (line 193) | def __init__(self, loss_func):
    method forward (line 198) | def forward(self, p, y_pred, y_true):
  class RegularizationLoss (line 230) | class RegularizationLoss(nn.Module):
    method __init__ (line 244) | def __init__(self, lambda_p, max_steps=20):
    method forward (line 257) | def forward(self, p):

FILE: github_adventures/product_quantization/convert.py
  function from_faiss (line 14) | def from_faiss(faiss_index: faiss.swigfaiss.IndexPQ) -> CustomIndexPQ:
  function main (line 45) | def main() -> int:

FILE: github_adventures/product_quantization/custom.py
  class CustomIndexPQ (line 16) | class CustomIndexPQ:
    method __init__ (line 36) | def __init__(
    method train (line 66) | def train(self, X: np.ndarray) -> None:
    method encode (line 88) | def encode(self, X: np.ndarray) -> np.ndarray:
    method add (line 111) | def add(self, X: np.ndarray) -> None:
    method compute_asymmetric_distances (line 123) | def compute_asymmetric_distances(self, X: np.ndarray) -> np.ndarray:
    method search (line 164) | def search(self, X: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray]:

FILE: github_adventures/product_quantization/parse.py
  function get_embeddings (line 15) | def get_embeddings(path: str, maximum: int | None = None) -> tuple[list[...

FILE: github_adventures/product_quantization/run_gradio.py
  function run (line 40) | def run(

FILE: github_adventures/siren/activations.py
  function fh (line 22) | def fh(inst, inp, out, number=0):

FILE: github_adventures/siren/core.py
  function paper_init_ (line 8) | def paper_init_(weight, is_first=False, omega=1):
  class SineLayer (line 33) | class SineLayer(nn.Module):
    method __init__ (line 63) | def __init__(
    method forward (line 81) | def forward(self, x):
  class ImageSiren (line 96) | class ImageSiren(nn.Module):
    method __init__ (line 119) | def __init__(
    method forward (line 163) | def forward(self, x):
  function generate_coordinates (line 180) | def generate_coordinates(n):
  class PixelDataset (line 198) | class PixelDataset(Dataset):
    method __init__ (line 226) | def __init__(self, img):
    method __len__ (line 237) | def __len__(self):
    method __getitem__ (line 241) | def __getitem__(self, idx):
  class GradientUtils (line 258) | class GradientUtils:
    method gradient (line 260) | def gradient(target, coords):
    method divergence (line 281) | def divergence(grad, coords):
    method laplace (line 310) | def laplace(target, coords):

FILE: github_adventures/vision_transformer/custom.py
  class PatchEmbed (line 5) | class PatchEmbed(nn.Module):
    method __init__ (line 31) | def __init__(self, img_size, patch_size, in_chans=3, embed_dim=768):
    method forward (line 45) | def forward(self, x):
  class Attention (line 67) | class Attention(nn.Module):
    method __init__ (line 103) | def __init__(self, dim, n_heads=12, qkv_bias=True, attn_p=0., proj_p=0.):
    method forward (line 115) | def forward(self, x):
  class MLP (line 161) | class MLP(nn.Module):
    method __init__ (line 192) | def __init__(self, in_features, hidden_features, out_features, p=0.):
    method forward (line 199) | def forward(self, x):
  class Block (line 223) | class Block(nn.Module):
    method __init__ (line 255) | def __init__(self, dim, n_heads, mlp_ratio=4.0, qkv_bias=True, p=0., a...
    method forward (line 273) | def forward(self, x):
  class VisionTransformer (line 292) | class VisionTransformer(nn.Module):
    method __init__ (line 349) | def __init__(
    method forward (line 395) | def forward(self, x):

FILE: github_adventures/vision_transformer/verify.py
  function get_n_params (line 7) | def get_n_params(module):
  function assert_tensors_equal (line 10) | def assert_tensors_equal(t1, t2):

FILE: mini_tutorials/bentoml/service.py
  class Request (line 13) | class Request(BaseModel):
  class Response (line 19) | class Response(BaseModel):
  function classify (line 24) | def classify(request: Request) -> Response:

FILE: mini_tutorials/custom_optimizer_in_pytorch/custom.py
  class WeirdDescent (line 5) | class WeirdDescent(Optimizer):
    method __init__ (line 10) | def __init__(self, parameters, lr=1e-3):
    method step (line 14) | def step(self, closure=None):

FILE: mini_tutorials/custom_optimizer_in_pytorch/src.py
  function rosenbrock (line 10) | def rosenbrock(xy):
  function run_optimization (line 27) | def run_optimization(xy_init, optimizer_class, n_iter, **optimizer_kwargs):
  function create_animation (line 67) | def create_animation(paths,

FILE: mini_tutorials/embedding/src.py
  class CharacterDataset (line 12) | class CharacterDataset(Dataset):
    method __init__ (line 41) | def __init__(self, text, window_size=1, vocab_size=50):
    method __len__ (line 56) | def __len__(self):
    method __getitem__ (line 59) | def __getitem__(self, ix):
  class Network (line 68) | class Network(Module):
    method __init__ (line 92) | def __init__(
    method forward (line 117) | def forward(self, x, h=None, c=None):
  function compute_loss (line 149) | def compute_loss(cal, net, dataloader):
  function generate_text (line 160) | def generate_text(n_chars, net, dataset, initial_text="Hello", random_st...

FILE: mini_tutorials/gradient_wrt_input/explain.py
  function func (line 9) | def func(inp, net=None, target=None):
  function compute_integrated_gradients (line 33) | def compute_integrated_gradients(inp, baseline, net, target, n_steps=100):

FILE: mini_tutorials/gradient_wrt_input/fool.py
  function func (line 8) | def func(inp, net=None, target=None):
  function attack (line 33) | def attack(tensor, net, eps=1e-3, n_iter=50):

FILE: mini_tutorials/gradient_wrt_input/utils.py
  function compute_gradient (line 6) | def compute_gradient(func, inp, **kwargs):
  function read_image (line 37) | def read_image(path):
  function to_array (line 64) | def to_array(tensor):
  function scale_grad (line 89) | def scale_grad(grad):

FILE: mini_tutorials/haiku_basics/parameter.py
  function foo (line 8) | def foo(x: jnp.ndarray) -> jnp.ndarray:

FILE: mini_tutorials/haiku_basics/reallife.py
  function foo (line 7) | def foo(x: jnp.ndarray) -> jnp.ndarray:

FILE: mini_tutorials/haiku_basics/state.py
  function foo (line 8) | def foo(x: jnp.ndarray) -> jnp.ndarray:

FILE: mini_tutorials/httpx_rate_limiting/script.py
  function send_request (line 11) | async def send_request(client: httpx.AsyncClient, semaphore: asyncio.Sem...
  function main (line 21) | async def main() -> int:

FILE: mini_tutorials/mocking_neural_networks/app.py
  function get_top_k (line 8) | def get_top_k(sequence, tokenizer, model, k=10):

FILE: mini_tutorials/mocking_neural_networks/test.py
  function test_with_real_objects (line 11) | def test_with_real_objects(k):
  function test_with_mock_objects (line 22) | def test_with_mock_objects(k):

FILE: mini_tutorials/numpy_equality_testing/test.py
  function get_arrays (line 4) | def get_arrays():
  function test___eq__ (line 32) | def test___eq__():
  function test___eq__all (line 38) | def test___eq__all():
  function test_array_equal (line 46) | def test_array_equal():
  function test_allclose (line 56) | def test_allclose():
  function test_testing_array_equal (line 67) | def test_testing_array_equal():
  function test_testing_allclose (line 75) | def test_testing_allclose():

FILE: mini_tutorials/openai_function_calling/example.py
  function get_price (line 17) | def get_price(symbol: str, date: str) -> float:
  function calculate (line 27) | def calculate(a: float, b: float, op: str) -> float:

FILE: mini_tutorials/rag_with_reranking/answer.py
  function generate_prompt (line 9) | def generate_prompt(question: str, contexts: str):

FILE: mini_tutorials/visualizing_activations_with_forward_hooks/src.py
  class Network (line 8) | class Network(Module):
    method __init__ (line 9) | def __init__(self):
    method forward (line 17) | def forward(self, x):
  function activation_hook (line 33) | def activation_hook(inst, inp, out):

Download .json

Condensed preview — 118 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (350K chars).

[
  {
    "path": ".gitignore",
    "chars": 1799,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "LICENSE",
    "chars": 1066,
    "preview": "MIT License\n\nCopyright (c) 2024 Jan Krepl\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
  },
  {
    "path": "README.md",
    "chars": 7277,
    "preview": "# mildlyoverfitted\n\nCode for https://www.youtube.com/c/mildlyoverfitted.\n\n\n### Overview\n| Name                          "
  },
  {
    "path": "github_adventures/automata/model.py",
    "chars": 5380,
    "preview": "import torch\nimport torch.nn as nn\n\n\nclass CAModel(nn.Module):\n    \"\"\"Cell automata model.\n\n    Parameters\n    ---------"
  },
  {
    "path": "github_adventures/automata/train.py",
    "chars": 6307,
    "preview": "import argparse\nimport pathlib\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom PIL import Image\nfrom torch.u"
  },
  {
    "path": "github_adventures/diffaugment/README.MD",
    "chars": 161,
    "preview": "# Data\nhttps://hanlab.mit.edu/projects/data-efficient-gans/datasets/100-shot-grumpy_cat.zip\n\nJust unzip it into `data/` "
  },
  {
    "path": "github_adventures/diffaugment/script.py",
    "chars": 7867,
    "preview": "import argparse\nimport pathlib\nimport pprint\nfrom datetime import datetime\n\nimport kornia.augmentation as K\nimport torch"
  },
  {
    "path": "github_adventures/diffaugment/utils.py",
    "chars": 6034,
    "preview": "import torch.nn as nn\nfrom PIL import Image\nfrom torch.utils.data import Dataset\n\n\nclass DatasetImages(Dataset):\n    \"\"\""
  },
  {
    "path": "github_adventures/dino/data/README.md",
    "chars": 113,
    "preview": "The `Imagenette` dataset was used. You can find it here: https://github.com/fastai/imagenette (320 px version). \n"
  },
  {
    "path": "github_adventures/dino/data/imagenette_labels.json",
    "chars": 271,
    "preview": "{\"n01440764\": \"tench\", \"n02102040\": \"english_springer\", \"n02979186\": \"cassette_player\", \"n03000684\": \"chain_saw\", \"n0302"
  },
  {
    "path": "github_adventures/dino/evaluation.py",
    "chars": 2738,
    "preview": "import numpy as np\nimport torch\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.neighbors import KNeighborsClass"
  },
  {
    "path": "github_adventures/dino/train.py",
    "chars": 6811,
    "preview": "import argparse\nimport json\nimport pathlib\n\nimport timm\nimport torch\nimport torchvision.transforms as transforms\nimport "
  },
  {
    "path": "github_adventures/dino/utils.py",
    "chars": 11169,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\nfrom PIL "
  },
  {
    "path": "github_adventures/dino/visualize_attentions.ipynb",
    "chars": 6876,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1a3bd5ec\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "github_adventures/dino/visualize_augmentations.ipynb",
    "chars": 2604,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5801191a\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "github_adventures/gpt/README.md",
    "chars": 1434,
    "preview": "# GPT-2 custom implementation\n## Installation\n\n```python\npip install -r requirements.txt\n```\n\n## Launching script\nTo cop"
  },
  {
    "path": "github_adventures/gpt/copy_and_generate.py",
    "chars": 3011,
    "preview": "import argparse\nimport logging\n\nimport torch\n\nfrom model import GPT\nfrom transformers import AutoModelForCausalLM, AutoT"
  },
  {
    "path": "github_adventures/gpt/distribution_visualizations.ipynb",
    "chars": 2845,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"896ffe86\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "github_adventures/gpt/ipython_code.py",
    "chars": 1393,
    "preview": ">>> import torch\n>>> from model import GPT\n>>> from transformers import AutoModelForCausalLM\n>>> hparams_names = [\n...  "
  },
  {
    "path": "github_adventures/gpt/model.py",
    "chars": 5863,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom transformers.activations import gelu_new\n\n\nclass CustomGELU(nn.Module):\n    \"\"\""
  },
  {
    "path": "github_adventures/gpt/requirements.txt",
    "chars": 208,
    "preview": "ipython==7.30.1\nipywidgets==7.6.5\njupyter==1.0.0\nmatplotlib==3.5.1\nnumpy==1.21.5\ntorch==1.10.1\n-e git+https://github.com"
  },
  {
    "path": "github_adventures/gpt/utils.py",
    "chars": 4036,
    "preview": "import torch\n\n\ndef copy_parameter(param_official, param_ours):\n    \"\"\"Copy values of one tensor to another tensor.\n\n    "
  },
  {
    "path": "github_adventures/integer/README.md",
    "chars": 437,
    "preview": "# On-line encyclopedia of integer sequences\nYou can use the `fetch_data.py` to download the sequences. However,\nI actual"
  },
  {
    "path": "github_adventures/integer/bert.py",
    "chars": 2050,
    "preview": "import argparse\n\nimport numpy as np\nimport torch\nfrom torch.utils.tensorboard import SummaryWriter\nfrom transformers imp"
  },
  {
    "path": "github_adventures/integer/experiments.sh",
    "chars": 515,
    "preview": "set -x\n\nOUTPUT_PATH=results\nGLOVE_PATH=glove.6B.300d.txt\nSEQUENCES_PATH=raw_data.pkl\nMAX_VALUE_EVAL=500\n\npython glove.py"
  },
  {
    "path": "github_adventures/integer/fetch_data.py",
    "chars": 1363,
    "preview": "import pathlib\nimport pickle\n\nimport requests\n\nfrom joblib import Parallel, delayed, parallel_backend\n\n\ndef get_sequence"
  },
  {
    "path": "github_adventures/integer/glove.py",
    "chars": 1954,
    "preview": "import argparse\n\nimport numpy as np\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom utils import create_classifi"
  },
  {
    "path": "github_adventures/integer/lstm.py",
    "chars": 4905,
    "preview": "import argparse\nimport json\nimport pathlib\nimport pickle\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torc"
  },
  {
    "path": "github_adventures/integer/requirements.txt",
    "chars": 83,
    "preview": "joblib\nmatplotlib\nnumpy\nrequests\nscikit-learn\nsympy\ntensorboard\ntorch\ntransformers\n"
  },
  {
    "path": "github_adventures/integer/utils.py",
    "chars": 6646,
    "preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.m"
  },
  {
    "path": "github_adventures/lottery/README.md",
    "chars": 594,
    "preview": "# The Lottery Ticket Hypothesis\n## Installation\n```bash\npip install -r requirements.txt\n```\n\n## Running experiments\nThe "
  },
  {
    "path": "github_adventures/lottery/data.py",
    "chars": 1508,
    "preview": "from torch.utils.data import Dataset\nfrom torchvision.datasets import MNIST\nfrom torchvision.transforms import Compose, "
  },
  {
    "path": "github_adventures/lottery/main.py",
    "chars": 6337,
    "preview": "import argparse\n\nimport torch\nimport torch.nn as nn\nimport tqdm\nfrom torch.utils.data import DataLoader\n\nimport wandb\nfr"
  },
  {
    "path": "github_adventures/lottery/parallel_launch.sh",
    "chars": 1060,
    "preview": "# Parallel parameters\nN_JOBS=4\nARGS=\"-P$N_JOBS --header :\" # arguments for parallel\n# ARGS=\"--bar \"$ARGS\nARGS=\"--dry-run"
  },
  {
    "path": "github_adventures/lottery/requirements.txt",
    "chars": 47,
    "preview": "numpy\npillow\nsix\ntorch\ntorch-vision\ntqdm\nwandb\n"
  },
  {
    "path": "github_adventures/lottery/utils.py",
    "chars": 7002,
    "preview": "import math\n\nimport torch\nimport torch.nn as nn\nfrom torch.nn.utils.prune import l1_unstructured, random_unstructured\n\n\n"
  },
  {
    "path": "github_adventures/mixer/README.md",
    "chars": 234,
    "preview": "Note that the `official.py` is just a copy of the\ncode provided in `https://arxiv.org/abs/2105.01601` and probably here\n"
  },
  {
    "path": "github_adventures/mixer/official.py",
    "chars": 1388,
    "preview": "import einops\nimport flax.linen as nn\nimport jax.numpy as jnp\n\n\nclass MlpBlock(nn.Module):\n    mlp_dim: int\n\n    @nn.com"
  },
  {
    "path": "github_adventures/mixer/ours.py",
    "chars": 6248,
    "preview": "import einops\nimport torch.nn as nn\n\n\nclass MlpBlock(nn.Module):\n    \"\"\"Multilayer perceptron.\n\n    Parameters\n    -----"
  },
  {
    "path": "github_adventures/mixer/test_compare.py",
    "chars": 1817,
    "preview": "import jax\nimport numpy as np\nimport pytest\nimport torch\n\nfrom official import MlpMixer as OfficialMixer\nfrom ours impor"
  },
  {
    "path": "github_adventures/mixup/launch_experiments.sh",
    "chars": 605,
    "preview": "set -x\n\nN_EPOCHS=100000\nN_SAMPLES=1000\nSEED=123\nTBOARD_DIR=tb_results/$SEED\n\npython train.py -r $SEED -n $N_EPOCHS -s $N"
  },
  {
    "path": "github_adventures/mixup/train.py",
    "chars": 5500,
    "preview": "import argparse\nimport json\n\nimport numpy as np\nimport torch\nfrom sklearn.model_selection import train_test_split\nfrom t"
  },
  {
    "path": "github_adventures/mixup/utils.py",
    "chars": 6311,
    "preview": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom matplotlib.colors import List"
  },
  {
    "path": "github_adventures/ner_evaluation/README.md",
    "chars": 250,
    "preview": "* https://github.com/huggingface/evaluate/blob/af3c30561d840b83e54fc5f7150ea58046d6af69/metrics/seqeval/seqeval.py#L120\n"
  },
  {
    "path": "github_adventures/ner_evaluation/ours.py",
    "chars": 2639,
    "preview": "import re\nimport pandas as pd\nfrom sklearn.metrics import classification_report\n\n\ndef check_valid(annots: list[str]) -> "
  },
  {
    "path": "github_adventures/ner_evaluation/test_ours.py",
    "chars": 3805,
    "preview": "import pytest\nfrom seqeval.metrics import classification_report as cr\nfrom seqeval.scheme import IOB2\nfrom ours import c"
  },
  {
    "path": "github_adventures/ner_evaluation/try.py",
    "chars": 426,
    "preview": "import pprint\nimport evaluate\n\n\nmetric = evaluate.load(\"seqeval\")\n\n\n# Tom Cruise is great\nannots_true = [\"B-PERSON\", \"I-"
  },
  {
    "path": "github_adventures/neuron/README.md",
    "chars": 913,
    "preview": "# Installation\n\n```bash\npip install -r requirements.txt\n```\n\n# Running training\nTo run the same experiments as in the vi"
  },
  {
    "path": "github_adventures/neuron/evaluate_noise.py",
    "chars": 2217,
    "preview": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickl"
  },
  {
    "path": "github_adventures/neuron/evaluate_shuffling.py",
    "chars": 2642,
    "preview": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickl"
  },
  {
    "path": "github_adventures/neuron/evaluate_video.py",
    "chars": 2042,
    "preview": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickl"
  },
  {
    "path": "github_adventures/neuron/launch.sh",
    "chars": 387,
    "preview": "OUTPUT_FOLDER=log_dir\n\npython trainer.py --max-iter 1000 linear $OUTPUT_FOLDER/linear\npython trainer.py --max-iter 1000 "
  },
  {
    "path": "github_adventures/neuron/requirements.txt",
    "chars": 84,
    "preview": "cma\ngym\ngym-cartpole-swingup\nmatplotlib\nnumpy\npandas\nseaborn\ntensorboard\ntorch\ntqdm\n"
  },
  {
    "path": "github_adventures/neuron/solutions.py",
    "chars": 5152,
    "preview": "import abc\n\nimport numpy as np\nimport torch\n\nfrom torch_utils import PermutationInvariantNetwork, MLP\n\n\nclass Solution(a"
  },
  {
    "path": "github_adventures/neuron/tasks.py",
    "chars": 4117,
    "preview": "import gym\nimport gym_cartpole_swingup  # noqa has a sideffect\nimport numpy as np\n\nN_ORIGINAL_FEATURES = 5\n\n\nclass Incom"
  },
  {
    "path": "github_adventures/neuron/torch_utils.py",
    "chars": 9111,
    "preview": "import numpy as np\nimport torch\nimport torch.nn as nn\n\n\nclass MLP(nn.Module):\n    \"\"\"Multilayer perceptron policy networ"
  },
  {
    "path": "github_adventures/neuron/trainer.py",
    "chars": 6501,
    "preview": "import argparse\nimport json\nimport multiprocessing as mp\nimport pathlib\nimport pickle\nfrom functools import partial\n\nimp"
  },
  {
    "path": "github_adventures/pondernet/experiment_1.sh",
    "chars": 311,
    "preview": "set -x \nSEED=$RANDOM\nLAMBDAS=(0.1 0.3 0.5 0.7 0.9)\n\nfor lambda in ${LAMBDAS[@]}\ndo\n\tpython train.py \\\n\t\t--batch-size 128"
  },
  {
    "path": "github_adventures/pondernet/experiment_2.sh",
    "chars": 237,
    "preview": "set -x \nSEED=$RANDOM\n\npython train.py \\\n\t--batch-size 128 \\\n\t--beta 0.01 \\\n\t--eval-frequency 4000 \\\n\t--device cuda \\\n\t--"
  },
  {
    "path": "github_adventures/pondernet/requirements.txt",
    "chars": 40,
    "preview": "matplotlib\nnumpy\ntensorboard\ntorch\ntqdm\n"
  },
  {
    "path": "github_adventures/pondernet/train.py",
    "chars": 10515,
    "preview": "from argparse import ArgumentParser\nimport json\nimport pathlib\n\nimport matplotlib.pyplot as plt\nimport torch\nimport torc"
  },
  {
    "path": "github_adventures/pondernet/utils.py",
    "chars": 7889,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch.utils.data import Dataset\n\n\nclass ParityDataset(Dataset):\n    \"\"\"Parity of"
  },
  {
    "path": "github_adventures/product_quantization/README.md",
    "chars": 788,
    "preview": "# Installation\n\nRun the following to get all the dependencies.\n```\npip install -r requirements.txt\n```\n\n# Faiss 101\nThe "
  },
  {
    "path": "github_adventures/product_quantization/convert.py",
    "chars": 1724,
    "preview": "import argparse\nimport logging\nimport pathlib\nimport pickle\n\nimport faiss\n\nfrom custom import CustomIndexPQ\n\nlogger = lo"
  },
  {
    "path": "github_adventures/product_quantization/custom.py",
    "chars": 5010,
    "preview": "from __future__ import annotations\n\nimport logging\n\nimport numpy as np\nfrom sklearn.cluster import KMeans\nfrom sklearn.m"
  },
  {
    "path": "github_adventures/product_quantization/faiss_101_ipython.py",
    "chars": 825,
    "preview": "import numpy as np\nimport faiss\n\n# Load fast text embeddings\nembs = np.load(\"parsed_fasttext/embs.npy\")  # change path i"
  },
  {
    "path": "github_adventures/product_quantization/generate_index.py",
    "chars": 1885,
    "preview": "from __future__ import annotations\n\nimport argparse\nimport logging\nimport pathlib\nimport pickle\n\nimport faiss\nimport num"
  },
  {
    "path": "github_adventures/product_quantization/parse.py",
    "chars": 1634,
    "preview": "from __future__ import annotations\n\nimport argparse\nimport io\nimport logging\nimport pathlib\nimport tqdm\n\nimport numpy as"
  },
  {
    "path": "github_adventures/product_quantization/requirements.txt",
    "chars": 80,
    "preview": "faiss-cpu==1.7.2\ngradio==3.0.17\nnumpy==1.22.4\npandas==1.4.2\nscikit-learn==1.1.1\n"
  },
  {
    "path": "github_adventures/product_quantization/run_all.sh",
    "chars": 2096,
    "preview": "set -ex\n\n# Parameters\nURL=https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz\nRAW_FASTTEXT=raw_fastte"
  },
  {
    "path": "github_adventures/product_quantization/run_gradio.py",
    "chars": 3196,
    "preview": "from __future__ import annotations\n\nimport argparse\nimport logging\nimport pathlib\nimport pickle\nimport time\nfrom functoo"
  },
  {
    "path": "github_adventures/siren/activations.py",
    "chars": 1086,
    "preview": "import pathlib\nfrom functools import partial\n\nimport torch\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom core "
  },
  {
    "path": "github_adventures/siren/core.py",
    "chars": 9061,
    "preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom scipy.ndimage import laplace, sobel\nfrom torch.utils.data imp"
  },
  {
    "path": "github_adventures/siren/train.py",
    "chars": 4175,
    "preview": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nfrom torch.nn import Linear, ReLU, Sequential\nfrom torch"
  },
  {
    "path": "github_adventures/vision_transformer/classes.txt",
    "chars": 21675,
    "preview": "tench, Tinca_tinca\ngoldfish, Carassius_auratus\ngreat_white_shark, white_shark, man-eater, man-eating_shark, Carcharodon_"
  },
  {
    "path": "github_adventures/vision_transformer/custom.py",
    "chars": 11100,
    "preview": "import torch\nimport torch.nn as nn\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\"Split image into patches and then embed them.\n\n"
  },
  {
    "path": "github_adventures/vision_transformer/forward.py",
    "chars": 610,
    "preview": "import numpy as np\nfrom PIL import Image\nimport torch\n\nk = 10\n\nimagenet_labels = dict(enumerate(open(\"classes.txt\")))\n\nm"
  },
  {
    "path": "github_adventures/vision_transformer/verify.py",
    "chars": 1272,
    "preview": "import numpy as np\nimport timm\nimport torch\nfrom custom import VisionTransformer\n\n# Helpers\ndef get_n_params(module):\n  "
  },
  {
    "path": "mini_tutorials/bentoml/README.md",
    "chars": 2569,
    "preview": "1. [Resources](#resources)\n2. [Installation](#installation)\n3. [Instructions](#instructions)\n    1. [`bentoml`](#bentoml"
  },
  {
    "path": "mini_tutorials/bentoml/bentofile.yaml",
    "chars": 123,
    "preview": "service: \"service:svc\"\ninclude:\n- \"service.py\"\npython:\n  packages:\n  - pydantic\n  - scikit-learn\nmodels:\n- iris_clf:late"
  },
  {
    "path": "mini_tutorials/bentoml/create_model.py",
    "chars": 250,
    "preview": "import bentoml\n\nfrom sklearn import datasets\nfrom sklearn import svm\n\niris = datasets.load_iris()\nX, y = iris.data, iris"
  },
  {
    "path": "mini_tutorials/bentoml/requirements.txt",
    "chars": 51,
    "preview": "bentoctl\nbentoml\nboto3\nnumpy\npydantic\nscikit-learn\n"
  },
  {
    "path": "mini_tutorials/bentoml/service.py",
    "chars": 877,
    "preview": "from typing import Literal\n\nimport bentoml\n\nfrom pydantic import BaseModel\nfrom bentoml.io import JSON\n\n\niris_clf_runner"
  },
  {
    "path": "mini_tutorials/custom_optimizer_in_pytorch/custom.py",
    "chars": 1129,
    "preview": "import numpy as np\nimport torch\nfrom torch.optim import Optimizer\n\nclass WeirdDescent(Optimizer):\n    \"\"\"Take a coordina"
  },
  {
    "path": "mini_tutorials/custom_optimizer_in_pytorch/src.py",
    "chars": 4240,
    "preview": "from matplotlib.animation import FuncAnimation\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nfrom torc"
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/Dockerfile",
    "chars": 418,
    "preview": "FROM huggingface/transformers-pytorch-gpu\n\nRUN python3 -c \"from transformers import AutoModel;AutoModel.from_pretrained("
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/DockerfileConda",
    "chars": 602,
    "preview": "FROM continuumio/miniconda3\n\nRUN conda install -c conda-forge pytorch-cpu\nRUN conda install -c conda-forge fastapi\nRUN c"
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/README.md",
    "chars": 1146,
    "preview": "# Relevant commands\n\n## Creating an API\n```bash\ntransformers-cli serve --task=fill-mask --model=bert-base-uncased\n```\n\n`"
  },
  {
    "path": "mini_tutorials/embedding/README.md",
    "chars": 128,
    "preview": "# Training data\nThe Dracula book can be found here: https://archive.org/stream/draculabr00stokuoft/draculabr00stokuoft_d"
  },
  {
    "path": "mini_tutorials/embedding/Visualize.ipynb",
    "chars": 2058,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"incredible-backup\",\n   \"metadata\": {},\n "
  },
  {
    "path": "mini_tutorials/embedding/src.py",
    "chars": 8190,
    "preview": "from collections import Counter, defaultdict\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom torch.nn import E"
  },
  {
    "path": "mini_tutorials/fewshot_text_classification/classify.py",
    "chars": 830,
    "preview": "import pathlib\n\nimport jinja2\nimport openai\n\n\npath = pathlib.Path(\"template.jinja2\")\n\nwith path.open() as f:\n    prompt_"
  },
  {
    "path": "mini_tutorials/fewshot_text_classification/template.jinja2",
    "chars": 427,
    "preview": "I want you to classify text for me.\nSee below all the possible labels and their description\n{% for item in labels %}\n\"\"\""
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/explain.py",
    "chars": 2541,
    "preview": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torchvision.models as models\n\nfrom utils import c"
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/fool.py",
    "chars": 2773,
    "preview": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torchvision.models as models\n\nfrom utils import c"
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/utils.py",
    "chars": 2775,
    "preview": "from PIL import Image\nimport torch\nfrom torchvision.transforms import (CenterCrop, Compose, Normalize, Resize,\n         "
  },
  {
    "path": "mini_tutorials/haiku_basics/buffers_in_torch.py",
    "chars": 205,
    "preview": "import torch\nbn = torch.nn.BatchNorm1d(5)\nbn.state_dict()\n\nfor name, p in bn.named_buffers():\n    print(name, p, p.requi"
  },
  {
    "path": "mini_tutorials/haiku_basics/parameter.py",
    "chars": 612,
    "preview": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\n\ndef foo(x: jnp.ndarray) -> j"
  },
  {
    "path": "mini_tutorials/haiku_basics/reallife.py",
    "chars": 450,
    "preview": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\ndef foo(x: jnp.ndarray) -> jn"
  },
  {
    "path": "mini_tutorials/haiku_basics/requirements.txt",
    "chars": 129,
    "preview": "-e git+ssh://git@github.com/deepmind/dm-haiku.git@386efc098fd52a5cf728e7d13442138ab25eb235#egg=dm_haiku\njax==0.3.5\njaxli"
  },
  {
    "path": "mini_tutorials/haiku_basics/state.py",
    "chars": 694,
    "preview": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\n\ndef foo(x: jnp.ndarray) -> j"
  },
  {
    "path": "mini_tutorials/httpx_rate_limiting/script.py",
    "chars": 949,
    "preview": "import asyncio\nimport logging\n\nimport httpx\n\nlogger = logging.getLogger()\nlogging.getLogger(\"httpx\").setLevel(logging.WA"
  },
  {
    "path": "mini_tutorials/mocking_neural_networks/app.py",
    "chars": 1578,
    "preview": "import logging\nimport sys\n\nimport numpy as np\nimport torch\nfrom transformers import AutoModelForMaskedLM, AutoTokenizer\n"
  },
  {
    "path": "mini_tutorials/mocking_neural_networks/test.py",
    "chars": 1250,
    "preview": "from unittest.mock import Mock\n\nimport pytest\nimport torch\nfrom transformers import (AutoTokenizer, AutoModelForMaskedLM"
  },
  {
    "path": "mini_tutorials/numpy_equality_testing/test.py",
    "chars": 2236,
    "preview": "import numpy as np\nimport pytest\n\ndef get_arrays():\n    \"\"\"Create 4 arrays that are all similar but different.\n\n    Retu"
  },
  {
    "path": "mini_tutorials/openai_function_calling/example.py",
    "chars": 3062,
    "preview": "import json\nimport logging\nimport operator\nimport sys\nimport datetime\nimport openai\nimport yfinance as yf\n\nTODAY = datet"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/README.md",
    "chars": 1721,
    "preview": "# Description\n## Installation\n\nRun the following command to deploy a simple OpenSearch DB locally.\n \n```bash\ndocker run "
  },
  {
    "path": "mini_tutorials/rag_with_reranking/answer.py",
    "chars": 1740,
    "preview": "import os\nimport sys\n\n\nimport cohere\nfrom opensearchpy import OpenSearch\n\n# Helper\ndef generate_prompt(question: str, co"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/input.txt",
    "chars": 618,
    "preview": "# AGE AND FAVOURITE FOOD - 'What is the favourite food of Charles?', 'Who prefers vegetables the most?'\nAdam is older th"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/postman_collection.json",
    "chars": 6586,
    "preview": "{\n\t\"info\": {\n\t\t\"name\": \"Retrieval augmented generation\",\n\t\t\"schema\": \"https://schema.getpostman.com/json/collection/v2.1"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/upload_data.py",
    "chars": 537,
    "preview": "from pathlib import Path\nfrom opensearchpy import OpenSearch\n\nINPUT_FILE = \"input.txt\"\nINDEX_NAME = \"cool_index\"\nFIELD_N"
  },
  {
    "path": "mini_tutorials/visualizing_activations_with_forward_hooks/src.py",
    "chars": 1292,
    "preview": "import pathlib\n\nimport torch\nimport torch.nn.functional as F\nfrom torch.nn import Linear, Module\nfrom torch.utils.tensor"
  }
]

// ... and 6 more files (download for full content)

About this extraction

This page contains the full source code of the jankrepl/mildlyoverfitted GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 118 files (314.6 KB), approximately 85.5k tokens, and a symbol index with 274 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo