[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\npip-wheel-metadata/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n.python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2024 Jan Krepl\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# mildlyoverfitted\n\nCode for https://www.youtube.com/c/mildlyoverfitted.\n\n\n### Overview\n| Name                                                                           | Video                                | Code                                                                                                                       |\n|--------------------------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------------------------------------------------------------|\n| Asynchronous requests and rate limiting                                        | [link](https://youtu.be/luWsr9exlE4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/httpx_rate_limiting)                |\n| BentoML Sagemaker deployment                                                   | [link](https://youtu.be/Zci_D4az9FU) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml)                |\n| Custom optimizer in PyTorch                                                    | [link](https://youtu.be/zvp8K4iX2Cs) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/custom_optimizer_in_pytorch)                |\n| Deploying machine learning models on Kubernetes                                | [link](https://youtu.be/DQRNt8Diyw4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/deploying_on_kubernetes)                             |\n| Differentiable augmentation for GANs (using Kornia)                            | [link](https://youtu.be/J97EM3Clyys) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/diffaugment)                             |\n| DINO in PyTorch                                                                | [link](https://youtu.be/psmMEWKk4Uk) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/dino)                                    |\n| Few-shot text classification with prompts                                      | [link](https://youtu.be/AhqgDXcBU2M) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification)                                    |\n| GPT in PyTorch                                                                 | [link](https://youtu.be/d7IRM40VMYM) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/gpt)                                    |\n| Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients) | [link](https://youtu.be/5lFiZTSsp40) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/gradient_wrt_input)                         |\n| Growing neural cellular automata in PyTorch                                    | [link](https://youtu.be/21ACbWoF2Oo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/automata)                                |\n| Haiku basics                                                                   | [link](https://youtu.be/yXCKS-ZoYTY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics)                            |\n| Integer embeddings in PyTorch                                                  | [link](https://youtu.be/bybuSBVzOdg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/integer)                                 |\n| Mixup in PyTorch                                                               | [link](https://youtu.be/hGAKHKqmXdY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/mixup)                                   |\n| MLP-Mixer in Flax and PyTorch                                                  | [link](https://youtu.be/HqytB2GUbHA) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/mixer)                                   |\n| Mocking neural networks: unit testing in deep learning                         | [link](https://youtu.be/_KVV9jXSzvo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/mocking_neural_networks)                    |\n| NER model evaluation                                                           | [link](https://youtu.be/70YAUYP3hrw) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation)                                    |\n| NumPy equality testing                                                         | [link](https://youtu.be/sai1g5fjyb8) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/numpy_equality_testing)                     |\n| OpenAI function calling                                                        | [link](https://youtu.be/_B7F_6nTVEg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/openai_function_calling)                     |\n| PonderNet in PyTorch                                                           | [link](https://youtu.be/JLFz1dU5HR4) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/pondernet)                               |\n| Product quantization in Faiss and from scratch                                 | [link](https://youtu.be/PNVJvZEkuXo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/product_quantization)                               |\n| Retrieval augmented generation with OpenSearch and reranking                   | [link](https://youtu.be/OsE7YcDcPz0) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/rag_with_reranking)                               |\n| SIREN in PyTorch                                                               | [link](https://youtu.be/s4iFEoNlYhM) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/siren)                                   |\n| The Lottery Ticket Hypothesis and pruning in PyTorch                           | [link](https://youtu.be/bQt0CLXXAqg) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/lottery)                                  |\n| The Sensory Neuron as a Transformer in PyTorch                                 | [link](https://youtu.be/mi_mzlhBGAU) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/neuron)                                  |\n| `torch.nn.Embedding` explained (+ Character-level language model)              | [link](https://youtu.be/euwN5DHfLEo) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/embedding)                                  |\n| Vision Transformer in PyTorch                                                  | [link](https://youtu.be/ovB0ddFtzzA) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/vision_transformer)                      |\n| Visualizing activations with forward hooks (PyTorch)                           | [link](https://youtu.be/1ZbLA7ofasY) | [link](https://github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/visualizing_activations_with_forward_hooks) |\n"
  },
  {
    "path": "github_adventures/automata/model.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass CAModel(nn.Module):\n    \"\"\"Cell automata model.\n\n    Parameters\n    ----------\n    n_channels : int\n        Number of channels of the grid.\n\n    hidden_channels : int\n        Hidden channels that are related to the pixelwise 1x1 convolution.\n\n    fire_rate : float\n        Number between 0 and 1. The lower it is the more likely it is for\n        cells to be set to zero during the `stochastic_update` process.\n\n    device : torch.device\n        Determines on what device we perfrom all the computations.\n\n    Attributes\n    ----------\n    update_module : nn.Sequential\n        The only part of the network containing trainable parameters. Composed\n        of 1x1 convolution, ReLu and 1x1 convolution.\n\n    filters : torch.Tensor\n        Constant tensor of shape `(3 * n_channels, 1, 3, 3)`.\n    \"\"\"\n    def __init__(self, n_channels=16, hidden_channels=128, fire_rate=0.5, device=None):\n        super().__init__()\n\n\n        self.fire_rate = 0.5\n        self.n_channels = n_channels\n        self.device = device or torch.device(\"cpu\")\n\n        # Perceive step\n        sobel_filter_ = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])\n        scalar = 8.0\n\n        sobel_filter_x = sobel_filter_ / scalar\n        sobel_filter_y = sobel_filter_.t() / scalar\n        identity_filter = torch.tensor(\n                [\n                    [0, 0, 0],\n                    [0, 1, 0],\n                    [0, 0, 0],\n                ],\n                dtype=torch.float32,\n        )\n        filters = torch.stack(\n                [identity_filter, sobel_filter_x, sobel_filter_y]\n        )  # (3, 3, 3)\n        filters = filters.repeat((n_channels, 1, 1))  # (3 * n_channels, 3, 3)\n        self.filters = filters[:, None, ...].to(\n                self.device\n        )  # (3 * n_channels, 1, 3, 3)\n\n        # Update step\n        self.update_module = nn.Sequential(\n                nn.Conv2d(\n                    3 * n_channels,\n                    hidden_channels,\n                    kernel_size=1,  # (1, 1)\n                ),\n                nn.ReLU(),\n                nn.Conv2d(\n                    hidden_channels,\n                    n_channels,\n                    kernel_size=1,\n                    bias=False,\n                ),\n        )\n\n        with torch.no_grad():\n            self.update_module[2].weight.zero_()\n\n        self.to(self.device)\n\n    def perceive(self, x):\n        \"\"\"Approximate channelwise gradient and combine with the input.\n\n        This is the only place where we include information on the\n        neighboring cells. However, we are not using any learnable\n        parameters here.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, 3 * n_channels, grid_size, grid_size)`.\n        \"\"\"\n        return nn.functional.conv2d(x, self.filters, padding=1, groups=self.n_channels)\n\n    def update(self, x):\n        \"\"\"Perform update.\n\n        Note that this is the only part of the forward pass that uses\n        trainable parameters\n\n        Paramters\n        ---------\n        x : torch.Tensor\n            Shape `(n_samples, 3 * n_channels, grid_size, grid_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n        \"\"\"\n        return self.update_module(x)\n\n    @staticmethod\n    def stochastic_update(x, fire_rate):\n        \"\"\"Run pixel-wise dropout.\n\n        Unlike dropout there is no scaling taking place.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n\n        fire_rate : float\n            Number between 0 and 1. The higher the more likely a given cell\n            updates.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n        \"\"\"\n        device = x.device\n\n        mask = (torch.rand(x[:, :1, :, :].shape) <= fire_rate).to(device, torch.float32)\n        return x * mask  # broadcasted over all channels\n\n    @staticmethod\n    def get_living_mask(x):\n        \"\"\"Identify living cells.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, 1, grid_size, grid_size)` and the\n            dtype is bool.\n        \"\"\"\n        return (\n            nn.functional.max_pool2d(\n                x[:, 3:4, :, :], kernel_size=3, stride=1, padding=1\n            )\n            > 0.1\n        )\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_channels, grid_size, grid_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_sample, n_channels, grid_size, grid_size)`.\n        \"\"\"\n        pre_life_mask = self.get_living_mask(x)\n\n        y = self.perceive(x)\n        dx = self.update(y)\n        dx = self.stochastic_update(dx, fire_rate=self.fire_rate)\n\n        x = x + dx\n\n        post_life_mask = self.get_living_mask(x)\n        life_mask = (pre_life_mask & post_life_mask).to(torch.float32)\n\n        return x * life_mask\n"
  },
  {
    "path": "github_adventures/automata/train.py",
    "content": "import argparse\nimport pathlib\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom PIL import Image\nfrom torch.utils.tensorboard import SummaryWriter\nfrom tqdm import tqdm\n\nfrom model import CAModel\n\n\ndef load_image(path, size=40):\n    \"\"\"Load an image.\n\n    Parameters\n    ----------\n    path : pathlib.Path\n        Path to where the image is located. Note that the image needs to be\n        RGBA.\n\n    size : int\n        The image will be resized to a square wit ha side length of `size`.\n\n    Returns\n    -------\n    torch.Tensor\n        4D float image of shape `(1, 4, size, size)`. The RGB channels\n        are premultiplied by the alpha channel.\n    \"\"\"\n    img = Image.open(path)\n    img = img.resize((size, size), Image.ANTIALIAS)\n    img = np.float32(img) / 255.0\n    img[..., :3] *= img[..., 3:]\n\n    return torch.from_numpy(img).permute(2, 0, 1)[None, ...]\n\n\ndef to_rgb(img_rgba):\n    \"\"\"Convert RGBA image to RGB image.\n\n    Parameters\n    ----------\n    img_rgba : torch.Tensor\n        4D tensor of shape `(1, 4, size, size)` where the RGB channels\n        were already multiplied by the alpha.\n\n    Returns\n    -------\n    img_rgb : torch.Tensor\n        4D tensor of shape `(1, 3, size, size)`.\n    \"\"\"\n    rgb, a = img_rgba[:, :3, ...], torch.clamp(img_rgba[:, 3:, ...], 0, 1)\n    return torch.clamp(1.0 - a + rgb, 0, 1)\n\n\ndef make_seed(size, n_channels):\n    \"\"\"Create a starting tensor for training.\n\n    The only active pixels are going to be in the middle.\n\n    Parameters\n    ----------\n    size : int\n        The height and the width of the tensor.\n\n    n_channels : int\n        Overall number of channels. Note that it needs to be higher than 4\n        since the first 4 channels represent RGBA.\n\n    Returns\n    -------\n    torch.Tensor\n        4D float tensor of shape `(1, n_chanels, size, size)`.\n    \"\"\"\n    x = torch.zeros((1, n_channels, size, size), dtype=torch.float32)\n    x[:, 3:, size // 2, size // 2] = 1\n    return x\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\n            description=\"Training script for the Celluar Automata\"\n    )\n    parser.add_argument(\"img\", type=str, help=\"Path to the image we want to reproduce\")\n\n    parser.add_argument(\n            \"-b\",\n            \"--batch-size\",\n            type=int,\n            default=8,\n            help=\"Batch size. Samples will always be taken randomly from the pool.\"\n    )\n    parser.add_argument(\n            \"-d\",\n            \"--device\",\n            type=str,\n            default=\"cpu\",\n            help=\"Device to use\",\n            choices=(\"cpu\", \"cuda\"),\n    )\n    parser.add_argument(\n            \"-e\",\n            \"--eval-frequency\",\n            type=int,\n            default=500,\n            help=\"Evaluation frequency.\",\n    )\n    parser.add_argument(\n            \"-i\",\n            \"--eval-iterations\",\n            type=int,\n            default=300,\n            help=\"Number of iterations when evaluating.\",\n    )\n    parser.add_argument(\n            \"-n\",\n            \"--n-batches\",\n            type=int,\n            default=5000,\n            help=\"Number of batches to train for.\",\n    )\n    parser.add_argument(\n            \"-c\",\n            \"--n-channels\",\n            type=int,\n            default=16,\n            help=\"Number of channels of the input tensor\",\n    )\n    parser.add_argument(\n            \"-l\",\n            \"--logdir\",\n            type=str,\n            default=\"logs\",\n            help=\"Folder where all the logs and outputs are saved.\",\n    )\n    parser.add_argument(\n            \"-p\",\n            \"--padding\",\n            type=int,\n            default=16,\n            help=\"Padding. The shape after padding is (h + 2 * p, w + 2 * p).\",\n    )\n    parser.add_argument(\n            \"--pool-size\",\n            type=int,\n            default=1024,\n            help=\"Size of the training pool\",\n    )\n    parser.add_argument(\n            \"-s\",\n            \"--size\",\n            type=int,\n            default=40,\n            help=\"Image size\",\n    )\n    # Parse arguments\n    args = parser.parse_args()\n    print(vars(args))\n\n    # Misc\n    device = torch.device(args.device)\n\n    log_path = pathlib.Path(args.logdir)\n    log_path.mkdir(parents=True, exist_ok=True)\n    writer = SummaryWriter(log_path)\n\n    # Target image\n    target_img_ = load_image(args.img, size=args.size)\n    p = args.padding\n    target_img_ = nn.functional.pad(target_img_, (p, p, p, p), \"constant\", 0)\n    target_img = target_img_.to(device)\n    target_img = target_img.repeat(args.batch_size, 1, 1, 1)\n\n    writer.add_image(\"ground truth\", to_rgb(target_img_)[0])\n\n    # Model and optimizer\n    model = CAModel(n_channels=args.n_channels, device=device)\n    optimizer = torch.optim.Adam(model.parameters(), lr=2e-3)\n\n    # Pool initialization\n    seed = make_seed(args.size, args.n_channels).to(device)\n    seed = nn.functional.pad(seed, (p, p, p, p), \"constant\", 0)\n    pool = seed.clone().repeat(args.pool_size, 1, 1, 1)\n\n    for it in tqdm(range(args.n_batches)):\n        batch_ixs = np.random.choice(\n                args.pool_size, args.batch_size, replace=False\n        ).tolist()\n\n        x = pool[batch_ixs]\n        for i in range(np.random.randint(64, 96)):\n            x = model(x)\n\n        loss_batch = ((target_img - x[:, :4, ...]) ** 2).mean(dim=[1, 2, 3])\n        loss = loss_batch.mean()\n\n        optimizer.zero_grad()\n        loss.backward()\n        optimizer.step()\n        writer.add_scalar(\"train/loss\", loss, it)\n\n        argmax_batch = loss_batch.argmax().item()\n        argmax_pool = batch_ixs[argmax_batch]\n        remaining_batch = [i for i in range(args.batch_size) if i != argmax_batch]\n        remaining_pool = [i for i in batch_ixs if i != argmax_pool]\n\n        pool[argmax_pool] = seed.clone()\n        pool[remaining_pool] = x[remaining_batch].detach()\n\n        if it % args.eval_frequency == 0:\n            x_eval = seed.clone()  # (1, n_channels, size, size)\n\n            eval_video = torch.empty(1, args.eval_iterations, 3, *x_eval.shape[2:])\n\n            for it_eval in range(args.eval_iterations):\n                x_eval = model(x_eval)\n                x_eval_out = to_rgb(x_eval[:, :4].detach().cpu())\n                eval_video[0, it_eval] = x_eval_out\n\n            writer.add_video(\"eval\", eval_video, it, fps=60)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/diffaugment/README.MD",
    "content": "# Data\nhttps://hanlab.mit.edu/projects/data-efficient-gans/datasets/100-shot-grumpy_cat.zip\n\nJust unzip it into `data/` and the code should work out of the box.\n"
  },
  {
    "path": "github_adventures/diffaugment/script.py",
    "content": "import argparse\nimport pathlib\nimport pprint\nfrom datetime import datetime\n\nimport kornia.augmentation as K\nimport torch\nimport torchvision.transforms as transforms\nfrom torch.utils.data import DataLoader\nfrom torch.utils.tensorboard import SummaryWriter\nfrom torchvision.utils import make_grid\nfrom tqdm import tqdm\n\nfrom utils import DatasetImages, Discriminator, Generator, init_weights_\n\n\ndef main(argv=None):\n    # CLI\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"name\", help=\"Name of the experiment\")\n    parser.add_argument(\n        \"-a\",\n        \"--augment\",\n        action=\"store_true\",\n        help=\"If True, we apply augmentations\",\n    )\n    parser.add_argument(\n        \"-b\", \"--batch-size\", type=int, default=16, help=\"Batch size\"\n    )\n    parser.add_argument(\n        \"--b1\",\n        type=float,\n        default=0.5,\n        help=\"Adam optimizer hyperparamter\",\n    )\n    parser.add_argument(\n        \"--b2\",\n        type=float,\n        default=0.999,\n        help=\"Adam optimizer hyperparamter\",\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--device\",\n        type=str,\n        default=\"cpu\",\n        choices=[\"cpu\", \"cuda\"],\n        help=\"Device to use\",\n    )\n    parser.add_argument(\n        \"--eval-frequency\",\n        type=int,\n        default=400,\n        help=\"Generate generator images every `eval_frequency` epochs\",\n    )\n    parser.add_argument(\n        \"--latent-dim\",\n        type=int,\n        default=100,\n        help=\"Dimensionality of the random noise\",\n    )\n    parser.add_argument(\n        \"--lr\", type=float, default=0.0002, help=\"Learning rate\"\n    )\n    parser.add_argument(\n        \"--ndf\",\n        type=int,\n        default=32,\n        help=\"Number of discriminator feature maps (after first convolution)\",\n    )\n    parser.add_argument(\n        \"--ngf\",\n        type=int,\n        default=32,\n        help=\"Number of generator feature maps (before last transposed convolution)\",\n    )\n    parser.add_argument(\n        \"-n\",\n        \"--n-epochs\",\n        type=int,\n        default=200,\n        help=\"Number of training epochs\",\n    )\n    parser.add_argument(\n        \"--mosaic-size\",\n        type=int,\n        default=10,\n        help=\"Size of the side of the rectangular mosaic\",\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--prob\",\n        type=float,\n        default=0.9,\n        help=\"Probability of applying an augmentation\",\n    )\n\n    args = parser.parse_args(argv)\n    args_d = vars(args)\n    print(args)\n\n    img_size = 128\n\n    # Additional parameters\n    device = torch.device(args.device)\n    mosaic_kwargs = {\"nrow\": args.mosaic_size, \"normalize\": True}\n    n_mosaic_cells = args.mosaic_size * args.mosaic_size\n    sample_showcase_ix = (\n        0  # this one will be used to demonstrate the augmentations\n    )\n\n    augment_module = torch.nn.Sequential(\n        K.RandomAffine(degrees=0, translate=(1 / 8, 1 / 8), p=args.prob),\n        K.RandomErasing((0.0, 0.5), p=args.prob),\n    )\n\n    # Loss function\n    adversarial_loss = torch.nn.BCELoss()\n\n    # Initialize generator and discriminator\n    generator = Generator(latent_dim=args.latent_dim, ngf=args.ngf)\n    discriminator = Discriminator(\n        ndf=args.ndf, augment_module=augment_module if args.augment else None\n    )\n\n    generator.to(device)\n    discriminator.to(device)\n\n    # Initialize weights\n    generator.apply(init_weights_)\n    discriminator.apply(init_weights_)\n\n    # Configure data loader\n    data_path = pathlib.Path(\"data\")\n    tform = transforms.Compose(\n        [\n            transforms.Resize(img_size),\n            transforms.ToTensor(),\n            transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),\n        ]\n    )\n    dataset = DatasetImages(\n        data_path,\n        transform=tform,\n    )\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.batch_size,\n        shuffle=True,\n    )\n\n    # Optimizers\n    optimizer_G = torch.optim.Adam(\n        generator.parameters(), lr=args.lr, betas=(args.b1, args.b2)\n    )\n    optimizer_D = torch.optim.Adam(\n        discriminator.parameters(), lr=args.lr, betas=(args.b1, args.b2)\n    )\n\n    # Output path and metadata\n    output_path = pathlib.Path(\"outputs\") / args.name\n    output_path.mkdir(exist_ok=True, parents=True)\n\n    # Add other parameters (not included in CLI)\n    args_d[\"time\"] = datetime.now()\n    args_d[\"kornia\"] = str(augment_module)\n\n    # Prepare tensorboard writer\n    writer = SummaryWriter(output_path)\n\n    # Log hyperparameters as text\n    writer.add_text(\n        \"hyperparameter\",\n        pprint.pformat(args_d).replace(\n            \"\\n\", \"  \\n\"\n        ),  # markdown needs 2 spaces before newline\n        0,\n    )\n    # Log true data\n    writer.add_image(\n        \"true_data\",\n        make_grid(\n            torch.stack([dataset[i] for i in range(n_mosaic_cells)]),\n            **mosaic_kwargs\n        ),\n        0,\n    )\n    # Log augmented data\n    batch_showcase = dataset[sample_showcase_ix][None, ...].repeat(\n        n_mosaic_cells, 1, 1, 1\n    )\n    batch_showcase_aug = discriminator.augment_module(batch_showcase)\n    writer.add_image(\n        \"augmentations\", make_grid(batch_showcase_aug, **mosaic_kwargs), 0\n    )\n\n    # Prepate evaluation noise\n    z_eval = torch.randn(n_mosaic_cells, args.latent_dim).to(device)\n\n    for epoch in tqdm(range(args.n_epochs)):\n        for i, imgs in enumerate(dataloader):\n            n_samples, *_ = imgs.shape\n            batches_done = epoch * len(dataloader) + i\n\n            # Adversarial ground truths\n            valid = 0.9 * torch.ones(\n                n_samples, 1, device=device, dtype=torch.float32\n            )\n            fake = torch.zeros(n_samples, 1, device=device, dtype=torch.float32)\n\n            # D preparation\n            optimizer_D.zero_grad()\n\n            # D loss on reals\n            real_imgs = imgs.to(device)\n            d_x = discriminator(real_imgs)\n            real_loss = adversarial_loss(d_x, valid)\n            real_loss.backward()\n\n            # D loss on fakes\n            z = torch.randn(n_samples, args.latent_dim).to(device)\n            gen_imgs = generator(z)\n            d_g_z1 = discriminator(gen_imgs.detach())\n\n            fake_loss = adversarial_loss(d_g_z1, fake)\n            fake_loss.backward()\n\n            optimizer_D.step()  # we called backward twice, the result is a sum\n\n            # G preparation\n            optimizer_G.zero_grad()\n\n            # G loss\n            d_g_z2 = discriminator(gen_imgs)\n            g_loss = adversarial_loss(d_g_z2, valid)\n\n            g_loss.backward()\n            optimizer_G.step()\n\n            # Logging\n            if batches_done % 50 == 0:\n                writer.add_scalar(\"d_x\", d_x.mean().item(), batches_done)\n                writer.add_scalar(\"d_g_z1\", d_g_z1.mean().item(), batches_done)\n                writer.add_scalar(\"d_g_z2\", d_g_z2.mean().item(), batches_done)\n                writer.add_scalar(\n                    \"D_loss\", (real_loss + fake_loss).item(), batches_done\n                )\n                writer.add_scalar(\"G_loss\", g_loss.item(), batches_done)\n\n            if epoch % args.eval_frequency == 0 and i == 0:\n                generator.eval()\n                discriminator.eval()\n\n                # Generate fake images\n                gen_imgs_eval = generator(z_eval)\n\n                # Generate nice mosaic\n                writer.add_image(\n                    \"fake\",\n                    make_grid(gen_imgs_eval.data, **mosaic_kwargs),\n                    batches_done,\n                )\n\n                # Save checkpoint (and potentially overwrite an existing one)\n                torch.save(generator, output_path / \"model.pt\")\n\n                # Make sure generator and discriminator in the training mode\n                generator.train()\n                discriminator.train()\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/diffaugment/utils.py",
    "content": "import torch.nn as nn\nfrom PIL import Image\nfrom torch.utils.data import Dataset\n\n\nclass DatasetImages(Dataset):\n    \"\"\"Dataset loading photos on the hard drive.\n\n    Parameters\n    ----------\n    path : pathlib.Path\n        Path to the folder containing all the images.\n\n    transform : None or callable\n        The transform to be applied when yielding the image.\n\n    Attributes\n    ----------\n    all_paths : list\n        List of all paths to the `.jpg` images.\n    \"\"\"\n    def __init__(self, path, transform=None):\n        super().__init__()\n\n        self.all_paths = sorted([p for p in path.iterdir() if p.suffix == \".jpg\"])\n        self.transform = transform\n\n    def __len__(self):\n        \"\"\"Compute length of the dataset.\"\"\"\n        return len(self.all_paths)\n\n    def __getitem__(self, ix):\n        \"\"\"Get a single item.\"\"\"\n        img = Image.open(self.all_paths[ix])\n\n        if self.transform is not None:\n            img = self.transform(img)\n\n        return img\n\n\n\nclass Generator(nn.Module):\n    \"\"\"Generator network.\n\n    Parameters\n    ----------\n    latent_dim : int\n        The dimensionality of the input noise.\n\n    ngf : int\n        Number of generator filters. Note that the actual number of filters\n        will be a multiple of this number and is going to be divided by two in\n        each consecutive block of the network.\n\n    Attributes\n    ----------\n    main : torch.Sequential\n        The actual network that is composed of `ConvTranspose2d`, `BatchNorm2d`\n        and `ReLU` blocks.\n    \"\"\"\n\n    def __init__(self, latent_dim, ngf=64):\n        super().__init__()\n        self.main = nn.Sequential(\n            nn.ConvTranspose2d(latent_dim, ngf * 16, 4, 1, 0, bias=False),\n            nn.BatchNorm2d(ngf * 16),\n            nn.ReLU(True),\n            # (ngf * 16) x 4 x 4\n            nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),\n            nn.BatchNorm2d(ngf * 8),\n            nn.ReLU(True),\n            # (ngf * 8) x 8 x 8\n            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),\n            nn.BatchNorm2d(ngf * 4),\n            nn.ReLU(True),\n            # (ngf * 4) x 16 x 16\n            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),\n            nn.BatchNorm2d(ngf * 2),\n            nn.ReLU(True),\n            # (ngf * 2) x 32 x 32\n            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),\n            nn.BatchNorm2d(ngf),\n            nn.ReLU(True),\n            # ngf x 64 x 64\n            nn.ConvTranspose2d(ngf, 3, 4, 2, 1, bias=False),\n            nn.Tanh(),\n            # 3 x 128 x 128\n        )\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input noise of shape `(n_samples, latent_dim)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Generated images of shape `(n_samples, 3, 128, 128)`.\n        \"\"\"\n        x = x.reshape(*x.shape, 1, 1)  # (n_samples, latent_dim, 1, 1)\n        return self.main(x)\n\n\nclass Discriminator(nn.Module):\n    \"\"\"Discriminator netowrk.\n\n    Parameters\n    ----------\n    ndf : int\n        Number of discriminator filters. It represents the number of filters\n        after the first convolution block. Each consecutive block will double\n        the number.\n\n    augment_module : nn.Module or None\n        If provided it represents the Kornia module that performs\n        differentiable augmentation of the images.\n\n    Attributes\n    ----------\n    augment_module : nn.Module\n        If the input parameter `augment_module` provided then this is the\n        same thing. If not, then this is just an identity mapping.\n    \"\"\"\n    def __init__(self, ndf=16, augment_module=None):\n        super().__init__()\n        self.main = nn.Sequential(\n            # 3 x 128 x 128\n            nn.Conv2d(3, ndf, 4, stride=2, padding=1, bias=False),\n            nn.LeakyReLU(0.2, inplace=True),\n            # ndf x 64 x 64\n            nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),\n            nn.BatchNorm2d(ndf * 2),\n            nn.LeakyReLU(0.2, inplace=True),\n            # (ndf * 2) x 32 x 32\n            nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),\n            nn.BatchNorm2d(ndf * 4),\n            nn.LeakyReLU(0.2, inplace=True),\n            # (ndf * 4) x 16 x 16\n            nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),\n            nn.BatchNorm2d(ndf * 8),\n            nn.LeakyReLU(0.2, inplace=True),\n            # (ndf * 8) x 8 x 8\n            nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),\n            nn.BatchNorm2d(ndf * 16),\n            nn.LeakyReLU(0.2, inplace=True),\n            # (ndf * 16) x 4 x 4\n            nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),\n            nn.Sigmoid()\n            # 1 x 1 x 1\n        )\n        if augment_module is not None:\n            self.augment_module = augment_module\n        else:\n            self.augment_module = nn.Identity()\n\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input images of shape `(n_samples, 3, 128, 128)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Classification outputs of shape `(n_samples, 1)`.\n        \"\"\"\n        if self.training:\n            x = self.augment_module(x)\n\n        x = self.main(x)  # (n_samples, 1, 1, 1)\n        x = x.reshape(len(x), -1)  # (n_samples, 1)\n        return x\n\n\ndef init_weights_(module):\n    \"\"\"Initialize weights by sampling from a normal distribution.\n\n    Note that this operation is modifying the weights in place.\n\n    Parameters\n    ----------\n    module : nn.Module\n        Module with trainable weights.\n    \"\"\"\n    cls_name = module.__class__.__name__\n\n    if cls_name in {\"Conv2d\", \"ConvTranspose2d\"}:\n        nn.init.normal_(module.weight.data, 0.0, 0.02)\n\n    elif cls_name == \"BatchNorm2d\":\n        nn.init.normal_(module.weight.data, 1.0, 0.02)\n        nn.init.constant_(module.bias.data, 0.0)\n"
  },
  {
    "path": "github_adventures/dino/data/README.md",
    "content": "The `Imagenette` dataset was used. You can find it here: https://github.com/fastai/imagenette (320 px version). \n"
  },
  {
    "path": "github_adventures/dino/data/imagenette_labels.json",
    "content": "{\"n01440764\": \"tench\", \"n02102040\": \"english_springer\", \"n02979186\": \"cassette_player\", \"n03000684\": \"chain_saw\", \"n03028079\": \"church\", \"n03394916\": \"french_horn\", \"n03417042\": \"garbage_truck\", \"n03425413\": \"gas_pump\", \"n03445777\": \"golf_ball\", \"n03888257\": \"parachute\"}"
  },
  {
    "path": "github_adventures/dino/evaluation.py",
    "content": "import numpy as np\nimport torch\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.neighbors import KNeighborsClassifier\n\n\ndef compute_knn(backbone, data_loader_train, data_loader_val):\n    \"\"\"Get CLS embeddings and use KNN classifier on them.\n\n    We load all embeddings in memory and use sklearn. Should\n    be doable.\n\n    Parameters\n    ----------\n    backbone : timm.models.vision_transformer.VisionTransformer\n        Vision transformer whose head is just an identity\n        mapping.\n\n    data_loader_train, data_loader_val : torch.utils.data.DataLoader\n        Training and validation dataloader that does not apply any\n        augmentations. Just casting to tensor and then normalizing.\n\n    Returns\n    -------\n    val_accuracy : float\n        Validation accuracy.\n    \"\"\"\n    device = next(backbone.parameters()).device\n\n    data_loaders = {\n        \"train\": data_loader_train,\n        \"val\": data_loader_val,\n    }\n    lists = {\n        \"X_train\": [],\n        \"y_train\": [],\n        \"X_val\": [],\n        \"y_val\": [],\n    }\n\n    for name, data_loader in data_loaders.items():\n        for imgs, y in data_loader:\n            imgs = imgs.to(device)\n            lists[f\"X_{name}\"].append(backbone(imgs).detach().cpu().numpy())\n            lists[f\"y_{name}\"].append(y.detach().cpu().numpy())\n\n    arrays = {k: np.concatenate(l) for k, l in lists.items()}\n\n    estimator = KNeighborsClassifier()\n    estimator.fit(arrays[\"X_train\"], arrays[\"y_train\"])\n    y_val_pred = estimator.predict(arrays[\"X_val\"])\n\n    acc = accuracy_score(arrays[\"y_val\"], y_val_pred)\n\n    return acc\n\ndef compute_embedding(backbone, data_loader):\n    \"\"\"Compute CLS embedding and prepare for TensorBoard.\n\n    Parameters\n    ----------\n    backbone : timm.models.vision_transformer.VisionTransformer\n        Vision transformer. The head should be an identity mapping.\n\n    data_loader : torch.utils.data.DataLoader\n        Validation dataloader that does not apply any augmentations. Just\n        casting to tensor and then normalizing.\n\n    Returns\n    -------\n    embs : torch.Tensor\n        Embeddings of shape `(n_samples, out_dim)`.\n\n    imgs : torch.Tensor\n        Images of shape `(n_samples, 3, height, width)`.\n\n    labels : list\n        List of strings representing the classes.\n    \"\"\"\n    device = next(backbone.parameters()).device\n\n    embs_l = []\n    imgs_l = []\n    labels = []\n\n    for img, y in data_loader:\n        img = img.to(device)\n        embs_l.append(backbone(img).detach().cpu())\n        imgs_l.append(((img * 0.224) + 0.45).cpu())  # undo norm\n        labels.extend([data_loader.dataset.classes[i] for i in y.tolist()])\n\n    embs = torch.cat(embs_l, dim=0)\n    imgs = torch.cat(imgs_l, dim=0)\n\n    return embs, imgs, labels\n"
  },
  {
    "path": "github_adventures/dino/train.py",
    "content": "import argparse\nimport json\nimport pathlib\n\nimport timm\nimport torch\nimport torchvision.transforms as transforms\nimport tqdm\nfrom torch.utils.data import DataLoader, SubsetRandomSampler\nfrom torch.utils.tensorboard import SummaryWriter\nfrom torchvision.datasets import ImageFolder\n\nfrom evaluation import compute_embedding, compute_knn\nfrom utils import DataAugmentation, Head, Loss, MultiCropWrapper, clip_gradients\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        \"DINO training CLI\",\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter,\n    )\n    parser.add_argument(\"-b\", \"--batch-size\", type=int, default=32)\n    parser.add_argument(\n        \"-d\", \"--device\", type=str, choices=(\"cpu\", \"cuda\"), default=\"cpu\"\n    )\n    parser.add_argument(\"-l\", \"--logging-freq\", type=int, default=200)\n    parser.add_argument(\"--momentum-teacher\", type=int, default=0.9995)\n    parser.add_argument(\"-c\", \"--n-crops\", type=int, default=4)\n    parser.add_argument(\"-e\", \"--n-epochs\", type=int, default=100)\n    parser.add_argument(\"-o\", \"--out-dim\", type=int, default=1024)\n    parser.add_argument(\"-t\", \"--tensorboard-dir\", type=str, default=\"logs\")\n    parser.add_argument(\"--clip-grad\", type=float, default=2.0)\n    parser.add_argument(\"--norm-last-layer\", action=\"store_true\")\n    parser.add_argument(\"--batch-size-eval\", type=int, default=64)\n    parser.add_argument(\"--teacher-temp\", type=float, default=0.04)\n    parser.add_argument(\"--student-temp\", type=float, default=0.1)\n    parser.add_argument(\"--pretrained\", action=\"store_true\")\n    parser.add_argument(\"-w\", \"--weight-decay\", type=float, default=0.4)\n\n    args = parser.parse_args()\n    print(vars(args))\n    # Parameters\n    vit_name, dim = \"vit_deit_small_patch16_224\", 384\n    path_dataset_train = pathlib.Path(\"data/imagenette2-320/train\")\n    path_dataset_val = pathlib.Path(\"data/imagenette2-320/val\")\n    path_labels = pathlib.Path(\"data/imagenette_labels.json\")\n\n    logging_path = pathlib.Path(args.tensorboard_dir)\n    device = torch.device(args.device)\n\n    n_workers = 4\n\n    # Data related\n    with path_labels.open(\"r\") as f:\n        label_mapping = json.load(f)\n\n    transform_aug = DataAugmentation(size=224, n_local_crops=args.n_crops - 2)\n    transform_plain = transforms.Compose(\n        [\n            transforms.ToTensor(),\n            transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\n            transforms.Resize((224, 224)),\n        ]\n    )\n\n    dataset_train_aug = ImageFolder(path_dataset_train, transform=transform_aug)\n    dataset_train_plain = ImageFolder(path_dataset_train, transform=transform_plain)\n    dataset_val_plain = ImageFolder(path_dataset_val, transform=transform_plain)\n\n    if dataset_train_plain.classes != dataset_val_plain.classes:\n        raise ValueError(\"Inconsistent classes\")\n\n    data_loader_train_aug = DataLoader(\n        dataset_train_aug,\n        batch_size=args.batch_size,\n        shuffle=True,\n        drop_last=True,\n        num_workers=n_workers,\n        pin_memory=True,\n    )\n    data_loader_train_plain = DataLoader(\n        dataset_train_plain,\n        batch_size=args.batch_size_eval,\n        drop_last=False,\n        num_workers=n_workers,\n    )\n    data_loader_val_plain = DataLoader(\n        dataset_val_plain,\n        batch_size=args.batch_size_eval,\n        drop_last=False,\n        num_workers=n_workers,\n    )\n    data_loader_val_plain_subset = DataLoader(\n        dataset_val_plain,\n        batch_size=args.batch_size_eval,\n        drop_last=False,\n        sampler=SubsetRandomSampler(list(range(0, len(dataset_val_plain), 50))),\n        num_workers=n_workers,\n    )\n\n    # Logging\n    writer = SummaryWriter(logging_path)\n    writer.add_text(\"arguments\", json.dumps(vars(args)))\n\n    # Neural network related\n    student_vit = timm.create_model(vit_name, pretrained=args.pretrained)\n    teacher_vit = timm.create_model(vit_name, pretrained=args.pretrained)\n\n    student = MultiCropWrapper(\n        student_vit,\n        Head(\n            dim,\n            args.out_dim,\n            norm_last_layer=args.norm_last_layer,\n        ),\n    )\n    teacher = MultiCropWrapper(teacher_vit, Head(dim, args.out_dim))\n    student, teacher = student.to(device), teacher.to(device)\n\n    teacher.load_state_dict(student.state_dict())\n\n    for p in teacher.parameters():\n        p.requires_grad = False\n\n    # Loss related\n    loss_inst = Loss(\n        args.out_dim,\n        teacher_temp=args.teacher_temp,\n        student_temp=args.student_temp,\n    ).to(device)\n    lr = 0.0005 * args.batch_size / 256\n    optimizer = torch.optim.AdamW(\n        student.parameters(),\n        lr=lr,\n        weight_decay=args.weight_decay,\n    )\n\n    # Training loop\n    n_batches = len(dataset_train_aug) // args.batch_size\n    best_acc = 0\n    n_steps = 0\n\n    for e in range(args.n_epochs):\n        for i, (images, _) in tqdm.tqdm(\n            enumerate(data_loader_train_aug), total=n_batches\n        ):\n            if n_steps % args.logging_freq == 0:\n                student.eval()\n\n                # Embedding\n                embs, imgs, labels_ = compute_embedding(\n                    student.backbone,\n                    data_loader_val_plain_subset,\n                )\n                writer.add_embedding(\n                    embs,\n                    metadata=[label_mapping[l] for l in labels_],\n                    label_img=imgs,\n                    global_step=n_steps,\n                    tag=\"embeddings\",\n                )\n\n                # KNN\n                current_acc = compute_knn(\n                    student.backbone,\n                    data_loader_train_plain,\n                    data_loader_val_plain,\n                )\n                writer.add_scalar(\"knn-accuracy\", current_acc, n_steps)\n                if current_acc > best_acc:\n                    torch.save(student, logging_path / \"best_model.pth\")\n                    best_acc = current_acc\n\n                student.train()\n\n            images = [img.to(device) for img in images]\n\n            teacher_output = teacher(images[:2])\n            student_output = student(images)\n\n            loss = loss_inst(student_output, teacher_output)\n\n            optimizer.zero_grad()\n            loss.backward()\n            clip_gradients(student, args.clip_grad)\n            optimizer.step()\n\n            with torch.no_grad():\n                for student_ps, teacher_ps in zip(\n                    student.parameters(), teacher.parameters()\n                ):\n                    teacher_ps.data.mul_(args.momentum_teacher)\n                    teacher_ps.data.add_(\n                        (1 - args.momentum_teacher) * student_ps.detach().data\n                    )\n\n            writer.add_scalar(\"train_loss\", loss, n_steps)\n\n            n_steps += 1\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/dino/utils.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\nfrom PIL import Image\n\n\nclass DataAugmentation:\n    \"\"\"Create crops of an input image together with additional augmentation.\n\n    It generates 2 global crops and `n_local_crops` local crops.\n\n    Parameters\n    ----------\n    global_crops_scale : tuple\n        Range of sizes for the global crops.\n\n    local_crops_scale : tuple\n        Range of sizes for the local crops.\n\n    n_local_crops : int\n        Number of local crops to create.\n\n    size : int\n        The size of the final image.\n\n    Attributes\n    ----------\n    global_1, global_2 : transforms.Compose\n        Two global transforms.\n\n    local : transforms.Compose\n        Local transform. Note that the augmentation is stochastic so one\n        instance is enough and will lead to different crops.\n    \"\"\"\n    def __init__(\n        self,\n        global_crops_scale=(0.4, 1),\n        local_crops_scale=(0.05, 0.4),\n        n_local_crops=8,\n        size=224,\n    ):\n        self.n_local_crops = n_local_crops\n        RandomGaussianBlur = lambda p: transforms.RandomApply(  # noqa\n            [transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2))],\n            p=p,\n        )\n\n        flip_and_jitter = transforms.Compose(\n            [\n                transforms.RandomHorizontalFlip(p=0.5),\n                transforms.RandomApply(\n                    [\n                        transforms.ColorJitter(\n                            brightness=0.4,\n                            contrast=0.4,\n                            saturation=0.2,\n                            hue=0.1,\n                        ),\n                    ]\n                ),\n                transforms.RandomGrayscale(p=0.2),\n            ]\n        )\n\n        normalize = transforms.Compose(\n            [\n                transforms.ToTensor(),\n                transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\n            ]\n        )\n\n        self.global_1 = transforms.Compose(\n            [\n                transforms.RandomResizedCrop(\n                    size,\n                    scale=global_crops_scale,\n                    interpolation=Image.BICUBIC,\n                ),\n                flip_and_jitter,\n                RandomGaussianBlur(1.0),  # always apply\n                normalize,\n            ],\n        )\n\n        self.global_2 = transforms.Compose(\n            [\n                transforms.RandomResizedCrop(\n                    size,\n                    scale=global_crops_scale,\n                    interpolation=Image.BICUBIC,\n                ),\n                flip_and_jitter,\n                RandomGaussianBlur(0.1),\n                transforms.RandomSolarize(170, p=0.2),\n                normalize,\n            ],\n        )\n\n        self.local = transforms.Compose(\n            [\n                transforms.RandomResizedCrop(\n                    size,\n                    scale=local_crops_scale,\n                    interpolation=Image.BICUBIC,\n                ),\n                flip_and_jitter,\n                RandomGaussianBlur(0.5),\n                normalize,\n            ],\n        )\n\n    def __call__(self, img):\n        \"\"\"Apply transformation.\n\n        Parameters\n        ----------\n        img : PIL.Image\n            Input image.\n\n        Returns\n        -------\n        all_crops : list\n            List of `torch.Tensor` representing different views of\n            the input `img`.\n        \"\"\"\n        all_crops = []\n        all_crops.append(self.global_1(img))\n        all_crops.append(self.global_2(img))\n\n        all_crops.extend([self.local(img) for _ in range(self.n_local_crops)])\n\n        return all_crops\n\n\nclass Head(nn.Module):\n    \"\"\"Network hooked up to the CLS token embedding.\n\n    Just a MLP with the last layer being normalized in a particular way.\n\n    Parameters\n    ----------\n    in_dim : int\n        The dimensionality of the token embedding.\n\n    out_dim : int\n        The dimensionality of the final layer (we compute the softmax over).\n\n    hidden_dim : int\n        Dimensionality of the hidden layers.\n\n    bottleneck_dim : int\n        Dimensionality of the second last layer.\n\n    n_layers : int\n        The number of layers.\n\n    norm_last_layer : bool\n        If True, then we freeze the norm of the weight of the last linear layer\n        to 1.\n\n    Attributes\n    ----------\n    mlp : nn.Sequential\n        Vanilla multi-layer perceptron.\n\n    last_layer : nn.Linear\n        Reparametrized linear layer with weight normalization. That means\n        that that it will have `weight_g` and `weight_v` as learnable\n        parameters instead of a single `weight`.\n    \"\"\"\n\n    def __init__(\n        self,\n        in_dim,\n        out_dim,\n        hidden_dim=512,\n        bottleneck_dim=256,\n        n_layers=3,\n        norm_last_layer=False,\n    ):\n        super().__init__()\n        if n_layers == 1:\n            self.mlp = nn.Linear(in_dim, bottleneck_dim)\n        else:\n            layers = [nn.Linear(in_dim, hidden_dim)]\n            layers.append(nn.GELU())\n            for _ in range(n_layers - 2):\n                layers.append(nn.Linear(hidden_dim, hidden_dim))\n                layers.append(nn.GELU())\n            layers.append(nn.Linear(hidden_dim, bottleneck_dim))\n            self.mlp = nn.Sequential(*layers)\n\n        self.apply(self._init_weights)\n\n        self.last_layer = nn.utils.weight_norm(\n            nn.Linear(bottleneck_dim, out_dim, bias=False)\n        )\n        self.last_layer.weight_g.data.fill_(1)\n        if norm_last_layer:\n            self.last_layer.weight_g.requires_grad = False\n\n    def _init_weights(self, m):\n        \"\"\"Initialize learnable parameters.\"\"\"\n        if isinstance(m, nn.Linear):\n            nn.init.normal_(m.weight, std=0.02)\n            if m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Of shape `(n_samples, in_dim)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Of shape `(n_samples, out_dim)`.\n        \"\"\"\n        x = self.mlp(x)  # (n_samples, bottleneck_dim)\n        x = nn.functional.normalize(x, dim=-1, p=2)  # (n_samples, bottleneck_dim)\n        x = self.last_layer(x)  # (n_samples, out_dim)\n\n        return x\n\n\nclass MultiCropWrapper(nn.Module):\n    \"\"\"Convenience class for forward pass of multiple crops.\n\n    Parameters\n    ----------\n    backbone : timm.models.vision_transformer.VisionTransformer\n        Instantiated Vision Transformer. Note that we will take the `head`\n        attribute and replace it with `nn.Identity`.\n\n    new_head : Head\n        New head that is going to be put on top of the `backbone`.\n    \"\"\"\n    def __init__(self, backbone, new_head):\n        super().__init__()\n        backbone.head = nn.Identity()  # deactivate original head\n        self.backbone = backbone\n        self.new_head = new_head\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        The different crops are concatenated along the batch dimension\n        and then a single forward pass is fun. The resulting tensor\n        is then chunked back to per crop tensors.\n\n        Parameters\n        ----------\n        x : list\n            List of `torch.Tensor` each of shape `(n_samples, 3, size, size)`.\n\n        Returns\n        -------\n        tuple\n            Tuple of `torch.Tensor` each of shape `(n_samples, out_dim)` where\n            `output_dim` is determined by `Head`.\n        \"\"\"\n        n_crops = len(x)\n        concatenated = torch.cat(x, dim=0)  # (n_samples * n_crops, 3, size, size)\n        cls_embedding = self.backbone(concatenated)  # (n_samples * n_crops, in_dim)\n        logits = self.new_head(cls_embedding)  # (n_samples * n_crops, out_dim)\n        chunks = logits.chunk(n_crops)  # n_crops * (n_samples, out_dim)\n\n        return chunks\n\n\nclass Loss(nn.Module):\n    \"\"\"The loss function.\n\n    We subclass the `nn.Module` becuase we want to create a buffer for the\n    logits center of the teacher.\n\n    Parameters\n    ----------\n    out_dim : int\n        The dimensionality of the final layer (we computed the softmax over).\n\n    teacher_temp, student_temp : float\n        Softmax temperature of the teacher resp. student.\n\n    center_momentum : float\n        Hyperparameter for the exponential moving average that determines\n        the center logits. The higher the more the running average matters.\n    \"\"\"\n    def __init__(\n        self, out_dim, teacher_temp=0.04, student_temp=0.1, center_momentum=0.9\n    ):\n        super().__init__()\n        self.student_temp = student_temp\n        self.teacher_temp = teacher_temp\n        self.center_momentum = center_momentum\n        self.register_buffer(\"center\", torch.zeros(1, out_dim))\n\n    def forward(self, student_output, teacher_output):\n        \"\"\"Evaluate loss.\n\n        Parameters\n        ----------\n        student_output, teacher_output : tuple\n            Tuple of tensors of shape `(n_samples, out_dim)` representing\n            logits. The length is equal to number of crops.\n            Note that student processed all crops and that the two initial crops\n            are the global ones.\n\n        Returns\n        -------\n        loss : torch.Tensor\n            Scalar representing the average loss.\n        \"\"\"\n        student_temp = [s / self.student_temp for s in student_output]\n        teacher_temp = [(t - self.center) / self.teacher_temp for t in teacher_output]\n\n        student_sm = [F.log_softmax(s, dim=-1) for s in student_temp]\n        teacher_sm = [F.softmax(t, dim=-1).detach() for t in teacher_temp]\n\n        total_loss = 0\n        n_loss_terms = 0\n\n        for t_ix, t in enumerate(teacher_sm):\n            for s_ix, s in enumerate(student_sm):\n                if t_ix == s_ix:\n                    continue\n\n                loss = torch.sum(-t * s, dim=-1)  # (n_samples,)\n                total_loss += loss.mean()  # scalar\n                n_loss_terms += 1\n\n        total_loss /= n_loss_terms\n        self.update_center(teacher_output)\n\n        return total_loss\n\n    @torch.no_grad()\n    def update_center(self, teacher_output):\n        \"\"\"Update center used for teacher output.\n\n        Compute the exponential moving average.\n\n        Parameters\n        ----------\n        teacher_output : tuple\n            Tuple of tensors of shape `(n_samples, out_dim)` where each\n            tensor represents a different crop.\n        \"\"\"\n        batch_center = torch.cat(teacher_output).mean(\n            dim=0, keepdim=True\n        )  # (1, out_dim)\n        self.center = self.center * self.center_momentum + batch_center * (\n            1 - self.center_momentum\n        )\n\ndef clip_gradients(model, clip=2.0):\n    \"\"\"Rescale norm of computed gradients.\n\n    Parameters\n    ----------\n    model : nn.Module\n        Module.\n\n    clip : float\n        Maximum norm.\n    \"\"\"\n    for p in model.parameters():\n        if p.grad is not None:\n            param_norm = p.grad.data.norm(2)\n            clip_coef = clip / (param_norm + 1e-6)\n            if clip_coef < 1:\n                p.grad.data.mul_(clip_coef)\n"
  },
  {
    "path": "github_adventures/dino/visualize_attentions.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1a3bd5ec\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import ipywidgets\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import timm\\n\",\n    \"import torch\\n\",\n    \"from torchvision.datasets import ImageFolder\\n\",\n    \"import torchvision.transforms as transforms\\n\",\n    \"from torchvision.utils import make_grid\\n\",\n    \"import torch.nn.functional as F\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a6eaa0ef\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Helpers\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"2c0b2e7c\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def get_last_attention(backbone, x):\\n\",\n    \"    \\\"\\\"\\\"Get the attention weights of CLS from the last self-attention layer.\\n\",\n    \"\\n\",\n    \"    Very hacky!\\n\",\n    \"\\n\",\n    \"    Parameters\\n\",\n    \"    ----------\\n\",\n    \"    backbone : timm.models.vision_transformer.VisionTransformer\\n\",\n    \"        Instantiated Vision Transformer. Note that we will in-place\\n\",\n    \"        take the `head` attribute and replace it with `nn.Identity`.\\n\",\n    \"\\n\",\n    \"    x : torch.Tensor\\n\",\n    \"        Batch of images of shape `(n_samples, 3, size, size)`.\\n\",\n    \"\\n\",\n    \"    Returns\\n\",\n    \"    -------\\n\",\n    \"    torch.Tensor\\n\",\n    \"        Attention weights `(n_samples, n_heads, n_patches)`.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    attn_module = backbone.blocks[-1].attn\\n\",\n    \"    n_heads = attn_module.num_heads\\n\",\n    \"\\n\",\n    \"    # define hook\\n\",\n    \"    inp = None\\n\",\n    \"    def fprehook(self, inputs):\\n\",\n    \"        nonlocal inp\\n\",\n    \"        inp = inputs[0]\\n\",\n    \"\\n\",\n    \"    # Register a hook\\n\",\n    \"    handle = attn_module.register_forward_pre_hook(fprehook)\\n\",\n    \"\\n\",\n    \"    # Run forward pass\\n\",\n    \"    _ = backbone(x)\\n\",\n    \"    handle.remove()\\n\",\n    \"\\n\",\n    \"    B, N, C = inp.shape\\n\",\n    \"    qkv = attn_module.qkv(inp).reshape(B, N, 3, n_heads, C // n_heads).permute(2, 0, 3, 1, 4)\\n\",\n    \"    q, k, v = qkv[0], qkv[1], qkv[2]\\n\",\n    \"\\n\",\n    \"    attn = (q @ k.transpose(-2, -1)) * attn_module.scale\\n\",\n    \"    attn = attn.softmax(dim=-1)\\n\",\n    \"\\n\",\n    \"    return attn[:, :, 0, 1:]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"57b72b84\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def threshold(attn, k=30):\\n\",\n    \"    n_heads = len(attn)\\n\",\n    \"    indices = attn.argsort(dim=1, descending=True)[:, k:]\\n\",\n    \"\\n\",\n    \"    for head in range(n_heads):\\n\",\n    \"        attn[head, indices[head]] = 0\\n\",\n    \"\\n\",\n    \"    attn /= attn.sum(dim=1, keepdim=True)\\n\",\n    \"\\n\",\n    \"    return attn\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"59e9009d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def visualize_attention(img, backbone, k=30):\\n\",\n    \"    \\\"\\\"\\\"Create attention image.\\n\",\n    \"\\n\",\n    \"    Parameteres\\n\",\n    \"    -----------\\n\",\n    \"    img : PIL.Image\\n\",\n    \"        RGB image.\\n\",\n    \"\\n\",\n    \"    backbone : timm.models.vision_transformer.VisionTransformer\\n\",\n    \"        The vision transformer.\\n\",\n    \"\\n\",\n    \"    Returns\\n\",\n    \"    -------\\n\",\n    \"    new_img : torch.Tensor\\n\",\n    \"        Image of shape (n_heads, 1, height, width).\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    # imply parameters\\n\",\n    \"\\n\",\n    \"    patch_size = backbone.patch_embed.proj.kernel_size[0]\\n\",\n    \"\\n\",\n    \"    transform = transforms.Compose([\\n\",\n    \"\\n\",\n    \"        transforms.Resize((224, 224)),\\n\",\n    \"        transforms.ToTensor(),\\n\",\n    \"        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\\n\",\n    \"        ]\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"    device = next(backbone.parameters()).device\\n\",\n    \"    x = transform(img)[None, ...].to(device)\\n\",\n    \"    attn = get_last_attention(backbone, x)[0]  # (n_heads, n_patches)\\n\",\n    \"    attn = attn / attn.sum(dim=1, keepdim=True)  # (n_heads, n_patches)\\n\",\n    \"    attn = threshold(attn, k)\\n\",\n    \"    attn = attn.reshape(-1, 14, 14)  # (n_heads, 14, 14)\\n\",\n    \"    attn = F.interpolate(attn.unsqueeze(0),\\n\",\n    \"        scale_factor=patch_size,\\n\",\n    \"        mode=\\\"nearest\\\"\\n\",\n    \"        )[0]\\n\",\n    \"\\n\",\n    \"    return attn\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"df0972ec\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Preparation\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d6e0d987\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"models = {\\n\",\n    \"    \\\"supervised\\\": timm.create_model(\\\"vit_deit_small_patch16_224\\\", pretrained=True),\\n\",\n    \"    \\\"selfsupervised\\\": torch.load(\\\"best_model.pth\\\", map_location=\\\"cpu\\\").backbone,\\n\",\n    \"}\\n\",\n    \"dataset = ImageFolder(\\\"data/imagenette2-320/val\\\")\\n\",\n    \"\\n\",\n    \"colors = [\\\"yellow\\\", \\\"red\\\", \\\"green\\\", \\\"blue\\\"]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"690e3a1f\",\n   \"metadata\": {\n    \"scrolled\": false\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"@ipywidgets.interact\\n\",\n    \"def _(\\n\",\n    \"    i=ipywidgets.IntSlider(min=0, max=len(dataset) - 1, continuous_update=False),\\n\",\n    \"    k=ipywidgets.IntSlider(min=0, max=195, value=10, continuous_update=False),\\n\",\n    \"    model=ipywidgets.Dropdown(options=[\\\"supervised\\\", \\\"selfsupervised\\\"]),\\n\",\n    \"):\\n\",\n    \"    img = dataset[i][0]\\n\",\n    \"    attns = visualize_attention(img, models[model], k=k).detach()[:].permute(1, 2, 0).numpy()\\n\",\n    \"\\n\",\n    \"    tform = transforms.Compose([\\n\",\n    \"\\n\",\n    \"        transforms.Resize((224, 224)),\\n\",\n    \"    ])\\n\",\n    \"    # original image\\n\",\n    \"    plt.imshow(tform(img))\\n\",\n    \"    plt.axis(\\\"off\\\")\\n\",\n    \"    plt.show()\\n\",\n    \"\\n\",\n    \"    kwargs = {\\\"vmin\\\": 0, \\\"vmax\\\": 0.24}\\n\",\n    \"    # Attentions\\n\",\n    \"    n_heads = 6\\n\",\n    \"\\n\",\n    \"    fig, axs = plt.subplots(2, 3, figsize=(10, 7))\\n\",\n    \"    \\n\",\n    \"    for i in range(n_heads):\\n\",\n    \"        ax = axs[i // 3, i % 3]\\n\",\n    \"        ax.imshow(attns[..., i], **kwargs)\\n\",\n    \"        ax.axis(\\\"off\\\")\\n\",\n    \"        \\n\",\n    \"    plt.tight_layout()\\n\",\n    \"        \\n\",\n    \"    plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d83eae10\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# 3244, 1942, 3482, 688, 1509, 3709\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "github_adventures/dino/visualize_augmentations.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5801191a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import warnings\\n\",\n    \"\\n\",\n    \"warnings.filterwarnings(\\\"ignore\\\")\\n\",\n    \"import ipywidgets\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import numpy as np\\n\",\n    \"import torch\\n\",\n    \"from PIL import Image\\n\",\n    \"from torchvision.datasets import ImageFolder\\n\",\n    \"\\n\",\n    \"from utils import DataAugmentation\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ad4f7f91\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def to_numpy(t):\\n\",\n    \"    array = torch.clip((t * 0.224) + 0.45, 0, 1).permute(1, 2, 0).numpy()\\n\",\n    \"    return array\\n\",\n    \"    \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"db09874a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"transform = DataAugmentation(n_local_crops=2)\\n\",\n    \"dataset = ImageFolder(\\\"data/imagenette2-320/train/\\\", transform=transform)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"48738037\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"@ipywidgets.interact\\n\",\n    \"def _(\\n\",\n    \"    i=ipywidgets.IntSlider(min=0, max=len(dataset) - 1, continuous_update=False),\\n\",\n    \"    seed=ipywidgets.IntSlider(min=0, max=50, continuous_update=False),\\n\",\n    \"):\\n\",\n    \"    torch.manual_seed(seed)\\n\",\n    \"    all_crops, _ = dataset[i]\\n\",\n    \"    titles = [\\\"Global 1\\\", \\\"Global 2\\\", \\\"Local 1\\\", \\\"Local 2\\\"]\\n\",\n    \"    \\n\",\n    \"    original_img = np.array(Image.open(dataset.samples[i][0]))\\n\",\n    \"    _, ax_orig = plt.subplots(figsize=(15, 5))\\n\",\n    \"    ax_orig.imshow(original_img)\\n\",\n    \"    ax_orig.set_title(\\\"Original\\\")\\n\",\n    \"    ax_orig.axis(\\\"off\\\")\\n\",\n    \"    \\n\",\n    \"    \\n\",\n    \"    fig, axs = plt.subplots(2, 2, figsize=(10, 10))\\n\",\n    \"    \\n\",\n    \"    for i, title in enumerate(titles):\\n\",\n    \"        ax = axs[i // 2, i % 2]\\n\",\n    \"        ax.imshow(to_numpy(all_crops[i]))\\n\",\n    \"        ax.set_title(title)\\n\",\n    \"        ax.axis(\\\"off\\\")\\n\",\n    \"    fig.tight_layout()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "github_adventures/gpt/README.md",
    "content": "# GPT-2 custom implementation\n## Installation\n\n```python\npip install -r requirements.txt\n```\n\n## Launching script\nTo copy weights of an official model + generate some text use the script\n`copy_and_generate.py`\n\n```python\n(gpt) gpt$ python copy_and_generate.py --help\nusage: Copy weights of a HF model and generate text. [-h] [--sample] [-s STEPS] [-r RANDOM_STATE]\n                                                     [-t TEMPERATURE] [-k TOP_K] [-v]\n                                                     {gpt2,gpt2-medium,gpt2-large,distilgpt2}\n                                                     initial_text\n\npositional arguments:\n  {gpt2,gpt2-medium,gpt2-large,distilgpt2}\n                        Pretrained model to use\n  initial_text          Initial text\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --sample              If True sample randomly otherwise take the most probable token (default: False)\n  -s STEPS, --steps STEPS\n                        Number of new tokens to generate (default: 30)\n  -r RANDOM_STATE, --random-state RANDOM_STATE\n                        Random state (default: None)\n  -t TEMPERATURE, --temperature TEMPERATURE\n                        Softmax logits temperature (default: 1)\n  -k TOP_K, --top-k TOP_K\n                        If specified, then selecting k most probable tokens (default: None)\n  -v, --verbose         If True, then verbose (default: False)\n\n```\n"
  },
  {
    "path": "github_adventures/gpt/copy_and_generate.py",
    "content": "import argparse\nimport logging\n\nimport torch\n\nfrom model import GPT\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom utils import copy_model, generate_token\n\nlogging.basicConfig(format=\"[%(levelname)s] %(asctime)s %(message)s\")\nlogger = logging.getLogger(__file__)\n\n\ndef main(argv=None):\n    \"\"\"Copy weights and generate some text.\"\"\"\n    parser = argparse.ArgumentParser(\n        \"Copy weights of a HF model and generate text.\",\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter,\n    )\n\n    parser.add_argument(\n        \"model_name\",\n        type=str,\n        choices=(\"gpt2\", \"gpt2-medium\", \"gpt2-large\", \"distilgpt2\"),\n        help=\"Pretrained model to use\",\n    )\n    parser.add_argument(\n        \"initial_text\",\n        type=str,\n        help=\"Initial text\",\n    )\n    parser.add_argument(\n        \"--sample\",\n        action=\"store_true\",\n        help=\"If True sample randomly otherwise take the most probable token\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--steps\",\n        default=30,\n        type=int,\n        help=\"Number of new tokens to generate\",\n    )\n    parser.add_argument(\"-r\", \"--random-state\", type=int, help=\"Random state\")\n    parser.add_argument(\n        \"-t\",\n        \"--temperature\",\n        default=1,\n        type=float,\n        help=\"Softmax logits temperature\",\n    )\n    parser.add_argument(\n        \"-k\",\n        \"--top-k\",\n        type=int,\n        help=\"If specified, then selecting k most probable tokens\",\n    )\n    parser.add_argument(\n        \"-v\", \"--verbose\", action=\"store_true\", help=\"If True, then verbose\"\n    )\n\n    args = parser.parse_args(argv)\n\n    # Setup logging\n    if args.verbose:\n        logger.setLevel(logging.INFO)\n    else:\n        logger.setLevel(logging.WARNING)\n\n    logger.info(f\"CLI parameters: {vars(args)})\")\n    tokenizer = AutoTokenizer.from_pretrained(args.model_name)\n\n    model_official = AutoModelForCausalLM.from_pretrained(args.model_name)\n    config_official = model_official.config\n\n    our_params = [\n        \"vocab_size\",\n        \"n_layer\",\n        \"n_embd\",\n        \"n_head\",\n        \"n_positions\",\n        \"attn_pdrop\",\n        \"embd_pdrop\",\n        \"resid_pdrop\",\n        \"layer_norm_epsilon\",\n    ]\n\n    config_ours = {k: getattr(config_official, k) for k in our_params}\n    logger.info(f\"Model hyperparameters: {config_ours}\")\n\n    model_ours = GPT(**config_ours)\n    model_ours.eval()\n\n    copy_model(model_official, model_ours)\n\n    token_ixs = tokenizer(args.initial_text)[\"input_ids\"]\n\n    if args.random_state:\n        torch.manual_seed(args.random_state)\n\n    # Sample\n    for step in range(args.steps):\n        new_token_ix = generate_token(\n            model_ours,\n            token_ixs,\n            sample=args.sample,\n            top_k=args.top_k,\n            temperature=args.temperature,\n        )\n        token_ixs.append(new_token_ix)\n        logger.info(f\"Step {step} done\")\n\n    text = tokenizer.decode(token_ixs)\n    print(text)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/gpt/distribution_visualizations.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"896ffe86\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import ipywidgets\\n\",\n    \"\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import torch\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"09b6e1f4\",\n   \"metadata\": {},\n   \"source\": [\n    \"# <center> Applying temperature + keeping only top K values</center>\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2c7442cf\",\n   \"metadata\": {},\n   \"source\": [\n    \"$T=\\\\mbox{temperature}$ $$\\\\large P_i=\\\\frac{e^{\\\\frac{y_i}T}}{\\\\sum_{k=1}^n e^{\\\\frac{y_k}T}}$$\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"95833de6\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"@ipywidgets.interact\\n\",\n    \"def _(\\n\",\n    \"    n_tokens=ipywidgets.IntSlider(min=4, max=30, value=8, continuous_update=False),\\n\",\n    \"    random_state=ipywidgets.IntSlider(min=0, max=10, value=2, continuous_update=False),\\n\",\n    \"    temperature=ipywidgets.FloatSlider(min=0, max=10, value=1, continuous_update=False),\\n\",\n    \"    top_k=ipywidgets.IntSlider(min=1, max=20, value=8, continuous_update=False),\\n\",\n    \"    ):\\n\",\n    \"    # Preparations\\n\",\n    \"    top_k = min(top_k, n_tokens)\\n\",\n    \"    torch.manual_seed(random_state)\\n\",\n    \"    logits = 10 * torch.rand(n_tokens,)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"    # Generate original\\n\",\n    \"    probs_orig = torch.nn.functional.softmax(logits, dim=0).numpy()\\n\",\n    \"    \\n\",\n    \"    # Generate new\\n\",\n    \"    logits = logits / temperature\\n\",\n    \"    top_values, _ = torch.topk(logits, top_k)  # (top_k,)                                                                                                                                                                                 \\n\",\n    \"    logits[logits < top_values.min()] = -torch.inf       \\n\",\n    \"    probs_new = torch.nn.functional.softmax(logits, dim=0).numpy()\\n\",\n    \"\\n\",\n    \"    # Plotting\\n\",\n    \"    fig, (ax_orig, ax_new) = plt.subplots(1, 2, sharey=True, figsize=(10, 2), dpi=100)\\n\",\n    \"    x = range(n_tokens)\\n\",\n    \"\\n\",\n    \"    ax_orig.bar(x, probs_orig)\\n\",\n    \"    ax_orig.set_ylim((0, 1))\\n\",\n    \"    ax_orig.set_title(\\\"Original\\\")\\n\",\n    \"    \\n\",\n    \"    ax_new.bar(x, probs_new)\\n\",\n    \"    ax_new.set_title(\\\"Temperature + top K\\\")\\n\",\n    \"    \\n\",\n    \"    plt.show()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "github_adventures/gpt/ipython_code.py",
    "content": ">>> import torch\n>>> from model import GPT\n>>> from transformers import AutoModelForCausalLM\n>>> hparams_names = [\n...     \"vocab_size\",\n...     \"n_layer\",\n...     \"n_embd\",\n...     \"n_head\",\n...     \"n_positions\",\n...     \"attn_pdrop\",\n...     \"embd_pdrop\",\n...     \"resid_pdrop\",\n...     \"layer_norm_epsilon\",\n...     ]\n...\n>>> model_name = \"gpt2\"\n>>> model_official = AutoModelForCausalLM.from_pretrained(model_name, tie_word_embeddings=False)\n>>> config_official = model_official.config\n>>> config_official\n>>> config_ours = {name: getattr(config_official, name) for name in hparams_names}\n>>> config_ours\n>>> model_ours = GPT(**config_ours)\n>>> sum(p.numel() for p in model_ours.parameters())\n>>> sum(p.numel() for p in model_official.parameters())\n>>> _ = model_official.eval()\n>>> _ = model_ours.eval()\n>>> idx = torch.tensor([[1, 123, 52, 28]], dtype=torch.long)\n>>> logits_official = model_official(idx).logits\n>>> logits_ours = model_ours(idx)\n>>> logits_official.shape\n>>> logits_ours.shape\n>>> torch.allclose(logits_ours, logits_official, rtol=0, atol=1e-3)\n>>> (logits_ours - logits_official).abs().max()\n>>> from utils import copy_model\n>>> copy_model(model_official, model_ours)\n>>> logits_official = model_official(idx).logits\n>>> logits_ours = model_ours(idx)\n>>> torch.allclose(logits_ours, logits_official, rtol=0, atol=1e-3)\n>>> (logits_ours - logits_official).abs().max()\n"
  },
  {
    "path": "github_adventures/gpt/model.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom transformers.activations import gelu_new\n\n\nclass CustomGELU(nn.Module):\n    \"\"\"GELU implementation taken from the `transformers`.\"\"\"\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\"\"\"\n        return gelu_new(x)\n\n\nclass Block(nn.Module):\n    \"\"\"Decoder block.\n\n    Parameters\n    ----------\n    n_embd : int\n        Dimensionality of the embeddings.\n\n    n_head : int\n        Number of attention heads.\n\n    n_positions : int\n        Maximum number of tokens.\n\n    attn_pdrop : float\n        Probability of dropout on attention weights.\n\n    resid_pdrop : float\n        Probability of dropout after applying the MLP.\n\n    layer_norm_epsilon : float\n        Hyperparameter of layer normalization.\n\n    Attributes\n    ----------\n    ln_1, ln_2 : nn.LayerNorm\n        Layer norms.\n\n    attention : nn.MultiHeadAttention\n        Attention module.\n\n    mlp : nn.Sequential\n        Multilayer perceptron.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        *,\n        n_embd,\n        n_head,\n        n_positions,\n        attn_pdrop,\n        resid_pdrop,\n        layer_norm_epsilon,\n    ):\n        super().__init__()\n\n        self.ln_1 = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)\n        self.ln_2 = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)\n\n        self.attention = nn.MultiheadAttention(\n            embed_dim=n_embd,\n            num_heads=n_head,\n            dropout=attn_pdrop,\n            bias=True,\n            batch_first=True,\n        )\n        self.register_buffer(\n            \"mask\",\n            (1 - torch.tril(torch.ones(n_positions, n_positions))).to(\n                dtype=torch.bool\n            ),\n        )\n\n        self.mlp = nn.Sequential(\n            nn.Linear(n_embd, 4 * n_embd),\n            CustomGELU(),\n            nn.Linear(4 * n_embd, n_embd),\n            nn.Dropout(resid_pdrop),\n        )\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input tensor of shape `(batch_size, n_tokens, n_embd)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Output tensor of shape `(batch_size, n_tokens, n_embd)`.\n        \"\"\"\n        batch_size, n_tokens, n_embd = x.shape\n\n        x_ = self.ln_1(x)  # (batch_size, n_tokens, n_embd)\n\n        mask = self.mask[:n_tokens, :n_tokens]  # (n_tokens, n_tokens)\n\n        attn_out, _ = self.attention(\n            x_, x_, x_, attn_mask=mask, need_weights=False\n        )  # (batch_size, n_tokens, n_embd)\n        x = x + attn_out  # (batch_size, n_tokens, n_embd)\n        x = x + self.mlp(self.ln_2(x))  # (batch_size, n_tokens, n_embd)\n\n        return x\n\n\nclass GPT(nn.Module):\n    \"\"\"Entire GPT model.\n\n    Parameters\n    ----------\n    vocab_size : int\n        Number of tokens in the vocabulary.\n\n    n_layer : int\n        Number of decoder blocks to include.\n\n    n_embd : int\n        Dimensionality of the embeddings.\n\n    n_head : int\n        Number of attention heads.\n\n    n_positions : int\n        Maximum number of tokens.\n\n    attn_pdrop : float\n        Probability of dropout on attention weights.\n\n    embd_pdrop : float\n        Probability of dropout on the sum of embeddings.\n\n    resid_pdrop : float\n        Probability of dropout after applying the MLP.\n\n    layer_norm_epsilon : float\n        Hyperparameter of layer normalization.\n\n    Attributes\n    ----------\n    token_emb : nn.Embedding\n        Token embeddings.\n\n    pos_emb : nn.Embedding\n        Positional embedding.\n\n    drop : nn.Dropout\n        Dropout module to be applied on the sum of embeddings.\n\n    blocks : nn.Sequential\n        List of decoder blocks.\n\n    ln : nn.LayerNorm\n        Layer norm applied before applying `head`.\n\n    head : nn.Linear\n        Final linear layer.\n    \"\"\"\n\n    def __init__(\n        self,\n        *,\n        vocab_size,\n        n_layer,\n        n_embd,\n        n_head,\n        n_positions,\n        attn_pdrop,\n        embd_pdrop,\n        resid_pdrop,\n        layer_norm_epsilon,\n    ):\n        super().__init__()\n        self.n_positions = n_positions\n        self.token_emb = nn.Embedding(vocab_size, n_embd)\n        self.pos_emb = nn.Embedding(n_positions, n_embd)\n\n        self.drop = nn.Dropout(embd_pdrop)\n\n        self.blocks = nn.Sequential(\n            *[\n                Block(\n                    n_embd=n_embd,\n                    n_head=n_head,\n                    n_positions=n_positions,\n                    attn_pdrop=attn_pdrop,\n                    resid_pdrop=resid_pdrop,\n                    layer_norm_epsilon=layer_norm_epsilon,\n                )\n                for _ in range(n_layer)\n            ]\n        )\n        self.ln = nn.LayerNorm(n_embd, eps=layer_norm_epsilon)\n        self.head = nn.Linear(n_embd, vocab_size, bias=False)\n\n    def forward(self, idx):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        idx : torch.Tensor\n            Integer tensor of shape `(batch_size, n_tokens)` where each\n            element is in the range `[0, vocab_size)`.\n\n        Returns\n        -------\n        logits : torch.Tensor\n            Tensor of shape `(batch_size, n_tokens, vocab_size)`.\n        \"\"\"\n        batch_size, n_tokens = idx.shape\n        device = idx.device\n\n        if n_tokens > self.n_positions:\n            raise ValueError(\"There are too many tokens in the input\")\n\n        positions = torch.arange(n_tokens, device=device)  # (n_tokens,)\n\n        token_emb = self.token_emb(idx)  # (batch_size, n_tokens, n_embd)\n        pos_emb = self.pos_emb(positions)[None, ...]  # (1, n_tokens, n_embd)\n        x = self.drop(token_emb + pos_emb)  # (batch_size, n_tokens, n_embd)\n        x = self.blocks(x)  # (batch_size, n_tokens, n_embd)\n        x = self.ln(x)  # (batch_size, n_tokens, n_embd)\n        logits = self.head(x)  # (batch_size, n_tokens, vocab_size)\n\n        return logits\n"
  },
  {
    "path": "github_adventures/gpt/requirements.txt",
    "content": "ipython==7.30.1\nipywidgets==7.6.5\njupyter==1.0.0\nmatplotlib==3.5.1\nnumpy==1.21.5\ntorch==1.10.1\n-e git+https://github.com/huggingface/transformers.git@05fa1a7ac17bb7aa07b9e0c1e138ecb31a28bbfe#egg=transformers\n"
  },
  {
    "path": "github_adventures/gpt/utils.py",
    "content": "import torch\n\n\ndef copy_parameter(param_official, param_ours):\n    \"\"\"Copy values of one tensor to another tensor.\n\n    Parameters\n    ----------\n    param_official : torch.Tensor\n        The value of this tensor will be copied.\n\n    param_ours : torch.Tensor\n        This tensor will be overwritten in-place with the values from\n        `param_official`.\n    \"\"\"\n    if param_official.shape != param_ours.shape:\n        raise ValueError(\"The shapes of the provided tensors are different\")\n\n    with torch.no_grad():\n        param_ours.copy_(param_official)\n\n\ndef copy_block(block_official, block_ours):\n    \"\"\"Copy all parameters within a transformer block.\n\n    Parameters\n    ----------\n    block_official : transformers.models.gpt2.modeling_gpt2.GPT2Block\n        Block coming from the huggingface code.\n\n    block_ours : Block\n        Our block.\n    \"\"\"\n    b_a = block_official\n    b_b = block_ours\n\n    # LN 1\n    copy_parameter(b_a.ln_1.weight, b_b.ln_1.weight)\n    copy_parameter(b_a.ln_1.bias, b_b.ln_1.bias)\n\n    # Attention\n    copy_parameter(b_a.attn.c_attn.weight.T, b_b.attention.in_proj_weight)\n    copy_parameter(b_a.attn.c_attn.bias, b_b.attention.in_proj_bias)\n\n    copy_parameter(b_a.attn.c_proj.weight.T, b_b.attention.out_proj.weight)\n    copy_parameter(b_a.attn.c_proj.bias, b_b.attention.out_proj.bias)\n\n    # LN 2\n    copy_parameter(b_a.ln_2.weight, b_b.ln_2.weight)\n    copy_parameter(b_a.ln_2.bias, b_b.ln_2.bias)\n\n    # MLP\n    copy_parameter(b_a.mlp.c_fc.weight.T, b_b.mlp[0].weight)\n    copy_parameter(b_a.mlp.c_fc.bias, b_b.mlp[0].bias)\n\n    copy_parameter(b_a.mlp.c_proj.weight.T, b_b.mlp[2].weight)\n    copy_parameter(b_a.mlp.c_proj.bias, b_b.mlp[2].bias)\n\n\ndef copy_model(model_official, model_ours):\n    \"\"\"Copy all trainable weights.\n\n    Parameters\n    ----------\n    model_official : transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel\n        Huggingface model.\n\n    model_ours : GPT\n        Our model.\n    \"\"\"\n    m_a = model_official\n    m_b = model_ours\n\n    # Token and positional embeddings\n    copy_parameter(m_a.transformer.wpe.weight, m_b.pos_emb.weight)\n    copy_parameter(m_a.transformer.wte.weight, m_b.token_emb.weight)\n\n    # Blocks\n    for block_official, block_ours in zip(m_a.transformer.h, m_b.blocks):\n        copy_block(block_official, block_ours)\n\n    # Head\n    copy_parameter(m_a.transformer.ln_f.weight, m_b.ln.weight)\n    copy_parameter(m_a.transformer.ln_f.bias, m_b.ln.bias)\n    copy_parameter(m_a.lm_head.weight, m_b.head.weight)\n\n\n@torch.no_grad()\ndef generate_token(\n    model, token_ixs, temperature=1.0, sample=False, top_k=None\n):\n    \"\"\"Generate a single token given previous tokens.\n\n    Parameters\n    ----------\n    model : GPT\n        Our GPT model.\n\n    token_ixs : list\n        List of conditional input token ids.\n\n    temperature : float\n        The higher the more variability and vice versa.\n\n    sample : bool\n        If True, we sample from the distribution (=there is randomness). If\n        False, we just take the argmax (=there is no randomness).\n\n    top_k : int or None\n        If not None then we modify the distribution to only contain the `top_k`\n        most probable outcomes.\n\n    Returns\n    -------\n    new_token_ix : int\n        Index of the new token.\n    \"\"\"\n    context_token_ixs = token_ixs[-model.n_positions :]\n    ixs = torch.tensor(context_token_ixs).to(dtype=torch.long)[\n        None, :\n    ]  # (1, n_tokens)\n\n    logits_all = model(ixs)  # (1, n_tokens, vocab_size)\n    logits = logits_all[0, -1, :]  # (vocab_size,)\n    logits = logits / temperature  # (vocab_size,)\n\n    if top_k is not None:\n        # Find the top k biggest elements, set the remaining elements to -inf\n        top_values, _ = torch.topk(logits, top_k)  # (top_k,)\n        logits[logits < top_values.min()] = -torch.inf\n\n    probs = torch.nn.functional.softmax(logits, dim=0)  # (vocab_size,)\n\n    if sample:\n        new_token_ix = torch.multinomial(probs, num_samples=1)\n    else:\n        new_token_ix = probs.argmax()\n\n    return new_token_ix.item()\n"
  },
  {
    "path": "github_adventures/integer/README.md",
    "content": "# On-line encyclopedia of integer sequences\nYou can use the `fetch_data.py` to download the sequences. However,\nI actually found out (after filming the video) that you can literally\ndownload all the sequences here:\nhttps://oeis.org/wiki/Welcome#Compressed_Versions\n\n\nSo you should probably do that and spare their API.\n\n# The GloVe embeddings\nThe one that I used in the video are located here:\nhttps://nlp.stanford.edu/data/glove.6B.zip\n"
  },
  {
    "path": "github_adventures/integer/bert.py",
    "content": "import argparse\n\nimport numpy as np\nimport torch\nfrom torch.utils.tensorboard import SummaryWriter\nfrom transformers import BertModel, BertTokenizer\n\nfrom utils import create_classification_targets, train_classifier\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\"Evaluating BERT integer embeddings\")\n\n    parser.add_argument(\n        \"log_folder\",\n        type=str,\n        help=\"Folder where to log results\",\n    )\n    parser.add_argument(\n        \"--max-value-eval\",\n        type=int,\n        default=500,\n        help=\"Number of integers to run the evaluation on\",\n    )\n    args = parser.parse_args(argv)\n    model_name = \"bert-base-uncased\"\n\n    # Create writer\n    writer = SummaryWriter(args.log_folder)\n\n    tokenizer = BertTokenizer.from_pretrained(model_name)\n    model = BertModel.from_pretrained(model_name)\n\n    # Retrieve embeddings\n    to_find = list(map(str, range(args.max_value_eval)))\n    positions = np.array(tokenizer.convert_tokens_to_ids(to_find))\n    unk_token_position = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)\n    is_valid = positions != unk_token_position\n\n    print(\n        \"The following numbers are missing\",\n        [i for i, x in enumerate(is_valid) if not x],\n    )\n\n    arange = np.arange(args.max_value_eval)\n    numbers = arange[is_valid]\n    embeddings = (\n        model.embeddings.word_embeddings(torch.from_numpy(positions[is_valid]))\n        .detach()\n        .numpy()\n    )\n\n    ys_clf = create_classification_targets(numbers)\n\n    keys = sorted(ys_clf.keys())\n    metadata = np.array([numbers] + [ys_clf[k] for k in keys]).T.tolist()\n    metadata_header = [\"value\"] + keys\n\n    for name, y in ys_clf.items():\n        metrics = train_classifier(embeddings, y)\n        for metric_name, value in metrics.items():\n            writer.add_scalar(\n                f\"{name}/{metric_name}\",\n                value,\n            )\n\n    writer.add_embedding(\n        embeddings,\n        metadata=metadata,\n        metadata_header=metadata_header,\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/integer/experiments.sh",
    "content": "set -x\n\nOUTPUT_PATH=results\nGLOVE_PATH=glove.6B.300d.txt\nSEQUENCES_PATH=raw_data.pkl\nMAX_VALUE_EVAL=500\n\npython glove.py --max-value-eval $MAX_VALUE_EVAL $GLOVE_PATH $OUTPUT_PATH/glove\npython bert.py --max-value-eval $MAX_VALUE_EVAL $OUTPUT_PATH/BERT\npython lstm.py \\\n    $SEQUENCES_PATH \\\n    $OUTPUT_PATH/LSTM \\\n    --batch-size 128 \\\n    --device cuda \\\n    --embedding-dim 128 \\\n    --hidden-dim 256 \\\n    --max-value-eval $MAX_VALUE_EVAL \\\n    --max-value 20000 \\\n    --n-epochs 20000 \\\n    --sequence-len 100\n"
  },
  {
    "path": "github_adventures/integer/fetch_data.py",
    "content": "import pathlib\nimport pickle\n\nimport requests\n\nfrom joblib import Parallel, delayed, parallel_backend\n\n\ndef get_sequence(sequence_id):\n    \"\"\"Get an integer sequence from the online OEIS.\n\n    Parameters\n    ----------\n    sequence_id : int\n        Unique identifier for the desired sequence.\n\n    Returns\n    -------\n    sequence : list\n        List of integers\n\n    Raises\n    ------\n    HTTPError\n        Was not possible to get the given sequence\n    \"\"\"\n    url = f\"https://oeis.org/search?fmt=json&q=id:A{sequence_id:07}\"\n    print(sequence_id)\n    response = requests.get(url)\n\n    response.raise_for_status()\n\n    data_str = response.json()[\"results\"][0][\"data\"]\n    sequence = [int(x) for x in data_str.split(\",\")]\n\n    return sequence\n\n\nif __name__ == \"__main__\":\n    # Parameters\n    n_sequences = 5000\n    start_id = 1  # seems like 1 - 340_000 are valid sequences\n    n_jobs = 64\n    backend = \"threading\"  # \"threading\" or \"loky\"\n\n    # Preparation\n    end_id = start_id + n_sequences\n    output_folder = pathlib.Path(\"data/\")\n    output_folder.mkdir(exist_ok=True, parents=True)\n    output_path = output_folder / f\"{start_id}_{end_id - 1}.pkl\"\n\n    with parallel_backend(backend, n_jobs=n_jobs):\n        res = Parallel()(delayed(get_sequence)(i) for i in range(start_id, end_id))\n\n    with output_path.open(\"wb\") as f:\n        pickle.dump(res, f)\n\n"
  },
  {
    "path": "github_adventures/integer/glove.py",
    "content": "import argparse\n\nimport numpy as np\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom utils import create_classification_targets, train_classifier\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\"Evaluating GloVe integer embeddings\")\n\n    parser.add_argument(\n        \"glove_path\",\n        type=str,\n        help=\"Path to a txt file holding the GloVe embeddings\",\n    )\n    parser.add_argument(\n        \"log_folder\",\n        type=str,\n        help=\"Folder where to log results\",\n    )\n    parser.add_argument(\n        \"--max-value-eval\",\n        type=int,\n        default=500,\n        help=\"Number of integers to run the evaluation on\",\n    )\n    parser.add_argument(\n        \"--dim\",\n        type=int,\n        default=300,\n        help=\"Dimensionality of the embeddings\",\n    )\n    args = parser.parse_args()\n\n    # Create writer\n    writer = SummaryWriter(args.log_folder)\n\n    # Retrieve embeddings\n    to_find = set(map(str, range(args.max_value_eval)))\n    embeddings = np.empty((args.max_value_eval, args.dim))\n\n    with open(args.glove_path) as f:\n        for line in f:\n            token, *vector_ = line.split(\" \")\n\n            if token in to_find:\n                embeddings[int(token)] = list(map(float, vector_))\n                to_find.remove(token)\n\n    assert not to_find\n\n    arange = np.arange(args.max_value_eval)\n    ys_clf = create_classification_targets(arange)\n\n    keys = sorted(ys_clf.keys())\n    metadata = np.array([arange] + [ys_clf[k] for k in keys]).T.tolist()\n    metadata_header = [\"value\"] + keys\n\n    for name, y in ys_clf.items():\n        metrics = train_classifier(embeddings, y)\n        for metric_name, value in metrics.items():\n            writer.add_scalar(\n                f\"{name}/{metric_name}\",\n                value,\n            )\n\n    writer.add_embedding(\n        embeddings,\n        metadata=metadata,\n        metadata_header=metadata_header,\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/integer/lstm.py",
    "content": "import argparse\nimport json\nimport pathlib\nimport pickle\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport tqdm\nfrom torch.utils.data import DataLoader\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom utils import (\n    CustomDataset,\n    Network,\n    create_classification_targets,\n    train_classifier,\n)\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\"Embedding integers using LSTM\")\n\n    parser.add_argument(\n        \"data_path\", type=str, help=\"Path to the pickled sequences\"\n    )\n\n    parser.add_argument(\n        \"log_folder\", type=str, help=\"Folder where to log results\"\n    )\n\n    parser.add_argument(\n        \"-b\", \"--batch-size\", type=int, default=128, help=\"Batch size\"\n    )\n\n    parser.add_argument(\n        \"-d\", \"--dense-dim\", type=int, default=256, help=\"Dense dimension\"\n    )\n\n    parser.add_argument(\"--device\", type=str, default=\"cpu\", help=\"Device\")\n\n    parser.add_argument(\n        \"-e\",\n        \"--embedding-dim\",\n        type=int,\n        default=128,\n        help=\"Embedding dimension\",\n    )\n\n    parser.add_argument(\n        \"--hidden-dim\", type=int, default=256, help=\"Hidden dimension\"\n    )\n    parser.add_argument(\n        \"--max-value-eval\",\n        type=int,\n        default=500,\n        help=\"Evaluation limit\",\n    )\n\n    parser.add_argument(\n        \"-m\",\n        \"--max-value\",\n        type=int,\n        default=20000,\n        help=\"The maximum allowed value (non inclusive)\",\n    )\n\n    parser.add_argument(\n        \"-n\", \"--n-epochs\", type=int, default=100, help=\"Number of epochs\"\n    )\n\n    parser.add_argument(\n        \"-l\",\n        \"--sequence-len\",\n        type=int,\n        default=100,\n        help=\"The maximum length of a sequence\",\n    )\n\n    args = parser.parse_args(argv)\n\n    # Preparations\n    device = torch.device(args.device)\n    eval_frequency = 500\n\n    log_folder = pathlib.Path(args.log_folder)\n    model_path = log_folder / \"checkpoint.pth\"\n\n    writer = SummaryWriter(log_folder)\n    writer.add_text(\"parameters\", json.dumps(vars(args)))\n\n    # Dataset related\n    data_path = pathlib.Path(args.data_path)\n    with data_path.open(\"rb\") as f:\n        raw_sequences = pickle.load(f)\n\n    dataset = CustomDataset(\n        raw_sequences,\n        max_value=args.max_value,\n        sequence_len=args.sequence_len,\n    )\n\n    fig, ax = plt.subplots()\n    ax.hist(dataset.normalized_sequences.ravel(), bins=100)\n    ax.set_title(\n        f\"Number distribution (numbers={dataset.normalized_sequences.shape})\"\n    )\n    writer.add_figure(\"number distribution\", fig)\n\n    dataloader = DataLoader(\n        dataset,\n        shuffle=True,\n        batch_size=args.batch_size,\n        pin_memory=True,\n    )\n\n    # Newtork, loss and the optimizer\n    net = Network(\n        max_value=args.max_value,\n        hidden_dim=args.hidden_dim,\n        embedding_dim=args.embedding_dim,\n        dense_dim=args.dense_dim,\n    )\n\n    net.to(device)\n\n    loss_inst = nn.CrossEntropyLoss(\n        ignore_index=args.max_value,\n    )\n\n    optimizer = torch.optim.Adam(net.parameters())\n\n    # Validation preparation\n    max_value_eval = args.max_value_eval or args.max_value\n    arange = np.arange(max_value_eval)\n    ys_clf = create_classification_targets(arange)\n\n    keys = sorted(ys_clf.keys())\n    metadata = np.array([arange] + [ys_clf[k] for k in keys]).T.tolist()\n    metadata_header = [\"value\"] + keys\n\n    step = 0\n    for _ in range(args.n_epochs):\n        for x in tqdm.tqdm(dataloader):\n            x = x.to(device)\n            logits_ = net(x)  # (batch_size, sequence_len, max_value)\n\n            logits = logits_[:, :-1].permute(\n                0, 2, 1\n            )  # (batch_size, max_value, sequence_len - 1)\n            target = x[:, 1:]  # (batch_size, sequence_len - 1)\n            loss = loss_inst(logits, target)\n\n            optimizer.zero_grad()\n            loss.backward()\n            optimizer.step()\n\n            writer.add_scalar(\"loss\", loss, step)\n\n            if step % eval_frequency == 0:\n                X = (\n                    net.embedding.weight.detach()\n                    .cpu()\n                    .numpy()[:max_value_eval]\n                )\n\n                writer.add_embedding(\n                    X,\n                    global_step=step,\n                    tag=\"Integer embeddings\",\n                    metadata=metadata,\n                    metadata_header=metadata_header,\n                )\n\n                for name, y in ys_clf.items():\n                    metrics = train_classifier(X, y)\n                    for metric_name, value in metrics.items():\n                        writer.add_scalar(\n                            f\"{name}/{metric_name}\",\n                            value,\n                            step,\n                        )\n                torch.save(net, model_path)\n\n            step += 1\n\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/integer/requirements.txt",
    "content": "joblib\nmatplotlib\nnumpy\nrequests\nscikit-learn\nsympy\ntensorboard\ntorch\ntransformers\n"
  },
  {
    "path": "github_adventures/integer/utils.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import StratifiedKFold, cross_validate\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sympy.ntheory import isprime\nfrom torch.utils.data import Dataset\n\n\nclass CustomDataset(Dataset):\n    \"\"\"Dataset containing integer sequences.\n\n    Parameters\n    ----------\n    raw_sequences : list of list of str\n        Containing the original raw sequences. Note\n        that their length differs.\n\n    sequence_len : int\n        The lenght og the sequence. If the original sequence is shorter,\n        we just pad it with `max_value`. If the original sequence is longer\n        we simply cut if off.\n\n    max_value : int\n        The maximum allowed value (non inclusive). We will only consider\n        sequences that had the first `sequence_len` elements in\n        the range `[0, max_value)`.\n\n    Attributes\n    ----------\n    normalized_sequences : np.ndarray\n        2D array of shape `(n_sequences, sequence_len)`. It only contains\n        sequences that had the first `sequence_len` elements in\n        the range `[0, max_value)`.\n    \"\"\"\n\n    def __init__(\n        self,\n        raw_sequences,\n        sequence_len=80,\n        max_value=2000,\n    ):\n        filtered_sequences = list(\n            filter(\n                lambda seq: all(\n                    0 <= x < max_value for x in seq[:sequence_len]\n                ),\n                raw_sequences,\n            )\n        )\n\n        n_sequences = len(filtered_sequences)\n\n        self.normalized_sequences = max_value * np.ones(\n            (n_sequences, sequence_len),\n            dtype=np.int64,\n        )\n\n        for i, seq in enumerate(filtered_sequences):\n            actual_len = min(len(seq), sequence_len)\n            self.normalized_sequences[i, :actual_len] = seq[:actual_len]\n\n    def __len__(self):\n        \"\"\"Get the length of the dataset.\"\"\"\n        return len(self.normalized_sequences)\n\n    def __getitem__(self, ix):\n        \"\"\"Get a single sample of the dataset.\"\"\"\n        return self.normalized_sequences[ix]\n\n\nclass Network(nn.Module):\n    \"\"\"Network predicting next number in the sequence.\n\n    Parameters\n    ----------\n    max_value : int\n        Maximum integer value allowed inside of the sequence. We will\n        generate an embedding for each of the numbers in `[0, max_value]`.\n\n    embedding_dim : int\n        Dimensionality of the integer embeddings.\n\n    n_layers : int\n        Number of layers inside of the LSTM.\n\n    hidden_dim : int\n        Dimensionality of the hidden state (LSTM).\n\n    dense_dim : int\n        Dimensionality of the dense layer.\n\n    Attributes\n    ----------\n    embedding : torch.nn.Embedding\n        Embeddings of all the integers.\n\n    lstm : torch.nn.LSTM\n        LSTM subnetwork. Inputs integer embeddings and outputs\n        new hidden states.\n\n    linear : torch.nn.Linear\n        Inputs hidden states and tranforms them.\n\n    classifier : torch.nn.Linear\n        Inputs outputs of the `linear` and outputs the logits\n        over all possible integers.\n    \"\"\"\n\n    def __init__(\n        self,\n        max_value=2000,\n        embedding_dim=100,\n        n_layers=2,\n        hidden_dim=64,\n        dense_dim=256,\n    ):\n        super().__init__()\n\n        self.embedding = nn.Embedding(\n            num_embeddings=max_value + 1,\n            embedding_dim=embedding_dim,\n            padding_idx=max_value,\n        )\n\n        self.lstm = nn.LSTM(\n            input_size=embedding_dim,\n            hidden_size=hidden_dim,\n            num_layers=n_layers,\n            batch_first=True,\n        )\n\n        self.linear = nn.Linear(\n            hidden_dim,\n            dense_dim,\n        )\n\n        self.classifier = nn.Linear(\n            dense_dim,\n            max_value,\n        )\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input tensor of shape `(batch_size, sequence_len)` and has\n            dtype `torch.long`.\n\n        Returns\n        -------\n        logits : torch.Tensor\n            Logits over all possible integers of shape\n            `(batch_size, sequence_len, max_value)`.\n        \"\"\"\n        emb = self.embedding(x)  # (batch_size, sequence_len, embedding_dim)\n        h, *_ = self.lstm(emb)  # (batch_size, sequence_len, hidden_dim)\n        dense = torch.relu(\n            self.linear(h)\n        )  # (batch_size, sequence_len, dense_dim)\n        logits = self.classifier(\n            dense\n        )  # (batch_size, sequence_len, max_value)\n\n        return logits\n\n\ndef train_classifier(X, y, random_state=2):\n    \"\"\"Cross-validate classification problem using logistic regression.\n\n    Parameters\n    ----------\n    X : np.ndarray\n        2D array holding the features of shape `(n_samples, n_features)`.\n\n    y : np.ndarray\n        1D array holding the classification targets of shape `(n_samples,)`.\n\n    random_state : int\n        Guaranteeing reproducibility.\n\n    Returns\n    -------\n    metrics : dict\n        Holds train and validation accuracy averaged over all the folds.\n    \"\"\"\n    cv = StratifiedKFold(\n        n_splits=5,\n        random_state=random_state,\n        shuffle=True,\n    )\n\n    clf = make_pipeline(\n        StandardScaler(),\n        LogisticRegression(\n            max_iter=2000,\n            random_state=random_state,\n        ),\n    )\n\n    res = cross_validate(\n        clf,\n        X,\n        y,\n        return_train_score=True,\n        cv=cv,\n    )\n\n    metrics = {\n        \"train_acc\": res[\"train_score\"].mean(),\n        \"test_acc\": res[\"test_score\"].mean(),\n    }\n\n    return metrics\n\n\ndef create_classification_targets(indices):\n    \"\"\"Create multiple classification targets.\n\n    They represent common properties of integers.\n\n    Parameters\n    ----------\n    indices : np.ndarray\n        1D array holding the integers for which we want to compute\n        the targets.\n\n    Returns\n    -------\n    targets : dict\n        Keys are property names and the values are arrays of the same shape\n        as `indices` representing whether a given integer does / does not\n        have a given property.\n    \"\"\"\n\n    targets = {\n        \"divisibility_2\": (indices % 2 == 0).astype(float),\n        \"divisibility_3\": (indices % 3 == 0).astype(float),\n        \"divisibility_4\": (indices % 4 == 0).astype(float),\n        \"divisibility_5\": (indices % 5 == 0).astype(float),\n        \"divisibility_10\": (indices % 10 == 0).astype(float),\n        \"prime\": np.vectorize(isprime)(indices).astype(float),\n    }\n\n    return targets\n"
  },
  {
    "path": "github_adventures/lottery/README.md",
    "content": "# The Lottery Ticket Hypothesis\n## Installation\n```bash\npip install -r requirements.txt\n```\n\n## Running experiments\nThe training logic is implemented inside of the script `main.py`. To\nget more information about the CLI run\n\n```bash\npython main.py --help\n```\n\nIf you want to run an entire grid search over different hyperparameters\nyou can use the `parallel_launch.sh` script. Note that it depends on a tool\ncalled `parallel` ([more info](https://www.gnu.org/software/parallel/)). Note\nthat the script allows for dry runs (default behavior) and progress bars.\n\n```bash\n./parallel_launch.sh\n```\n"
  },
  {
    "path": "github_adventures/lottery/data.py",
    "content": "from torch.utils.data import Dataset\nfrom torchvision.datasets import MNIST\nfrom torchvision.transforms import Compose, Lambda, ToTensor\n\n\nclass MNISTDataset(Dataset):\n    \"\"\"MNIST dataset.\n\n    Feature images are automatically flattened.\n\n    Parameters\n    ----------\n    root : str\n        Directory where the actual data is located (or downloaded to).\n\n    train : bool\n        If True the training set is returned (60_000 samples). Otherwise\n        the validation set is returned (10_000 samples).\n\n    Attributes\n    ----------\n    tv_dataset : MNIST\n        Instance of the torchvision `MNIST` dataset class.\n    \"\"\"\n\n    def __init__(self, root, train=True, download=True):\n        transform = Compose(\n            [\n                ToTensor(),\n                Lambda(lambda x: x.ravel()),\n            ]\n        )\n\n        self.tv_dataset = MNIST(\n            root,\n            train=train,\n            download=download,\n            transform=transform,\n        )\n\n    def __len__(self):\n        \"\"\"Get the length of the dataset.\"\"\"\n        return len(self.tv_dataset)\n\n    def __getitem__(self, ix):\n        \"\"\"Get a selected sample.\n\n        Parameters\n        ----------\n        ix : int\n            Index of the sample to get.\n\n        Returns\n        -------\n        x : torch.Tensor\n            Flattened feature tensor of shape `(784,)`.\n\n        y : torch.Tensor\n            Scalar representing the ground truth label. Number between 0 and 9.\n        \"\"\"\n        return self.tv_dataset[ix]\n"
  },
  {
    "path": "github_adventures/lottery/main.py",
    "content": "import argparse\n\nimport torch\nimport torch.nn as nn\nimport tqdm\nfrom torch.utils.data import DataLoader\n\nimport wandb\nfrom data import MNISTDataset\nfrom utils import MLP, compute_stats, copy_weights_mlp, prune_mlp, reinit_mlp\n\n\ndef loop_dataloader(dataloader):\n    \"\"\"Loop infinitely over a dataloader.\n\n    Parameters\n    ----------\n    dataloader : DataLoader\n        DataLoader streaming batches of samples.\n\n    Yields\n    ------\n    X_batch : torch.Tensor\n        Batch of features.\n\n    y_batch : torch.Tensor\n        Batch of predictions.\n    \"\"\"\n    while True:\n        for x in iter(dataloader):\n            yield x\n\n\ndef train(\n    model,\n    dataloader_train,\n    loss_inst,\n    optimizer,\n    max_iter=10_000,\n    dataloader_val=None,\n    val_freq=500,\n):\n    \"\"\"Run the training loop.\n\n    Parameters\n    ----------\n    model : nn.Module\n        Neural network (in our case MLP).\n\n    dataloader_train : DataLoader\n        Dataloader yielding training samples.\n\n    loss_inst : callable\n        Computes the loss when called.\n\n    optimizer : torch.optim.Optimizer\n        Instance of an optimizer.\n\n    max_iter : int\n        The number of iterations we run the training for\n        (= number of graident descent steps).\n\n    dataloader_val : None or DataLoader\n        Dataloader yielding validation samples. If provided it will\n        also single to us that we want to track metrics.\n\n    val_freq : int\n        How often evaluation run.\n    \"\"\"\n    iterable = loop_dataloader(dataloader_train)\n    iterable = tqdm.tqdm(iterable, total=max_iter)\n\n    it = 0\n    for X_batch, y_batch in iterable:\n        if it == max_iter:\n            break\n\n        logit_batch = model(X_batch)\n\n        loss = loss_inst(logit_batch, y_batch)\n        if dataloader_val is not None:\n            wandb.log({\"loss\": loss}, step=it)\n\n        optimizer.zero_grad()\n        loss.backward()\n        optimizer.step()\n\n        if it % val_freq == 0 and dataloader_val is not None:\n            is_equal = []\n\n            for X_batch_val, y_batch_val in dataloader_val:\n                is_equal.append(\n                    model(X_batch_val).argmax(dim=-1) == y_batch_val\n                )\n\n            is_equal_t = torch.cat(is_equal)\n            acc = is_equal_t.sum() / len(is_equal_t)\n            wandb.log({\"accuracy_val\": acc}, step=it)\n\n        it += 1\n\n\ndef main(argv=None):\n    \"\"\"Create CLI and run experiments.\"\"\"\n    parser = argparse.ArgumentParser(\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter\n    )\n\n    parser.add_argument(\n        \"-i\",\n        \"--max-iter\",\n        help=\"Number of iterations\",\n        type=int,\n        default=50000,\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch-size\",\n        help=\"Batch size\",\n        type=int,\n        default=60,\n    )\n    parser.add_argument(\n        \"--prune-iter\",\n        help=\"Number of prune iterations\",\n        type=int,\n        default=1,\n    )\n    parser.add_argument(\n        \"-m\",\n        \"--prune-method\",\n        help=\"Pruning method to employ\",\n        type=str,\n        choices=(\"l1\", \"random\"),\n        default=\"l1\",\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--prune-ratio\",\n        help=\"Percentage of weights to remove\",\n        type=float,\n        default=0.2,\n    )\n    parser.add_argument(\n        \"--val-freq\",\n        help=\"How often to compute the validation accuracy\",\n        type=int,\n        default=250,\n    )\n    parser.add_argument(\n        \"-r\",\n        \"--reinitialize\",\n        help=\"If true, reinitializes randomly all weights after pruning\",\n        type=str,\n        choices=(\"true\", \"false\"),  # easy for hyperparameter search\n        default=\"false\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--random-state\",\n        help=\"Random state\",\n        type=int,\n    )\n    parser.add_argument(\n        \"--wandb-entity\",\n        help=\"W&B entity\",\n        type=str,\n        default=\"mildlyoverfitted\",\n    )\n    parser.add_argument(\n        \"--wandb-project\",\n        help=\"W&B project\",\n        type=str,\n    )\n    args = parser.parse_args(argv)\n\n    wandb.init(\n        project=args.wandb_project,\n        entity=args.wandb_entity,\n        config=vars(args),\n    )\n    wandb.define_metric(\"accuracy_val\", summary=\"max\")\n\n    dataset_train = MNISTDataset(\n        \"data\",\n        train=True,\n        download=True,\n    )\n    dataset_val = MNISTDataset(\n        \"data\",\n        train=False,\n        download=True,\n    )\n\n    if args.random_state is not None:\n        torch.manual_seed(args.random_state)\n\n    dataloader_train = DataLoader(\n        dataset_train, batch_size=args.batch_size, shuffle=True\n    )\n    dataloader_val = DataLoader(\n        dataset_val, batch_size=args.batch_size, shuffle=True\n    )\n\n    kwargs = dict(\n        n_features=28 * 28,\n        hidden_layer_sizes=(300, 100),\n        n_targets=10,\n    )\n\n    mlp = MLP(**kwargs)\n\n    mlp_copy = MLP(**kwargs)\n    mlp_copy.load_state_dict(mlp.state_dict())\n\n    loss_inst = nn.CrossEntropyLoss()\n    optimizer = torch.optim.Adam(mlp.parameters(), lr=1.2 * 1e-3)\n\n    # Train and prune loop\n    if args.prune_ratio > 0:\n        per_round_prune_ratio = 1 - (1 - args.prune_ratio) ** (\n            1 / args.prune_iter\n        )\n\n        per_round_prune_ratios = [per_round_prune_ratio] * len(mlp.module_list)\n        per_round_prune_ratios[-1] /= 2\n\n        per_round_max_iter = int(args.max_iter / args.prune_iter)\n\n        for prune_it in range(args.prune_iter):\n            train(\n                mlp,\n                dataloader_train,\n                loss_inst,\n                optimizer,\n                max_iter=per_round_max_iter,\n            )\n            prune_mlp(mlp, per_round_prune_ratios, method=args.prune_method)\n\n            copy_weights_mlp(mlp_copy, mlp)\n\n            stats = compute_stats(mlp)\n            for name, stat in stats.items():\n                summary_name = f\"{name}_pruneiter={prune_it}\"\n                wandb.run.summary[summary_name] = stat\n\n    if args.reinitialize == \"true\":\n        reinit_mlp(mlp)\n\n    # Run actual training with a final pruned network\n    train(\n        mlp,\n        dataloader_train,\n        loss_inst,\n        optimizer,\n        max_iter=args.max_iter,\n        dataloader_val=dataloader_val,\n        val_freq=args.val_freq,\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/lottery/parallel_launch.sh",
    "content": "# Parallel parameters\nN_JOBS=4\nARGS=\"-P$N_JOBS --header :\" # arguments for parallel\n# ARGS=\"--bar \"$ARGS\nARGS=\"--dry-run \"$ARGS\n\n# Experiment parameters\nENTITY='mildlyoverfitted'\nPROJECT='lottery_parallel_2'  # it should already exist to avoid issues\n\nMAX_ITERS=(15000)\nPRUNE_ITERS=(1 5)\nPRUNE_METHODS=('l1' 'random')\nPRUNE_RATIOS=(0 0.1 0.25 0.5 0.8 0.9 0.93 0.97)\nREINITIALIZES=('true' 'false')\nRANDOM_STATES=(1 2 3 4 5)\n\nparallel $ARGS \\\n    python main.py \\\n        --max-iter={max_iter} \\\n        --prune-iter={prune_iter} \\\n        --prune-method={prune_method} \\\n        --prune-ratio={prune_ratio} \\\n        --random-state={random_state} \\\n        --reinitialize={reinitialize} \\\n        --wandb-entity=$ENTITY \\\n        --wandb-project=$PROJECT \\\n            ::: max_iter \"${MAX_ITERS[@]}\" \\\n            ::: prune_iter \"${PRUNE_ITERS[@]}\" \\\n            ::: prune_method \"${PRUNE_METHODS[@]}\" \\\n            ::: prune_ratio \"${PRUNE_RATIOS[@]}\" \\\n            ::: random_state \"${RANDOM_STATES[@]}\" \\\n            ::: reinitialize \"${REINITIALIZES[@]}\" \\\n"
  },
  {
    "path": "github_adventures/lottery/requirements.txt",
    "content": "numpy\npillow\nsix\ntorch\ntorch-vision\ntqdm\nwandb\n"
  },
  {
    "path": "github_adventures/lottery/utils.py",
    "content": "import math\n\nimport torch\nimport torch.nn as nn\nfrom torch.nn.utils.prune import l1_unstructured, random_unstructured\n\n\nclass MLP(nn.Module):\n    \"\"\"Multilayer perceptron.\n\n    The bias is included in all linear layers.\n\n    Parameters\n    ----------\n    n_features : int\n        Number of input features (pixels inside of MNIST images).\n\n    hidden_layer_sizes : tuple\n        Tuple of ints representing sizes of the hidden layers.\n\n    n_targets : int\n        Number of target classes (10 for MNIST).\n\n    Attributes\n    ----------\n    module_list : nn.ModuleList\n        List holding all the linear layers in the right order.\n    \"\"\"\n\n    def __init__(self, n_features, hidden_layer_sizes, n_targets):\n        super().__init__()\n\n        layer_sizes = (n_features,) + hidden_layer_sizes + (n_targets,)\n        layer_list = []\n\n        for i in range(len(layer_sizes) - 1):\n            layer_list.append(nn.Linear(layer_sizes[i], layer_sizes[i + 1]))\n\n        self.module_list = nn.ModuleList(layer_list)\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Batch of features of shape `(batch_size, n_features)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Batch of predictions (logits) of shape `(batch_size, n_targets)`.\n        \"\"\"\n        n_layers = len(self.module_list)\n\n        for i, layer in enumerate(self.module_list):\n            x = layer(x)\n\n            if i < n_layers - 1:\n                x = nn.functional.relu(x)\n\n        return x\n\n\ndef prune_linear(linear, prune_ratio=0.3, method=\"l1\"):\n    \"\"\"Prune a linear layer.\n\n    Modifies the module in-place. We make an assumption that the bias\n    is included.\n\n    Parameters\n    ----------\n    linear : nn.Linear\n        Linear module containing a bias.\n\n    prune_ratio : float\n        Number between 0 and 1 representing the percentage of weights\n        to prune.\n\n    method : str, {\"l1\", \"random\"}\n        Pruning method to use.\n    \"\"\"\n    if method == \"l1\":\n        prune_func = l1_unstructured\n    elif method == \"random\":\n        prune_func = random_unstructured\n    else:\n        raise ValueError\n\n    prune_func(linear, \"weight\", prune_ratio)\n    prune_func(linear, \"bias\", prune_ratio)\n\n\ndef prune_mlp(mlp, prune_ratio=0.3, method=\"l1\"):\n    \"\"\"Prune each layer of the multilayer perceptron.\n\n    Modifies the module in-place. We make an assumption that each\n    linear layer has the bias included.\n\n    Parameters\n    ----------\n    mlp : MLP\n        Multilayer perceptron instance.\n\n    prune_ratio : float or list\n        Number between 0 and 1 representing the percentage of weights\n        to prune. If `list` then different ratio for each\n        layer.\n\n    method : str, {\"l1\", \"random\"}\n        Pruning method to use.\n    \"\"\"\n    if isinstance(prune_ratio, float):\n        prune_ratios = [prune_ratio] * len(mlp.module_list)\n    elif isinstance(prune_ratio, list):\n        if len(prune_ratio) != len(mlp.module_list):\n            raise ValueError(\"Incompatible number of prune ratios provided\")\n\n        prune_ratios = prune_ratio\n    else:\n        raise TypeError\n\n    for prune_ratio, linear in zip(prune_ratios, mlp.module_list):\n        prune_linear(linear, prune_ratio=prune_ratio, method=method)\n\n\ndef check_pruned_linear(linear):\n    \"\"\"Check if a Linear module was pruned.\n\n    We require both the bias and the weight to be pruned.\n\n    Parameters\n    ----------\n    linear : nn.Linear\n        Linear module containing a bias.\n\n    Returns\n    -------\n    bool\n        True if the model has been pruned.\n    \"\"\"\n    params = {param_name for param_name, _ in linear.named_parameters()}\n    expected_params = {\"weight_orig\", \"bias_orig\"}\n\n    return params == expected_params\n\n\ndef reinit_linear(linear):\n    \"\"\"Reinitialize a linear layer.\n\n    This is an in-place operation.\n    If the module has some pruning logic we are not going to remove it\n    and we only initialize the underlying tensors - `weight_orig` and\n    `bias_orig`.\n\n    Parameters\n    ----------\n    linear : nn.Linear\n        Linear model containing a bias.\n    \"\"\"\n    is_pruned = check_pruned_linear(linear)\n\n    # Get parameters of interest\n    if is_pruned:\n        weight = linear.weight_orig\n        bias = linear.bias_orig\n    else:\n        weight = linear.weight\n        bias = linear.bias\n\n    # Initialize weight\n    nn.init.kaiming_uniform_(weight, a=math.sqrt(5))\n\n    # Initialize bias\n    fan_in, _ = nn.init._calculate_fan_in_and_fan_out(weight)\n    bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0\n    nn.init.uniform_(bias, -bound, bound)\n\n\ndef reinit_mlp(mlp):\n    \"\"\"Reinitialize all layers of the MLP.\n\n    Parameters\n    ----------\n    mlp : MLP\n        Multi-layer perceptron.\n    \"\"\"\n    for linear in mlp.module_list:\n        reinit_linear(linear)\n\n\ndef copy_weights_linear(linear_unpruned, linear_pruned):\n    \"\"\"Copy weights from an unpruned model to a pruned model.\n\n    Modifies `linear_pruned` in place.\n\n    Parameters\n    ----------\n    linear_unpruned : nn.Linear\n        Linear model with a bias that was not pruned.\n\n    linear_pruned : nn.Linear\n        Linear model with a bias that was pruned.\n    \"\"\"\n    assert check_pruned_linear(linear_pruned)\n    assert not check_pruned_linear(linear_unpruned)\n\n    with torch.no_grad():\n        linear_pruned.weight_orig.copy_(linear_unpruned.weight)\n        linear_pruned.bias_orig.copy_(linear_unpruned.bias)\n\n\ndef copy_weights_mlp(mlp_unpruned, mlp_pruned):\n    \"\"\"Copy weights of an unpruned network to a pruned network.\n\n    Modifies `mlp_pruned` in place.\n\n    Parameters\n    ----------\n    mlp_unpruned : MLP\n        MLP model that was not pruned.\n\n    mlp_pruned : MLP\n        MLP model that was pruned.\n    \"\"\"\n    zipped = zip(mlp_unpruned.module_list, mlp_pruned.module_list)\n\n    for linear_unpruned, linear_pruned in zipped:\n        copy_weights_linear(linear_unpruned, linear_pruned)\n\n\ndef compute_stats(mlp):\n    \"\"\"Compute important statistics related to pruning.\n\n    Parameters\n    ----------\n    mlp : MLP\n        Multilayer perceptron.\n\n    Returns\n    -------\n    dict\n        Statistics.\n    \"\"\"\n    stats = {}\n    total_params = 0\n    total_pruned_params = 0\n\n    for layer_ix, linear in enumerate(mlp.module_list):\n        assert check_pruned_linear(linear)\n\n        weight_mask = linear.weight_mask\n        bias_mask = linear.bias_mask\n\n        params = weight_mask.numel() + bias_mask.numel()\n        pruned_params = (weight_mask == 0).sum() + (bias_mask == 0).sum()\n\n        total_params += params\n        total_pruned_params += pruned_params\n\n        stats[f\"layer{layer_ix}_total_params\"] = params\n        stats[f\"layer{layer_ix}_pruned_params\"] = pruned_params\n        stats[f\"layer{layer_ix}_actual_prune_ratio\"] = pruned_params / params\n\n    stats[\"total_params\"] = total_params\n    stats[\"total_pruned_params\"] = total_pruned_params\n    stats[\"actual_prune_ratio\"] = total_pruned_params / total_params\n\n    return stats\n"
  },
  {
    "path": "github_adventures/mixer/README.md",
    "content": "Note that the `official.py` is just a copy of the\ncode provided in `https://arxiv.org/abs/2105.01601` and probably here\n`https://github.com/google-research/vision_transformer`. Please refer to those\nsources for licensing information.\n"
  },
  {
    "path": "github_adventures/mixer/official.py",
    "content": "import einops\nimport flax.linen as nn\nimport jax.numpy as jnp\n\n\nclass MlpBlock(nn.Module):\n    mlp_dim: int\n\n    @nn.compact\n    def __call__(self, x):\n        y = nn.Dense(self.mlp_dim)(x)\n        y = nn.gelu(y)\n        return nn.Dense(x.shape[-1])(y)\n\n\nclass MixerBlock(nn.Module):\n    tokens_mlp_dim: int\n    channels_mlp_dim: int\n\n    @nn.compact\n    def __call__(self, x):\n        y = nn.LayerNorm()(x)  # (n_samples, n_patches, hidden_dim)\n        y = jnp.swapaxes(y, 1, 2)\n        y = MlpBlock(self.tokens_mlp_dim, name=\"token_mixing\")(y)\n        y = jnp.swapaxes(y, 1, 2)\n        x = x + y\n        y = nn.LayerNorm()(x)\n        return x + MlpBlock(self.channels_mlp_dim, name=\"channel_mixing\")(y)\n\n\nclass MlpMixer(nn.Module):\n    num_classes: int\n    num_blocks: int\n    patch_size: int\n    hidden_dim: int\n    tokens_mlp_dim: int\n    channels_mlp_dim: int\n\n    @nn.compact\n    def __call__(self, x):\n        s = self.patch_size\n        x = nn.Conv(self.hidden_dim, (s, s), strides=(s, s), name=\"stem\")(x)\n        x = einops.rearrange(x, \"n h w c -> n (h w) c\")\n        for _ in range(self.num_blocks):\n            x = MixerBlock(self.tokens_mlp_dim, self.channels_mlp_dim)(x)\n        x = nn.LayerNorm(name=\"pre_head_layer_norm\")(x)\n        x = jnp.mean(x, axis=1)\n        return nn.Dense(\n            self.num_classes, name=\"head\", kernel_init=nn.initializers.zeros\n        )(x)\n"
  },
  {
    "path": "github_adventures/mixer/ours.py",
    "content": "import einops\nimport torch.nn as nn\n\n\nclass MlpBlock(nn.Module):\n    \"\"\"Multilayer perceptron.\n\n    Parameters\n    ----------\n    dim : int\n        Input and output dimension of the entire block. Inside of the mixer\n        it will either be equal to `n_patches` or `hidden_dim`.\n\n    mlp_dim : int\n        Dimension of the hidden layer.\n\n    Attributes\n    ----------\n    linear_1, linear_2 : nn.Linear\n        Linear layers.\n\n    activation : nn.GELU\n        Activation.\n    \"\"\"\n\n    def __init__(self, dim, mlp_dim=None):\n        super().__init__()\n\n        mlp_dim = dim if mlp_dim is None else mlp_dim\n        self.linear_1 = nn.Linear(dim, mlp_dim)\n        self.activation = nn.GELU()\n        self.linear_2 = nn.Linear(mlp_dim, dim)\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input tensor of shape `(n_samples, n_channels, n_patches)` or\n            `(n_samples, n_patches, n_channels)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Output tensor that has exactly the same shape as the input `x`.\n        \"\"\"\n        x = self.linear_1(x)  # (n_samples, *, mlp_dim)\n        x = self.activation(x)  # (n_samples, *, mlp_dim)\n        x = self.linear_2(x)  # (n_samples, *, dim)\n        return x\n\n\nclass MixerBlock(nn.Module):\n    \"\"\"Mixer block that contains two `MlpBlock`s and two `LayerNorm`s.\n\n    Parameters\n    ----------\n    n_patches : int\n        Number of patches the image is split up into.\n\n    hidden_dim : int\n        Dimensionality of patch embeddings.\n\n    tokens_mlp_dim : int\n        Hidden dimension for the `MlpBlock` when doing token mixing.\n\n    channels_mlp_dim : int\n        Hidden dimension for the `MlpBlock` when doing channel mixing.\n\n    Attributes\n    ----------\n    norm_1, norm_2 : nn.LayerNorm\n        Layer normalization.\n\n    token_mlp_block : MlpBlock\n        Token mixing MLP.\n\n    channel_mlp_block : MlpBlock\n        Channel mixing MLP.\n    \"\"\"\n\n    def __init__(\n        self, *, n_patches, hidden_dim, tokens_mlp_dim, channels_mlp_dim\n    ):\n        super().__init__()\n\n        self.norm_1 = nn.LayerNorm(hidden_dim)\n        self.norm_2 = nn.LayerNorm(hidden_dim)\n\n        self.token_mlp_block = MlpBlock(n_patches, tokens_mlp_dim)\n        self.channel_mlp_block = MlpBlock(hidden_dim, channels_mlp_dim)\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Tensor of shape `(n_samples, n_patches, hidden_dim)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Tensor of the same shape as `x`, i.e.\n            `(n_samples, n_patches, hidden_dim)`.\n        \"\"\"\n        y = self.norm_1(x)  # (n_samples, n_patches, hidden_dim)\n        y = y.permute(0, 2, 1)  # (n_samples, hidden_dim, n_patches)\n        y = self.token_mlp_block(y)  # (n_samples, hidden_dim, n_patches)\n        y = y.permute(0, 2, 1)  # (n_samples, n_patches, hidden_dim)\n        x = x + y  # (n_samples, n_patches, hidden_dim)\n        y = self.norm_2(x)  # (n_samples, n_patches, hidden_dim)\n        res = x + self.channel_mlp_block(\n            y\n        )  # (n_samples, n_patches, hidden_dim)\n        return res\n\n\nclass MlpMixer(nn.Module):\n    \"\"\"Entire network.\n\n    Parameters\n    ----------\n    image_size : int\n        Height and width (assuming it is a square) of the input image.\n\n    patch_size : int\n        Height and width (assuming it is a square) of the patches. Note\n        that we assume that `image_size % patch_size == 0`.\n\n    tokens_mlp_dim : int\n        Hidden dimension for the `MlpBlock` when doing the token mixing.\n\n    channels_mlp_dim : int\n        Hidden dimension for the `MlpBlock` when diong the channel mixing.\n\n    n_classes : int\n        Number of classes for classification.\n\n    hidden_dim : int\n        Dimensionality of patch embeddings.\n\n    n_blocks : int\n        The number of `MixerBlock`s in the architecture.\n\n    Attributes\n    ----------\n    patch_embedder : nn.Conv2D\n        Splits the image up into multiple patches and then embeds each of them\n        (using shared weights).\n\n    blocks : nn.ModuleList\n        List of `MixerBlock` instances.\n\n    pre_head_norm : nn.LayerNorm\n        Layer normalization applied just before the classification head.\n\n    head_classifier : nn.Linear\n        The classification head.\n    \"\"\"\n    def __init__(\n        self,\n        *,\n        image_size,\n        patch_size,\n        tokens_mlp_dim,\n        channels_mlp_dim,\n        n_classes,\n        hidden_dim,\n        n_blocks,\n    ):\n        super().__init__()\n        n_patches = (image_size // patch_size) ** 2 # assumes divisibility\n\n        self.patch_embedder = nn.Conv2d(\n            3,\n            hidden_dim,\n            kernel_size=patch_size,\n            stride=patch_size,\n        )\n        self.blocks = nn.ModuleList(\n            [\n                MixerBlock(\n                    n_patches=n_patches,\n                    hidden_dim=hidden_dim,\n                    tokens_mlp_dim=tokens_mlp_dim,\n                    channels_mlp_dim=channels_mlp_dim,\n                )\n                for _ in range(n_blocks)\n            ]\n        )\n\n        self.pre_head_norm = nn.LayerNorm(hidden_dim)\n        self.head_classifier = nn.Linear(hidden_dim, n_classes)\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input batch of square images of shape\n            `(n_samples, n_channels, image_size, image_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Class logits of shape `(n_samples, n_classes)`.\n        \"\"\"\n        x = self.patch_embedder(\n            x\n        )  # (n_samples, hidden_dim, n_patches ** (1/2), n_patches ** (1/2))\n        x = einops.rearrange(\n            x, \"n c h w -> n (h w) c\"\n        )  # (n_samples, n_patches, hidden_dim)\n        for mixer_block in self.blocks:\n            x = mixer_block(x)  # (n_samples, n_patches, hidden_dim)\n\n        x = self.pre_head_norm(x)  # (n_samples, n_patches, hidden_dim)\n        x = x.mean(dim=1)  # (n_samples, hidden_dim)\n        y = self.head_classifier(x)  # (n_samples, n_classes)\n\n        return y\n"
  },
  {
    "path": "github_adventures/mixer/test_compare.py",
    "content": "import jax\nimport numpy as np\nimport pytest\nimport torch\n\nfrom official import MlpMixer as OfficialMixer\nfrom ours import MlpMixer as OurMixer\n\n\n@pytest.mark.parametrize(\"image_size\", [6, 12])\n@pytest.mark.parametrize(\"patch_size\", [2, 3])\n@pytest.mark.parametrize(\"hidden_dim\", [4, 5])\n@pytest.mark.parametrize(\"n_blocks\", [1, 2])\n@pytest.mark.parametrize(\"n_classes\", [4, 8])\n@pytest.mark.parametrize(\"tokens_mlp_dim\", [2, 4])\n@pytest.mark.parametrize(\"channels_mlp_dim\", [3, 6])\ndef test_compare(\n    image_size,\n    patch_size,\n    hidden_dim,\n    n_blocks,\n    n_classes,\n    tokens_mlp_dim,\n    channels_mlp_dim,\n):\n    # Create Flax model\n    model_flax = OfficialMixer(\n        num_classes=n_classes,\n        num_blocks=n_blocks,\n        patch_size=patch_size,\n        hidden_dim=hidden_dim,\n        tokens_mlp_dim=tokens_mlp_dim,\n        channels_mlp_dim=channels_mlp_dim,\n    )\n    key1, key2 = jax.random.split(jax.random.PRNGKey(0))\n    x = jax.random.normal(key1, (11, image_size, image_size, 3))  # Dummy input\n    params = model_flax.init(key2, x)  # initialization call\n\n    n_params_flax = sum(\n        jax.tree_leaves(jax.tree_map(lambda x: np.prod(x.shape), params))\n    )\n    shape_flax = model_flax.apply(params, x).shape\n\n    # Create Torch model\n    model_torch = OurMixer(\n        image_size=image_size,\n        patch_size=patch_size,\n        hidden_dim=hidden_dim,\n        n_blocks=n_blocks,\n        n_classes=n_classes,\n        tokens_mlp_dim=tokens_mlp_dim,\n        channels_mlp_dim=channels_mlp_dim,\n    )\n\n    n_params_torch = sum(\n        p.numel() for p in model_torch.parameters() if p.requires_grad\n    )\n    shape_torch = model_torch(torch.rand(11, 3, image_size, image_size)).shape\n\n    assert n_params_flax == n_params_torch\n    assert shape_flax == shape_torch == (11, n_classes)\n"
  },
  {
    "path": "github_adventures/mixup/launch_experiments.sh",
    "content": "set -x\n\nN_EPOCHS=100000\nN_SAMPLES=1000\nSEED=123\nTBOARD_DIR=tb_results/$SEED\n\npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/no_regularization\npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/weight_decay --weight-decay 0.6\npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/dropout -p 0.2 \npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/mixup --mixup \npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/input_mixup -k 0 1 --mixup\npython train.py -r $SEED -n $N_EPOCHS -s $N_SAMPLES $TBOARD_DIR/hidden_layers_mixup -k 1 4 --mixup \n"
  },
  {
    "path": "github_adventures/mixup/train.py",
    "content": "import argparse\nimport json\n\nimport numpy as np\nimport torch\nfrom sklearn.model_selection import train_test_split\nfrom torch.utils.data import DataLoader\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom utils import (\n    CustomDataset,\n    MLPClassifierMixup,\n    generate_prediction_img,\n    generate_spirals,\n)\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\"Training\")\n\n    # Parameters\n    parser.add_argument(\n        \"logpath\",\n        type=str,\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch-size\",\n        type=int,\n        default=32,\n        help=\"Batch size\",\n    )\n    parser.add_argument(\n        \"--mixup\",\n        action=\"store_true\",\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--dropout-probability\",\n        type=float,\n        default=0,\n        help=\"The probability of dropout\",\n    )\n    parser.add_argument(\n        \"--hidden-dims\",\n        nargs=\"+\",\n        type=int,\n        default=(32, 32, 32),\n        help=\"Hidden dimensions of the MLP\",\n    )\n    parser.add_argument(\n        \"-c\",\n        \"--n-cycles\",\n        type=float,\n        default=2,\n        help=\"Number of cycles when creating the spiral dataset\",\n    )\n    parser.add_argument(\n        \"-n\",\n        \"--n-epochs\",\n        type=int,\n        default=100,\n        help=\"Number of epochs\",\n    )\n    parser.add_argument(\n        \"-k\",\n        \"--mixing-layer\",\n        type=int,\n        nargs=2,\n        default=(None, None),\n        help=\"The range of k to sample from\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--n-samples\",\n        type=int,\n        default=1000,\n        help=\"Number of samples\",\n    )\n    parser.add_argument(\n        \"-r\",\n        \"--random-state\",\n        type=int,\n        default=5,\n        help=\"Random state\",\n    )\n    parser.add_argument(\n        \"--weight-decay\",\n        type=float,\n        default=0.0,\n        help=\"Weight decay\",\n    )\n\n    args = parser.parse_args(argv)\n\n    device = torch.device(\"cpu\")\n    dtype = torch.float32\n\n    np.random.seed(args.random_state)\n    torch.manual_seed(args.random_state)\n\n    # Dataset preparation\n    X, y = generate_spirals(\n        args.n_samples,\n        noise_std=0,\n        n_cycles=args.n_cycles,\n    )\n\n    X_train, X_test, y_train, y_test = train_test_split(\n        X,\n        y,\n        test_size=0.9,\n        shuffle=True,\n        stratify=y,\n    )\n\n    X_test_t = torch.from_numpy(X_test).to(device, dtype)\n\n    dataset_train = CustomDataset(X_train, y_train)\n\n    dataloader_train = DataLoader(\n        dataset_train,\n        batch_size=2 * args.batch_size,\n        drop_last=True,\n        shuffle=True,\n    )\n\n    # Model and loss definition\n    model = MLPClassifierMixup(\n        n_features=2,\n        hidden_dims=tuple(args.hidden_dims),\n        p=args.dropout_probability,\n    )\n    model.to(device, dtype)\n\n    optimizer = torch.optim.AdamW(\n        model.parameters(),\n        weight_decay=args.weight_decay,\n    )\n\n    loss_fn = torch.nn.BCEWithLogitsLoss()\n\n    # Summary\n    writer = SummaryWriter(args.logpath)\n    writer.add_text(\"hparams\", json.dumps(vars(args)))\n\n    # Training + evaluation loop\n    bs = args.batch_size\n    n_steps = 0\n    for e in range(args.n_epochs):\n        for X_batch, y_batch in dataloader_train:\n            X_batch, y_batch = X_batch.to(device, dtype), y_batch.to(\n                device, dtype\n            )\n            if args.mixup:\n                k_min, k_max = args.mixing_layer\n                k_min = k_min or 0\n                k_max = k_max or model.n_hidden + 1\n\n                k = np.random.randint(k_min, k_max)\n                lam = np.random.beta(2, 2)\n                writer.add_scalar(\"k\", k, n_steps)\n                writer.add_scalar(\"lambda\", lam, n_steps)\n\n                h = model(X_batch, start=0, end=k)  # (2 * batch_size, *)\n\n                h_mixed = lam * h[:bs] + (1 - lam) * h[bs:]  # (batch_size, *)\n                y_mixed = lam * y_batch[:bs] + (1 - lam) * y_batch[bs:]  # (batch_size,)\n\n                logits = model(h_mixed, start=k, end=None)  # (batch_size, 1)\n                loss = loss_fn(logits.squeeze(), y_mixed)\n\n            else:\n                logits = model(X_batch[:bs])  # (batch_size, 1)\n                loss = loss_fn(logits.squeeze(), y_batch[:bs])\n\n            optimizer.zero_grad()\n            loss.backward()\n            optimizer.step()\n\n            # Logging\n            writer.add_scalar(\"loss_train\", loss, n_steps)\n\n            if n_steps % 2500 == 0:\n                model.eval()\n                fig_gen = generate_prediction_img(\n                    model,\n                    X_train,\n                    X_test,\n                    y_train,\n                    y_test,\n                )\n                writer.add_figure(\"test\", next(fig_gen))\n                writer.add_figure(\"contour\", next(fig_gen), n_steps)\n                writer.add_figure(\"contour_train\", next(fig_gen), n_steps)\n\n                with torch.no_grad():\n                    logits_test = model(X_test_t).squeeze().detach().cpu()\n\n                acc_test = (\n                    torch.sigmoid(logits_test).round().numpy() == y_test\n                ).sum() / len(y_test)\n                loss_test = loss_fn(logits_test, torch.from_numpy(y_test))\n\n                writer.add_scalar(\"loss_test\", loss_test, n_steps)\n                writer.add_scalar(\"accuracy_test\", acc_test, n_steps)\n\n                model.train()\n\n            n_steps += 1\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/mixup/utils.py",
    "content": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom matplotlib.colors import ListedColormap\nfrom torch.utils.data import Dataset\n\n\nclass MLPClassifierMixup(nn.Module):\n    \"\"\"Multilayer perceptron with inbuilt mixup logic.\n\n    Assuming binary classification.\n\n    Parameters\n    ----------\n    n_features : int\n        Number of features.\n\n    hidden_dims : tuple\n        The sizes of the hidden layers.\n\n    p : float\n        Dropout probability.\n\n    Attributes\n    ----------\n    hidden_layers : nn.ModuleList\n        List of hidden layers that are each composed of a `Linear`,\n        `LeakyReLU` and `Dropout` modules.\n\n    n_hidden : int\n        Number of hidden layers.\n\n    clf : nn.Linear\n        The classifier at the end of the pipeline.\n    \"\"\"\n\n    def __init__(self, n_features, hidden_dims, p=0):\n        super().__init__()\n        dims = (n_features,) + hidden_dims\n\n        self.n_hidden = len(hidden_dims)\n        self.hidden_layers = nn.ModuleList(\n            [\n                nn.Sequential(\n                    nn.Linear(dims[i], dims[i + 1]),\n                    nn.LeakyReLU(0.2),\n                    nn.Dropout(p),\n                )\n                for i in range(self.n_hidden)\n            ]\n        )\n        self.clf = nn.Linear(dims[-1], 1)\n\n    def forward(self, x, start=0, end=None):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input of shape `(n_samples, dim)`. Note that the dim\n            will depend on `start`.\n\n        start : int\n            The hidden layer where the forward pass starts (inclusive). We\n            use a convention of `start=0` and `end=0` as a noop and the input\n            tensor is returned. Useful for implementing input mixing.\n\n        end : int or None\n            The ending hidden layer (exclusive). If None, then always run until\n            the last hidden layer and then we also apply the classifier.\n        \"\"\"\n        for module in self.hidden_layers[start:end]:\n            x = module(x)\n\n        if end is None:\n            x = self.clf(x)\n\n        return x\n\n\nclass CustomDataset(Dataset):\n    \"\"\"Custom classification dataset assuming we have X and y loaded in memory.\n\n    Parameters\n    ----------\n    X : np.ndarray\n        Features of shape `(n_samples, n_features)`.\n\n    y : np.ndarray\n        Targets of shape `(n_samples,)`.\n    \"\"\"\n\n    def __init__(self, X, y):\n        if len(X) != len(y):\n            raise ValueError(\"Inconsistent number of samples\")\n\n        classes = np.unique(y)\n        if not np.array_equal(np.sort(classes), np.array([0, 1])):\n            raise ValueError\n\n        self.X = X\n        self.y = y\n\n    def __len__(self):\n        \"\"\"Compute the length of the dataset.\"\"\"\n        return len(self.X)\n\n    def __getitem__(self, ix):\n        \"\"\"Return a single sample.\"\"\"\n        return self.X[ix], self.y[ix]\n\n\ndef generate_spirals(\n    n_samples,\n    noise_std=0.05,\n    n_cycles=2,\n    random_state=None,\n):\n    \"\"\"Generate two spirals dataset.\n\n    Parameters\n    ----------\n    n_samples : int\n        Number of samples to generate. For simplicity, an even number\n        is required. The targets (2 spirals) are perfectly balanced.\n\n    noise_std : float\n        Standard deviation of the noise added to the spirals.\n\n    n_cycles : int\n        Number of revolutions the spirals make.\n\n    random_state : int or None\n        Controls randomness.\n\n    Returns\n    -------\n    X : np.ndarray\n        Features of shape `(n_samples, n_features)`.\n\n    y : np.ndarray\n        Targets of shape `(n_samples,)`. There are two\n        classes 0 and 1 representing the two spirals.\n    \"\"\"\n    if n_samples % 2 != 0:\n        raise ValueError(\"The number of samples needs to be even\")\n\n    n_samples_per_class = int(n_samples // 2)\n\n    angle_1 = np.linspace(0, n_cycles * 2 * np.pi, n_samples_per_class)\n    angle_2 = np.pi + angle_1\n    radius = np.linspace(0.2, 2, n_samples_per_class)\n\n    x_1 = radius * np.cos(angle_1)\n    y_1 = radius * np.sin(angle_1)\n\n    x_2 = radius * np.cos(angle_2)\n    y_2 = radius * np.sin(angle_2)\n\n    X = np.concatenate(\n        [\n            np.stack([x_1, y_1], axis=1),\n            np.stack([x_2, y_2], axis=1),\n        ],\n        axis=0,\n    )\n    y = np.zeros((n_samples,))\n    y[n_samples_per_class:] = 1.0\n\n    if random_state is not None:\n        np.random.seed(random_state)\n\n    new_ixs = np.random.permutation(n_samples)\n\n    X = X[new_ixs] + np.random.normal(\n        loc=0, scale=noise_std, size=(n_samples, 2)\n    )\n    y = y[new_ixs]\n\n    return X, y\n\n\ndef generate_prediction_img(\n    model,\n    X_train,\n    X_test,\n    y_train,\n    y_test,\n):\n    \"\"\"Generate contour and scatter plots with predictions.\n\n    Parameters\n    ----------\n    model : MLPClassifierMixup\n        Instance of a multilayer-perceptron.\n\n    X_train, X_test : np.ndarray\n        Trand and test features of shape `(n_samples, n_features)`.\n\n    y_train, y_test : np.ndarray\n        Train and test targets of shape `(n_samples,)`.\n\n    Yields\n    ------\n    matplotlib.Figure\n        Different figures.\n    \"\"\"\n    device = next(model.parameters()).device\n    dtype = next(model.parameters()).dtype\n\n    cm = plt.cm.RdBu\n    cm_bright = ListedColormap([\"#FF0000\", \"#0000FF\"])\n\n    delta = 0.5\n\n    xlim = (X_test[:, 0].min() - delta, X_test[:, 0].max() + delta)\n    ylim = (X_test[:, 1].min() - delta, X_test[:, 1].max() + delta)\n\n    n = 50\n    xx, yy = np.meshgrid(\n        np.linspace(xlim[0], xlim[1], n),\n        np.linspace(ylim[0], ylim[1], n),\n    )\n    grid = np.stack([xx.ravel(), yy.ravel()], axis=1)\n\n    with torch.no_grad():\n        logits = model(torch.from_numpy(grid).to(device, dtype))\n\n    probs = torch.sigmoid(logits)[:, 0].detach().cpu().numpy()\n\n    probs = probs.reshape(xx.shape)\n\n    fig, ax = plt.subplots(1, 1, dpi=170)\n\n    ax.scatter(\n        X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors=\"k\"\n    )\n    ax.set_title(\"Test data\")\n\n    yield fig\n    ax.cla()\n\n    ax.contourf(xx, yy, probs, cmap=cm, alpha=0.8)\n    ax.set_title(\"Prediction contours\")\n\n    yield fig\n\n    ax.scatter(\n        X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors=\"k\"\n    )\n    ax.set_title(\"Train data + prediction contours\")\n\n    yield fig\n"
  },
  {
    "path": "github_adventures/ner_evaluation/README.md",
    "content": "* https://github.com/huggingface/evaluate/blob/af3c30561d840b83e54fc5f7150ea58046d6af69/metrics/seqeval/seqeval.py#L120\n* https://github.com/chakki-works/seqeval/blob/cd01b5210eaa65e691c22320aba56f2be9e9fc43/seqeval/metrics/sequence_labeling.py#L1\n\n\n"
  },
  {
    "path": "github_adventures/ner_evaluation/ours.py",
    "content": "import re\nimport pandas as pd\nfrom sklearn.metrics import classification_report\n\n\ndef check_valid(annots: list[str]) -> bool:\n    allowed_pattern = re.compile(r\"^(O$|B-.+$|I-.+$)\")\n\n    annots = [\"O\"] + annots\n    n = len(annots)\n\n    if any(allowed_pattern.match(annot) is None for annot in annots):\n        return False\n\n    for i in range(1, n):\n        annot = annots[i]\n\n        if annot.startswith(\"I-\"):\n            if annots[i - 1] == \"O\" or annots[i - 1][2:] != annot[2:]:\n                return False\n\n\n    return True\n\ndef get_etypes(annots: list[str]) -> list[None | str]:\n    return [annot[2:] if annot != \"O\" else None for annot in annots]\n\n\ndef get_entities(annots: list[str]) -> list[dict[str, int | str]]:\n    if not check_valid(annots):\n        raise ValueError(\"Invalid input.\")\n\n    annots = [\"O\"] + annots + [\"O\"]\n    etypes = get_etypes(annots)\n    n = len(annots)\n\n    start_patterns = {\n        (\"O\", \"B-\"),  # [\"O\", \"B-LOC\"]\n        (\"B-\", \"B-\"),  # [\"B-PERSON\", \"B-LOC\"]\n        (\"I-\", \"B-\"),  # [\"B-LOC\", \"I-LOC\", \"B-PERSON\"]\n    }\n\n    end_patterns = {\n        (\"I-\", \"O\"), # [\"B-LOC\", \"I-LOC\", \"O\"]\n        (\"B-\", \"O\"), # [\"B-LOC\", \"O\"]\n        (\"B-\", \"B-\"),  # [\"B-PERSON\", \"B-LOC\"]\n        (\"I-\", \"B-\"),  # [\"B-LOC\", \"I-LOC\", \"B-PERSON\"]\n    }\n\n    entities: list[dict[str, int | str]] = []\n\n\n    i = 1\n    start = None\n\n    while i < n:\n        prev, curr = annots[i - 1], annots[i]\n        pattern = (prev[:2], curr[:2])\n\n\n        if pattern in end_patterns and start is not None:\n            entities.append(\n                {\n                    \"start\": start - 1,\n                    \"end\": i - 2,\n                    \"etype\": etypes[i - 1],\n\n                }\n            )\n\n            start = None\n\n        if pattern in start_patterns:\n            start = i\n\n        i += 1\n\n    return entities\n\n\ndef get_report(annots_true: list[str], annots_pred: list[str]) -> dict:\n    if len(annots_true) != len(annots_pred):\n        raise ValueError(\"Unequal lengths\")\n\n    entities_true = pd.DataFrame(get_entities(annots_true))\n    entities_pred = pd.DataFrame(get_entities(annots_pred))\n\n\n    entities_true = entities_true.rename(columns={\"etype\": \"etype_true\"})\n    entities_pred = entities_pred.rename(columns={\"etype\": \"etype_pred\"})\n\n    df_merge = entities_true.merge(entities_pred, on=[\"start\", \"end\"], how=\"outer\")\n    df = df_merge.fillna(\"\")\n\n    labels = (set(df[\"etype_true\"].tolist()) | set(df[\"etype_pred\"].tolist())) - {\"\"}\n\n    report = classification_report(\n        df[\"etype_true\"],\n        df[\"etype_pred\"],\n        output_dict=True,\n        labels=list(labels),\n    )\n    return report\n"
  },
  {
    "path": "github_adventures/ner_evaluation/test_ours.py",
    "content": "import pytest\nfrom seqeval.metrics import classification_report as cr\nfrom seqeval.scheme import IOB2\nfrom ours import check_valid, get_entities, get_etypes, get_report\n\n\n@pytest.mark.parametrize(\n    \"inp,out\",\n    [\n        ([], True),\n        ([\"NONSENSE\", \"O\"], False),\n        ([\"O\", \"O\", \"O\"], True),\n        ([\"B-\"], False),\n        ([\"O\", \"I-ORG\", \"O\"], False),\n        ([\"O\", \"B-ORG\", \"I-PERSON\"], False),\n        ([\"O\", \"B-ORG\", \"B-PERSON\"], True),\n        ([\"O\", \"SOMETHING\", \"B-PERSON\"], False),\n        ([\"O-\", \"O\", \"O\"], False),\n        ([\"B-A\", \"O\", \"B-T\"], True),\n        ([\"I-a\", \"B-a\", \"B-a\", \"I-a\", \"I-a\", \"O\"], False),\n    ],\n)\ndef test_check_valid(inp, out):\n    assert check_valid(inp) == out\n\n\n@pytest.mark.parametrize(\n    \"inp,out\",\n    [\n        ([], []),\n        ([\"O\", \"O\", \"O\"], [None, None, None]),\n        ([\"O\", \"B-ORG\", \"O\"], [None, \"ORG\", None]),\n        ([\"O\", \"B-ORG\", \"B-ORG\"], [None, \"ORG\", \"ORG\"]),\n        ([\"O\", \"B-PERSON\", \"I-PERSON\"], [None, \"PERSON\", \"PERSON\"]),\n        ([\"B-A\", \"O\", \"B-T\"], [\"A\", None, \"T\"]),\n    ],\n)\ndef test_get_etypes(inp, out):\n    assert get_etypes(inp) == out\n\n\n@pytest.mark.parametrize(\n    \"inp,out\",\n    [\n        ([\"O\", \"O\", \"O\"], []),\n        ([\"O\", \"B-ORG\", \"O\"], [{\"start\": 1, \"end\": 1, \"etype\": \"ORG\"}]),\n        (\n            [\"O\", \"B-ORG\", \"B-ORG\"],\n            [\n                {\"start\": 1, \"end\": 1, \"etype\": \"ORG\"},\n                {\"start\": 2, \"end\": 2, \"etype\": \"ORG\"},\n            ],\n        ),\n        ([\"O\", \"B-PERSON\", \"I-PERSON\"], [{\"start\": 1, \"end\": 2, \"etype\": \"PERSON\"}]),\n        (\n            [\"B-A\", \"O\", \"B-T\"],\n            [\n                {\"start\": 0, \"end\": 0, \"etype\": \"A\"},\n                {\"start\": 2, \"end\": 2, \"etype\": \"T\"},\n            ],\n        ),\n        ([\"B-LOC\", \"I-LOC\", \"I-LOC\"], [{\"start\": 0, \"end\": 2, \"etype\": \"LOC\"}]),\n        (\n            [\"B-A\", \"I-A\", \"B-T\"],\n            [\n                {\"start\": 0, \"end\": 1, \"etype\": \"A\"},\n                {\"start\": 2, \"end\": 2, \"etype\": \"T\"},\n            ],\n        ),\n    ],\n)\ndef test_get_entities(inp, out):\n    assert get_entities(inp) == out\n\n\n@pytest.mark.parametrize(\n    \"annots_true,annots_pred\",\n    [\n        (\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"O\"],\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"O\"],\n        ),\n        (\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"B-LOC\"],\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"O\"],\n        ),\n        (\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"O\"],\n            [\"O\", \"O\", \"B-PERSON\", \"O\"],\n        ),\n        (\n            [\"O\", \"B-PERSON\", \"I-PERSON\", \"O\"],\n            [\"O\", \"O\", \"B-PERSON\", \"O\"],\n        ),\n        (\n            [\"B-PERSON\", \"B-LOC\", \"I-LOC\", \"B-DATE\"],\n            [\"B-PERSON\", \"B-DATE\", \"B-PERSON\", \"B-DATE\"],\n        ),\n        (\n            [\"B-PERSON\", \"I-PERSON\", \"I-PERSON\", \"O\", \"O\", \"B-LOC\", \"B-DATE\"],\n            [\"B-PERSON\", \"I-PERSON\", \"I-PERSON\", \"O\", \"O\", \"B-LOC\", \"B-DATE\"],\n        ),\n        (\n            [\"B-PERSON\", \"O\", \"O\", \"O\", \"B-LOC\", \"I-LOC\", \"O\", \"B-LOC\"],\n            [\"B-PERSON\", \"O\", \"B-DATE\", \"O\", \"B-LOC\", \"I-LOC\", \"I-LOC\", \"I-LOC\"],\n        ),\n        (\n            [\"B-PERSON\", \"I-PERSON\", \"O\", \"B-LOC\", \"I-LOC\", \"O\", \"B-PERSON\", \"B-PERSON\", \"B-LOC\"],\n            [\"B-PERSON\", \"I-PERSON\", \"O\", \"B-LOC\", \"B-LOC\", \"O\", \"B-PERSON\", \"B-PERSON\", \"B-LOC\"],\n        ),\n    ]\n)\ndef test_get_report(annots_true, annots_pred):\n    report = get_report(annots_true, annots_pred)\n    seqeval_report = cr([annots_true], [annots_pred], scheme=IOB2, mode=\"strict\", output_dict=True)\n\n    keys_to_delete = {\"accuracy\", \"micro avg\"}\n\n    for rep in (report, seqeval_report):\n        for key in keys_to_delete:\n            try:\n                rep.pop(key)\n            except KeyError:\n                pass\n\n\n    assert report == seqeval_report\n"
  },
  {
    "path": "github_adventures/ner_evaluation/try.py",
    "content": "import pprint\nimport evaluate\n\n\nmetric = evaluate.load(\"seqeval\")\n\n\n# Tom Cruise is great\nannots_true = [\"B-PERSON\", \"I-PERSON\", \"O\", \"O\"]\n# annots_pred = [\"B-PERSON\", \"I-PERSON\", \"O\", \"O\"]\n# annots_pred = [\"O\", \"O\", \"O\", \"O\"]\n# annots_pred = [\"B-PERSON\", \"O\", \"O\", \"O\"]\nannots_pred = [\"B-LOCATION\", \"I-LOCATION\", \"O\", \"O\"]\n\n\nresult = metric.compute(references=[annots_true], predictions=[annots_pred])\n\npprint.pprint(result)\n"
  },
  {
    "path": "github_adventures/neuron/README.md",
    "content": "# Installation\n\n```bash\npip install -r requirements.txt\n```\n\n# Running training\nTo run the same experiments as in the video run\n\n```bash\n./launch.sh\n```\n\nHowever, feel free to check the contents of the `launch.sh` for single\nexperiments.\n\n# Evaluation and pretrained models\nThis repo contains multiple pretrained models inside of `pretrained/`. They\nare all `.pkl` files and they were created by pickling `solutions.Solution`\nsubclasses. To load them inside of Python run something along these lines\n\n```python\nimport pickle\n\nsolution_path = \"pretrained/invariant_ours.pkl\"  # you can change this\n\nwith open(solution_path, \"rb\") as f:\n    solution = pickle.load(f)[0]\n\n```\n\nYou can also run any of the below scripts to reproduce the results from\nthe end of the video.\n\n\n```bash\nEPISODES=30\n\npython evaluate_shuffling.py -e $EPISODES\npython evaluate_noise.py -e $EPISODES\npython evaluate_video.py -e $EPISODES\n```\n"
  },
  {
    "path": "github_adventures/neuron/evaluate_noise.py",
    "content": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickle\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport seaborn as sns\n\nfrom tasks import IncompatibleNFeatures, Task\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser()\n\n    parser.add_argument(\n        \"-e\",\n        \"--n-episodes\",\n        type=int,\n        default=200,\n    )\n    args = parser.parse_args(argv)\n\n    # Prepare solutions and tasks\n    checkpoint_path = pathlib.Path(\"pretrained\") / \"invariant_official.pkl\"\n    assert checkpoint_path.exists()\n\n    with checkpoint_path.open(\"rb\") as f:\n        obj = pickle.load(f)\n\n        if len(obj) == 1:\n            solution_inst = obj[0]\n        elif len(obj) == 2:\n            solver, solution_inst = obj\n            solution_inst.set_params(solver.result.xfavorite)\n        else:\n            raise ValueError\n\n    results = []\n\n    for n_noise_features in range(0, 30, 5):\n        for shuffle in [True, False]:\n            print(f\"{n_noise_features=}, {shuffle=}\")\n            task = Task(\n                render=False,\n                n_noise_features=n_noise_features,\n                shuffle_on_reset=shuffle,\n                env_seed=None,\n                feature_seed=None,\n            )\n            for episode_ix in range(args.n_episodes):\n                reward = task.rollout(solution_inst)\n                results.append(\n                    {\n                        \"n_noise_features\": n_noise_features,\n                        \"shuffle\": shuffle,\n                        \"episode_ix\": episode_ix,\n                        \"reward\": reward,\n                    }\n                )\n\n    results_df = pd.DataFrame(results)\n    fig, ax = plt.subplots(1, 1, figsize=(10, 5), dpi=300)\n\n    sns.violinplot(\n        data=results_df,\n        x=\"n_noise_features\",\n        y=\"reward\",\n        hue=\"shuffle\",\n        split=True,\n        inner=\"quart\",\n        linewidth=1,\n        palette=\"muted\",\n        ax=ax,\n        scale=\"count\",\n    )\n    sns.despine(left=True)\n    ax.set_ylim(0, 1000)\n    ax.grid(True)\n\n    fig.tight_layout()\n    fig.savefig(\"invariant_model_noise.png\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/neuron/evaluate_shuffling.py",
    "content": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickle\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport seaborn as sns\n\nfrom tasks import IncompatibleNFeatures, Task\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser()\n\n    parser.add_argument(\n        \"-e\",\n        \"--n-episodes\",\n        type=int,\n        default=200,\n    )\n    args = parser.parse_args(argv)\n\n    # Prepare solutions and tasks\n    checkpoints = {}\n\n    checkpoint_folder = pathlib.Path(\"pretrained\")\n    assert checkpoint_folder.exists()\n\n    checkpoint_paths = [\n        checkpoint_folder / \"linear.pkl\",\n        checkpoint_folder / \"linear_augment.pkl\",\n        checkpoint_folder / \"MLP.pkl\",\n        checkpoint_folder / \"MLP_augment.pkl\",\n        checkpoint_folder / \"invariant_ours.pkl\",\n        checkpoint_folder / \"invariant_official.pkl\",\n    ]\n\n    for path in checkpoint_paths:\n        with path.open(\"rb\") as f:\n            obj = pickle.load(f)\n\n            if len(obj) == 1:\n                solution_inst = obj[0]\n            elif len(obj) == 2:\n                solver, solution_inst = obj\n                solution_inst.set_params(solver.result.xfavorite)\n            else:\n                raise ValueError\n\n        checkpoints[path.stem] = solution_inst\n\n    results = []\n\n    for model_name, solution_inst in checkpoints.items():\n        for shuffle in [True, False]:\n            print(f\"{model_name=}, {shuffle=}\")\n            task = Task(\n                render=False,\n                n_noise_features=0,\n                shuffle_on_reset=shuffle,\n                env_seed=None,\n                feature_seed=None,\n            )\n            for episode_ix in range(args.n_episodes):\n                reward = task.rollout(solution_inst)\n                results.append(\n                    {\n                        \"model\": model_name,\n                        \"shuffle\": shuffle,\n                        \"episode_ix\": episode_ix,\n                        \"reward\": reward,\n                    }\n                )\n\n    results_df = pd.DataFrame(results)\n    fig, ax = plt.subplots(1, 1, figsize=(10, 5), dpi=300)\n\n    sns.violinplot(\n        data=results_df,\n        x=\"model\",\n        y=\"reward\",\n        hue=\"shuffle\",\n        split=True,\n        inner=\"quart\",\n        linewidth=1,\n        palette=\"muted\",\n        ax=ax,\n        scale=\"count\",\n        order=sorted(checkpoints.keys()),\n    )\n    sns.despine(left=True)\n    ax.set_ylim(0, 1000)\n    ax.grid(True)\n\n    fig.tight_layout()\n    fig.savefig(\"all_models_shuffling.png\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/neuron/evaluate_video.py",
    "content": "\"\"\"Assumes you have already trained your model and you have a checkpoint.\"\"\"\nimport argparse\nimport pathlib\nimport pickle\n\nfrom gym.wrappers import Monitor\nimport matplotlib.pyplot as plt\n\nfrom tasks import IncompatibleNFeatures, Task\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser()\n\n    parser.add_argument(\n        \"-e\",\n        \"--n-episodes\",\n        type=int,\n        default=2,\n    )\n    args = parser.parse_args(argv)\n\n    # Prepare solutions and tasks\n    checkpoints = {}\n\n    checkpoint_folder = pathlib.Path(\"pretrained\")\n    assert checkpoint_folder.exists()\n\n    checkpoint_paths = [\n        checkpoint_folder / \"linear.pkl\",\n        checkpoint_folder / \"linear_augment.pkl\",\n        checkpoint_folder / \"MLP.pkl\",\n        checkpoint_folder / \"MLP_augment.pkl\",\n        checkpoint_folder / \"invariant_ours.pkl\",\n        checkpoint_folder / \"invariant_official.pkl\",\n    ]\n    checkpoint_paths = checkpoint_paths\n\n    for path in checkpoint_paths:\n        with path.open(\"rb\") as f:\n            obj = pickle.load(f)\n\n            if len(obj) == 1:\n                solution_inst = obj[0]\n            elif len(obj) == 2:\n                solver, solution_inst = obj\n                solution_inst.set_params(solver.result.xfavorite)\n            else:\n                raise ValueError\n\n        checkpoints[path.stem] = solution_inst\n\n    for model_name, solution_inst in checkpoints.items():\n        for shuffle in [True, False]:\n            for episode_ix in range(args.n_episodes):\n                print(f\"{model_name=}, {shuffle=}\")\n                task = Task(\n                    render=False,\n                    n_noise_features=0,\n                    shuffle_on_reset=shuffle,\n                    env_seed=None,\n                    feature_seed=None,\n                )\n\n                task.env = Monitor(\n                    task.env,\n                    f\"videos/{model_name}/{shuffle}/{episode_ix}/\",\n                )\n                reward = task.rollout(solution_inst)\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/neuron/launch.sh",
    "content": "OUTPUT_FOLDER=log_dir\n\npython trainer.py --max-iter 1000 linear $OUTPUT_FOLDER/linear\npython trainer.py --max-iter 1000 --shuffle-on-reset linear $OUTPUT_FOLDER/linear_augment\npython trainer.py --max-iter 1000 MLP $OUTPUT_FOLDER/MLP\npython trainer.py --max-iter 2000 --shuffle-on-reset MLP $OUTPUT_FOLDER/MLP_augment\npython trainer.py --max-iter 14000 invariant $OUTPUT_FOLDER/invariant\n"
  },
  {
    "path": "github_adventures/neuron/requirements.txt",
    "content": "cma\ngym\ngym-cartpole-swingup\nmatplotlib\nnumpy\npandas\nseaborn\ntensorboard\ntorch\ntqdm\n"
  },
  {
    "path": "github_adventures/neuron/solutions.py",
    "content": "import abc\n\nimport numpy as np\nimport torch\n\nfrom torch_utils import PermutationInvariantNetwork, MLP\n\n\nclass Solution(abc.ABC):\n    \"\"\"Solution abstract class.\n\n    Attributes\n    ----------\n    policy : torch.nn.Module\n        Network that holds all the learnable parameters.\n    \"\"\"\n\n    @abc.abstractmethod\n    def clone(self, obs):\n        \"\"\"Create a copy of the current solution without any links to self.\"\"\"\n\n    @abc.abstractmethod\n    def get_action(self, obs):\n        \"\"\"Determine the next action given the observation array.\"\"\"\n\n    @abc.abstractmethod\n    def get_n_features(self):\n        \"\"\"Get the number of features expected by the model.\n\n        If None then the model can process variable-sized feature\n        vectors.\n        \"\"\"\n\n    @abc.abstractmethod\n    def reset(self):\n        \"\"\"Reset solution.\n\n        Will be called at the beginning of each rollout.\n\n        Does not mean we will \"reinitialize\" the weights of `policy`.\n        \"\"\"\n\n    def get_params(self):\n        \"\"\"Get learnable parameters of the solution.\n\n        Returns\n        -------\n        params : np.ndarray\n            1D array containing all parameters.\n        \"\"\"\n        params_l = []\n\n        for p in self.policy.parameters():\n            params_l.append(p.numpy().ravel())\n\n        params = np.concatenate(params_l)\n\n        return params\n\n    def set_params(self, params):\n        \"\"\"Set the learnable parameters.\n\n        Parameters\n        ----------\n        params : np.ndarray\n            1D array containing all parameters.\n\n        Returns\n        -------\n        self : Solution\n        \"\"\"\n        start_ix, end_ix = 0, 0\n\n        for p in self.policy.parameters():\n            end_ix = start_ix + np.prod(p.shape)\n            p.data = torch.from_numpy(\n                params[start_ix:end_ix].reshape(p.shape)\n            ).float()\n            start_ix = end_ix\n\n        return self\n\n    def get_n_params(self):\n        return len(self.get_params())\n\n\nclass MLPSolution(Solution):\n    \"\"\"Multilayer perceptron solution.\n\n    Parameters\n    ----------\n    n_features : int\n        Number of input features.\n\n    hidden_layer_sizes : tuple\n        Tuple of int that defines the sizes of all hidden layers.\n\n    Attributes\n    ----------\n    kwargs : dict\n        All parameters necessary to instantiate the class.\n\n    policy : MLP\n        Policy network - multilayer perceptron.\n    \"\"\"\n\n    def __init__(self, n_features=5, hidden_layer_sizes=(16,)):\n        self.kwargs = {\n            \"n_features\": n_features,\n            \"hidden_layer_sizes\": hidden_layer_sizes,\n        }\n        self.dtype = torch.float32\n\n        self.policy = MLP(n_features, hidden_layer_sizes)\n        self.policy.to(self.dtype)\n        self.policy.eval()\n\n    def clone(self):\n        old_policy = self.policy\n        new_solution = self.__class__(**self.kwargs)\n\n        new_solution.policy.load_state_dict(\n            old_policy.state_dict(),\n        )\n\n        return new_solution\n\n    def get_action(self, obs):\n        y = self.policy(torch.from_numpy(obs).to(self.dtype))\n\n        action = y.item()\n        return action\n\n    def get_n_features(self):\n        return self.kwargs[\"n_features\"]\n\n    def reset(self):\n        pass\n\n\nclass PermutationInvariantSolution(Solution):\n    \"\"\"Permutation invariant solution.\n\n    Parameters\n    ----------\n    n_embeddings : int\n        Number of rows in the Q tensor.\n\n    proj_dim : int\n        Size of the space to which we project the K and Q tensors.\n\n    hidden_size : int\n        Dimensionality of the Q and K tensors before linear projections.\n\n    Attributes\n    ----------\n    kwargs : dict\n        All parameters necessary to instantiate the class\n\n    dtype : torch.dtype\n        Dtype of both the network weights and input features.\n\n    policy : PermutationInvariantNetwork\n        Policy network.\n\n    prev_action : float\n        Stores the previous action. Automatically updated each time we call\n        `get_action`.\n    \"\"\"\n\n    def __init__(\n        self,\n        n_embeddings=16,\n        proj_dim=32,\n        hidden_size=8,\n    ):\n        self.kwargs = {\n            \"n_embeddings\": n_embeddings,\n            \"proj_dim\": proj_dim,\n            \"hidden_size\": hidden_size,\n        }\n        self.policy = PermutationInvariantNetwork(\n            n_embeddings=n_embeddings,\n            proj_dim=proj_dim,\n            hidden_size=hidden_size,\n        )\n        self.dtype = torch.float32\n\n        self.policy.to(self.dtype)\n        self.policy.eval()\n\n        self.prev_action = 0  # will be continuously updated\n\n    def clone(self):\n        old_policy = self.policy\n        new_solution = self.__class__(**self.kwargs)\n\n        new_solution.policy.load_state_dict(\n            old_policy.state_dict(),\n        )\n\n        return new_solution\n\n    def get_action(self, obs):\n        y = self.policy(torch.from_numpy(obs).to(self.dtype), self.prev_action)\n\n        action = y.item()\n        self.prev_action = action\n\n        return action\n\n    def reset(self):\n        self.policy.attention_neuron.hx = None\n        self.previous_action = 0\n\n    def get_n_features(self):\n        return None\n"
  },
  {
    "path": "github_adventures/neuron/tasks.py",
    "content": "import gym\nimport gym_cartpole_swingup  # noqa has a sideffect\nimport numpy as np\n\nN_ORIGINAL_FEATURES = 5\n\n\nclass IncompatibleNFeatures(Exception):\n    \"\"\"Raised when observation and model number of features does not match.\"\"\"\n\n\nclass Task:\n    \"\"\"Cartpoleswingup task.\n\n    Parameters\n    ----------\n    render : bool\n        If True, we render each step into a video frame.\n\n    shuffle_on_reset : bool\n        If True, the features are randomly shuffled before each rollout.\n\n    n_noise_features : int\n        Number of noise features added to the observation vector.\n\n    env_seed : None or int\n        Random state controling the underlying `gym.Env`.\n\n    feature_seed : None or int\n        Random state controling the shuffling and noise features.\n\n    max_episode_steps : int\n        Maximum number of steps per episode (=rollout). After his number\n        `done=True` automatically.\n\n    Attributes\n    ----------\n    n_features : int\n        Overall number of features (original + noise).\n\n    perm_ix : np.ndarray\n        1D array storing a permutation indices of the features.\n\n    env : gym.Env\n        Environment.\n\n    rnd : RandomState\n        Random state.\n    \"\"\"\n\n    def __init__(\n        self,\n        render=False,\n        shuffle_on_reset=False,\n        n_noise_features=0,\n        env_seed=None,\n        feature_seed=None,\n        max_episode_steps=1000,\n    ):\n\n        self.env = gym.make(\"CartPoleSwingUp-v1\")\n        self.env._max_episode_steps = max_episode_steps\n        self.shuffle_on_reset = shuffle_on_reset\n        self.render = render\n        self.n_noise_features = n_noise_features\n\n        self.n_features = N_ORIGINAL_FEATURES + n_noise_features\n\n        self.perm_ix = np.arange(self.n_features)\n        self.noise_std = 0.1\n\n        # Set seeds\n        self.env.seed(env_seed)\n        self.rnd = np.random.RandomState(seed=feature_seed)\n\n    def reset_for_rollout(self):\n        \"\"\"Generate a new permutation of the features.\n\n        It is going to be called at the beginning of each episode.\n        Note that the permutation stays constant throughout the episode.\n        \"\"\"\n        self.perm_ix = np.arange(self.n_features)\n\n        if self.shuffle_on_reset:\n            self.rnd.shuffle(self.perm_ix)\n\n    def modify_obs(self, obs):\n        \"\"\"Modify raw observations.\n\n        Parameters\n        ----------\n        obs : np.ndarray\n            Raw observation/feature array of shape `(5,)`.\n\n        Returns\n        -------\n        obs_modified : np.ndarray\n            Modified observation array of shape `(5 + n_noise_features,)`.\n            If `shuffle_on_reset` then the order of the features is going\n            to change.\n        \"\"\"\n        noise = self.rnd.randn(self.n_noise_features) * self.noise_std\n        obs_and_noise = np.concatenate([obs, noise], axis=0)\n        obs_modified = obs_and_noise[self.perm_ix]\n\n        return obs_modified\n\n    def rollout(self, solution):\n        \"\"\"Run a single episode/rollout.\n\n        Parameters\n        ----------\n        solution : solutions.Solution\n            Instance of a solution that yields an action given an\n            observation.\n\n        Returns\n        -------\n        ep_reward : int\n            Overall episode reward computed as a sum of per step rewards.\n        \"\"\"\n        # sanity check\n        n_features_solution = solution.get_n_features()\n        n_features_task = self.n_features\n\n        if (\n            n_features_solution is not None\n            and n_features_solution != n_features_task\n        ):\n            raise IncompatibleNFeatures\n\n        self.reset_for_rollout()\n        solution.reset()  # important for PermutationInvariantSolution\n\n        obs = self.env.reset()\n        if self.render:\n            self.env.render()\n\n        ep_reward = 0\n        done = False\n\n        while not done:\n            obs_modified = self.modify_obs(obs)\n            action = solution.get_action(obs_modified)\n            obs, reward, done, _ = self.env.step(action)\n\n            ep_reward += reward\n            if self.render:\n                self.env.render()\n\n        return ep_reward\n"
  },
  {
    "path": "github_adventures/neuron/torch_utils.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\n\n\nclass MLP(nn.Module):\n    \"\"\"Multilayer perceptron policy network.\n\n    Parameters\n    ----------\n    n_features : int\n        Number of input features.\n\n    hidden_layer_sizes : tuple\n        Tuple of int that defines the sizes of all hidden layers.\n\n    Attributes\n    ----------\n    net : nn.Sequential\n        The actual network.\n    \"\"\"\n\n    def __init__(self, n_features, hidden_layer_sizes):\n        super().__init__()\n\n        layer_sizes = (n_features,) + hidden_layer_sizes + (1,)\n\n        layers = []\n\n        for i in range(len(layer_sizes) - 1):\n            in_features = layer_sizes[i]\n            out_features = layer_sizes[i + 1]\n            layers.extend(\n                [\n                    nn.Linear(in_features, out_features),\n                    nn.Tanh(),\n                ]\n            )\n\n        self.net = nn.Sequential(*layers)\n\n        for p in self.parameters():\n            p.requires_grad = False\n\n\n    def forward(self, obs):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        obs : torch.Tensor\n            1D tensor representing the input observation of shape\n            `(n_features,)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Scalar between -1 and 1 representing the action.\n        \"\"\"\n\n        return self.net(obs[None, :])[0]\n\n\ndef pos_table(n_embeddings, hidden_size):\n    \"\"\"Create a table of positional encodings.\n\n    Parameters\n    ----------\n    n_embeddings : int\n        Number of rows of the table.\n\n    hidden_size : int\n        Number of columns of the table.\n\n    Returns\n    -------\n    tab : np.ndarray\n        2D array holding the positional encodings.\n    \"\"\"\n\n    def get_angle(x, h):\n        return x / np.power(10000, 2 * (h // 2) / hidden_size)\n\n    def get_angle_vec(x):\n        return [get_angle(x, j) for j in range(hidden_size)]\n\n    tab = np.array([get_angle_vec(i) for i in range(n_embeddings)]).astype(\n        float\n    )\n    tab[:, 0::2] = np.sin(tab[:, 0::2])\n    tab[:, 1::2] = np.cos(tab[:, 1::2])\n\n    return tab\n\n\nclass AttentionMatrix(nn.Module):\n    \"\"\"Generates attention matrix using the key and query tensors.\n\n    Parameters\n    ----------\n    proj_dim : int\n        Size of the space to which we project the K and Q tensors.\n\n    hidden_size : int\n        Dimensionality of the Q and K tensors before linear projections.\n\n    scale : bool\n        If True, then the attention matrix will be divided by\n        `proj_dim ** (1 / 2)` elementwise.\n\n    Attributes\n    ----------\n    proj_q, proj_k : torch.nn.Linear\n        Linear models projecting the Q and K tensors.\n\n    scalar : float\n        Number used for scaling the attention matrix elementwise.\n    \"\"\"\n\n    def __init__(self, hidden_size, proj_dim, scale=True):\n        super().__init__()\n\n        self.proj_q = nn.Linear(\n            in_features=hidden_size, out_features=proj_dim, bias=False\n        )\n        self.proj_k = nn.Linear(\n            in_features=hidden_size, out_features=proj_dim, bias=False\n        )\n        if scale:\n            self.scalar = np.sqrt(proj_dim)\n        else:\n            self.scalar = 1\n\n    def forward(self, data_q, data_k):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        data_q : torch.Tensor\n            Query tensor of shape `(n_embeddings, hidden_size)`.\n\n        data_k : torch.Tensor\n            Key tensor of shape `(n_features, hidden_size)`.\n\n        Returns\n        -------\n        attention_weights : torch.Tensor\n            Attention weights (don't sum up to 1 in general) of shape\n            `(n_embeddings, n_features)`.\n        \"\"\"\n        q = self.proj_q(data_q)  # (n_embeddings, proj_dim)\n        k = self.proj_k(data_k)  # (n_features, proj_dim)\n        dot = q @ k.T  # (n_embeddings, n_features)\n        dot_scaled = torch.div(dot, self.scalar)  # (n_embeddings, n_features)\n        attention_weights = torch.tanh(\n            dot_scaled\n        )  # (n_embeddings, n_features)\n\n        return attention_weights\n\n\nclass AttentionNeuron(nn.Module):\n    \"\"\"Permutation invariant layer.\n\n    Parameters\n    ----------\n    n_embeddings : int\n        Number of rows in the Q tensor. In our case it is equal to the length\n        of the latent code `m`.\n\n    proj_dim : int\n        Size of the space to which we project the K and Q tensors.\n\n    hidden_size : int\n        The dimensionality of the Q and K tensors before linear projections.\n\n    Attributes\n    ----------\n    hx : tuple or None\n        If not None then a tuple of 2 hidden state tensors (LSTM specific)\n\n    lstm : nn.LSTMCell\n        LSTM cell that inputs a hidden state and an observation and\n        outputs a new hidden state.\n\n    attention_matrix : AttentionMatrix\n        Attention matrix (only needs Q and K tensors).\n\n    Q : torch.Tensor\n        Query tensor that is not learnable since it is populated with\n        positional encodings.\n    \"\"\"\n\n    def __init__(\n        self,\n        n_embeddings=16,\n        proj_dim=32,\n        hidden_size=8,\n    ):\n        super().__init__()\n        self.n_embeddings = n_embeddings\n        self.proj_dim = proj_dim\n        self.hidden_size = hidden_size\n\n        # Modules\n        self.hx = None\n        self.lstm = nn.LSTMCell(input_size=2, hidden_size=hidden_size)\n\n        self.attention_matrix = AttentionMatrix(\n            hidden_size=hidden_size,\n            proj_dim=proj_dim,\n            scale=False,\n        )\n\n        self.register_buffer(\n            \"Q\",\n            torch.from_numpy(\n                pos_table(\n                    n_embeddings,\n                    hidden_size,\n                )\n            ).float(),\n        )\n\n    def forward(self, obs, prev_action):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        obs : torch.Tensor\n            1D tensor representing the input observations of shape\n            `(n_features,)`.\n\n        prev_action : float\n            Number between -1 and 1 based on what the previous action was.\n\n        Returns\n        -------\n        latent_code : torch.Tensor\n            1D tensor representing the latent code of shape `(n_embeddings,)`.\n\n        attn_weights : torch.Tensor\n            2D tensor of shape `(n_embeddings, n_features)` representing\n            attention weights.\n        \"\"\"\n        n_features = len(obs)\n        prev_action = float(prev_action)\n\n        obs_and_act = torch.cat(\n            [\n                obs[:, None],\n                torch.ones(n_features, 1) * prev_action,\n            ],\n            dim=-1,\n        )  # (n_features, 2)\n\n        if self.hx is None:\n            self.hx = (\n                torch.zeros(n_features, self.hidden_size),\n                torch.zeros(n_features, self.hidden_size),\n            )\n\n        self.hx = self.lstm(\n            obs_and_act, self.hx\n        )  # Tuple[(n_features, hidden_size)]\n\n        data_q = self.Q  # (n_embeddings, hidden_size)\n        data_k = self.hx[0]  # (n_features, hidden_size)\n        data_v = obs[:, None]  # (n_features, 1)\n\n        attn_weights = self.attention_matrix(\n            data_q=data_q, data_k=data_k\n        )  # (n_embeddings, n_features)\n\n        latent_code_ = torch.tanh(attn_weights @ data_v)  # (n_embeddings, 1)\n        latent_code = latent_code_.squeeze()  # (n_embeddings,)\n\n        return latent_code, attn_weights\n\n\nclass PermutationInvariantNetwork(nn.Module):\n    \"\"\"Permutation invariant policy network.\n\n    Parameters\n    ----------\n    n_embeddings : int\n        Number of rows in the Q tensor.\n\n    proj_dim : int\n        Size of the space to which we project the K and Q tensors.\n\n    hidden_size : int\n        Dimensionality of the Q and K matrices before linear projections.\n\n    Attributes\n    ----------\n    attention_neuron : AttentionNeuron\n        Permutation invariant layer that generates latent codes.\n\n    linear : nn.Linear\n        Maps the latent code into a single number.\n    \"\"\"\n\n    def __init__(\n        self,\n        n_embeddings=16,\n        proj_dim=32,\n        hidden_size=8,\n    ):\n        super().__init__()\n\n        self.attention_neuron = AttentionNeuron(\n            n_embeddings=n_embeddings,\n            proj_dim=proj_dim,\n            hidden_size=hidden_size,\n        )\n\n        self.linear = nn.Linear(n_embeddings, 1)\n\n        for p in self.parameters():\n            p.requires_grad = False\n\n    def forward(self, obs, prev_action):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        obs : torch.Tensor\n            1D tensor representing the input observations of shape\n            `(n_features,)`.\n\n        prev_action : float\n            Number between -1 and 1 based on what the previous action was.\n\n        Returns\n        -------\n        y : torch.Tensor\n            Scalar tensor with a value in range (-1, 1) representing the\n            next action.\n        \"\"\"\n\n        latent_code, _ = self.attention_neuron(\n            obs, prev_action\n        )  # (n_embeddings,)\n\n        y_ = torch.tanh(self.linear(latent_code[None, :]))  # (1, 1)\n        y = y_[0]  # (1,)\n\n        return y\n"
  },
  {
    "path": "github_adventures/neuron/trainer.py",
    "content": "import argparse\nimport json\nimport multiprocessing as mp\nimport pathlib\nimport pickle\nfrom functools import partial\n\nimport cma\nimport numpy as np\nimport tqdm\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom solutions import (\n    MLPSolution,\n    PermutationInvariantSolution,\n)\nfrom tasks import Task, N_ORIGINAL_FEATURES\n\n\ndef save(folder, n_iter, solver, solution_inst):\n    \"\"\"Save checkpoint.\n\n    Parameters\n    ----------\n    folder : str\n        Output folder.\n\n    n_iter : int\n        Iteration that corresponds to the checkpoint.\n\n    solver : cma.CMAEvolutionStrategy\n        Solver instance.\n\n    solution_inst : Solution\n        Solution instance.\n    \"\"\"\n    folder = pathlib.Path(folder)\n    folder.mkdir(parents=True, exist_ok=True)\n\n    path = folder / f\"{n_iter}.pkl\"\n\n    with path.open(\"wb\") as f:\n        obj = (solver, solution_inst)\n        pickle.dump(obj, f)\n\n\ndef get_fitness(\n    solution_inst,\n    *,\n    shuffle_on_reset,\n    n_episodes,\n    n_noise_features,\n    env_seed,\n    feature_seed,\n):\n    \"\"\"Get fitness function used by the CMA optimizer/solver.\n\n    Can be run independently on a single worker.\n\n\n    Returns\n    -------\n    fitness : list\n        List of floats of length `n_episodes` holding the per episode reward.\n    \"\"\"\n    task = Task(\n        render=False,\n        shuffle_on_reset=shuffle_on_reset,\n        n_noise_features=n_noise_features,\n        env_seed=env_seed,\n        feature_seed=feature_seed,\n    )\n    fitness = [task.rollout(solution_inst) for _ in range(n_episodes)]\n\n    return fitness\n\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser(\n        \"Training\",\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter,\n    )\n\n    parser.add_argument(\n        \"solution\",\n        type=str,\n        choices=(\n            \"linear\",\n            \"MLP\",\n            \"invariant\",\n        ),\n    )\n    parser.add_argument(\n        \"log_dir\",\n        type=str,\n        help=\"Logging folder\",\n    )\n    parser.add_argument(\n        \"--checkpoint\",\n        type=str,\n        help=\"Pickled solver and solution\",\n    )\n    parser.add_argument(\n        \"--env-seed\",\n        type=int,\n    )\n    parser.add_argument(\n        \"--eval-frequency\",\n        type=int,\n        default=25,\n    )\n    parser.add_argument(\n        \"--feature-seed\",\n        type=int,\n    )\n    parser.add_argument(\n        \"-m\",\n        \"--max-iter\",\n        type=int,\n        default=10000,\n        help=\"Maximum number of iterations\",\n    )\n    parser.add_argument(\n        \"-e\",\n        \"--n-episodes\",\n        type=int,\n        default=16,\n        help=\"Number of rollouts for fitness evaluation\",\n    )\n    parser.add_argument(\n        \"-j\",\n        \"--n-jobs\",\n        type=int,\n        default=-1,\n        help=\"Number of processes\",\n    )\n    parser.add_argument(\n        \"-n\",\n        \"--n-noise-features\",\n        type=int,\n        default=0,\n        help=\"Number of noise features\",\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--population-size\",\n        type=int,\n        default=256,\n        help=\"Number of solutions per generation\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--shuffle-on-reset\",\n        action=\"store_true\",\n        help=\"Shuffle features before each rollout\",\n    )\n\n    args = parser.parse_args(argv)\n\n    writer = SummaryWriter(args.log_dir)\n    writer.add_text(\"parameters\", json.dumps(vars(args)))\n\n    # Solution map\n    if args.solution == \"linear\":\n        solution_inst = MLPSolution(\n            n_features=N_ORIGINAL_FEATURES + args.n_noise_features,\n            hidden_layer_sizes=tuple(),\n        )\n\n    elif args.solution == \"MLP\":\n        solution_inst = MLPSolution(\n            n_features=N_ORIGINAL_FEATURES + args.n_noise_features,\n            hidden_layer_sizes=(16,),\n        )\n\n    elif args.solution == \"invariant\":\n        solution_inst = PermutationInvariantSolution(\n            n_embeddings=16,\n            proj_dim=32,\n            hidden_size=8,\n        )\n\n    else:\n        raise ValueError\n\n    # Prepare solver\n    if args.checkpoint is None:\n        x0 = np.zeros(solution_inst.get_n_params())\n        solver = cma.CMAEvolutionStrategy(\n            x0=x0,\n            sigma0=0.1,\n            inopts={\n                \"popsize\": args.population_size,\n                \"seed\": 42,\n                \"randn\": np.random.randn,\n            },\n        )\n    else:\n        with open(args.checkpoint, \"rb\") as f:\n            solver, solution_inst_ = pickle.load(f)\n\n            assert isinstance(solution_inst, solution_inst_.__class__)\n\n            solution_inst = solution_inst_\n\n    get_fitness_partial = partial(\n        get_fitness,\n        n_episodes=args.n_episodes,\n        shuffle_on_reset=args.shuffle_on_reset,\n        n_noise_features=args.n_noise_features,\n        env_seed=args.env_seed,\n        feature_seed=args.feature_seed,\n    )\n\n    if args.n_jobs == -1:\n        n_jobs = mp.cpu_count()\n    else:\n        n_jobs = args.n_jobs\n\n\n    with mp.Pool(processes=n_jobs) as pool:\n        for n_iter in tqdm.tqdm(range(args.max_iter)):\n            try:\n                params_set = solver.ask()\n                iterable = [\n                    solution_inst.clone().set_params(p) for p in params_set\n                ]\n                rewards = pool.map(get_fitness_partial, iterable)\n                pos_fitnesses = [np.mean(r) for r in rewards]\n\n                neg_fitnesses = [-x for x in pos_fitnesses]\n\n                all_parameters = np.concatenate(params_set)\n                metrics = {\n                    \"parameter_mean\": all_parameters.mean(),\n                    \"parameter_std\": all_parameters.std(),\n                    \"mean\": np.mean(pos_fitnesses),\n                    \"max (generation)\": np.max(pos_fitnesses),\n                    \"max (overall)\": -solver.result.fbest,\n                }\n\n                for metric_name, metric in metrics.items():\n                    writer.add_scalar(metric_name, metric, global_step=n_iter)\n\n                if (n_iter % args.eval_frequency == 0) or (\n                    n_iter == (args.max_iter - 1)\n                ):\n                    save(args.log_dir, n_iter, solver, solution_inst)\n\n                solver.tell(params_set, neg_fitnesses)\n\n            except KeyboardInterrupt:\n                save(\n                    args.log_dir,\n                    n_iter,\n                    solver,\n                    solution_inst,\n                )\n                break\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/pondernet/experiment_1.sh",
    "content": "set -x \nSEED=$RANDOM\nLAMBDAS=(0.1 0.3 0.5 0.7 0.9)\n\nfor lambda in ${LAMBDAS[@]}\ndo\n\tpython train.py \\\n\t\t--batch-size 128 \\\n\t\t--beta 0.01 \\\n\t\t--device cuda \\\n\t\t--eval-frequency 4000 \\\n\t\t--n-iter 100000 \\\n\t\t--n-hidden 128 \\\n\t\t--lambda-p $lambda \\\n\t\t--n-elems 15 \\\n\t\tresults/experiment_a/$SEED/lambda_$lambda\ndone\n"
  },
  {
    "path": "github_adventures/pondernet/experiment_2.sh",
    "content": "set -x \nSEED=$RANDOM\n\npython train.py \\\n\t--batch-size 128 \\\n\t--beta 0.01 \\\n\t--eval-frequency 4000 \\\n\t--device cuda \\\n\t--lambda-p 0.2 \\\n\t--n-elems 30 \\\n\t--n-iter 1500000 \\\n\t--n-hidden 128 \\\n\t--n-nonzero 1 25 \\\n\tresults/experiment_b/$SEED\n"
  },
  {
    "path": "github_adventures/pondernet/requirements.txt",
    "content": "matplotlib\nnumpy\ntensorboard\ntorch\ntqdm\n"
  },
  {
    "path": "github_adventures/pondernet/train.py",
    "content": "from argparse import ArgumentParser\nimport json\nimport pathlib\n\nimport matplotlib.pyplot as plt\nimport torch\nimport torch.nn as nn\nfrom torch.utils.data import DataLoader\nfrom torch.utils.tensorboard import SummaryWriter\nfrom tqdm import tqdm\n\nfrom utils import (\n    ParityDataset,\n    PonderNet,\n    ReconstructionLoss,\n    RegularizationLoss,\n)\n\n\n@torch.no_grad()\ndef evaluate(dataloader, module):\n    \"\"\"Compute relevant metrics.\n\n    Parameters\n    ----------\n    dataloader : DataLoader\n        Dataloader that yields batches of `x` and `y`.\n\n    module : PonderNet\n        Our pondering network.\n\n    Returns\n    -------\n    metrics_single : dict\n        Scalar metrics. The keys are names and the values are `torch.Tensor`.\n        These metrics are computed as mean values over the entire dataset.\n\n    metrics_per_step : dict\n        Per step metrics. The keys are names and the values are `torch.Tensor`\n        of shape `(max_steps,)`. These metrics are computed as mean values over\n        the entire dataset.\n\n    \"\"\"\n    # Imply device and dtype\n    param = next(module.parameters())\n    device, dtype = param.device, param.dtype\n\n    metrics_single_ = {\n        \"accuracy_halted\": [],\n        \"halting_step\": [],\n    }\n    metrics_per_step_ = {\n        \"accuracy\": [],\n        \"p\": [],\n    }\n\n    for x_batch, y_true_batch in dataloader:\n        x_batch = x_batch.to(device, dtype)  # (batch_size, n_elems)\n        y_true_batch = y_true_batch.to(device, dtype)  # (batch_size,)\n\n        y_pred_batch, p, halting_step = module(x_batch)\n        y_halted_batch = y_pred_batch.gather(\n            dim=0,\n            index=halting_step[None, :] - 1,\n        )[\n            0\n        ]  # (batch_size,)\n\n        # Computing single metrics (mean over samples in the batch)\n        accuracy_halted = (\n            ((y_halted_batch > 0) == y_true_batch).to(torch.float32).mean()\n        )\n\n        metrics_single_[\"accuracy_halted\"].append(accuracy_halted)\n        metrics_single_[\"halting_step\"].append(\n            halting_step.to(torch.float).mean()\n        )\n\n        # Computing per step metrics (mean over samples in the batch)\n        accuracy = (\n            ((y_pred_batch > 0) == y_true_batch[None, :])\n            .to(torch.float32)\n            .mean(dim=1)\n        )\n\n        metrics_per_step_[\"accuracy\"].append(accuracy)\n        metrics_per_step_[\"p\"].append(p.mean(dim=1))\n\n    metrics_single = {\n        name: torch.stack(values).mean(dim=0).cpu().numpy()\n        for name, values in metrics_single_.items()\n    }\n\n    metrics_per_step = {\n        name: torch.stack(values).mean(dim=0).cpu().numpy()\n        for name, values in metrics_per_step_.items()\n    }\n\n    return metrics_single, metrics_per_step\n\n\ndef plot_distributions(target, predicted):\n    \"\"\"Create a barplot.\n\n    Parameters\n    ----------\n    target, predicted : np.ndarray\n        Arrays of shape `(max_steps,)` representing the target and predicted\n        probability distributions.\n\n    Returns\n    -------\n    matplotlib.Figure\n    \"\"\"\n    support = list(range(1, len(target) + 1))\n\n    fig, ax = plt.subplots(dpi=140)\n\n    ax.bar(\n        support,\n        target,\n        color=\"red\",\n        label=f\"Target - Geometric({target[0].item():.2f})\",\n    )\n\n    ax.bar(\n        support,\n        predicted,\n        color=\"green\",\n        width=0.4,\n        label=\"Predicted\",\n    )\n\n    ax.set_ylim(0, 0.6)\n    ax.set_xticks(support)\n    ax.legend()\n    ax.grid()\n\n    return fig\n\n\ndef plot_accuracy(accuracy):\n    \"\"\"Create a barplot representing accuracy over different halting steps.\n\n    Parameters\n    ----------\n    accuracy : np.array\n        1D array representing accuracy if we were to take the output after\n        the corresponding step.\n\n    Returns\n    -------\n    matplotlib.Figure\n    \"\"\"\n    support = list(range(1, len(accuracy) + 1))\n\n    fig, ax = plt.subplots(dpi=140)\n\n    ax.bar(\n        support,\n        accuracy,\n        label=\"Accuracy over different steps\",\n    )\n\n    ax.set_ylim(0, 1)\n    ax.set_xticks(support)\n    ax.legend()\n    ax.grid()\n\n    return fig\n\n\ndef main(argv=None):\n    \"\"\"CLI for training.\"\"\"\n    parser = ArgumentParser()\n\n    parser.add_argument(\n        \"log_folder\",\n        type=str,\n        help=\"Folder where tensorboard logging is saved\",\n    )\n    parser.add_argument(\n        \"--batch-size\",\n        type=int,\n        default=128,\n        help=\"Batch size\",\n    )\n    parser.add_argument(\n        \"--beta\",\n        type=float,\n        default=0.01,\n        help=\"Regularization loss coefficient\",\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--device\",\n        type=str,\n        choices={\"cpu\", \"cuda\"},\n        default=\"cpu\",\n        help=\"Device to use\",\n    )\n    parser.add_argument(\n        \"--eval-frequency\",\n        type=int,\n        default=10_000,\n        help=\"Evaluation is run every `eval_frequency` steps\",\n    )\n    parser.add_argument(\n        \"--lambda-p\",\n        type=float,\n        default=0.4,\n        help=\"True probability of success for a geometric distribution\",\n    )\n    parser.add_argument(\n        \"--n-iter\",\n        type=int,\n        default=1_000_000,\n        help=\"Number of gradient steps\",\n    )\n    parser.add_argument(\n        \"--n-elems\",\n        type=int,\n        default=64,\n        help=\"Number of elements\",\n    )\n    parser.add_argument(\n        \"--n-hidden\",\n        type=int,\n        default=64,\n        help=\"Number of hidden elements in the reccurent cell\",\n    )\n    parser.add_argument(\n        \"--n-nonzero\",\n        type=int,\n        nargs=2,\n        default=(None, None),\n        help=\"Lower and upper bound on nonzero elements in the training set\",\n    )\n    parser.add_argument(\n        \"--max-steps\",\n        type=int,\n        default=20,\n        help=\"Maximum number of pondering steps\",\n    )\n\n    # Parameters\n    args = parser.parse_args(argv)\n    print(args)\n\n    device = torch.device(args.device)\n    dtype = torch.float32\n    n_eval_samples = 1000\n    batch_size_eval = 50\n\n    if args.n_nonzero[0] is None and args.n_nonzero[1] is None:\n        threshold = int(0.3 * args.n_elems)\n        range_nonzero_easy = (1, threshold)\n        range_nonzero_hard = (args.n_elems - threshold, args.n_elems)\n    else:\n        range_nonzero_easy = (1, args.n_nonzero[1])\n        range_nonzero_hard = (args.n_nonzero[1] + 1, args.n_elems)\n\n    # Tensorboard\n    log_folder = pathlib.Path(args.log_folder)\n    writer = SummaryWriter(log_folder)\n    writer.add_text(\"parameters\", json.dumps(vars(args)))\n\n    # Prepare data\n    dataloader_train = DataLoader(\n        ParityDataset(\n            n_samples=args.batch_size * args.n_iter,\n            n_elems=args.n_elems,\n            n_nonzero_min=args.n_nonzero[0],\n            n_nonzero_max=args.n_nonzero[1],\n        ),\n        batch_size=args.batch_size,\n    )  # consider specifying `num_workers` for speedups\n    eval_dataloaders = {\n        \"test\": DataLoader(\n            ParityDataset(\n                n_samples=n_eval_samples,\n                n_elems=args.n_elems,\n                n_nonzero_min=args.n_nonzero[0],\n                n_nonzero_max=args.n_nonzero[1],\n            ),\n            batch_size=batch_size_eval,\n        ),\n        f\"{range_nonzero_easy[0]}_{range_nonzero_easy[1]}\": DataLoader(\n            ParityDataset(\n                n_samples=n_eval_samples,\n                n_elems=args.n_elems,\n                n_nonzero_min=range_nonzero_easy[0],\n                n_nonzero_max=range_nonzero_easy[1],\n            ),\n            batch_size=batch_size_eval,\n        ),\n        f\"{range_nonzero_hard[0]}_{range_nonzero_hard[1]}\": DataLoader(\n            ParityDataset(\n                n_samples=n_eval_samples,\n                n_elems=args.n_elems,\n                n_nonzero_min=range_nonzero_hard[0],\n                n_nonzero_max=range_nonzero_hard[1],\n            ),\n            batch_size=batch_size_eval,\n        ),\n    }\n\n    # Model preparation\n    module = PonderNet(\n        n_elems=args.n_elems,\n        n_hidden=args.n_hidden,\n        max_steps=args.max_steps,\n    )\n    module = module.to(device, dtype)\n\n    # Loss preparation\n    loss_rec_inst = ReconstructionLoss(\n        nn.BCEWithLogitsLoss(reduction=\"none\")\n    ).to(device, dtype)\n\n    loss_reg_inst = RegularizationLoss(\n        lambda_p=args.lambda_p,\n        max_steps=args.max_steps,\n    ).to(device, dtype)\n\n    # Optimizer\n    optimizer = torch.optim.Adam(\n        module.parameters(),\n        lr=0.0003,\n    )\n\n    # Training and evaluation loops\n    iterator = tqdm(enumerate(dataloader_train), total=args.n_iter)\n    for step, (x_batch, y_true_batch) in iterator:\n        x_batch = x_batch.to(device, dtype)\n        y_true_batch = y_true_batch.to(device, dtype)\n\n        y_pred_batch, p, halting_step = module(x_batch)\n\n        loss_rec = loss_rec_inst(\n            p,\n            y_pred_batch,\n            y_true_batch,\n        )\n\n        loss_reg = loss_reg_inst(\n            p,\n        )\n\n        loss_overall = loss_rec + args.beta * loss_reg\n\n        optimizer.zero_grad()\n        loss_overall.backward()\n        torch.nn.utils.clip_grad_norm_(module.parameters(), 1)\n        optimizer.step()\n\n        # Logging\n        writer.add_scalar(\"loss_rec\", loss_rec, step)\n        writer.add_scalar(\"loss_reg\", loss_reg, step)\n        writer.add_scalar(\"loss_overall\", loss_overall, step)\n\n        # Evaluation\n        if step % args.eval_frequency == 0:\n            module.eval()\n\n            for dataloader_name, dataloader in eval_dataloaders.items():\n                metrics_single, metrics_per_step = evaluate(\n                    dataloader,\n                    module,\n                )\n                fig_dist = plot_distributions(\n                    loss_reg_inst.p_g.cpu().numpy(),\n                    metrics_per_step[\"p\"],\n                )\n                writer.add_figure(\n                    f\"distributions/{dataloader_name}\", fig_dist, step\n                )\n\n                fig_acc = plot_accuracy(metrics_per_step[\"accuracy\"])\n                writer.add_figure(\n                    f\"accuracy_per_step/{dataloader_name}\", fig_acc, step\n                )\n\n                for metric_name, metric_value in metrics_single.items():\n                    writer.add_scalar(\n                        f\"{metric_name}/{dataloader_name}\",\n                        metric_value,\n                        step,\n                    )\n\n            torch.save(module, log_folder / \"checkpoint.pth\")\n\n            module.train()\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/pondernet/utils.py",
    "content": "import torch\nimport torch.nn as nn\nfrom torch.utils.data import Dataset\n\n\nclass ParityDataset(Dataset):\n    \"\"\"Parity of vectors - binary classification dataset.\n\n    Parameters\n    ----------\n    n_samples : int\n        Number of samples to generate.\n\n    n_elems : int\n        Size of the vectors.\n\n    n_nonzero_min, n_nonzero_max : int or None\n        Minimum (inclusive) and maximum (inclusive) number of nonzero\n        elements in the feature vector. If not specified then `(1, n_elem)`.\n    \"\"\"\n\n    def __init__(\n        self,\n        n_samples,\n        n_elems,\n        n_nonzero_min=None,\n        n_nonzero_max=None,\n    ):\n        self.n_samples = n_samples\n        self.n_elems = n_elems\n\n        self.n_nonzero_min = 1 if n_nonzero_min is None else n_nonzero_min\n        self.n_nonzero_max = (\n            n_elems if n_nonzero_max is None else n_nonzero_max\n        )\n\n        assert 0 <= self.n_nonzero_min <= self.n_nonzero_max <= n_elems\n\n    def __len__(self):\n        \"\"\"Get the number of samples.\"\"\"\n        return self.n_samples\n\n    def __getitem__(self, idx):\n        \"\"\"Get a feature vector and it's parity (target).\n\n        Note that the generating process is random.\n        \"\"\"\n        x = torch.zeros((self.n_elems,))\n        n_non_zero = torch.randint(\n            self.n_nonzero_min, self.n_nonzero_max + 1, (1,)\n        ).item()\n        x[:n_non_zero] = torch.randint(0, 2, (n_non_zero,)) * 2 - 1\n        x = x[torch.randperm(self.n_elems)]\n\n        y = (x == 1.0).sum() % 2\n\n        return x, y\n\n\nclass PonderNet(nn.Module):\n    \"\"\"Network that ponders.\n\n    Parameters\n    ----------\n    n_elems : int\n        Number of features in the vector.\n\n    n_hidden : int\n        Hidden layer size of the recurrent cell.\n\n    max_steps : int\n        Maximum number of steps the network can \"ponder\" for.\n\n    allow_halting : bool\n        If True, then the forward pass is allowed to halt before\n        reaching the maximum steps.\n\n    Attributes\n    ----------\n    cell : nn.GRUCell\n        Learnable GRU cell that maps the previous hidden state and the input\n        to a new hidden state.\n\n    output_layer : nn.Linear\n        Linear module that serves as the binary classifier. It inputs\n        the hidden state.\n\n    lambda_layer : nn.Linear\n        Linear module that generates the halting probability at each step.\n\n    \"\"\"\n\n    def __init__(\n        self, n_elems, n_hidden=64, max_steps=20, allow_halting=False\n    ):\n        super().__init__()\n\n        self.max_steps = max_steps\n        self.n_hidden = n_hidden\n        self.allow_halting = allow_halting\n\n        self.cell = nn.GRUCell(n_elems, n_hidden)\n        self.output_layer = nn.Linear(n_hidden, 1)\n        self.lambda_layer = nn.Linear(n_hidden, 1)\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Batch of input features of shape `(batch_size, n_elems)`.\n\n        Returns\n        -------\n        y : torch.Tensor\n            Tensor of shape `(max_steps, batch_size)` representing\n            the predictions for each step and each sample. In case\n            `allow_halting=True` then the shape is\n            `(steps, batch_size)` where `1 <= steps <= max_steps`.\n\n        p : torch.Tensor\n            Tensor of shape `(max_steps, batch_size)` representing\n            the halting probabilities. Sums over rows (fixing a sample)\n            are 1. In case `allow_halting=True` then the shape is\n            `(steps, batch_size)` where `1 <= steps <= max_steps`.\n\n        halting_step : torch.Tensor\n            An integer for each sample in the batch that corresponds to\n            the step when it was halted. The shape is `(batch_size,)`. The\n            minimal value is 1 because we always run at least one step.\n        \"\"\"\n        batch_size, _ = x.shape\n        device = x.device\n\n        h = x.new_zeros(batch_size, self.n_hidden)\n\n        un_halted_prob = x.new_ones(batch_size)\n\n        y_list = []\n        p_list = []\n\n        halting_step = torch.zeros(\n            batch_size,\n            dtype=torch.long,\n            device=device,\n        )\n\n        for n in range(1, self.max_steps + 1):\n            if n == self.max_steps:\n                lambda_n = x.new_ones(batch_size)  # (batch_size,)\n            else:\n                lambda_n = torch.sigmoid(self.lambda_layer(h))[\n                    :, 0\n                ]  # (batch_size,)\n\n            # Store releavant outputs\n            y_list.append(self.output_layer(h)[:, 0])  # (batch_size,)\n            p_list.append(un_halted_prob * lambda_n)  # (batch_size,)\n\n            halting_step = torch.maximum(\n                n\n                * (halting_step == 0)\n                * torch.bernoulli(lambda_n).to(torch.long),\n                halting_step,\n            )\n\n            # Prepare for next iteration\n            un_halted_prob = un_halted_prob * (1 - lambda_n)\n            h = self.cell(x, h)\n\n            # Potentially stop if all samples halted\n            if self.allow_halting and (halting_step > 0).sum() == batch_size:\n                break\n\n        y = torch.stack(y_list)\n        p = torch.stack(p_list)\n\n        return y, p, halting_step\n\n\nclass ReconstructionLoss(nn.Module):\n    \"\"\"Weighted average of per step losses.\n\n    Parameters\n    ----------\n    loss_func : callable\n        Loss function that accepts `y_pred` and `y_true` as arguments. Both\n        of these tensors have shape `(batch_size,)`. It outputs a loss for\n        each sample in the batch.\n    \"\"\"\n\n    def __init__(self, loss_func):\n        super().__init__()\n\n        self.loss_func = loss_func\n\n    def forward(self, p, y_pred, y_true):\n        \"\"\"Compute loss.\n\n        Parameters\n        ----------\n        p : torch.Tensor\n            Probability of halting of shape `(max_steps, batch_size)`.\n\n        y_pred : torch.Tensor\n            Predicted outputs of shape `(max_steps, batch_size)`.\n\n        y_true : torch.Tensor\n            True targets of shape `(batch_size,)`.\n\n        Returns\n        -------\n        loss : torch.Tensor\n            Scalar representing the reconstruction loss. It is nothing else\n            than a weighted sum of per step losses.\n        \"\"\"\n        max_steps, _ = p.shape\n        total_loss = p.new_tensor(0.0)\n\n        for n in range(max_steps):\n            loss_per_sample = p[n] * self.loss_func(\n                y_pred[n], y_true\n            )  # (batch_size,)\n            total_loss = total_loss + loss_per_sample.mean()  # (1,)\n\n        return total_loss\n\n\nclass RegularizationLoss(nn.Module):\n    \"\"\"Enforce halting distribution to ressemble the geometric distribution.\n\n    Parameters\n    ----------\n    lambda_p : float\n        The single parameter determining uniquely the geometric distribution.\n        Note that the expected value of this distribution is going to be\n        `1 / lambda_p`.\n\n    max_steps : int\n        Maximum number of pondering steps.\n    \"\"\"\n\n    def __init__(self, lambda_p, max_steps=20):\n        super().__init__()\n\n        p_g = torch.zeros((max_steps,))\n        not_halted = 1.0\n\n        for k in range(max_steps):\n            p_g[k] = not_halted * lambda_p\n            not_halted = not_halted * (1 - lambda_p)\n\n        self.register_buffer(\"p_g\", p_g)\n        self.kl_div = nn.KLDivLoss(reduction=\"batchmean\")\n\n    def forward(self, p):\n        \"\"\"Compute loss.\n\n        Parameters\n        ----------\n        p : torch.Tensor\n            Probability of halting of shape `(steps, batch_size)`.\n\n        Returns\n        -------\n        loss : torch.Tensor\n            Scalar representing the regularization loss.\n        \"\"\"\n        steps, batch_size = p.shape\n\n        p = p.transpose(0, 1)  # (batch_size, max_steps)\n\n        p_g_batch = self.p_g[None, :steps].expand_as(\n            p\n        )  # (batch_size, max_steps)\n\n        return self.kl_div(p.log(), p_g_batch)\n"
  },
  {
    "path": "github_adventures/product_quantization/README.md",
    "content": "# Installation\n\nRun the following to get all the dependencies.\n```\npip install -r requirements.txt\n```\n\n# Faiss 101\nThe code for the short intro to FAISS can be found in `faiss_101_ipython.py`.\nNote that you can use `parse.py` to turn the raw fasttext embeddings\ninto a numpy array. See `run_all.sh` for example usage.\n\n# Custom PQ implementation\nThe custom PQ implementation can be found inside of `custom.py`.\n\n\n# End to end script\nThe script `run_all.sh` does the following things:\n\n* Download fasttext embeddings\n* Train multiple indexes (faiss + custom) using the embeddings\n* Serve gradio apps for similarity search comparing different indexes\n\n\n```\nchmod +x run_all.sh\n./run_all\n```\n\nDon't forget to kill the Gradio processes by `pkill -f gradio` once you\ndon't need them anymore.\n"
  },
  {
    "path": "github_adventures/product_quantization/convert.py",
    "content": "import argparse\nimport logging\nimport pathlib\nimport pickle\n\nimport faiss\n\nfrom custom import CustomIndexPQ\n\nlogger = logging.getLogger(__name__)\nlogging.basicConfig(level=logging.INFO)\n\n\ndef from_faiss(faiss_index: faiss.swigfaiss.IndexPQ) -> CustomIndexPQ:\n    if not faiss_index.is_trained:\n        raise ValueError(\"The faiss index is not trained\")\n\n    if faiss_index.ntotal == 0:\n        raise ValueError(\"The faiss index has no codes\")\n\n    d = faiss_index.d\n    m = faiss_index.code_size\n    nbits = faiss_index.pq.nbits\n    k = 2**nbits\n    ntotal = faiss_index.ntotal\n\n    custom_index = CustomIndexPQ(d=d, m=m, nbits=nbits)\n    centers = faiss.vector_to_array(faiss_index.pq.centroids).reshape(\n        m, k, d // m\n    )\n\n    logger.info(\"Copying centers from the faiss index\")\n    for i in range(m):\n        custom_index.estimators[i].cluster_centers_ = centers[i]\n    custom_index.is_trained = True\n\n    logger.info(\"Copying codes form the faiss index\")\n    custom_index.codes = faiss.vector_to_array(faiss_index.codes).reshape(\n        ntotal, m\n    )\n\n    return custom_index\n\n\ndef main() -> int:\n    parser = argparse.ArgumentParser(\"Convert from faiss to custom\")\n    parser.add_argument(\n        \"faiss_index_path\",\n        type=pathlib.Path,\n        help=\"Path to a faiss index\",\n    )\n    parser.add_argument(\n        \"output_index_path\",\n        type=pathlib.Path,\n        help=\"Path to a new custom index with faiss parameters\",\n    )\n\n    args = parser.parse_args()\n\n    faiss_index = faiss.read_index(str(args.faiss_index_path))\n    custom_index = from_faiss(faiss_index)\n\n    with args.output_index_path.open(\"wb\") as f:\n        pickle.dump(custom_index, f)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "github_adventures/product_quantization/custom.py",
    "content": "from __future__ import annotations\n\nimport logging\n\nimport numpy as np\nfrom sklearn.cluster import KMeans\nfrom sklearn.metrics.pairwise import euclidean_distances\n\nlogger = logging.getLogger(__name__)\n\nBITS2DTYPE = {\n    8: np.uint8,\n}\n\n\nclass CustomIndexPQ:\n    \"\"\"Custom IndexPQ implementation.\n\n    Parameters\n    ----------\n    d\n        Dimensionality of the original vectors.\n\n    m\n        Number of segments.\n\n    nbits\n        Number of bits.\n\n    estimator_kwargs\n        Additional hyperparameters passed onto the sklearn KMeans\n        class.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        d: int,\n        m: int,\n        nbits: int,\n        **estimator_kwargs: str | int,\n    ) -> None:\n        if d % m != 0:\n            raise ValueError(\"d needs to be a multiple of m\")\n\n        if nbits not in BITS2DTYPE:\n            raise ValueError(f\"Unsupported number of bits {nbits}\")\n\n        self.m = m\n        self.k = 2**nbits\n        self.d = d\n        self.ds = d // m\n\n        self.estimators = [\n            KMeans(n_clusters=self.k, **estimator_kwargs) for _ in range(m)\n        ]\n        logger.info(f\"Creating following estimators: {self.estimators[0]!r}\")\n\n        self.is_trained = False\n\n        self.dtype = BITS2DTYPE[nbits]\n        self.dtype_orig = np.float32\n\n        self.codes: np.ndarray | None = None\n\n    def train(self, X: np.ndarray) -> None:\n        \"\"\"Train all KMeans estimators.\n\n        Parameters\n        ----------\n        X\n            Array of shape `(n, d)` and dtype `float32`.\n\n        \"\"\"\n        if self.is_trained:\n            raise ValueError(\"Training multiple times is not allowed\")\n\n        for i in range(self.m):\n            estimator = self.estimators[i]\n            X_i = X[:, i * self.ds : (i + 1) * self.ds]\n\n            logger.info(f\"Fitting KMeans for the {i}-th segment\")\n            estimator.fit(X_i)\n\n        self.is_trained = True\n\n\n    def encode(self, X: np.ndarray) -> np.ndarray:\n        \"\"\"Encode original features into codes.\n\n        Parameters\n        ----------\n        X\n            Array of shape `(n_queries, d)` of dtype `np.float32`.\n\n        Returns\n        -------\n        result\n            Array of shape `(n_queries, m)` of dtype `np.uint8`.\n        \"\"\"\n        n = len(X)\n        result = np.empty((n, self.m), dtype=self.dtype)\n\n        for i in range(self.m):\n            estimator = self.estimators[i]\n            X_i = X[:, i * self.ds : (i + 1) * self.ds]\n            result[:, i] = estimator.predict(X_i)\n\n        return result\n\n    def add(self, X: np.ndarray) -> None:\n        \"\"\"Add vectors to the database (their encoded versions).\n\n        Parameters\n        ----------\n        X\n            Array of shape `(n_codes, d)` of dtype `np.float32`.\n        \"\"\"\n        if not self.is_trained:\n            raise ValueError(\"The quantizer needs to be trained first.\")\n        self.codes = self.encode(X)\n\n    def compute_asymmetric_distances(self, X: np.ndarray) -> np.ndarray:\n        \"\"\"Compute asymmetric distances to all database codes.\n\n        Parameters\n        ----------\n        X\n            Array of shape `(n_queries, d)` of dtype `np.float32`.\n\n        Returns\n        -------\n        distances\n            Array of shape `(n_queries, n_codes)` of dtype `np.float32`.\n\n        \"\"\"\n        if not self.is_trained:\n            raise ValueError(\"The quantizer needs to be trained first.\")\n\n        if self.codes is None:\n            raise ValueError(\"No codes detected. You need to run `add` first\")\n\n        n_queries = len(X)\n        n_codes = len(self.codes)\n\n        distance_table = np.empty(\n            (n_queries, self.m, self.k), dtype=self.dtype_orig\n        )  # (n_queries, m, k)\n\n        for i in range(self.m):\n            X_i = X[:, i * self.ds : (i + 1) * self.ds]  # (n_queries, ds)\n            centers = self.estimators[i].cluster_centers_  # (k, ds)\n            distance_table[:, i, :] = euclidean_distances(\n                X_i, centers, squared=True\n            )\n\n        distances = np.zeros((n_queries, n_codes), dtype=self.dtype_orig)\n\n        for i in range(self.m):\n            distances += distance_table[:, i, self.codes[:, i]]\n\n        return distances\n\n    def search(self, X: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray]:\n        \"\"\"Find k closest database codes to given queries.\n\n        Parameters\n        ----------\n        X\n            Array of shape `(n_queries, d)` of dtype `np.float32`.\n\n        k\n            The number of closest codes to look for.\n\n        Returns\n        -------\n        distances\n            Array of shape `(n_queries, k)`.\n\n        indices\n            Array of shape `(n_queries, k)`.\n        \"\"\"\n        n_queries = len(X)\n        distances_all = self.compute_asymmetric_distances(X)\n\n        indices = np.argsort(distances_all, axis=1)[:, :k]\n\n        distances = np.empty((n_queries, k), dtype=np.float32)\n        for i in range(n_queries):\n            distances[i] = distances_all[i][indices[i]]\n\n        return distances, indices\n"
  },
  {
    "path": "github_adventures/product_quantization/faiss_101_ipython.py",
    "content": "import numpy as np\nimport faiss\n\n# Load fast text embeddings\nembs = np.load(\"parsed_fasttext/embs.npy\")  # change path if necessary\nembs.shape\nembs.nbytes / 1e6\n\n# Prepare parameters\nd = embs.shape[1]\nm = 10\nnbits = 8\nk = 2 ** nbits\nk\n\n# Construct index\nindex = faiss.IndexPQ(d, m, nbits)\nindex.is_trained\n\n# Try encoding without any training\nindex.sa_encode(embs[:2])\n\n# Train the model\nindex.train(embs)\nindex.is_trained\nindex.ntotal\n\n# Add vectors to the database\nindex.add(embs)\nindex.ntotal\n\ncodes = faiss.vector_to_array(index.codes).reshape(index.ntotal, m)\ncodes[:3]\ncodes.nbytes / 1e6\n\n# Try searching - EXHAUSTIVE SEARCH\nindex.search(embs[:3], 4)\n\n# Quickly show that with flat index distances are precise\nflat_index = faiss.IndexFlatL2(d)\nflat_index.train(embs)\nflat_index.add(embs)\nflat_index.search(embs[:3], 4)\n"
  },
  {
    "path": "github_adventures/product_quantization/generate_index.py",
    "content": "from __future__ import annotations\n\nimport argparse\nimport logging\nimport pathlib\nimport pickle\n\nimport faiss\nimport numpy as np\n\nfrom custom import CustomIndexPQ\n\nlogger = logging.getLogger(__name__)\nlogging.basicConfig(level=logging.INFO)\n\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\n    \"input_path\",\n    type=pathlib.Path,\n    help=\"Path to the full embeddings array\",\n)\nparser.add_argument(\n    \"index_type\",\n    type=str,\n    choices=[\"faiss-flat\", \"faiss-pq\", \"our-pq\"],\n    help=\"Type of index to generate\",\n)\nparser.add_argument(\n    \"output_path\",\n    type=pathlib.Path,\n    help=\"Path to where to store the index\"\n)\n\nargs, unknown_kwargs = parser.parse_known_args()\nhyperparams: dict[str, int] = {}\n\nfor i in range(0, len(unknown_kwargs), 2):\n    key_raw, value_raw = unknown_kwargs[i], unknown_kwargs[i + 1]\n\n    key = key_raw.strip(\"--\")\n    value = int(value_raw) if value_raw.isnumeric() else value_raw\n    hyperparams[key] = value\n\nlogger.info(f\"The following hyperparameters were detected {hyperparams}\")\nlogger.info(\"Loading embeddings\")\nembs = np.load(args.input_path)\nn, d = embs.shape\n\nif args.index_type == \"faiss-flat\":\n    logger.info(\"Instantiating IndexFlatL2\")\n    index = faiss.IndexFlatL2(d)\n\nelif args.index_type == \"faiss-pq\":\n    logger.info(\"Instantiating IndexPQ\")\n    arguments = [d, hyperparams[\"m\"], hyperparams[\"nbits\"]]\n    index = faiss.IndexPQ(*arguments)\n\nelif args.index_type == \"our-pq\":\n    logger.info(\"Instantiating CustomIndexPQ\")\n    index = CustomIndexPQ(d, **hyperparams)\n\nlogger.info(\"Training the index\")\nindex.train(embs)\n\nlogger.info(\"Adding all embeddings to the index\")\nindex.add(embs)\n\nlogger.info(f\"Writing index to disk - {args.output_path}\")\n\nif args.index_type == \"our-pq\":\n    with args.output_path.open(\"wb\") as f:\n        pickle.dump(index, f)\n    \nelse:\n    faiss.write_index(index, str(args.output_path))\n"
  },
  {
    "path": "github_adventures/product_quantization/parse.py",
    "content": "from __future__ import annotations\n\nimport argparse\nimport io\nimport logging\nimport pathlib\nimport tqdm\n\nimport numpy as np\n\nlogger = logging.getLogger(__name__)\nlogging.basicConfig(level=logging.INFO)\n\n\ndef get_embeddings(path: str, maximum: int | None = None) -> tuple[list[str], np.ndarray]:\n    fin = io.open(path, 'r', encoding='utf-8', newline='\\n', errors='ignore')\n    n, d = map(int, fin.readline().split())\n    n = n if maximum is None else min(n, maximum)\n\n    embs: np.ndarray = np.empty((n, d), dtype=np.float32)\n    words: list[str] = []\n\n    for i, line in tqdm.tqdm(enumerate(fin)):\n        if maximum is not None and i == maximum:\n            break\n\n        tokens = line.rstrip().split(' ')\n\n        words.append(tokens[0])\n        embs[i] = list(map(float, tokens[1:]))\n    \n    return words, embs\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\n    \"fasttext_path\",\n    type=pathlib.Path,\n    help=\"Path to fasttext embeddings.\",\n)\nparser.add_argument(\n    \"output_dir\",\n    type=pathlib.Path,\n    help=\"Directory where we store the words and the embeddings.\"\n)\nparser.add_argument(\n    \"-m\",\n    \"--max\",\n    type=int,\n    help=\"Maximum number of embeddings to parse.\"\n)\n\nargs = parser.parse_args()\n\npath_embs = args.output_dir / \"embs.npy\"\npath_words = args.output_dir / \"words.txt\"\n\nargs.output_dir.mkdir(exist_ok=True, parents=True)\n\nlogger.info(\"Parsing\")\nwords, embs = get_embeddings(args.fasttext_path, maximum=args.max)\n\nlogger.info(\"Saving words\")\nwith path_words.open(\"w\") as f:\n    for word in words:\n        f.write(word + \"\\n\")\n    \nlogger.info(\"Saving embeddings\")\nnp.save(path_embs, embs)\n\n"
  },
  {
    "path": "github_adventures/product_quantization/requirements.txt",
    "content": "faiss-cpu==1.7.2\ngradio==3.0.17\nnumpy==1.22.4\npandas==1.4.2\nscikit-learn==1.1.1\n"
  },
  {
    "path": "github_adventures/product_quantization/run_all.sh",
    "content": "set -ex\n\n# Parameters\nURL=https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz\nRAW_FASTTEXT=raw_fasttext.vec\nMAX_WORDS=100000\nOUTPUT_FOLDER=new_results  # no slash\nSCIKIT_KWARGS='--n_init 1 --max_iter 30 --init random'\n\n# Download fasttext embeddings\nif [ ! -f $RAW_FASTTEXT ]\nthen\n    curl $URL --output $RAW_FASTTEXT.gz\n    gzip -d $RAW_FASTTEXT.gz\nfi\n\nmkdir $OUTPUT_FOLDER\n\n# Parse raw data\npython parse.py $RAW_FASTTEXT $OUTPUT_FOLDER -m $MAX_WORDS\n\n# Generate a couple of different indexes\npython generate_index.py \\\n    $OUTPUT_FOLDER/embs.npy \\\n    faiss-flat \\\n    $OUTPUT_FOLDER/flat.faiss\n\npython generate_index.py \\\n    $OUTPUT_FOLDER/embs.npy \\\n    faiss-pq \\\n    $OUTPUT_FOLDER/faisspq_m4_nbits8.faiss \\\n    --m 4 \\\n    --nbits 8\n\npython generate_index.py \\\n    $OUTPUT_FOLDER/embs.npy \\\n    faiss-pq \\\n    $OUTPUT_FOLDER/faisspq_m12_nbits8.faiss \\\n    --m 12 \\\n    --nbits 8\n\npython generate_index.py \\\n    $OUTPUT_FOLDER/embs.npy \\\n    our-pq \\\n    $OUTPUT_FOLDER/custompq_m4_nbits8.pkl \\\n    --m 4 \\\n    --nbits 8 \\\n    $SCIKIT_KWARGS\n\npython generate_index.py \\\n    $OUTPUT_FOLDER/embs.npy \\\n    our-pq \\\n    $OUTPUT_FOLDER/custompq_m12_nbits8.pkl \\\n    --m 12 \\\n    --nbits 8 \\\n    $SCIKIT_KWARGS\n\n# Convert faiss index into custom index\npython convert.py \\\n    $OUTPUT_FOLDER/faisspq_m12_nbits8.faiss \\\n    $OUTPUT_FOLDER/converted_faisspq_m12_nbits8.pkl\n\n\n# Run webapp\n\nGRADIO_SERVER_PORT=7777 python run_gradio.py \\\n    $OUTPUT_FOLDER/flat.faiss \\\n    $OUTPUT_FOLDER/faisspq_m12_nbits8.faiss \\\n    $OUTPUT_FOLDER/converted_faisspq_m12_nbits8.pkl \\\n    $OUTPUT_FOLDER/words.txt \\\n    &\n\nGRADIO_SERVER_PORT=7778 python run_gradio.py \\\n    $OUTPUT_FOLDER/flat.faiss \\\n    $OUTPUT_FOLDER/faisspq_m4_nbits8.faiss \\\n    $OUTPUT_FOLDER/faisspq_m12_nbits8.faiss \\\n    $OUTPUT_FOLDER/words.txt \\\n    &\n\n\nGRADIO_SERVER_PORT=7779 python run_gradio.py \\\n    $OUTPUT_FOLDER/flat.faiss \\\n    $OUTPUT_FOLDER/custompq_m4_nbits8.pkl \\\n    $OUTPUT_FOLDER/custompq_m12_nbits8.pkl \\\n    $OUTPUT_FOLDER/words.txt \\\n    &\n# make sure to kill the gradio processes pkill -f gradio\n"
  },
  {
    "path": "github_adventures/product_quantization/run_gradio.py",
    "content": "from __future__ import annotations\n\nimport argparse\nimport logging\nimport pathlib\nimport pickle\nimport time\nfrom functools import partial\nfrom typing import Any\n\nimport faiss\nimport gradio as gr\nimport numpy as np\nimport pandas as pd\n\nlogger = logging.getLogger(__name__)\nlogging.basicConfig(level=logging.INFO)\n\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\n    \"exact_index_path\",\n    type=pathlib.Path,\n    help=\"Path to the exact index\",\n)\nparser.add_argument(\n    \"approximate_index_path\",\n    type=pathlib.Path,\n    nargs=\"+\",\n    help=\"Path to the approximate index\",\n)\nparser.add_argument(\n    \"words_path\",\n    type=pathlib.Path,\n    help=\"Path to the text file containing words\",\n)\n\nargs = parser.parse_args()\n\ndef run(\n    word: str,\n    k: int,\n    exact_index,\n    approximate_indexes: dict[str, Any],\n    words: list[str],\n    word2ix: dict[str, int],\n) -> tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]:\n    metrics = {}\n\n    emb = exact_index.reconstruct(word2ix[word])\n\n    start = time.monotonic()\n    D, I = exact_index.search(emb[None, :], k)\n    metrics[\"time_exact\"] = time.monotonic() - start\n    D, I = D[0], I[0]\n\n    df_e = pd.DataFrame({\n        \"ix\": I,\n        \"distance\": D,\n        \"word\": [words[i] for i in I],\n    })\n    dfs_a = []\n\n    for name, approximate_index in approximate_indexes.items():\n        start = time.monotonic()\n        D, I = approximate_index.search(emb[None, :], k)\n        metrics[f\"time_approximate_{name}\"] = time.monotonic() - start\n        D, I = D[0], I[0]\n\n        df_a = pd.DataFrame({\n            \"ix\": I,\n            \"distance\": D,\n            \"word\": [words[i] for i in I],\n        })\n        dfs_a.append(df_a)\n\n        metrics[f\"recall_{name}\"] = len(np.intersect1d(df_e.word.unique(), df_a.word.unique())) / k\n\n    return df_e, *dfs_a, metrics\n\n\nlogger.info(f\"Loading words {args.words_path}\")\nwords = args.words_path.read_text().strip().split(\"\\n\")\nword2ix = {word: i for i, word in enumerate(words)}\n\nlogger.info(f\"Loading exact index {args.exact_index_path}\")\nexact_index = faiss.read_index(str(args.exact_index_path))\n\nlogger.info(f\"Loading approximate indexes {args.approximate_index_path}\")\n\napproximate_indexes = {\n}\n\nfor path in args.approximate_index_path:\n    if path.suffix in {\".pkl\", \"pickle\"}:\n        with path.open(\"rb\") as f:\n            approximate_indexes[path.stem] = pickle.load(f)\n\n    else:\n        approximate_indexes[path.stem] = faiss.read_index(str(path))\n\n# Sanity checks\nassert isinstance(exact_index, faiss.IndexFlat)\n# assert len(words) == exact_index.ntotal == approximate_index.ntotal\n\nrun_partial = partial(\n    run,\n    exact_index=exact_index,\n    approximate_indexes=approximate_indexes,\n    words=words,\n    word2ix=word2ix,\n)\n\nsetattr(run_partial, \"__name__\", \"run_function\")\n\ndemo = gr.Interface(\n    fn=run_partial,\n    inputs=[\n        gr.Textbox(lines=1, placeholder=\"Word here...\"),\n        gr.Slider(minimum=1, maximum=20, value=5, step=1),\n    ],\n    outputs=[\n        gr.DataFrame(label=\"exact\"),\n        *[gr.DataFrame(label=name) for name in approximate_indexes.keys()],\n        gr.JSON(label=\"metrics\"),\n        ],\n    allow_flagging=\"never\",\n\n)\n\ndemo.launch()\n"
  },
  {
    "path": "github_adventures/siren/activations.py",
    "content": "import pathlib\nfrom functools import partial\n\nimport torch\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom core import ImageSiren\n\ntorch.manual_seed(2)\n\ninit_functions = {\n        \"ones\": torch.nn.init.ones_,\n        \"eye\": torch.nn.init.eye_,\n        \"default\": partial(torch.nn.init.kaiming_uniform_, a=5 ** (1 / 2)),\n        \"paper\": None,\n}\n\nfor fname, func in init_functions.items():\n    path = pathlib.Path.cwd() / \"tensorboard_logs\" / fname\n    writer = SummaryWriter(path)\n\n    def fh(inst, inp, out, number=0):\n        layer_name = f\"{number}_{inst.__class__.__name__}\"\n        writer.add_histogram(layer_name, out)\n\n    model = ImageSiren(\n            hidden_layers=10,\n            hidden_features=200,\n            first_omega=30,\n            hidden_omega=30,\n            custom_init_function_=func,\n    )\n\n    for i, layer in enumerate(model.net.modules()):\n        if not i:\n            continue\n        layer.register_forward_hook(partial(fh, number=(i + 1) // 2))\n\n    inp = 2 * (torch.rand(10000, 2) - 0.5)\n    writer.add_histogram(\"0\", inp)\n    res = model(inp)\n"
  },
  {
    "path": "github_adventures/siren/core.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom scipy.ndimage import laplace, sobel\nfrom torch.utils.data import Dataset\n\n\ndef paper_init_(weight, is_first=False, omega=1):\n    \"\"\"Initialize the weigth of the Linear layer.\n\n    Parameters\n    ----------\n    weight : torch.Tensor\n        The learnable 2D weight matrix.\n\n    is_first : bool\n        If True, this Linear layer is the very first one in the network.\n\n    omega : float\n        Hyperparamter.\n    \"\"\"\n    in_features = weight.shape[1]\n\n    with torch.no_grad():\n        if is_first:\n            bound = 1 / in_features\n        else:\n            bound = np.sqrt(6 / in_features) / omega\n\n        weight.uniform_(-bound, bound)\n\n\nclass SineLayer(nn.Module):\n    \"\"\"Linear layer followed by the sine activation.\n\n    Parameters\n    ----------\n    in_features : int\n        Number of input features.\n\n    out_features : int\n        Number of output features.\n\n    bias : bool\n        If True, the bias is included.\n\n    is_first : bool\n        If True, then it represents the first layer of the network. Note that\n        it influences the initialization scheme.\n\n    omega : int\n        Hyperparameter. Determines scaling.\n\n    custom_init_function_ : None or callable\n        If None, then we are going to use the `paper_init_` defined above.\n        Otherwise, any callable that modifies the `weight` parameter in place.\n\n    Attributes\n    ----------\n    linear : nn.Linear\n        Linear layer.\n    \"\"\"\n    def __init__(\n            self,\n            in_features,\n            out_features,\n            bias=True,\n            is_first=False,\n            omega=30,\n            custom_init_function_=None,\n    ):\n        super().__init__()\n        self.omega = omega\n        self.linear = nn.Linear(in_features, out_features, bias=bias)\n\n        if custom_init_function_ is None:\n            paper_init_(self.linear.weight, is_first=is_first, omega=omega)\n        else:\n            custom_init_function_(self.linear.weight)\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Tensor of shape `(n_samples, in_features)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Tensor of shape `(n_samples, out_features).\n        \"\"\"\n        return torch.sin(self.omega * self.linear(x))\n\nclass ImageSiren(nn.Module):\n    \"\"\"Network composed of SineLayers.\n\n    Parameters\n    ----------\n    hidden_features : int\n        Number of hidden features (each hidden layer the same).\n\n    hidden_layers : int\n        Number of hidden layers.\n\n    first_omega, hidden_omega : float\n        Hyperparameter influencing scaling.\n\n    custom_init_function_ : None or callable\n        If None, then we are going to use the `paper_init_` defined above.\n        Otherwise any callable that modifies the `weight` parameter in place.\n\n    Attributes\n    ----------\n    net : nn.Sequential\n        Sequential collection of `SineLayer` and `nn.Linear` at the end.\n    \"\"\"\n    def __init__(\n            self,\n            hidden_features,\n            hidden_layers=1,\n            first_omega=30,\n            hidden_omega=30,\n            custom_init_function_=None,\n            ):\n        super().__init__()\n        in_features = 2\n        out_features = 1\n\n        net = []\n        net.append(\n                SineLayer(\n                    in_features,\n                    hidden_features,\n                    is_first=True,\n                    custom_init_function_=custom_init_function_,\n                    omega=first_omega,\n            )\n        )\n\n        for _ in range(hidden_layers):\n            net.append(\n                    SineLayer(\n                        hidden_features,\n                        hidden_features,\n                        is_first=False,\n                        custom_init_function_=custom_init_function_,\n                        omega=hidden_omega,\n                )\n            )\n\n        final_linear = nn.Linear(hidden_features, out_features)\n        if custom_init_function_ is None:\n            paper_init_(final_linear.weight, is_first=False, omega=hidden_omega)\n        else:\n            custom_init_function_(final_linear.weight)\n\n        net.append(final_linear)\n        self.net = nn.Sequential(*net)\n\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Tensor of shape `(n_samples, 2)` representing the 2D pixel coordinates.\n\n        Returns\n        -------\n        torch.Tensor\n            Tensor of shape `(n_samples, 1)` representing the predicted\n            intensities.\n        \"\"\"\n        return self.net(x)\n\n\ndef generate_coordinates(n):\n    \"\"\"Generate regular grid of 2D coordinates on [0, n] x [0, n].\n\n    Parameters\n    ----------\n    n : int\n        Number of points per dimension.\n\n    Returns\n    -------\n    coords_abs : np.ndarray\n        Array of row and column coordinates of shape `(n ** 2, 2)`.\n    \"\"\"\n    rows, cols = np.meshgrid(range(n), range(n), indexing=\"ij\")\n    coords_abs = np.stack([rows.ravel(), cols.ravel()], axis=-1)\n\n    return coords_abs\n\nclass PixelDataset(Dataset):\n    \"\"\"Dataset yielding coordinates, intensitives and (higher) derivatives.\n\n    Parameters\n    ----------\n    img : np.ndarray\n        2D image representing a grayscale image.\n\n    Attributes\n    ----------\n    size : int\n        Height and width of the square image.\n\n    coords_abs : np.ndarray\n        Array of shape `(size ** 2, 2)` representing all coordinates of the\n        `img`.\n\n    grad : np.ndarray\n        Array of shape `(size, size, 2)` representing the approximate\n        gradient in the two directions.\n\n    grad_norm : np.ndarray\n        Array of shape `(size, size)` representing the approximate gradient\n        norm of `img`.\n\n    laplace : np.ndarray\n        Array of shape `(size, size)` representing the approximate laplace operator.\n    \"\"\"\n    def __init__(self, img):\n        if not (img.ndim == 2 and img.shape[0] == img.shape[1]):\n            raise ValueError(\"Only 2D square images are supported.\")\n\n        self.img = img\n        self.size = img.shape[0]\n        self.coords_abs = generate_coordinates(self.size)\n        self.grad = np.stack([sobel(img, axis=0), sobel(img, axis=1)], axis=-1)\n        self.grad_norm = np.linalg.norm(self.grad, axis=-1)\n        self.laplace = laplace(img)\n\n    def __len__(self):\n        \"\"\"Determine the number of samples (pixels).\"\"\"\n        return self.size ** 2\n\n    def __getitem__(self, idx):\n        \"\"\"Get all relevant data for a single coordinate.\"\"\"\n        coords_abs = self.coords_abs[idx]\n        r, c = coords_abs\n\n        coords = 2 * ((coords_abs / self.size) - 0.5)\n\n        return {\n            \"coords\": coords,\n            \"coords_abs\": coords_abs,\n            \"intensity\": self.img[r, c],\n            \"grad_norm\": self.grad_norm[r, c],\n            \"grad\": self.grad[r, c],\n            \"laplace\": self.laplace[r, c],\n        }\n\n\nclass GradientUtils:\n    @staticmethod\n    def gradient(target, coords):\n        \"\"\"Compute the gradient with respect to input.\n\n        Parameters\n        ----------\n        target : torch.Tensor\n            2D tensor of shape `(n_coords, ?)` representing the targets.\n\n        coords : torch.Tensor\n            2D tensor fo shape `(n_coords, 2)` representing the coordinates.\n\n        Returns\n        -------\n        grad : torch.Tensor\n            2D tensor of shape `(n_coords, 2)` representing the gradient.\n        \"\"\"\n        return torch.autograd.grad(\n            target, coords, grad_outputs=torch.ones_like(target), create_graph=True\n        )[0]\n\n    @staticmethod\n    def divergence(grad, coords):\n        \"\"\"Compute divergence.\n\n        Parameters\n        ----------\n        grad : torch.Tensor\n            2D tensor of shape `(n_coords, 2)` representing the gradient wrt\n            x and y.\n\n        coords : torch.Tensor\n            2D tensor of shape `(n_coords, 2)` representing the coordinates.\n\n        Returns\n        -------\n        div : torch.Tensor\n            2D tensor of shape `(n_coords, 1)` representing the divergence.\n\n        Notes\n        -----\n        In a 2D case this will give us f_{xx} + f_{yy}.\n        \"\"\"\n        div = 0.0\n        for i in range(coords.shape[1]):\n            div += torch.autograd.grad(\n                grad[..., i], coords, torch.ones_like(grad[..., i]), create_graph=True,\n            )[0][..., i : i + 1]\n        return div\n\n    @staticmethod\n    def laplace(target, coords):\n        \"\"\"Compute laplace operator.\n\n        Parameters\n        ----------\n        target : torch.Tensor\n            2D tesnor of shape `(n_coords, 1)` representing the targets.\n\n        coords : torch.Tensor\n            2D tensor of shape `(n_coords, 2)` representing the coordinates.\n\n        Returns\n        -------\n        torch.Tensor\n            2D tensor of shape `(n_coords, 1)` representing the laplace.\n        \"\"\"\n        grad = GradientUtils.gradient(target, coords)\n        return GradientUtils.divergence(grad, coords)\n"
  },
  {
    "path": "github_adventures/siren/train.py",
    "content": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nfrom torch.nn import Linear, ReLU, Sequential\nfrom torch.utils.data import DataLoader\nimport tqdm\n\nfrom core import GradientUtils, ImageSiren, PixelDataset\n\n\n# Image loading\nimg_ = plt.imread(\"dog.png\")\ndownsampling_factor = 4\nimg = 2 * (img_ - 0.5)\nimg = img[::downsampling_factor, ::downsampling_factor]\nsize = img.shape[0]\n\ndataset = PixelDataset(img)\n\n# Parameters\nn_epochs = 100\nbatch_size = int(size ** 2)\nlogging_freq = 20\n\nmodel_name = \"siren\"  # \"siren\", \"mlp_relu\"\nhidden_features = 256\nhidden_layers = 3\n\ntarget = \"intensity\"  # \"intensity\", \"grad\", \"laplace\"\n\n\n# Model creation\nif model_name == \"siren\":\n    model = ImageSiren(\n        hidden_features,\n        hidden_layers=hidden_layers,\n        hidden_omega=30,\n    )\nelif model_name == \"mlp_relu\":\n    layers = [Linear(2, hidden_features), ReLU()]\n\n    for _ in range(hidden_layers):\n        layers.append(Linear(hidden_features, hidden_features))\n        layers.append(ReLU())\n\n    layers.append(Linear(hidden_features, 1))\n\n    model = Sequential(*layers)\n\n    for module in model.modules():\n        if not isinstance(module, Linear):\n            continue\n        torch.nn.init.xavier_normal_(module.weight)\nelse:\n    raise ValueError(\"Unsupported model\")\n\ndataloader = DataLoader(dataset, batch_size=batch_size)\noptim = torch.optim.Adam(lr=1e-4, params=model.parameters())\n\n# Training loop\nfor e in range(n_epochs):\n    losses = []\n    for d_batch in tqdm.tqdm(dataloader):\n        x_batch = d_batch[\"coords\"].to(torch.float32)\n        x_batch.requires_grad = True\n\n        y_true_batch = d_batch[\"intensity\"].to(torch.float32)\n        y_true_batch = y_true_batch[:, None]\n\n        y_pred_batch = model(x_batch)\n\n        if target == \"intensity\":\n            loss = ((y_true_batch - y_pred_batch) ** 2).mean()\n\n        elif target == \"grad\":\n            y_pred_g_batch = GradientUtils.gradient(y_pred_batch, x_batch)\n            y_true_g_batch = d_batch[\"grad\"].to(torch.float32)\n            loss = ((y_true_g_batch - y_pred_g_batch) ** 2).mean()\n\n        elif target == \"laplace\":\n            y_pred_l_batch = GradientUtils.laplace(y_pred_batch, x_batch)\n            y_true_l_batch = d_batch[\"laplace\"].to(torch.float32)[:, None]\n            loss = ((y_true_l_batch - y_pred_l_batch) ** 2).mean()\n\n        else:\n            raise ValueError(\"Unrecognized target\")\n\n        losses.append(loss.item())\n\n\n        optim.zero_grad()\n        loss.backward()\n        optim.step()\n\n    print(e, np.mean(losses))\n\n    if e % logging_freq == 0:\n        pred_img = np.zeros_like(img)\n        pred_img_grad_norm = np.zeros_like(img)\n        pred_img_laplace = np.zeros_like(img)\n\n        orig_img = np.zeros_like(img)\n        for d_batch in tqdm.tqdm(dataloader):\n            coords = d_batch[\"coords\"].to(torch.float32)\n            coords.requires_grad = True\n            coords_abs = d_batch[\"coords_abs\"].numpy()\n\n            pred = model(coords)\n            pred_n = pred.detach().numpy().squeeze()\n            pred_g = (\n                GradientUtils.gradient(pred, coords)\n                .norm(dim=-1)\n                .detach()\n                .numpy()\n                .squeeze()\n            )\n            pred_l = GradientUtils.laplace(pred, coords).detach().numpy().squeeze()\n\n            pred_img[coords_abs[:, 0], coords_abs[:, 1]] = pred_n\n            pred_img_grad_norm[coords_abs[:, 0], coords_abs[:, 1]] = pred_g\n            pred_img_laplace[coords_abs[:, 0], coords_abs[:, 1]] = pred_l\n\n        fig, axs = plt.subplots(3, 2, constrained_layout=True)\n        axs[0, 0].imshow(dataset.img, cmap=\"gray\")\n        axs[0, 1].imshow(pred_img, cmap=\"gray\")\n\n        axs[1, 0].imshow(dataset.grad_norm, cmap=\"gray\")\n        axs[1, 1].imshow(pred_img_grad_norm, cmap=\"gray\")\n\n        axs[2, 0].imshow(dataset.laplace, cmap=\"gray\")\n        axs[2, 1].imshow(pred_img_laplace, cmap=\"gray\")\n\n        for row in axs:\n            for ax in row:\n                ax.set_axis_off()\n\n        fig.suptitle(f\"Iteration: {e}\")\n        axs[0, 0].set_title(\"Ground truth\")\n        axs[0, 1].set_title(\"Prediction\")\n\n        plt.savefig(f\"visualization/{e}.png\")\n"
  },
  {
    "path": "github_adventures/vision_transformer/classes.txt",
    "content": "tench, Tinca_tinca\ngoldfish, Carassius_auratus\ngreat_white_shark, white_shark, man-eater, man-eating_shark, Carcharodon_carcharias\ntiger_shark, Galeocerdo_cuvieri\nhammerhead, hammerhead_shark\nelectric_ray, crampfish, numbfish, torpedo\nstingray\ncock\nhen\nostrich, Struthio_camelus\nbrambling, Fringilla_montifringilla\ngoldfinch, Carduelis_carduelis\nhouse_finch, linnet, Carpodacus_mexicanus\njunco, snowbird\nindigo_bunting, indigo_finch, indigo_bird, Passerina_cyanea\nrobin, American_robin, Turdus_migratorius\nbulbul\njay\nmagpie\nchickadee\nwater_ouzel, dipper\nkite\nbald_eagle, American_eagle, Haliaeetus_leucocephalus\nvulture\ngreat_grey_owl, great_gray_owl, Strix_nebulosa\nEuropean_fire_salamander, Salamandra_salamandra\ncommon_newt, Triturus_vulgaris\neft\nspotted_salamander, Ambystoma_maculatum\naxolotl, mud_puppy, Ambystoma_mexicanum\nbullfrog, Rana_catesbeiana\ntree_frog, tree-frog\ntailed_frog, bell_toad, ribbed_toad, tailed_toad, Ascaphus_trui\nloggerhead, loggerhead_turtle, Caretta_caretta\nleatherback_turtle, leatherback, leathery_turtle, Dermochelys_coriacea\nmud_turtle\nterrapin\nbox_turtle, box_tortoise\nbanded_gecko\ncommon_iguana, iguana, Iguana_iguana\nAmerican_chameleon, anole, Anolis_carolinensis\nwhiptail, whiptail_lizard\nagama\nfrilled_lizard, Chlamydosaurus_kingi\nalligator_lizard\nGila_monster, Heloderma_suspectum\ngreen_lizard, Lacerta_viridis\nAfrican_chameleon, Chamaeleo_chamaeleon\nKomodo_dragon, Komodo_lizard, dragon_lizard, giant_lizard, Varanus_komodoensis\nAfrican_crocodile, Nile_crocodile, Crocodylus_niloticus\nAmerican_alligator, Alligator_mississipiensis\ntriceratops\nthunder_snake, worm_snake, Carphophis_amoenus\nringneck_snake, ring-necked_snake, ring_snake\nhognose_snake, puff_adder, sand_viper\ngreen_snake, grass_snake\nking_snake, kingsnake\ngarter_snake, grass_snake\nwater_snake\nvine_snake\nnight_snake, Hypsiglena_torquata\nboa_constrictor, Constrictor_constrictor\nrock_python, rock_snake, Python_sebae\nIndian_cobra, Naja_naja\ngreen_mamba\nsea_snake\nhorned_viper, cerastes, sand_viper, horned_asp, Cerastes_cornutus\ndiamondback, diamondback_rattlesnake, Crotalus_adamanteus\nsidewinder, horned_rattlesnake, Crotalus_cerastes\ntrilobite\nharvestman, daddy_longlegs, Phalangium_opilio\nscorpion\nblack_and_gold_garden_spider, Argiope_aurantia\nbarn_spider, Araneus_cavaticus\ngarden_spider, Aranea_diademata\nblack_widow, Latrodectus_mactans\ntarantula\nwolf_spider, hunting_spider\ntick\ncentipede\nblack_grouse\nptarmigan\nruffed_grouse, partridge, Bonasa_umbellus\nprairie_chicken, prairie_grouse, prairie_fowl\npeacock\nquail\npartridge\nAfrican_grey, African_gray, Psittacus_erithacus\nmacaw\nsulphur-crested_cockatoo, Kakatoe_galerita, Cacatua_galerita\nlorikeet\ncoucal\nbee_eater\nhornbill\nhummingbird\njacamar\ntoucan\ndrake\nred-breasted_merganser, Mergus_serrator\ngoose\nblack_swan, Cygnus_atratus\ntusker\nechidna, spiny_anteater, anteater\nplatypus, duckbill, duckbilled_platypus, duck-billed_platypus, Ornithorhynchus_anatinus\nwallaby, brush_kangaroo\nkoala, koala_bear, kangaroo_bear, native_bear, Phascolarctos_cinereus\nwombat\njellyfish\nsea_anemone, anemone\nbrain_coral\nflatworm, platyhelminth\nnematode, nematode_worm, roundworm\nconch\nsnail\nslug\nsea_slug, nudibranch\nchiton, coat-of-mail_shell, sea_cradle, polyplacophore\nchambered_nautilus, pearly_nautilus, nautilus\nDungeness_crab, Cancer_magister\nrock_crab, Cancer_irroratus\nfiddler_crab\nking_crab, Alaska_crab, Alaskan_king_crab, Alaska_king_crab, Paralithodes_camtschatica\nAmerican_lobster, Northern_lobster, Maine_lobster, Homarus_americanus\nspiny_lobster, langouste, rock_lobster, crawfish, crayfish, sea_crawfish\ncrayfish, crawfish, crawdad, crawdaddy\nhermit_crab\nisopod\nwhite_stork, Ciconia_ciconia\nblack_stork, Ciconia_nigra\nspoonbill\nflamingo\nlittle_blue_heron, Egretta_caerulea\nAmerican_egret, great_white_heron, Egretta_albus\nbittern\ncrane\nlimpkin, Aramus_pictus\nEuropean_gallinule, Porphyrio_porphyrio\nAmerican_coot, marsh_hen, mud_hen, water_hen, Fulica_americana\nbustard\nruddy_turnstone, Arenaria_interpres\nred-backed_sandpiper, dunlin, Erolia_alpina\nredshank, Tringa_totanus\ndowitcher\noystercatcher, oyster_catcher\npelican\nking_penguin, Aptenodytes_patagonica\nalbatross, mollymawk\ngrey_whale, gray_whale, devilfish, Eschrichtius_gibbosus, Eschrichtius_robustus\nkiller_whale, killer, orca, grampus, sea_wolf, Orcinus_orca\ndugong, Dugong_dugon\nsea_lion\nChihuahua\nJapanese_spaniel\nMaltese_dog, Maltese_terrier, Maltese\nPekinese, Pekingese, Peke\nShih-Tzu\nBlenheim_spaniel\npapillon\ntoy_terrier\nRhodesian_ridgeback\nAfghan_hound, Afghan\nbasset, basset_hound\nbeagle\nbloodhound, sleuthhound\nbluetick\nblack-and-tan_coonhound\nWalker_hound, Walker_foxhound\nEnglish_foxhound\nredbone\nborzoi, Russian_wolfhound\nIrish_wolfhound\nItalian_greyhound\nwhippet\nIbizan_hound, Ibizan_Podenco\nNorwegian_elkhound, elkhound\notterhound, otter_hound\nSaluki, gazelle_hound\nScottish_deerhound, deerhound\nWeimaraner\nStaffordshire_bullterrier, Staffordshire_bull_terrier\nAmerican_Staffordshire_terrier, Staffordshire_terrier, American_pit_bull_terrier, pit_bull_terrier\nBedlington_terrier\nBorder_terrier\nKerry_blue_terrier\nIrish_terrier\nNorfolk_terrier\nNorwich_terrier\nYorkshire_terrier\nwire-haired_fox_terrier\nLakeland_terrier\nSealyham_terrier, Sealyham\nAiredale, Airedale_terrier\ncairn, cairn_terrier\nAustralian_terrier\nDandie_Dinmont, Dandie_Dinmont_terrier\nBoston_bull, Boston_terrier\nminiature_schnauzer\ngiant_schnauzer\nstandard_schnauzer\nScotch_terrier, Scottish_terrier, Scottie\nTibetan_terrier, chrysanthemum_dog\nsilky_terrier, Sydney_silky\nsoft-coated_wheaten_terrier\nWest_Highland_white_terrier\nLhasa, Lhasa_apso\nflat-coated_retriever\ncurly-coated_retriever\ngolden_retriever\nLabrador_retriever\nChesapeake_Bay_retriever\nGerman_short-haired_pointer\nvizsla, Hungarian_pointer\nEnglish_setter\nIrish_setter, red_setter\nGordon_setter\nBrittany_spaniel\nclumber, clumber_spaniel\nEnglish_springer, English_springer_spaniel\nWelsh_springer_spaniel\ncocker_spaniel, English_cocker_spaniel, cocker\nSussex_spaniel\nIrish_water_spaniel\nkuvasz\nschipperke\ngroenendael\nmalinois\nbriard\nkelpie\nkomondor\nOld_English_sheepdog, bobtail\nShetland_sheepdog, Shetland_sheep_dog, Shetland\ncollie\nBorder_collie\nBouvier_des_Flandres, Bouviers_des_Flandres\nRottweiler\nGerman_shepherd, German_shepherd_dog, German_police_dog, alsatian\nDoberman, Doberman_pinscher\nminiature_pinscher\nGreater_Swiss_Mountain_dog\nBernese_mountain_dog\nAppenzeller\nEntleBucher\nboxer\nbull_mastiff\nTibetan_mastiff\nFrench_bulldog\nGreat_Dane\nSaint_Bernard, St_Bernard\nEskimo_dog, husky\nmalamute, malemute, Alaskan_malamute\nSiberian_husky\ndalmatian, coach_dog, carriage_dog\naffenpinscher, monkey_pinscher, monkey_dog\nbasenji\npug, pug-dog\nLeonberg\nNewfoundland, Newfoundland_dog\nGreat_Pyrenees\nSamoyed, Samoyede\nPomeranian\nchow, chow_chow\nkeeshond\nBrabancon_griffon\nPembroke, Pembroke_Welsh_corgi\nCardigan, Cardigan_Welsh_corgi\ntoy_poodle\nminiature_poodle\nstandard_poodle\nMexican_hairless\ntimber_wolf, grey_wolf, gray_wolf, Canis_lupus\nwhite_wolf, Arctic_wolf, Canis_lupus_tundrarum\nred_wolf, maned_wolf, Canis_rufus, Canis_niger\ncoyote, prairie_wolf, brush_wolf, Canis_latrans\ndingo, warrigal, warragal, Canis_dingo\ndhole, Cuon_alpinus\nAfrican_hunting_dog, hyena_dog, Cape_hunting_dog, Lycaon_pictus\nhyena, hyaena\nred_fox, Vulpes_vulpes\nkit_fox, Vulpes_macrotis\nArctic_fox, white_fox, Alopex_lagopus\ngrey_fox, gray_fox, Urocyon_cinereoargenteus\ntabby, tabby_cat\ntiger_cat\nPersian_cat\nSiamese_cat, Siamese\nEgyptian_cat\ncougar, puma, catamount, mountain_lion, painter, panther, Felis_concolor\nlynx, catamount\nleopard, Panthera_pardus\nsnow_leopard, ounce, Panthera_uncia\njaguar, panther, Panthera_onca, Felis_onca\nlion, king_of_beasts, Panthera_leo\ntiger, Panthera_tigris\ncheetah, chetah, Acinonyx_jubatus\nbrown_bear, bruin, Ursus_arctos\nAmerican_black_bear, black_bear, Ursus_americanus, Euarctos_americanus\nice_bear, polar_bear, Ursus_Maritimus, Thalarctos_maritimus\nsloth_bear, Melursus_ursinus, Ursus_ursinus\nmongoose\nmeerkat, mierkat\ntiger_beetle\nladybug, ladybeetle, lady_beetle, ladybird, ladybird_beetle\nground_beetle, carabid_beetle\nlong-horned_beetle, longicorn, longicorn_beetle\nleaf_beetle, chrysomelid\ndung_beetle\nrhinoceros_beetle\nweevil\nfly\nbee\nant, emmet, pismire\ngrasshopper, hopper\ncricket\nwalking_stick, walkingstick, stick_insect\ncockroach, roach\nmantis, mantid\ncicada, cicala\nleafhopper\nlacewing, lacewing_fly\ndragonfly, darning_needle, devil's_darning_needle, sewing_needle, snake_feeder, snake_doctor, mosquito_hawk, skeeter_hawk\ndamselfly\nadmiral\nringlet, ringlet_butterfly\nmonarch, monarch_butterfly, milkweed_butterfly, Danaus_plexippus\ncabbage_butterfly\nsulphur_butterfly, sulfur_butterfly\nlycaenid, lycaenid_butterfly\nstarfish, sea_star\nsea_urchin\nsea_cucumber, holothurian\nwood_rabbit, cottontail, cottontail_rabbit\nhare\nAngora, Angora_rabbit\nhamster\nporcupine, hedgehog\nfox_squirrel, eastern_fox_squirrel, Sciurus_niger\nmarmot\nbeaver\nguinea_pig, Cavia_cobaya\nsorrel\nzebra\nhog, pig, grunter, squealer, Sus_scrofa\nwild_boar, boar, Sus_scrofa\nwarthog\nhippopotamus, hippo, river_horse, Hippopotamus_amphibius\nox\nwater_buffalo, water_ox, Asiatic_buffalo, Bubalus_bubalis\nbison\nram, tup\nbighorn, bighorn_sheep, cimarron, Rocky_Mountain_bighorn, Rocky_Mountain_sheep, Ovis_canadensis\nibex, Capra_ibex\nhartebeest\nimpala, Aepyceros_melampus\ngazelle\nArabian_camel, dromedary, Camelus_dromedarius\nllama\nweasel\nmink\npolecat, fitch, foulmart, foumart, Mustela_putorius\nblack-footed_ferret, ferret, Mustela_nigripes\notter\nskunk, polecat, wood_pussy\nbadger\narmadillo\nthree-toed_sloth, ai, Bradypus_tridactylus\norangutan, orang, orangutang, Pongo_pygmaeus\ngorilla, Gorilla_gorilla\nchimpanzee, chimp, Pan_troglodytes\ngibbon, Hylobates_lar\nsiamang, Hylobates_syndactylus, Symphalangus_syndactylus\nguenon, guenon_monkey\npatas, hussar_monkey, Erythrocebus_patas\nbaboon\nmacaque\nlangur\ncolobus, colobus_monkey\nproboscis_monkey, Nasalis_larvatus\nmarmoset\ncapuchin, ringtail, Cebus_capucinus\nhowler_monkey, howler\ntiti, titi_monkey\nspider_monkey, Ateles_geoffroyi\nsquirrel_monkey, Saimiri_sciureus\nMadagascar_cat, ring-tailed_lemur, Lemur_catta\nindri, indris, Indri_indri, Indri_brevicaudatus\nIndian_elephant, Elephas_maximus\nAfrican_elephant, Loxodonta_africana\nlesser_panda, red_panda, panda, bear_cat, cat_bear, Ailurus_fulgens\ngiant_panda, panda, panda_bear, coon_bear, Ailuropoda_melanoleuca\nbarracouta, snoek\neel\ncoho, cohoe, coho_salmon, blue_jack, silver_salmon, Oncorhynchus_kisutch\nrock_beauty, Holocanthus_tricolor\nanemone_fish\nsturgeon\ngar, garfish, garpike, billfish, Lepisosteus_osseus\nlionfish\npuffer, pufferfish, blowfish, globefish\nabacus\nabaya\nacademic_gown, academic_robe, judge's_robe\naccordion, piano_accordion, squeeze_box\nacoustic_guitar\naircraft_carrier, carrier, flattop, attack_aircraft_carrier\nairliner\nairship, dirigible\naltar\nambulance\namphibian, amphibious_vehicle\nanalog_clock\napiary, bee_house\napron\nashcan, trash_can, garbage_can, wastebin, ash_bin, ash-bin, ashbin, dustbin, trash_barrel, trash_bin\nassault_rifle, assault_gun\nbackpack, back_pack, knapsack, packsack, rucksack, haversack\nbakery, bakeshop, bakehouse\nbalance_beam, beam\nballoon\nballpoint, ballpoint_pen, ballpen, Biro\nBand_Aid\nbanjo\nbannister, banister, balustrade, balusters, handrail\nbarbell\nbarber_chair\nbarbershop\nbarn\nbarometer\nbarrel, cask\nbarrow, garden_cart, lawn_cart, wheelbarrow\nbaseball\nbasketball\nbassinet\nbassoon\nbathing_cap, swimming_cap\nbath_towel\nbathtub, bathing_tub, bath, tub\nbeach_wagon, station_wagon, wagon, estate_car, beach_waggon, station_waggon, waggon\nbeacon, lighthouse, beacon_light, pharos\nbeaker\nbearskin, busby, shako\nbeer_bottle\nbeer_glass\nbell_cote, bell_cot\nbib\nbicycle-built-for-two, tandem_bicycle, tandem\nbikini, two-piece\nbinder, ring-binder\nbinoculars, field_glasses, opera_glasses\nbirdhouse\nboathouse\nbobsled, bobsleigh, bob\nbolo_tie, bolo, bola_tie, bola\nbonnet, poke_bonnet\nbookcase\nbookshop, bookstore, bookstall\nbottlecap\nbow\nbow_tie, bow-tie, bowtie\nbrass, memorial_tablet, plaque\nbrassiere, bra, bandeau\nbreakwater, groin, groyne, mole, bulwark, seawall, jetty\nbreastplate, aegis, egis\nbroom\nbucket, pail\nbuckle\nbulletproof_vest\nbullet_train, bullet\nbutcher_shop, meat_market\ncab, hack, taxi, taxicab\ncaldron, cauldron\ncandle, taper, wax_light\ncannon\ncanoe\ncan_opener, tin_opener\ncardigan\ncar_mirror\ncarousel, carrousel, merry-go-round, roundabout, whirligig\ncarpenter's_kit, tool_kit\ncarton\ncar_wheel\ncash_machine, cash_dispenser, automated_teller_machine, automatic_teller_machine, automated_teller, automatic_teller, ATM\ncassette\ncassette_player\ncastle\ncatamaran\nCD_player\ncello, violoncello\ncellular_telephone, cellular_phone, cellphone, cell, mobile_phone\nchain\nchainlink_fence\nchain_mail, ring_mail, mail, chain_armor, chain_armour, ring_armor, ring_armour\nchain_saw, chainsaw\nchest\nchiffonier, commode\nchime, bell, gong\nchina_cabinet, china_closet\nChristmas_stocking\nchurch, church_building\ncinema, movie_theater, movie_theatre, movie_house, picture_palace\ncleaver, meat_cleaver, chopper\ncliff_dwelling\ncloak\nclog, geta, patten, sabot\ncocktail_shaker\ncoffee_mug\ncoffeepot\ncoil, spiral, volute, whorl, helix\ncombination_lock\ncomputer_keyboard, keypad\nconfectionery, confectionary, candy_store\ncontainer_ship, containership, container_vessel\nconvertible\ncorkscrew, bottle_screw\ncornet, horn, trumpet, trump\ncowboy_boot\ncowboy_hat, ten-gallon_hat\ncradle\ncrane\ncrash_helmet\ncrate\ncrib, cot\nCrock_Pot\ncroquet_ball\ncrutch\ncuirass\ndam, dike, dyke\ndesk\ndesktop_computer\ndial_telephone, dial_phone\ndiaper, nappy, napkin\ndigital_clock\ndigital_watch\ndining_table, board\ndishrag, dishcloth\ndishwasher, dish_washer, dishwashing_machine\ndisk_brake, disc_brake\ndock, dockage, docking_facility\ndogsled, dog_sled, dog_sleigh\ndome\ndoormat, welcome_mat\ndrilling_platform, offshore_rig\ndrum, membranophone, tympan\ndrumstick\ndumbbell\nDutch_oven\nelectric_fan, blower\nelectric_guitar\nelectric_locomotive\nentertainment_center\nenvelope\nespresso_maker\nface_powder\nfeather_boa, boa\nfile, file_cabinet, filing_cabinet\nfireboat\nfire_engine, fire_truck\nfire_screen, fireguard\nflagpole, flagstaff\nflute, transverse_flute\nfolding_chair\nfootball_helmet\nforklift\nfountain\nfountain_pen\nfour-poster\nfreight_car\nFrench_horn, horn\nfrying_pan, frypan, skillet\nfur_coat\ngarbage_truck, dustcart\ngasmask, respirator, gas_helmet\ngas_pump, gasoline_pump, petrol_pump, island_dispenser\ngoblet\ngo-kart\ngolf_ball\ngolfcart, golf_cart\ngondola\ngong, tam-tam\ngown\ngrand_piano, grand\ngreenhouse, nursery, glasshouse\ngrille, radiator_grille\ngrocery_store, grocery, food_market, market\nguillotine\nhair_slide\nhair_spray\nhalf_track\nhammer\nhamper\nhand_blower, blow_dryer, blow_drier, hair_dryer, hair_drier\nhand-held_computer, hand-held_microcomputer\nhandkerchief, hankie, hanky, hankey\nhard_disc, hard_disk, fixed_disk\nharmonica, mouth_organ, harp, mouth_harp\nharp\nharvester, reaper\nhatchet\nholster\nhome_theater, home_theatre\nhoneycomb\nhook, claw\nhoopskirt, crinoline\nhorizontal_bar, high_bar\nhorse_cart, horse-cart\nhourglass\niPod\niron, smoothing_iron\njack-o'-lantern\njean, blue_jean, denim\njeep, landrover\njersey, T-shirt, tee_shirt\njigsaw_puzzle\njinrikisha, ricksha, rickshaw\njoystick\nkimono\nknee_pad\nknot\nlab_coat, laboratory_coat\nladle\nlampshade, lamp_shade\nlaptop, laptop_computer\nlawn_mower, mower\nlens_cap, lens_cover\nletter_opener, paper_knife, paperknife\nlibrary\nlifeboat\nlighter, light, igniter, ignitor\nlimousine, limo\nliner, ocean_liner\nlipstick, lip_rouge\nLoafer\nlotion\nloudspeaker, speaker, speaker_unit, loudspeaker_system, speaker_system\nloupe, jeweler's_loupe\nlumbermill, sawmill\nmagnetic_compass\nmailbag, postbag\nmailbox, letter_box\nmaillot\nmaillot, tank_suit\nmanhole_cover\nmaraca\nmarimba, xylophone\nmask\nmatchstick\nmaypole\nmaze, labyrinth\nmeasuring_cup\nmedicine_chest, medicine_cabinet\nmegalith, megalithic_structure\nmicrophone, mike\nmicrowave, microwave_oven\nmilitary_uniform\nmilk_can\nminibus\nminiskirt, mini\nminivan\nmissile\nmitten\nmixing_bowl\nmobile_home, manufactured_home\nModel_T\nmodem\nmonastery\nmonitor\nmoped\nmortar\nmortarboard\nmosque\nmosquito_net\nmotor_scooter, scooter\nmountain_bike, all-terrain_bike, off-roader\nmountain_tent\nmouse, computer_mouse\nmousetrap\nmoving_van\nmuzzle\nnail\nneck_brace\nnecklace\nnipple\nnotebook, notebook_computer\nobelisk\noboe, hautboy, hautbois\nocarina, sweet_potato\nodometer, hodometer, mileometer, milometer\noil_filter\norgan, pipe_organ\noscilloscope, scope, cathode-ray_oscilloscope, CRO\noverskirt\noxcart\noxygen_mask\npacket\npaddle, boat_paddle\npaddlewheel, paddle_wheel\npadlock\npaintbrush\npajama, pyjama, pj's, jammies\npalace\npanpipe, pandean_pipe, syrinx\npaper_towel\nparachute, chute\nparallel_bars, bars\npark_bench\nparking_meter\npassenger_car, coach, carriage\npatio, terrace\npay-phone, pay-station\npedestal, plinth, footstall\npencil_box, pencil_case\npencil_sharpener\nperfume, essence\nPetri_dish\nphotocopier\npick, plectrum, plectron\npickelhaube\npicket_fence, paling\npickup, pickup_truck\npier\npiggy_bank, penny_bank\npill_bottle\npillow\nping-pong_ball\npinwheel\npirate, pirate_ship\npitcher, ewer\nplane, carpenter's_plane, woodworking_plane\nplanetarium\nplastic_bag\nplate_rack\nplow, plough\nplunger, plumber's_helper\nPolaroid_camera, Polaroid_Land_camera\npole\npolice_van, police_wagon, paddy_wagon, patrol_wagon, wagon, black_Maria\nponcho\npool_table, billiard_table, snooker_table\npop_bottle, soda_bottle\npot, flowerpot\npotter's_wheel\npower_drill\nprayer_rug, prayer_mat\nprinter\nprison, prison_house\nprojectile, missile\nprojector\npuck, hockey_puck\npunching_bag, punch_bag, punching_ball, punchball\npurse\nquill, quill_pen\nquilt, comforter, comfort, puff\nracer, race_car, racing_car\nracket, racquet\nradiator\nradio, wireless\nradio_telescope, radio_reflector\nrain_barrel\nrecreational_vehicle, RV, R.V.\nreel\nreflex_camera\nrefrigerator, icebox\nremote_control, remote\nrestaurant, eating_house, eating_place, eatery\nrevolver, six-gun, six-shooter\nrifle\nrocking_chair, rocker\nrotisserie\nrubber_eraser, rubber, pencil_eraser\nrugby_ball\nrule, ruler\nrunning_shoe\nsafe\nsafety_pin\nsaltshaker, salt_shaker\nsandal\nsarong\nsax, saxophone\nscabbard\nscale, weighing_machine\nschool_bus\nschooner\nscoreboard\nscreen, CRT_screen\nscrew\nscrewdriver\nseat_belt, seatbelt\nsewing_machine\nshield, buckler\nshoe_shop, shoe-shop, shoe_store\nshoji\nshopping_basket\nshopping_cart\nshovel\nshower_cap\nshower_curtain\nski\nski_mask\nsleeping_bag\nslide_rule, slipstick\nsliding_door\nslot, one-armed_bandit\nsnorkel\nsnowmobile\nsnowplow, snowplough\nsoap_dispenser\nsoccer_ball\nsock\nsolar_dish, solar_collector, solar_furnace\nsombrero\nsoup_bowl\nspace_bar\nspace_heater\nspace_shuttle\nspatula\nspeedboat\nspider_web, spider's_web\nspindle\nsports_car, sport_car\nspotlight, spot\nstage\nsteam_locomotive\nsteel_arch_bridge\nsteel_drum\nstethoscope\nstole\nstone_wall\nstopwatch, stop_watch\nstove\nstrainer\nstreetcar, tram, tramcar, trolley, trolley_car\nstretcher\nstudio_couch, day_bed\nstupa, tope\nsubmarine, pigboat, sub, U-boat\nsuit, suit_of_clothes\nsundial\nsunglass\nsunglasses, dark_glasses, shades\nsunscreen, sunblock, sun_blocker\nsuspension_bridge\nswab, swob, mop\nsweatshirt\nswimming_trunks, bathing_trunks\nswing\nswitch, electric_switch, electrical_switch\nsyringe\ntable_lamp\ntank, army_tank, armored_combat_vehicle, armoured_combat_vehicle\ntape_player\nteapot\nteddy, teddy_bear\ntelevision, television_system\ntennis_ball\nthatch, thatched_roof\ntheater_curtain, theatre_curtain\nthimble\nthresher, thrasher, threshing_machine\nthrone\ntile_roof\ntoaster\ntobacco_shop, tobacconist_shop, tobacconist\ntoilet_seat\ntorch\ntotem_pole\ntow_truck, tow_car, wrecker\ntoyshop\ntractor\ntrailer_truck, tractor_trailer, trucking_rig, rig, articulated_lorry, semi\ntray\ntrench_coat\ntricycle, trike, velocipede\ntrimaran\ntripod\ntriumphal_arch\ntrolleybus, trolley_coach, trackless_trolley\ntrombone\ntub, vat\nturnstile\ntypewriter_keyboard\numbrella\nunicycle, monocycle\nupright, upright_piano\nvacuum, vacuum_cleaner\nvase\nvault\nvelvet\nvending_machine\nvestment\nviaduct\nviolin, fiddle\nvolleyball\nwaffle_iron\nwall_clock\nwallet, billfold, notecase, pocketbook\nwardrobe, closet, press\nwarplane, military_plane\nwashbasin, handbasin, washbowl, lavabo, wash-hand_basin\nwasher, automatic_washer, washing_machine\nwater_bottle\nwater_jug\nwater_tower\nwhiskey_jug\nwhistle\nwig\nwindow_screen\nwindow_shade\nWindsor_tie\nwine_bottle\nwing\nwok\nwooden_spoon\nwool, woolen, woollen\nworm_fence, snake_fence, snake-rail_fence, Virginia_fence\nwreck\nyawl\nyurt\nweb_site, website, internet_site, site\ncomic_book\ncrossword_puzzle, crossword\nstreet_sign\ntraffic_light, traffic_signal, stoplight\nbook_jacket, dust_cover, dust_jacket, dust_wrapper\nmenu\nplate\nguacamole\nconsomme\nhot_pot, hotpot\ntrifle\nice_cream, icecream\nice_lolly, lolly, lollipop, popsicle\nFrench_loaf\nbagel, beigel\npretzel\ncheeseburger\nhotdog, hot_dog, red_hot\nmashed_potato\nhead_cabbage\nbroccoli\ncauliflower\nzucchini, courgette\nspaghetti_squash\nacorn_squash\nbutternut_squash\ncucumber, cuke\nartichoke, globe_artichoke\nbell_pepper\ncardoon\nmushroom\nGranny_Smith\nstrawberry\norange\nlemon\nfig\npineapple, ananas\nbanana\njackfruit, jak, jack\ncustard_apple\npomegranate\nhay\ncarbonara\nchocolate_sauce, chocolate_syrup\ndough\nmeat_loaf, meatloaf\npizza, pizza_pie\npotpie\nburrito\nred_wine\nespresso\ncup\neggnog\nalp\nbubble\ncliff, drop, drop-off\ncoral_reef\ngeyser\nlakeside, lakeshore\npromontory, headland, head, foreland\nsandbar, sand_bar\nseashore, coast, seacoast, sea-coast\nvalley, vale\nvolcano\nballplayer, baseball_player\ngroom, bridegroom\nscuba_diver\nrapeseed\ndaisy\nyellow_lady's_slipper, yellow_lady-slipper, Cypripedium_calceolus, Cypripedium_parviflorum\ncorn\nacorn\nhip, rose_hip, rosehip\nbuckeye, horse_chestnut, conker\ncoral_fungus\nagaric\ngyromitra\nstinkhorn, carrion_fungus\nearthstar\nhen-of-the-woods, hen_of_the_woods, Polyporus_frondosus, Grifola_frondosa\nbolete\near, spike, capitulum\ntoilet_tissue, toilet_paper, bathroom_tissue\n"
  },
  {
    "path": "github_adventures/vision_transformer/custom.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\"Split image into patches and then embed them.\n\n    Parameters\n    ----------\n    img_size : int\n        Size of the image (it is a square).\n\n    patch_size : int\n        Size of the patch (it is a square).\n\n    in_chans : int\n        Number of input channels.\n\n    embed_dim : int\n        The emmbedding dimension.\n\n    Attributes\n    ----------\n    n_patches : int\n        Number of patches inside of our image.\n\n    proj : nn.Conv2d\n        Convolutional layer that does both the splitting into patches\n        and their embedding.\n    \"\"\"\n    def __init__(self, img_size, patch_size, in_chans=3, embed_dim=768):\n        super().__init__()\n        self.img_size = img_size\n        self.patch_size = patch_size\n        self.n_patches = (img_size // patch_size) ** 2\n\n\n        self.proj = nn.Conv2d(\n                in_chans,\n                embed_dim,\n                kernel_size=patch_size,\n                stride=patch_size,\n        )\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, in_chans, img_size, img_size)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_patches, embed_dim)`.\n        \"\"\"\n        x = self.proj(\n                x\n            )  # (n_samples, embed_dim, n_patches ** 0.5, n_patches ** 0.5)\n        x = x.flatten(2)  # (n_samples, embed_dim, n_patches)\n        x = x.transpose(1, 2)  # (n_samples, n_patches, embed_dim)\n\n        return x\n\n\nclass Attention(nn.Module):\n    \"\"\"Attention mechanism.\n\n    Parameters\n    ----------\n    dim : int\n        The input and out dimension of per token features.\n\n    n_heads : int\n        Number of attention heads.\n\n    qkv_bias : bool\n        If True then we include bias to the query, key and value projections.\n\n    attn_p : float\n        Dropout probability applied to the query, key and value tensors.\n\n    proj_p : float\n        Dropout probability applied to the output tensor.\n\n\n    Attributes\n    ----------\n    scale : float\n        Normalizing consant for the dot product.\n\n    qkv : nn.Linear\n        Linear projection for the query, key and value.\n\n    proj : nn.Linear\n        Linear mapping that takes in the concatenated output of all attention\n        heads and maps it into a new space.\n\n    attn_drop, proj_drop : nn.Dropout\n        Dropout layers.\n    \"\"\"\n    def __init__(self, dim, n_heads=12, qkv_bias=True, attn_p=0., proj_p=0.):\n        super().__init__()\n        self.n_heads = n_heads\n        self.dim = dim\n        self.head_dim = dim // n_heads\n        self.scale = self.head_dim ** -0.5\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_p)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_p)\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_patches + 1, dim)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_patches + 1, dim)`.\n        \"\"\"\n        n_samples, n_tokens, dim = x.shape\n\n        if dim != self.dim:\n            raise ValueError\n\n        qkv = self.qkv(x)  # (n_samples, n_patches + 1, 3 * dim)\n        qkv = qkv.reshape(\n                n_samples, n_tokens, 3, self.n_heads, self.head_dim\n        )  # (n_smaples, n_patches + 1, 3, n_heads, head_dim)\n        qkv = qkv.permute(\n                2, 0, 3, 1, 4\n        )  # (3, n_samples, n_heads, n_patches + 1, head_dim)\n\n        q, k, v = qkv[0], qkv[1], qkv[2]\n        k_t = k.transpose(-2, -1)  # (n_samples, n_heads, head_dim, n_patches + 1)\n        dp = (\n           q @ k_t\n        ) * self.scale # (n_samples, n_heads, n_patches + 1, n_patches + 1)\n        attn = dp.softmax(dim=-1)  # (n_samples, n_heads, n_patches + 1, n_patches + 1)\n        attn = self.attn_drop(attn)\n\n        weighted_avg = attn @ v  # (n_samples, n_heads, n_patches +1, head_dim)\n        weighted_avg = weighted_avg.transpose(\n                1, 2\n        )  # (n_samples, n_patches + 1, n_heads, head_dim)\n        weighted_avg = weighted_avg.flatten(2)  # (n_samples, n_patches + 1, dim)\n\n        x = self.proj(weighted_avg)  # (n_samples, n_patches + 1, dim)\n        x = self.proj_drop(x)  # (n_samples, n_patches + 1, dim)\n\n        return x\n\n\nclass MLP(nn.Module):\n    \"\"\"Multilayer perceptron.\n\n    Parameters\n    ----------\n    in_features : int\n        Number of input features.\n\n    hidden_features : int\n        Number of nodes in the hidden layer.\n\n    out_features : int\n        Number of output features.\n\n    p : float\n        Dropout probability.\n\n    Attributes\n    ----------\n    fc : nn.Linear\n        The First linear layer.\n\n    act : nn.GELU\n        GELU activation function.\n\n    fc2 : nn.Linear\n        The second linear layer.\n\n    drop : nn.Dropout\n        Dropout layer.\n    \"\"\"\n    def __init__(self, in_features, hidden_features, out_features, p=0.):\n        super().__init__()\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = nn.GELU()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(p)\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_patches + 1, in_features)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_patches +1, out_features)`\n        \"\"\"\n        x = self.fc1(\n                x\n        ) # (n_samples, n_patches + 1, hidden_features)\n        x = self.act(x)  # (n_samples, n_patches + 1, hidden_features)\n        x = self.drop(x)  # (n_samples, n_patches + 1, hidden_features)\n        x = self.fc2(x)  # (n_samples, n_patches + 1, out_features)\n        x = self.drop(x)  # (n_samples, n_patches + 1, out_features)\n\n        return x\n\n\nclass Block(nn.Module):\n    \"\"\"Transformer block.\n\n    Parameters\n    ----------\n    dim : int\n        Embeddinig dimension.\n\n    n_heads : int\n        Number of attention heads.\n\n    mlp_ratio : float\n        Determines the hidden dimension size of the `MLP` module with respect\n        to `dim`.\n\n    qkv_bias : bool\n        If True then we include bias to the query, key and value projections.\n\n    p, attn_p : float\n        Dropout probability.\n\n    Attributes\n    ----------\n    norm1, norm2 : LayerNorm\n        Layer normalization.\n\n    attn : Attention\n        Attention module.\n\n    mlp : MLP\n        MLP module.\n    \"\"\"\n    def __init__(self, dim, n_heads, mlp_ratio=4.0, qkv_bias=True, p=0., attn_p=0.):\n        super().__init__()\n        self.norm1 = nn.LayerNorm(dim, eps=1e-6)\n        self.attn = Attention(\n                dim,\n                n_heads=n_heads,\n                qkv_bias=qkv_bias,\n                attn_p=attn_p,\n                proj_p=p\n        )\n        self.norm2 = nn.LayerNorm(dim, eps=1e-6)\n        hidden_features = int(dim * mlp_ratio)\n        self.mlp = MLP(\n                in_features=dim,\n                hidden_features=hidden_features,\n                out_features=dim,\n        )\n\n    def forward(self, x):\n        \"\"\"Run forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, n_patches + 1, dim)`.\n\n        Returns\n        -------\n        torch.Tensor\n            Shape `(n_samples, n_patches + 1, dim)`.\n        \"\"\"\n        x = x + self.attn(self.norm1(x))\n        x = x + self.mlp(self.norm2(x))\n\n        return x\n\n\nclass VisionTransformer(nn.Module):\n    \"\"\"Simplified implementation of the Vision transformer.\n\n    Parameters\n    ----------\n    img_size : int\n        Both height and the width of the image (it is a square).\n\n    patch_size : int\n        Both height and the width of the patch (it is a square).\n\n    in_chans : int\n        Number of input channels.\n\n    n_classes : int\n        Number of classes.\n\n    embed_dim : int\n        Dimensionality of the token/patch embeddings.\n\n    depth : int\n        Number of blocks.\n\n    n_heads : int\n        Number of attention heads.\n\n    mlp_ratio : float\n        Determines the hidden dimension of the `MLP` module.\n\n    qkv_bias : bool\n        If True then we include bias to the query, key and value projections.\n\n    p, attn_p : float\n        Dropout probability.\n\n    Attributes\n    ----------\n    patch_embed : PatchEmbed\n        Instance of `PatchEmbed` layer.\n\n    cls_token : nn.Parameter\n        Learnable parameter that will represent the first token in the sequence.\n        It has `embed_dim` elements.\n\n    pos_emb : nn.Parameter\n        Positional embedding of the cls token + all the patches.\n        It has `(n_patches + 1) * embed_dim` elements.\n\n    pos_drop : nn.Dropout\n        Dropout layer.\n\n    blocks : nn.ModuleList\n        List of `Block` modules.\n\n    norm : nn.LayerNorm\n        Layer normalization.\n    \"\"\"\n    def __init__(\n            self,\n            img_size=384,\n            patch_size=16,\n            in_chans=3,\n            n_classes=1000,\n            embed_dim=768,\n            depth=12,\n            n_heads=12,\n            mlp_ratio=4.,\n            qkv_bias=True,\n            p=0.,\n            attn_p=0.,\n    ):\n        super().__init__()\n\n        self.patch_embed = PatchEmbed(\n                img_size=img_size,\n                patch_size=patch_size,\n                in_chans=in_chans,\n                embed_dim=embed_dim,\n        )\n        self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))\n        self.pos_embed = nn.Parameter(\n                torch.zeros(1, 1 + self.patch_embed.n_patches, embed_dim)\n        )\n        self.pos_drop = nn.Dropout(p=p)\n\n        self.blocks = nn.ModuleList(\n            [\n                Block(\n                    dim=embed_dim,\n                    n_heads=n_heads,\n                    mlp_ratio=mlp_ratio,\n                    qkv_bias=qkv_bias,\n                    p=p,\n                    attn_p=attn_p,\n                )\n                for _ in range(depth)\n            ]\n        )\n\n        self.norm = nn.LayerNorm(embed_dim, eps=1e-6)\n        self.head = nn.Linear(embed_dim, n_classes)\n\n\n    def forward(self, x):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Shape `(n_samples, in_chans, img_size, img_size)`.\n\n        Returns\n        -------\n        logits : torch.Tensor\n            Logits over all the classes - `(n_samples, n_classes)`.\n        \"\"\"\n        n_samples = x.shape[0]\n        x = self.patch_embed(x)\n\n        cls_token = self.cls_token.expand(\n                n_samples, -1, -1\n        )  # (n_samples, 1, embed_dim)\n        x = torch.cat((cls_token, x), dim=1)  # (n_samples, 1 + n_patches, embed_dim)\n        x = x + self.pos_embed  # (n_samples, 1 + n_patches, embed_dim)\n        x = self.pos_drop(x)\n\n        for block in self.blocks:\n            x = block(x)\n\n        x = self.norm(x)\n\n        cls_token_final = x[:, 0]  # just the CLS token\n        x = self.head(cls_token_final)\n\n        return x\n"
  },
  {
    "path": "github_adventures/vision_transformer/forward.py",
    "content": "import numpy as np\nfrom PIL import Image\nimport torch\n\nk = 10\n\nimagenet_labels = dict(enumerate(open(\"classes.txt\")))\n\nmodel = torch.load(\"model.pth\")\nmodel.eval()\n\nimg = (np.array(Image.open(\"cat.png\")) / 128) - 1  # in the range -1, 1\ninp = torch.from_numpy(img).permute(2, 0, 1).unsqueeze(0).to(torch.float32)\nlogits = model(inp)\nprobs = torch.nn.functional.softmax(logits, dim=-1)\n\ntop_probs, top_ixs = probs[0].topk(k)\n\nfor i, (ix_, prob_) in enumerate(zip(top_ixs, top_probs)):\n    ix = ix_.item()\n    prob = prob_.item()\n    cls = imagenet_labels[ix].strip()\n    print(f\"{i}: {cls:<45} --- {prob:.4f}\")\n"
  },
  {
    "path": "github_adventures/vision_transformer/verify.py",
    "content": "import numpy as np\nimport timm\nimport torch\nfrom custom import VisionTransformer\n\n# Helpers\ndef get_n_params(module):\n    return sum(p.numel() for p in module.parameters() if p.requires_grad)\n\ndef assert_tensors_equal(t1, t2):\n    a1, a2 = t1.detach().numpy(), t2.detach().numpy()\n\n    np.testing.assert_allclose(a1, a2)\n\nmodel_name = \"vit_base_patch16_384\"\nmodel_official = timm.create_model(model_name, pretrained=True)\nmodel_official.eval()\nprint(type(model_official))\n\ncustom_config = {\n        \"img_size\": 384,\n        \"in_chans\": 3,\n        \"patch_size\": 16,\n        \"embed_dim\": 768,\n        \"depth\": 12,\n        \"n_heads\": 12,\n        \"qkv_bias\": True,\n        \"mlp_ratio\": 4,\n}\n\nmodel_custom = VisionTransformer(**custom_config)\nmodel_custom.eval()\n\n\nfor (n_o, p_o), (n_c, p_c) in zip(\n        model_official.named_parameters(), model_custom.named_parameters()\n):\n    assert p_o.numel() == p_c.numel()\n    print(f\"{n_o} | {n_c}\")\n\n    p_c.data[:] = p_o.data\n\n    assert_tensors_equal(p_c.data, p_o.data)\n\ninp = torch.rand(1, 3, 384, 384)\nres_c = model_custom(inp)\nres_o = model_official(inp)\n\n# Asserts\nassert get_n_params(model_custom) == get_n_params(model_official)\nassert_tensors_equal(res_c, res_o)\n\n# Save custom model\ntorch.save(model_custom, \"model.pth\")\n"
  },
  {
    "path": "mini_tutorials/bentoml/README.md",
    "content": "1. [Resources](#resources)\n2. [Installation](#installation)\n3. [Instructions](#instructions)\n    1. [`bentoml`](#bentoml)\n    1. [`bentoctl`](#bentoctl)\n    1. [`aws` CLI](#aws-cli)\n4. [Sketches](#sketches)\n\n# Resources\n* https://docs.bentoml.com/en/latest/\n* https://github.com/bentoml/bentoctl\n* https://github.com/bentoml/aws-sagemaker-deploy\n\n# Installation\n```bash\npip install -r requirements.txt\n```\nSee below the actual versions at the time of making the video\n```txt\nbentoctl==0.4.0\nbentoml==1.1.9\nboto3==1.29.0\nnumpy==1.26.2\npydantic==2.5.1\npydantic_core==2.14.3\nscikit-learn==1.3.2\n```\n\n# Instructions\n## `bentoml`\nCreating a model\n```bash\npython create_model.py\n```\n\nListing all existing models\n```bash\nbentoml models list\n```\n\nBuild a bento\n```bash\nbentoml build\n```\n\nList all existing bentos\n```bash\nbentoml list\n```\n\nServe a bento locally\n```bash\nbentoml serve $BENTO\n```\n\nServe a `service.py` (development)\n```bash\nbentoml serve service.py\n```\n\n## `bentoctl`\nInstall SageMaker operator\n```bash\nbentoctl operator install aws-sagemaker\n```\n\nInitialize\n```bash\nbentoctl init\n```\n\nATTENTION: All of the below assumes that you have correctly set up AWS\nsecret keys and permissions.\n\nBuild custom customized SageMaker image and push to ECR\n\n```bash\nbentoctl build -f deployment_config.yaml -b $BENTO\n```\n\n\nInitialize terraform\n```bash\nterraform init\n```\n\nLook at what changes will be applied\n\n```bash\nterraform plan -var-file=bentoctl.tfvars\n```\n\nActually apply changes\n```bash\nterraform apply -var-file=bentoctl.tfvars\n```\n\nSend request to the API Gateway\n```bash\ncurl -X 'POST' \"$URL/classify\"   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{            \n  \"sepal_width\": 0,\n  \"sepal_length\": 0,\n  \"petal_width\": 0,\n  \"petal_length\": 0\n}'\n```\n\nDestroy resources (not including ECR)\n\n```bash\nterraform destroy -var-file=bentoctl.tfvars\n```\n\nDestroy resources  including ECR)\n\n```bash\nbentoctl destroy\n```\n\n## `aws` CLI\nDescribe repositories\n```bash\naws ecr describe-repositories\n```\n\nList all images in the repository `amazing-iris`\n```bash\naws ecr list-images --repository-name=amazing-iris\n```\n\nList SageMaker models\n```bash\naws sagemaker list-models\n```\n\nList SageMaker endpoints\n```bash\naws sagemaker list-endpoints\n```\n\n\n# Sketches\n<img width=\"1250\" alt=\"bentoml-overview\" src=\"https://github.com/jankrepl/mildlyoverfitted/assets/18519371/eeb60c7f-2bbd-40df-9d89-95c0a720c16b\">\n<img width=\"1069\" alt=\"sklearn-sagemaker\" src=\"https://github.com/jankrepl/mildlyoverfitted/assets/18519371/58848152-cffb-4ec2-8b8f-b25a3d6647c0\">\n"
  },
  {
    "path": "mini_tutorials/bentoml/bentofile.yaml",
    "content": "service: \"service:svc\"\ninclude:\n- \"service.py\"\npython:\n  packages:\n  - pydantic\n  - scikit-learn\nmodels:\n- iris_clf:latest\n"
  },
  {
    "path": "mini_tutorials/bentoml/create_model.py",
    "content": "import bentoml\n\nfrom sklearn import datasets\nfrom sklearn import svm\n\niris = datasets.load_iris()\nX, y = iris.data, iris.target\n\nclf = svm.SVC(gamma=\"scale\")\nclf.fit(X, y)\n\nsaved_model = bentoml.sklearn.save_model(\"iris_clf\", clf)\nprint(saved_model)\n"
  },
  {
    "path": "mini_tutorials/bentoml/requirements.txt",
    "content": "bentoctl\nbentoml\nboto3\nnumpy\npydantic\nscikit-learn\n"
  },
  {
    "path": "mini_tutorials/bentoml/service.py",
    "content": "from typing import Literal\n\nimport bentoml\n\nfrom pydantic import BaseModel\nfrom bentoml.io import JSON\n\n\niris_clf_runner = bentoml.sklearn.get(\"iris_clf:latest\").to_runner()\n\nsvc = bentoml.Service(\"iris_classifier\", runners=[iris_clf_runner])\n\nclass Request(BaseModel):\n    sepal_width: float\n    sepal_length: float\n    petal_width: float\n    petal_length: float\n\nclass Response(BaseModel):\n    label: Literal[\"setosa\", \"versicolor\", \"virginica\"]\n\n\n@svc.api(input=JSON(pydantic_model=Request), output=JSON(pydantic_model=Response))\ndef classify(request: Request) -> Response:\n    input_ = [\n        request.sepal_width,\n        request.sepal_length,\n        request.petal_width,\n        request.petal_length,\n    ]\n\n    label_index = iris_clf_runner.predict.run([input_])[0]\n    label = [\"setosa\", \"versicolor\", \"virginica\"][label_index]\n\n    return Response(label=label)\n\n\n\n\n"
  },
  {
    "path": "mini_tutorials/custom_optimizer_in_pytorch/custom.py",
    "content": "import numpy as np\nimport torch\nfrom torch.optim import Optimizer\n\nclass WeirdDescent(Optimizer):\n    \"\"\"Take a coordinate descent step for a random parameter.\n\n    And also, make every 100th step way bigger.\n    \"\"\"\n    def __init__(self, parameters, lr=1e-3):\n        defaults = {\"lr\": lr}\n        super().__init__(parameters, defaults)\n\n    def step(self, closure=None):\n        loss = None\n        if closure is not None:\n            loss = closure()\n\n        if not self.state:\n            self.state[\"step\"] = 1\n        else:\n            self.state[\"step\"] += 1\n\n        c = 1\n        if self.state[\"step\"] % 100 == 0:\n            c = 100\n\n        grad = None\n        while grad is None:\n            param_group = np.random.choice(self.param_groups)\n            tensor = np.random.choice(param_group[\"params\"])\n            grad = tensor.grad.data\n\n        element_ix = np.random.randint(tensor.numel())\n\n        mask_flat = torch.zeros(tensor.numel())\n        mask_flat[element_ix] = 1\n        mask = mask_flat.reshape(tensor.shape)\n\n        tensor.data.add_(grad * mask, alpha=-param_group[\"lr\"] * c)\n\n        return loss\n"
  },
  {
    "path": "mini_tutorials/custom_optimizer_in_pytorch/src.py",
    "content": "from matplotlib.animation import FuncAnimation\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nfrom torch.optim import Adam, SGD\nfrom tqdm import tqdm\n\nfrom custom import WeirdDescent\n\ndef rosenbrock(xy):\n    \"\"\"Evaluate Rosenbrock function.\n\n    Parameters\n    ----------\n    xy : tuple\n        Two element tuple of floats representing the x resp. y coordinates.\n\n    Returns\n    -------\n    float\n        The Rosenbrock function evaluated at the point `xy`.\n    \"\"\"\n    x, y = xy\n\n    return (1 - x) ** 2 + 100 * (y - x ** 2) ** 2\n\ndef run_optimization(xy_init, optimizer_class, n_iter, **optimizer_kwargs):\n    \"\"\"Run optimization finding the minimum of the Rosenbrock function.\n\n    Parameters\n    ----------\n    xy_init : tuple\n        Two floats representing the x resp. y coordinates.\n\n    optimizer_class : object\n        Optimizer class.\n\n    n_iter : int\n        Number of iterations to run the optimization for.\n\n    optimizer_kwargs : dict\n        Additional parameters to be passed into the optimizer.\n\n    Returns\n    -------\n    path : np.ndarray\n        2D array of shape `(n_iter + 1, 2)`. Where the rows represent the\n        iteration and the columns represent the x resp. y coordinates.\n    \"\"\"\n    xy_t = torch.tensor(xy_init, requires_grad=True)\n    optimizer = optimizer_class([xy_t], **optimizer_kwargs)\n\n    path = np.empty((n_iter + 1, 2))\n    path[0, :] = xy_init\n\n    for i in tqdm(range(1, n_iter + 1)):\n        optimizer.zero_grad()\n        loss = rosenbrock(xy_t)\n        loss.backward()\n        torch.nn.utils.clip_grad_norm_(xy_t, 1.0)\n        optimizer.step()\n\n        path[i, :] = xy_t.detach().numpy()\n\n    return path\n\ndef create_animation(paths,\n                     colors,\n                     names,\n                     figsize=(12, 12),\n                     x_lim=(-2, 2),\n                     y_lim=(-1, 3),\n                     n_seconds=5):\n    \"\"\"Create an animation.\n\n    Parameters\n    ----------\n    paths : list\n        List of arrays representing the paths (history of x,y coordinates) the\n        optimizer went through.\n\n    colors :  list\n        List of strings representing colors for each path.\n\n    names : list\n        List of strings representing names for each path.\n\n    figsize : tuple\n        Size of the figure.\n\n    x_lim, y_lim : tuple\n        Range of the x resp. y axis.\n\n    n_seconds : int\n        Number of seconds the animation should last.\n\n    Returns\n    -------\n    anim : FuncAnimation\n        Animation of the paths of all the optimizers.\n    \"\"\"\n    if not (len(paths) == len(colors) == len(names)):\n        raise ValueError\n\n    path_length = max(len(path) for path in paths)\n\n    n_points = 300\n    x = np.linspace(*x_lim, n_points)\n    y = np.linspace(*y_lim, n_points)\n    X, Y = np.meshgrid(x, y)\n    Z = rosenbrock([X, Y])\n\n    minimum = (1.0, 1.0)\n\n    fig, ax = plt.subplots(figsize=figsize)\n    ax.contour(X, Y, Z, 90, cmap=\"jet\")\n\n    scatters = [ax.scatter(None,\n                           None,\n                           label=label,\n                           c=c) for c, label in zip(colors, names)]\n\n    ax.legend(prop={\"size\": 25})\n    ax.plot(*minimum, \"rD\")\n\n    def animate(i):\n        for path, scatter in zip(paths, scatters):\n            scatter.set_offsets(path[:i, :])\n\n        ax.set_title(str(i))\n\n    ms_per_frame = 1000 * n_seconds / path_length\n\n    anim = FuncAnimation(fig, animate, frames=path_length, interval=ms_per_frame)\n\n    return anim\n\nif __name__ == \"__main__\":\n    xy_init = (.3, .8)\n    n_iter = 1500\n\n    path_adam = run_optimization(xy_init, Adam, n_iter)\n    path_sgd = run_optimization(xy_init, SGD, n_iter, lr=1e-3)\n    path_weird = run_optimization(xy_init, WeirdDescent, n_iter, lr=1e-3)\n\n    freq = 10\n\n    paths = [path_adam[::freq], path_sgd[::freq], path_weird[::freq]]\n    colors = [\"green\", \"blue\", \"black\"]\n    names = [\"Adam\", \"SGD\", \"Weird\"]\n\n    anim = create_animation(paths,\n                            colors,\n                            names,\n                            figsize=(12, 7),\n                            x_lim=(-.1, 1.1),\n                            y_lim=(-.1, 1.1),\n                            n_seconds=7)\n\n    anim.save(\"result.gif\")\n    print(path_weird[-15:])\n"
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/Dockerfile",
    "content": "FROM huggingface/transformers-pytorch-gpu\n\nRUN python3 -c \"from transformers import AutoModel;AutoModel.from_pretrained('bert-base-uncased')\"\nRUN python3 -c \"from transformers import AutoTokenizer;AutoTokenizer.from_pretrained('bert-base-uncased')\"\n\nRUN pip install fastapi uvicorn\n\nEXPOSE 8888\nENTRYPOINT [\"transformers-cli\", \"serve\", \"--port=8888\", \"--host=0.0.0.0\", \"--task=fill-mask\", \"--model=bert-base-uncased\"]\n"
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/DockerfileConda",
    "content": "FROM continuumio/miniconda3\n\nRUN conda install -c conda-forge pytorch-cpu\nRUN conda install -c conda-forge fastapi\nRUN conda install -c conda-forge uvicorn\nRUN conda install -c huggingface transformers\nRUN conda install -c conda-forge huggingface_hub=0.2.1\n\nRUN python3 -c \"from transformers import AutoModel;AutoModel.from_pretrained('bert-base-uncased')\"\nRUN python3 -c \"from transformers import AutoTokenizer;AutoTokenizer.from_pretrained('bert-base-uncased')\"\n\n\nEXPOSE 8888\nENTRYPOINT [\"transformers-cli\", \"serve\", \"--port=8888\", \"--host=0.0.0.0\", \"--task=fill-mask\", \"--model=bert-base-uncased\"]\n\n"
  },
  {
    "path": "mini_tutorials/deploying_on_kubernetes/README.md",
    "content": "# Relevant commands\n\n## Creating an API\n```bash\ntransformers-cli serve --task=fill-mask --model=bert-base-uncased\n```\n\n```bash\ncurl http://localhost:8888 | jq \n```\n\n```bash\ncurl -X POST http://localhost:8888/forward  -H \"accept: application/json\" -H \"Content-Type: application/json\" -d '{\"inputs\": \"Today is going to be a [MASK] day\"}' | jq \n```\n\n## Containerization\nBuild first image.\n```bash\ndocker build -t cool-api:v1 .\n```\n\nBuild second image.\n```bash\ndocker build -t cool-api:v2 -f DockerfileConda .\n```\n\nRun image.\n```bash\ndocker run -it --rm -P cool-api:v2\n```\n\n## Deploying on Kubernetes\nStart a minikube cluster.\n```bash\nminikube start\n```\n\nGet all objects across all namespaces.\n```bash\nkubectl get all -A\n```\n\nList images.\n```bash\nminikube image list\n```\n\nLoad an image.\n```bash\nminikube image cool-api:v2\n```\n\nCreate a deployment.\n```bash\nkubectl create deploy cool-deploy --image=cool-api:v2\n```\n\nCreate a service.\n```bash\nkubectl expose deploy/cool-deploy --name=cool-service --target-port=8888 --port=1234\n```\n\nScale up.\n```bash\nkubectl scale deploy/cool-deploy --replicas=3\n```\n\nGet logs.\n```bash\nkubectl logs -f PODFULLNAME\n```\n"
  },
  {
    "path": "mini_tutorials/embedding/README.md",
    "content": "# Training data\nThe Dracula book can be found here: https://archive.org/stream/draculabr00stokuoft/draculabr00stokuoft_djvu.txt\n"
  },
  {
    "path": "mini_tutorials/embedding/Visualize.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"incredible-backup\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import ipywidgets\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import pandas as pd\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"proud-accreditation\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"df = pd.read_csv(\\\"res.csv\\\")\\n\",\n    \"last_epoch = df[\\\"epoch\\\"].max()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"canadian-nightlife\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"@ipywidgets.interact\\n\",\n    \"def f(epoch=ipywidgets.IntSlider(min=0, max=last_epoch , continuous_update=False)):\\n\",\n    \"    fig, ax = plt.subplots(1, 1, figsize=(12, 8))\\n\",\n    \"    ax.set_xlim([-2, 2])\\n\",\n    \"    ax.set_ylim([-2, 2])\\n\",\n    \"    df_iter = df[df[\\\"epoch\\\"] == epoch]\\n\",\n    \"    df_iter.plot(kind='scatter', x='dim_0',y='dim_1', ax=ax, c=\\\"red\\\")\\n\",\n    \"    df_iter[['dim_0','dim_1','character']].apply(lambda row:\\n\",\n    \"                                                 ax.text(row[\\\"dim_0\\\"] + 0.02,\\n\",\n    \"                                                         row[\\\"dim_1\\\"] + 0.01,\\n\",\n    \"                                                         row[\\\"character\\\"],\\n\",\n    \"                                                         fontsize=18),\\n\",\n    \"                                                 axis=1)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"early-vinyl\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.5\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "mini_tutorials/embedding/src.py",
    "content": "from collections import Counter, defaultdict\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom torch.nn import Embedding, Linear, LSTM, Module\nimport torch.nn.functional as F\nfrom torch.utils.data import Dataset, DataLoader, SubsetRandomSampler\nfrom tqdm import tqdm\n\n\nclass CharacterDataset(Dataset):\n    \"\"\"Custom dataset.\n\n    Parameters\n    ----------\n    text : str\n        Input text that will be used to create the entire database.\n\n    window_size : int\n        Number of characters to use as input features.\n\n    vocab_size : int\n        Number of characters in the vocabulary. Note that the last character\n        is always reserved for a special \"~\" out-of-vocabulary character.\n\n    Attributes\n    ----------\n    ch2ix : defaultdict\n        Mapping from the character to the position of that character in the\n        vocabulary. Note that all characters that are not in the vocabulary\n        will get mapped into the index `vocab_size - 1`.\n\n    ix2ch : dict\n        Mapping from the character position in the vocabulary to the actual\n        character.\n\n    vocabulary : list\n        List of all characters. `len(vocabulary) == vocab_size`.\n    \"\"\"\n    def __init__(self, text, window_size=1, vocab_size=50):\n        self.text = text.replace(\"\\n\", \" \")\n        self.window_size = window_size\n        self.ch2ix = defaultdict(lambda: vocab_size - 1)\n\n        most_common_ch2ix = {\n            x[0]: i\n            for i, x in enumerate(Counter(self.text).most_common()[: (vocab_size - 1)])\n        }\n        self.ch2ix.update(most_common_ch2ix)\n        self.ch2ix[\"~\"] = vocab_size - 1\n\n        self.ix2ch = {v: k for k, v in self.ch2ix.items()}\n        self.vocabulary = [self.ix2ch[i] for i in range(vocab_size)]\n\n    def __len__(self):\n        return len(self.text) - self.window_size\n\n    def __getitem__(self, ix):\n        X = torch.LongTensor(\n            [self.ch2ix[c] for c in self.text[ix : ix + self.window_size]]\n        )\n        y = self.ch2ix[self.text[ix + self.window_size]]\n\n        return X, y\n\n\nclass Network(Module):\n    \"\"\"Custom network predicting the next character of a string.\n\n    Parameters\n    ----------\n    vocab_size : int\n        The number of characters in the vocabulary.\n\n    embedding_dim : int\n        Dimension of the character embedding vectors.\n\n    dense_dim : int\n        Number of neurons in the linear layer that follows the LSTM.\n\n    hidden_dim : int\n        Size of the LSTM hidden state.\n\n    max_norm : int\n        If any of the embedding vectors has a higher L2 norm than `max_norm`\n        it is rescaled.\n\n    n_layers : int\n        Number of the layers of the LSTM.\n    \"\"\"\n    def __init__(\n        self,\n        vocab_size,\n        embedding_dim=2,\n        dense_dim=32,\n        hidden_dim=8,\n        max_norm=2,\n        n_layers=1,\n    ):\n        super().__init__()\n\n        self.embedding = Embedding(\n                vocab_size,\n                embedding_dim,\n                padding_idx=vocab_size - 1,\n                norm_type=2,\n                max_norm=max_norm,\n        )\n        self.lstm = LSTM(\n                embedding_dim, hidden_dim, batch_first=True, num_layers=n_layers\n        )\n        self.linear_1 = Linear(hidden_dim, dense_dim)\n        self.linear_2 = Linear(dense_dim, vocab_size)\n\n\n    def forward(self, x, h=None, c=None):\n        \"\"\"Run the forward pass.\n\n        Parameters\n        ----------\n        x : torch.Tensor\n            Input tensor of shape `(n_samples, window_size)` of dtype\n            `torch.int64`.\n\n        h, c : torch.Tensor or None\n            Hidden states of the LSTM.\n\n        Returns\n        -------\n        logits : torch.Tensor\n            Tensor of shape `(n_samples, vocab_size)`.\n\n        h, c : torch.Tensor or None\n            Hidden states of the LSTM.\n        \"\"\"\n        emb = self.embedding(x)  # (n_samples, window_size, embedding_dim)\n        if h is not None and c is not None:\n            _, (h, c) = self.lstm(emb, (h, c))\n        else:\n            _, (h, c) = self.lstm(emb)  # (n_layers, n_samples, hidden_dim)\n\n        h_mean = h.mean(dim=0)  # (n_samples, hidden_dim)\n        x = self.linear_1(h_mean)  # (n_samples, dense_dim)\n        logits = self.linear_2(x)  # (n_samples, vocab_size)\n\n        return logits, h, c\n\ndef compute_loss(cal, net, dataloader):\n    \"\"\"Computer average loss over a dataset.\"\"\"\n    net.eval()\n    all_losses = []\n    for X_batch, y_batch in dataloader:\n        probs, _, _ = net(X_batch)\n\n        all_losses.append(cal(probs, y_batch).item())\n\n    return np.mean(all_losses)\n\ndef generate_text(n_chars, net, dataset, initial_text=\"Hello\", random_state=None):\n    \"\"\"Generate text with the character-level model.\n\n    Parameters\n    ----------\n    n_chars : int\n        Number of characters to generate.\n\n    net : Module\n        Character-level model.\n\n    dataset : CharacterDataset\n        Instance of the `CharacterDataset`.\n\n    initial_text : str\n        The starting text to be used as the initial condition for the model.\n\n    random_state : None or int\n        If not None, then the result is reproducible.\n\n    Returns\n    -------\n    res : str\n        Generated text.\n    \"\"\"\n    if not initial_text:\n        raise ValueError(\"You need to specify the initial text\")\n\n    res = initial_text\n    net.eval()\n    h, c = None, None\n\n    if random_state is not None:\n        np.random.seed(random_state)\n\n    for _ in range(n_chars):\n        previous_chars = initial_text if res == initial_text else res[-1]\n        features = torch.LongTensor([[dataset.ch2ix[c] for c in previous_chars]])\n        logits, h, c = net(features, h, c)\n        probs = F.softmax(logits[0], dim=0).detach().numpy()\n        new_ch = np.random.choice(dataset.vocabulary, p=probs)\n        res += new_ch\n\n    return res\n\nif __name__ == \"__main__\":\n    with open(\"text.txt\", \"r\") as f:\n        text = \"\\n\".join(f.readlines())\n\n    # Hyperparameters model\n    vocab_size = 70\n    window_size = 10\n    embedding_dim = 2\n    hidden_dim = 16\n    dense_dim = 32\n    n_layers = 1\n    max_norm = 2\n\n    # Training config\n    n_epochs = 25\n    train_val_split = 0.8\n    batch_size = 128\n    random_state = 13\n\n    torch.manual_seed(random_state)\n\n    loss_f = torch.nn.CrossEntropyLoss()\n    dataset = CharacterDataset(text, window_size=window_size, vocab_size=vocab_size)\n\n    n_samples = len(dataset)\n    split_ix = int(n_samples * train_val_split)\n\n    train_indices, val_indices = np.arange(split_ix), np.arange(split_ix, n_samples)\n\n    train_dataloader = DataLoader(\n            dataset, sampler=SubsetRandomSampler(train_indices), batch_size=batch_size\n    )\n    val_dataloader = DataLoader(\n            dataset, sampler=SubsetRandomSampler(val_indices), batch_size=batch_size\n    )\n\n    net = Network(\n            vocab_size,\n            hidden_dim=hidden_dim,\n            n_layers=n_layers,\n            dense_dim=dense_dim,\n            embedding_dim=embedding_dim,\n            max_norm=max_norm,\n    )\n    optimizer = torch.optim.Adam(\n            net.parameters(),\n            lr=1e-2,\n    )\n\n    emb_history = []\n\n    for e in range(n_epochs + 1):\n        net.train()\n        for X_batch, y_batch in tqdm(train_dataloader):\n            if e == 0:\n                break\n\n            optimizer.zero_grad()\n            probs, _, _ = net(X_batch)\n            loss = loss_f(probs, y_batch)\n            loss.backward()\n\n            optimizer.step()\n\n        train_loss = compute_loss(loss_f, net, train_dataloader)\n        val_loss = compute_loss(loss_f, net, val_dataloader)\n        print(f\"Epoch: {e}, {train_loss=:.3f}, {val_loss=:.3f}\")\n\n        # Generate one sentence\n        initial_text = \"I hope it works \"\n        generated_text = generate_text(\n            100, net, dataset, initial_text=initial_text, random_state=random_state\n        )\n        print(generated_text)\n\n        # Prepare DataFrame\n        weights = net.embedding.weight.detach().clone().numpy()\n\n        df = pd.DataFrame(weights, columns=[f\"dim_{i}\" for i in range(embedding_dim)])\n        df[\"epoch\"] = e\n        df[\"character\"] = dataset.vocabulary\n\n        emb_history.append(df)\n\nfinal_df = pd.concat(emb_history)\nfinal_df.to_csv(\"res.csv\", index=False)\n"
  },
  {
    "path": "mini_tutorials/fewshot_text_classification/classify.py",
    "content": "import pathlib\n\nimport jinja2\nimport openai\n\n\npath = pathlib.Path(\"template.jinja2\")\n\nwith path.open() as f:\n    prompt_template = jinja2.Template(f.read())\n\nlabels = [\n    {\"label\": 0, \"description\": \"negative sentiment\"},\n    {\"label\": 1, \"description\": \"neutral sentiment\"},\n    {\"label\": 2, \"description\": \"positive sentiment\"},\n]\n\nexamples = [\n    {\"text\": \"Today was a horrible day\", \"label\": 0},\n    {\"text\": \"Yesterday was a great day\", \"label\": 2},\n]\n\ntext = \"I loved the TV show\"\n\nprompt = prompt_template.render(\n    examples=examples,\n    labels=labels,\n    text=text,\n)\nprint(prompt)\n\ncompletion = openai.ChatCompletion.create(\n  model=\"gpt-3.5-turbo\",\n  messages=[\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": prompt}\n  ]\n)\n\nprint(completion.choices[0].message)\n"
  },
  {
    "path": "mini_tutorials/fewshot_text_classification/template.jinja2",
    "content": "I want you to classify text for me.\nSee below all the possible labels and their description\n{% for item in labels %}\n\"\"\"\ndescription: {{ item.description }}\nlabel: {{ item.label }}\n\"\"\"\n{% endfor %}\n{% if examples %}\nSee below a couple of examples\n{% for item in examples %}\n\"\"\"\ntext: {{ item.text }}\nlabel: {{ item.label }}\n\"\"\"\n{% endfor %}\n{% endif %}\n\nHere is the text that needs to be classified\n\"\"\"\ntext: {{ text }}\nlabel:\n"
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/explain.py",
    "content": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torchvision.models as models\n\nfrom utils import compute_gradient, read_image, scale_grad, to_array\n\n\ndef func(inp, net=None, target=None):\n    \"\"\"Get logit of a target class.\n\n    Parameters\n    ----------\n    inp : torch.Tensor\n        Input image (single image batch).\n\n    net : torch.nn.Module\n        Classifier network.\n\n    target : int\n        Imagenet ground truth label id.\n\n    Returns\n    -------\n    logit : torch.Tensor\n        Logit of the `target` class.\n    \"\"\"\n    out = net(inp)\n    logit = out[0, target]\n\n    return logit\n\ndef compute_integrated_gradients(inp, baseline, net, target, n_steps=100):\n    \"\"\"Compute integrated gradients.\n\n    Parameters\n    ----------\n    inp : torch.Tensor\n        Input image (single image batch) of shape `(1, 3, *, *)`.\n\n    baseline : torch.Tensor\n        Basline image of the same shape as the `inp`.\n\n    net : torch.nn.Module\n        Classifier network.\n\n    target : int\n        Imagenet ground truth label id.\n\n    n_steps : int\n        Number of steps between the `inp` and `baseline` tensors.\n\n    Returns\n    -------\n    ig : torch.Tensor\n        Integrated gradients with the same shape as the `inp`.\n\n    inp_grad : torch.Tensor\n        Gradient with respect to the `inp` tensor. Same shape as `inp`.\n    \"\"\"\n    path = [baseline + a * (inp - baseline) for a in np.linspace(0, 1, n_steps)]\n    grads = [compute_gradient(func, x, net=net, target=target) for x in path]\n\n    ig = (inp - baseline) * torch.cat(grads[:-1]).mean(dim=0, keepdims=True)\n\n    return ig, grads[-1]\n\nif __name__ == \"__main__\":\n    net = models.resnet18(pretrained=True)\n    net.eval()\n\n    tensor = read_image(\"img.jpg\")\n    arr = to_array(tensor)\n\n    n_steps = 100\n    baseline = -1.5 * torch.ones_like(tensor)\n\n    ig, inp_grad = compute_integrated_gradients(\n            tensor, baseline, net, 291, n_steps=n_steps\n    )\n\n    ig_scaled = scale_grad(ig)\n    inp_grad_scaled = scale_grad(inp_grad)\n\n    _, (ax_baseline, ax_img, ax_inp_grad, ax_ig) = plt.subplots(1, 4, figsize=(19.20,10.80))\n\n    ax_baseline.imshow(to_array(baseline))\n    ax_img.imshow(arr)\n    ax_inp_grad.imshow(arr * inp_grad_scaled)\n    ax_ig.imshow(arr * ig_scaled)\n\n    ax_baseline.set_title(\"Baseline\")\n    ax_img.set_title(\"Input\")\n    ax_inp_grad.set_title(\"Gradient input\")\n    ax_ig.set_title(\"Integrated gradients\")\n\n    ax_baseline.axis(\"off\")\n    ax_img.axis(\"off\")\n    ax_inp_grad.axis(\"off\")\n    ax_ig.axis(\"off\")\n\n    plt.savefig(\"res_2.png\")\n"
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/fool.py",
    "content": "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torchvision.models as models\n\nfrom utils import compute_gradient, read_image, to_array\n\ndef func(inp, net=None, target=None):\n    \"\"\"Compute negative log likelihood.\n\n    Parameters\n    ----------\n    inp : torch.Tensor\n        Input image (single image batch).\n\n    net : torch.nn.Module\n        Classifier network.\n\n    target : int\n        Imagenet ground truth label id.\n\n    Returns\n    -------\n    loss : torch.Tensor\n        Loss for the `inp` image.\n    \"\"\"\n    out = net(inp)\n    loss = torch.nn.functional.nll_loss(out, target=torch.LongTensor([target]))\n\n    print(f\"Loss: {loss.item()}\")\n    return loss\n\ndef attack(tensor, net, eps=1e-3, n_iter=50):\n    \"\"\"Run the Fast Sign Gradient Method (FSGM) attack.\n\n    Parameters\n    ----------\n    tensor : torch.Tensor\n        The input image of shape `(1, 3, 224, 224)`.\n\n    net : torch.nn.Module\n        Classifier network.\n\n    eps : float\n        Determines how much we modify the image in a single iteration.\n\n    n_iter : int\n        Number of iterations.\n\n    Returns\n    -------\n    new_tensor : torch.Tensor\n        New image that is a modification of the input image that \"fools\"\n        the classifier.\n    \"\"\"\n    new_tensor = tensor.detach().clone()\n\n    orig_prediction = net(tensor).argmax()\n    print(f\"Original prediction: {orig_prediction.item()}\")\n\n    for i in range(n_iter):\n        net.zero_grad()\n\n        grad = compute_gradient(\n                func, new_tensor, net=net, target=orig_prediction.item()\n                )\n        new_tensor = torch.clamp(new_tensor + eps * grad.sign(), -2, 2)\n        new_prediction = net(new_tensor).argmax()\n\n        if orig_prediction != new_prediction:\n            print(f\"We fooled the network after {i} iterations!\")\n            print(f\"New prediction: {new_prediction.item()}\")\n            break\n\n    return new_tensor, orig_prediction.item(), new_prediction.item()\n\n\nif __name__ == \"__main__\":\n    net = models.resnet18(pretrained=True)\n    net.eval()\n\n    tensor = read_image(\"img.jpg\")\n\n    new_tensor, orig_prediction, new_prediction = attack(\n            tensor, net, eps=1e-3, n_iter=100\n            )\n\n    _, (ax_orig, ax_new, ax_diff) = plt.subplots(1, 3, figsize=(19.20,10.80))\n    arr = to_array(tensor)\n    new_arr = to_array(new_tensor)\n    diff_arr = np.abs(arr - new_arr).mean(axis=-1)\n    diff_arr = diff_arr / diff_arr.max()\n\n    ax_orig.imshow(arr)\n    ax_new.imshow(new_arr)\n    ax_diff.imshow(diff_arr, cmap=\"gray\")\n\n    ax_orig.axis(\"off\")\n    ax_new.axis(\"off\")\n    ax_diff.axis(\"off\")\n\n    ax_orig.set_title(f\"Original: {orig_prediction}\")\n    ax_new.set_title(f\"Modified: {new_prediction}\")\n    ax_diff.set_title(\"Difference\")\n\n    plt.savefig(\"res_1.png\")\n"
  },
  {
    "path": "mini_tutorials/gradient_wrt_input/utils.py",
    "content": "from PIL import Image\nimport torch\nfrom torchvision.transforms import (CenterCrop, Compose, Normalize, Resize,\n                                    ToTensor)\n\ndef compute_gradient(func, inp, **kwargs):\n    \"\"\"Compute the gradient with respect to `inp`.\n\n    Parameters\n    ----------\n    func : callable\n        Function that takes in `inp` and `kwargs` and returns a single element\n        tensor.\n\n    inp : torch.Tensor\n        The tensor that we want to get the gradients for. Needs to be a leaf\n        node.\n\n    **kwargs : dict\n        Additional keyword arguments passed into `func`.\n\n    Returns\n    -------\n    grad : torch.Tensor\n        Tensor of the same shape as `inp` that is representing the gradient.\n    \"\"\"\n    inp.requires_grad = True\n\n    loss = func(inp, **kwargs)\n    loss.backward()\n\n    inp.requires_grad = False\n\n    return inp.grad.data\n\n\ndef read_image(path):\n    \"\"\"Load image from disk and convert to torch.Tensor.\n\n    Parameters\n    ----------\n    path : str\n        Path to the image.\n\n    Returns\n    -------\n    tensor : torch.Tensor\n        Single sample batch containing our image (ready to be used with\n        pretrained networks). The shape is `(1, 3, 224, 224)`.\n    \"\"\"\n    img = Image.open(path)\n\n    transform = Compose([Resize(256),\n                         CenterCrop(224),\n                         ToTensor(),\n                         Normalize(mean=[0.485, 0.456, 0.406],\n                                   std=[0.229, 0.224, 0.225])])\n\n    tensor_ = transform(img)\n    tensor = tensor_.unsqueeze(0)\n\n    return tensor\n\ndef to_array(tensor):\n    \"\"\"Convert torch.Tensor to np.ndarray.\n\n    Parameters\n    ----------\n    tensor : torch.Tensor\n        Tensor of shape `(1, 3, *, *)` representing one sample batch of images.\n\n    Returns\n    -------\n    arr : np.ndarray\n        Array of shape `(*, *, 3)` representing an image that can be plotted\n        directly.\n    \"\"\"\n    tensor_ = tensor.squeeze()\n\n    unnormalize_transform = Compose([Normalize(mean=[0, 0, 0],\n                                               std=[1 / 0.229, 1 / 0.224, 1 / 0.225]),\n                                     Normalize(mean=[-0.485, -0.456, -0.406],\n                                               std=[1, 1, 1])])\n    arr_ = unnormalize_transform(tensor_)\n    arr = arr_.permute(1, 2, 0).detach().numpy()\n\n    return arr\n\ndef scale_grad(grad):\n    \"\"\"Scale gradient tensor.\n\n    Parameters\n    ----------\n    grad : torch.Tensor\n        Gradient of shape `(1, 3, *, *)`.\n\n    Returns\n    -------\n    grad_arr : np.ndarray\n        Array of shape `(*, *, 1)`.\n    \"\"\"\n    grad_arr = torch.abs(grad).mean(dim=1).detach().permute(1, 2, 0)\n    grad_arr /= grad_arr.quantile(0.98)\n    grad_arr = torch.clamp(grad_arr, 0, 1)\n\n    return grad_arr.numpy()\n"
  },
  {
    "path": "mini_tutorials/haiku_basics/buffers_in_torch.py",
    "content": "import torch\nbn = torch.nn.BatchNorm1d(5)\nbn.state_dict()\n\nfor name, p in bn.named_buffers():\n    print(name, p, p.requires_grad)\n\nfor name, p in bn.named_parameters():\n    print(name, p, p.requires_grad)\n"
  },
  {
    "path": "mini_tutorials/haiku_basics/parameter.py",
    "content": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\n\ndef foo(x: jnp.ndarray) -> jnp.ndarray:\n    c = hk.get_parameter(\"c\", x.shape, init=hk.initializers.RandomNormal(1))\n\n    res = c + x\n\n    key = hk.next_rng_key()\n    mask = jax.random.bernoulli(key, 0.5, x.shape)\n\n    return res * mask * 2\n\n\nfoo_transformed = hk.transform(foo)\n\ninit_key = jax.random.PRNGKey(24)\napply_key_seq = hk.PRNGSequence(init_key)\n\nx = jnp.ones((2, 5))\nparams = foo_transformed.init(init_key, x)\n\nfor _ in range(2):\n    res = foo_transformed.apply(params, next(apply_key_seq), x)\n    print(res)\n"
  },
  {
    "path": "mini_tutorials/haiku_basics/reallife.py",
    "content": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\ndef foo(x: jnp.ndarray) -> jnp.ndarray:\n    mlp = hk.nets.MLP([4, 5, 1])\n\n    loss = mlp(x).mean()\n\n    return loss\n\n\nfoo_transformed = hk.without_apply_rng(hk.transform(foo))\n\ninit_key = jax.random.PRNGKey(3452)\nx = jnp.ones((2, 3))\nparams = foo_transformed.init(init_key, x)\n\ngrad_foo = jax.jit(jax.grad(foo_transformed.apply))\n\ngrads = grad_foo(params, x)\n"
  },
  {
    "path": "mini_tutorials/haiku_basics/requirements.txt",
    "content": "-e git+ssh://git@github.com/deepmind/dm-haiku.git@386efc098fd52a5cf728e7d13442138ab25eb235#egg=dm_haiku\njax==0.3.5\njaxlib==0.3.5\n"
  },
  {
    "path": "mini_tutorials/haiku_basics/state.py",
    "content": "from __future__ import annotations\n\nimport haiku as hk\nimport jax\nimport jax.numpy as jnp\n\n\ndef foo(x: jnp.ndarray) -> jnp.ndarray:\n    c = hk.get_parameter(\"c\", x.shape, init=hk.initializers.RandomNormal(1))\n\n    counter = hk.get_state(\n        \"counter\", shape=[], dtype=jnp.int32, init=jnp.ones\n    )\n    hk.set_state(\"counter\", counter + 1)\n    res = c + x + counter\n\n    return res \n\nfoo_transformed = hk.transform_with_state(foo)\ninit_key = jax.random.PRNGKey(32)\n\nx = jnp.ones((2, 5))\nparams, state = foo_transformed.init(init_key, x)\n\nfor i in range(2):\n    print(f\"After {i} iterations\")\n\n    res, state = foo_transformed.apply(params, state, None, x)\n    print(state)\n    print(res)\n\n"
  },
  {
    "path": "mini_tutorials/httpx_rate_limiting/script.py",
    "content": "import asyncio\nimport logging\n\nimport httpx\n\nlogger = logging.getLogger()\nlogging.getLogger(\"httpx\").setLevel(logging.WARNING)\nlogging.basicConfig(format=\"%(asctime)s %(name)s %(message)s\", level=logging.INFO)\n\n\nasync def send_request(client: httpx.AsyncClient, semaphore: asyncio.Semaphore) -> int:\n    url = \"https://pokeapi.co/api/v2/pokemon/ditto\"\n    async with semaphore:\n        logger.info(\"Sending request\")\n        response = await client.get(url)\n        logger.info(\"Response received\")\n\n    return response.status_code\n\n\nasync def main() -> int:\n    semaphore = asyncio.Semaphore(5)\n    async with httpx.AsyncClient() as client:\n        tasks = [asyncio.create_task(send_request(client, semaphore)) for _ in range(10)]\n        status_codes = await asyncio.gather(*tasks)\n\n    logger.info(\"All work done\")\n\n    return 0 if all(c == 200 for c in status_codes) else 1\n\n\nif __name__ == \"__main__\":\n    raise SystemExit(asyncio.run(main()))\n"
  },
  {
    "path": "mini_tutorials/mocking_neural_networks/app.py",
    "content": "import logging\nimport sys\n\nimport numpy as np\nimport torch\nfrom transformers import AutoModelForMaskedLM, AutoTokenizer\n\ndef get_top_k(sequence, tokenizer, model, k=10):\n    \"\"\"Get the top k most probable tokens to fill the gap with.\n\n    Parameters\n    ----------\n    sequence : str\n        String containing the [MASK] token.\n\n    tokenizer : BertFastTokenizer\n        Tokenizer.\n\n    model : BertForMaskedLM\n        Model.\n\n    k : int\n        Number of the top results to return.\n\n    Returns\n    -------\n    top_vocab_indices : torch.Tensor\n        1D tensor representing the indices of the top tokens.\n    \"\"\"\n    batch_enc = tokenizer(sequence, return_tensors=\"pt\")\n    mask_ix = torch.where(batch_enc[\"input_ids\"] == tokenizer.mask_token_id)[1]\n    logits = model(**batch_enc).logits\n\n    top_vocab_indices = torch.topk(logits[0, mask_ix.item(), :], k)[1]\n\n    return top_vocab_indices\n\nif __name__ == \"__main__\":\n    logging.disable(logging.WARNING)\n\n    tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n    model = AutoModelForMaskedLM.from_pretrained(\"bert-base-uncased\")\n\n    sequence = sys.argv[1]\n\n    top_indices = get_top_k(sequence, tokenizer, model, 5)\n    top_tokens = [tokenizer.decode(torch.tensor([ix])) for ix in top_indices]\n\n    winner = top_tokens[0]\n    print(np.random.permutation(top_tokens))\n    guess = input(\"Who do you think is the winner? \").strip()\n\n    if guess == winner:\n        print(\"You won!!!\")\n    else:\n        print(\"You lost!!!\")\n\n    print(\"\\nTrue ranking\")\n    for i, x in enumerate(top_tokens):\n        print(i, x)\n"
  },
  {
    "path": "mini_tutorials/mocking_neural_networks/test.py",
    "content": "from unittest.mock import Mock\n\nimport pytest\nimport torch\nfrom transformers import (AutoTokenizer, AutoModelForMaskedLM, BatchEncoding,\n                          BertForMaskedLM, BertTokenizerFast)\n\nfrom app import get_top_k\n\n@pytest.mark.parametrize(\"k\", [5, 7])\ndef test_with_real_objects(k):\n    tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n    model = AutoModelForMaskedLM.from_pretrained(\"bert-base-uncased\")\n\n    sequence = \"Hello [MASK]\"\n    res = get_top_k(sequence, tokenizer, model, k)\n\n    assert isinstance(res, torch.Tensor)\n    assert res.shape == (k,)\n\n@pytest.mark.parametrize(\"k\", [5, 7])\ndef test_with_mock_objects(k):\n    sequence = \"Hello [MASK]\"\n    vocab_size = 1000\n\n    data = {\"input_ids\": torch.tensor([[101, 555, 103, 102]])}\n    be = BatchEncoding(data=data)\n\n    logits = torch.rand(1, 4, vocab_size)\n\n    tokenizer_m = Mock(spec=BertTokenizerFast,\n                       return_value=be,\n                       mask_token_id=103)\n    model_m = Mock(spec=BertForMaskedLM)\n    model_m.return_value.logits = logits\n\n    res = get_top_k(sequence,\n                    tokenizer_m,\n                    model_m,\n                    k=k)\n\n    assert isinstance(res, torch.Tensor)\n    assert res.shape == (k,)\n"
  },
  {
    "path": "mini_tutorials/numpy_equality_testing/test.py",
    "content": "import numpy as np\nimport pytest\n\ndef get_arrays():\n    \"\"\"Create 4 arrays that are all similar but different.\n\n    Returns\n    -------\n    a : np.ndarray\n        Reference array.\n\n    a_eps : np.ndarray\n        Same shape as `a`, however, the values are slightly different.\n\n    a_dim : np.ndarray\n        One extra dimension compared to `a`, however, the values are the same.\n\n    a_nan : np.ndarray\n        Same shape and same values, however, one entry is set to `np.nan`.\n    \"\"\"\n    eps = 1e-5\n\n    a = np.array([[1.2, 5.12, 2.4], [5.5, 8.8, 1.55]])\n    a_eps = a + eps\n    a_dim = a[None, :]  # shape (1, 2, 3)\n    a_nan = a.copy()\n    a_nan[0, 1] = np.nan\n\n    return a, a_eps, a_dim, a_nan\n\n\ndef test___eq__():\n    a, *_ = get_arrays()\n\n    with pytest.raises(ValueError):\n        assert a == a\n\ndef test___eq__all():\n    a, a_eps, a_dim, a_nan = get_arrays()\n\n    assert (a == a).all()\n    assert not (a == a_eps).all()\n    assert (a == a_dim).all()\n    assert not (a_nan == a_nan).all()\n\ndef test_array_equal():\n    a, a_eps, a_dim, a_nan = get_arrays()\n\n    assert np.array_equal(a, a)\n    assert not np.array_equal(a, a_eps)\n    assert not np.array_equal(a, a_dim)\n    assert not np.array_equal(a_nan, a_nan)\n    assert np.array_equal(a_nan, a_nan, equal_nan=True)\n\n\ndef test_allclose():\n    a, a_eps, a_dim, a_nan = get_arrays()\n\n    atol = 1e-5\n\n    assert np.allclose(a, a, atol=atol)\n    assert np.allclose(a, a_eps, atol=atol)\n    assert np.allclose(a, a_dim, atol=atol)\n    assert not np.allclose(a_nan, a_nan, atol=atol)\n    assert np.allclose(a_nan, a_nan, atol=atol, equal_nan=True)\n\ndef test_testing_array_equal():\n    a, a_eps, a_dim, a_nan = get_arrays()\n\n    np.testing.assert_array_equal(a, a)\n    # np.testing.assert_array_equal(a, a_eps)\n    # np.testing.assert_array_equal(a, a_dim)\n    np.testing.assert_array_equal(a_nan, a_nan)\n\ndef test_testing_allclose():\n    a, a_eps, a_dim, a_nan = get_arrays()\n\n    atol = 1e-5\n\n    np.testing.assert_allclose(a, a, atol=atol)\n    np.testing.assert_allclose(a, a_eps, atol=atol)\n    # np.testing.assert_allclose(a, a_dim, atol=atol)\n    np.testing.assert_allclose(a_nan, a_nan, atol=atol)\n    # np.testing.assert_allclose(a_nan, a_nan, atol=atol, equal_nan=False)\n"
  },
  {
    "path": "mini_tutorials/openai_function_calling/example.py",
    "content": "import json\nimport logging\nimport operator\nimport sys\nimport datetime\nimport openai\nimport yfinance as yf\n\nTODAY = datetime.date.today().strftime(\"%Y/%m/%d\")\n\nlogging.basicConfig(level=logging.WARNING, format=\"%(asctime)s %(message)s\")\n\nlogger = logging.getLogger(__name__)\nlogger.setLevel(logging.INFO)\n\n\ndef get_price(symbol: str, date: str) -> float:\n    logger.info(f\"Calling get_price with {symbol=} and {date=}\")\n\n    history = yf.download(\n        symbol, start=date, period=\"1d\", interval=\"1d\", progress=False\n    )\n\n    return history[\"Close\"].iloc[0].item()\n\n\ndef calculate(a: float, b: float, op: str) -> float:\n    logger.info(f\"Calling calculate with {a=}, {b=} and {op=}\")\n\n    return getattr(operator, op)(a, b)\n\n\nget_price_metadata = {\n    \"name\": \"get_price\",\n    \"description\": \"Get closing price of a financial instrument on a given date\",\n    \"parameters\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"symbol\": {\n                \"type\": \"string\",\n                \"description\": \"Ticker symbol of a financial instrument\",\n            },\n            \"date\": {\n                \"type\": \"string\",\n                \"description\": \"Date in the format YYYY-MM-DD\",\n            },\n        },\n        \"required\": [\"symbol\", \"date\"],\n    },\n}\n\ncalculate_metadata = {\n    \"name\": \"calculate\",\n    \"description\": \"General purpose calculator\",\n    \"parameters\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"a\": {\n                \"type\": \"number\",\n                \"description\": \"First entry\",\n            },\n            \"b\": {\n                \"type\": \"number\",\n                \"description\": \"Second entry\",\n            },\n            \"op\": {\n                \"type\": \"string\",\n                \"enum\": [\"mul\", \"add\", \"truediv\", \"sub\"],\n                \"description\": \"Binary operation\",\n            },\n        },\n        \"required\": [\"a\", \"b\", \"op\"],\n    },\n}\n\n\nmessages = [\n    {\"role\": \"user\", \"content\": sys.argv[1]},\n    {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful financial investor who overlooks the \"\n        f\"performance of stocks. Today is {TODAY}. Note that the \"\n        \"format of the date is YYYY/MM/DD\",\n    },\n]\n\nwhile True:\n    response = openai.ChatCompletion.create(\n        model=\"gpt-3.5-turbo-0613\",\n        temperature=0,\n        messages=messages,\n        functions=[get_price_metadata, calculate_metadata],\n    )\n    message = response[\"choices\"][0][\"message\"]\n    messages.append(message)\n\n    if \"function_call\" not in message:\n        break\n\n    # call custom functions\n    function_name = message[\"function_call\"][\"name\"]\n    kwargs = json.loads(message[\"function_call\"][\"arguments\"])\n\n    if function_name == \"get_price\":\n        output = str(get_price(**kwargs))\n    elif function_name == \"calculate\":\n        output = str(calculate(**kwargs))\n    else:\n        raise ValueError\n\n    messages.append({\"role\": \"function\", \"name\": function_name, \"content\": output})\n\nprint(\"*\" * 80)\nprint([m[\"role\"] for m in messages])\nprint(\"*\" * 80)\nprint(messages[-1][\"content\"])\n"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/README.md",
    "content": "# Description\n## Installation\n\nRun the following command to deploy a simple OpenSearch DB locally.\n \n```bash\ndocker run -p 9200:9200 -p 9600:9600 -e \"DISABLE_SECURITY_PLUGIN=true\" -e \"discovery.type=single-node\" --name opensearch-node -d opensearchproject/opensearch:latest\n```\nThe version of the image was `2.10.0` at the time of making the video.\n\nTo install the Python dependencies run\n```bash\npip install opensearch-py cohere\n```\nAgain, I did not hardcode any version, but the versions at the time of\nmaking the video were\n\n```bash\ncohere==4.27\nopensearch-py==2.3.1\n```\n\n## Contents\n* `answer.py` - scripts that does RAG question answering - requires question as the only argument\n* `input.txt` - each line corresponds to a document to be added to OpenSearch(except for emtpy lines and comments)\n* `upload_data.py` - load `input.txt` into OpenSearch\n\n\nNote that to use the `answer.py` you need to get a Cohere API token and\nthen export \n```bash\nexport COHERE_API_KEY=VERYSECRET\npython answer.py 'What is the meaning of life?'\n```\n\n## Postman\nYou can import the `postman_collection.json` in Postman and then\nsimply add the following 3 variables in your environment\n\n* `OpenSearchURL` - will be `http://localhost:9200` if you follow the above instructions\n* `CohereURL` - should be `https://api.cohere.ai/v1`\n* `CohereAPIKey` - you need to generate this yourself\n\n# Diagrams\n\n## RAG with embeddings\n<img width=\"1165\" alt=\"rag-with-embeddings\" src=\"https://github.com/jankrepl/mildlyoverfitted/assets/18519371/678e69eb-96a9-4fa1-bcff-8c848d69f10a\">\n\n## RAG with reranking\n<img width=\"1169\" alt=\"rag-with-reranking\" src=\"https://github.com/jankrepl/mildlyoverfitted/assets/18519371/45ea091b-5724-4117-bfec-d219afdd9f40\">\n"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/answer.py",
    "content": "import os\nimport sys\n\n\nimport cohere\nfrom opensearchpy import OpenSearch\n\n# Helper\ndef generate_prompt(question: str, contexts: str):\n    prompt = (\n        \"Given the following extracted parts of a long document and a \"\n        'question, create a final answer with references (\"SOURCES\").'\n        \"If you don't know the answer, just say that you don't know, don't try \"\n        'to make up an answer. ALWAYS return a \"SOURCES\" part in your answer.\\n'\n    )\n\n    prompt += f\"QUESTION: {question}\\n\"\n    prompt += \"\".join(\n        [f\"SOURCE {i}: {context}\\n\" for i, context in enumerate(contexts)]\n    )\n    prompt += \"ANSWER: \"\n\n    return prompt\n\n\n# PARAMETERS\nINDEX_NAME = \"cool_index\"\nFIELD_NAME = \"stuff\"\nRETRIEVER_K = 5\nRERANKER_K = 2\nCOHERE_API_KEY = os.environ[\"COHERE_API_KEY\"]\n\nquestion = sys.argv[1]\n\n# Instantiate clients\nos_client = OpenSearch(\n    hosts=[\n        {\n            \"host\": \"localhost\",\n            \"port\": 9200,\n        }\n    ]\n)\ncohere_client = cohere.Client(COHERE_API_KEY)\n\n# Retrieve\nos_results = os_client.search(\n    body={\n        \"query\": {\n            \"match\": {\n                FIELD_NAME: question\n            }\n        }\n    },\n    size=RETRIEVER_K\n)\ncontexts = [x[\"_source\"][FIELD_NAME] for x in os_results[\"hits\"][\"hits\"]]\nprint(\"OpenSearch: \", contexts)\n\n# Rerank\ncohere_results = cohere_client.rerank(\n    model=\"rerank-english-v2.0\",\n    query=question,\n    documents=contexts,\n    top_n=RERANKER_K,\n)\nreranked_contexts = [r.document[\"text\"] for r in cohere_results]\nprint(\"Cohere Reranked: \", reranked_contexts)\n\n\n# Chat completion\nprompt = generate_prompt(question, reranked_contexts)\n\nresponse = cohere_client.chat(\n    chat_history=[],\n    message=prompt\n)\n\nprint(\"Answer: \", response.text)\n"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/input.txt",
    "content": "# AGE AND FAVOURITE FOOD - 'What is the favourite food of Charles?', 'Who prefers vegetables the most?'\nAdam is older than Ben\nBen is older then Charles\nAdam eats a lot of carrots\nBen's favourite food is an apple\nCharles loves KFC\nWhatever, this sentence does not really contain anything super important\n\n# SPORTING EVENTS - 'What country managed to become world football champion after 2050'?\nBrazil won the Fifa World Cup in 2070\nFrance is pretty good at football and won many championships\nFinland has won many ice hockey world cups\nJamaica won the Athletics World Cup in 2055\nMexico won the Golf World Cup in 2050\n"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/postman_collection.json",
    "content": "{\n\t\"info\": {\n\t\t\"name\": \"Retrieval augmented generation\",\n\t\t\"schema\": \"https://schema.getpostman.com/json/collection/v2.1.0/collection.json\"\n\t},\n\t\"item\": [\n\t\t{\n\t\t\t\"name\": \"OpenSearch\",\n\t\t\t\"item\": [\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Get all indices\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"GET\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/_cat/indices?v=true&s=index\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"_cat\",\n\t\t\t\t\t\t\t\t\"indices\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"query\": [\n\t\t\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\t\t\"key\": \"v\",\n\t\t\t\t\t\t\t\t\t\"value\": \"true\"\n\t\t\t\t\t\t\t\t},\n\t\t\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\t\t\"key\": \"s\",\n\t\t\t\t\t\t\t\t\t\"value\": \"index\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Create index\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"PUT\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n  \\\"settings\\\": {\\n    \\\"index\\\": {\\n      \\\"number_of_shards\\\": 1,\\n      \\\"number_of_replicas\\\": 1\\n    }\\n  },\\n  \\\"mappings\\\": {\\n    \\\"properties\\\": {\\n      \\\"stuff\\\": {\\n        \\\"type\\\": \\\"text\\\"\\n      }\\n    }\\n  }\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/cool_index\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"cool_index\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Delete index\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"DELETE\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/cool_index\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"cool_index\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Add document\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n  \\\"stuff\\\": \\\"This is just some document\\\"\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/cool_index/_doc\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"cool_index\",\n\t\t\t\t\t\t\t\t\"_doc\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"List all documents\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n    \\\"query\\\": {\\n        \\\"match_all\\\": {}\\n    }\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/cool_index/_search\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"cool_index\",\n\t\t\t\t\t\t\t\t\"_search\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Lexical (BM 25) search\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n    \\\"query\\\": {\\n        \\\"match\\\": {\\n            \\\"stuff\\\": \\\"Some document\\\"\\n        }\\n    }\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{OpenSearchURL}}/cool_index/_search\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{OpenSearchURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"cool_index\",\n\t\t\t\t\t\t\t\t\"_search\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t}\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t}\n\t\t\t]\n\t\t},\n\t\t{\n\t\t\t\"name\": \"Cohere\",\n\t\t\t\"item\": [\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Embed\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n  \\\"texts\\\": [\\n    \\\"hello\\\",\\n    \\\"goodbye\\\"\\n  ],\\n  \\\"truncate\\\": \\\"END\\\"\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{CohereURL}}/embed\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{CohereURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"embed\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"description\": \"[https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed)\"\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Rerank\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \"{\\n  \\\"return_documents\\\": false,\\n  \\\"max_chunks_per_doc\\\": 10,\\n  \\\"query\\\": \\\"What is the capital of the United States?\\\",\\n  \\\"documents\\\": [\\n    \\\"Carson City is the capital city of the American state of Nevada.\\\",\\n    \\\"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.\\\",\\n    \\\"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.\\\",\\n    \\\"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.\\\"\\n  ]\\n}\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{CohereURL}}/rerank\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{CohereURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"rerank\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"description\": \"[https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed)\"\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"name\": \"Chat\",\n\t\t\t\t\t\"request\": {\n\t\t\t\t\t\t\"method\": \"POST\",\n\t\t\t\t\t\t\"header\": [],\n\t\t\t\t\t\t\"body\": {\n\t\t\t\t\t\t\t\"mode\": \"raw\",\n\t\t\t\t\t\t\t\"raw\": \" {\\n    \\\"chat_history\\\": [\\n      {\\\"role\\\": \\\"USER\\\", \\\"message\\\": \\\"Who discovered gravity?\\\"},\\n      {\\\"role\\\": \\\"CHATBOT\\\", \\\"message\\\": \\\"The man who is widely credited with discovering gravity is Sir Isaac Newton\\\"}\\n    ],\\n    \\\"message\\\": \\\"What year was he born?\\\"\\n  }\",\n\t\t\t\t\t\t\t\"options\": {\n\t\t\t\t\t\t\t\t\"raw\": {\n\t\t\t\t\t\t\t\t\t\"language\": \"json\"\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\"raw\": \"{{CohereURL}}/chat\",\n\t\t\t\t\t\t\t\"host\": [\n\t\t\t\t\t\t\t\t\"{{CohereURL}}\"\n\t\t\t\t\t\t\t],\n\t\t\t\t\t\t\t\"path\": [\n\t\t\t\t\t\t\t\t\"chat\"\n\t\t\t\t\t\t\t]\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"description\": \"\"\n\t\t\t\t\t},\n\t\t\t\t\t\"response\": []\n\t\t\t\t}\n\t\t\t],\n\t\t\t\"auth\": {\n\t\t\t\t\"type\": \"bearer\",\n\t\t\t\t\"bearer\": [\n\t\t\t\t\t{\n\t\t\t\t\t\t\"key\": \"token\",\n\t\t\t\t\t\t\"value\": \"{{CohereAPIKey}}\",\n\t\t\t\t\t\t\"type\": \"string\"\n\t\t\t\t\t}\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"event\": [\n\t\t\t\t{\n\t\t\t\t\t\"listen\": \"prerequest\",\n\t\t\t\t\t\"script\": {\n\t\t\t\t\t\t\"type\": \"text/javascript\",\n\t\t\t\t\t\t\"exec\": [\n\t\t\t\t\t\t\t\"\"\n\t\t\t\t\t\t]\n\t\t\t\t\t}\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"listen\": \"test\",\n\t\t\t\t\t\"script\": {\n\t\t\t\t\t\t\"type\": \"text/javascript\",\n\t\t\t\t\t\t\"exec\": [\n\t\t\t\t\t\t\t\"\"\n\t\t\t\t\t\t]\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t]\n\t\t}\n\t]\n}\n"
  },
  {
    "path": "mini_tutorials/rag_with_reranking/upload_data.py",
    "content": "from pathlib import Path\nfrom opensearchpy import OpenSearch\n\nINPUT_FILE = \"input.txt\"\nINDEX_NAME = \"cool_index\"\nFIELD_NAME = \"stuff\"\n\nclient = OpenSearch(\n    hosts=[\n        {\n            \"host\": \"localhost\",\n            \"port\": 9200,\n        }\n    ]\n)\n\nprint(client.ping())\n\nwith Path(INPUT_FILE).open() as f:\n    i = 0\n    for line in f.read().splitlines():\n        if not line or line.startswith(\"#\"):\n            continue\n\n        print(f\"Adding {i}\")\n        client.index(index=INDEX_NAME, body={FIELD_NAME: line})\n        i += 1\n"
  },
  {
    "path": "mini_tutorials/visualizing_activations_with_forward_hooks/src.py",
    "content": "import pathlib\n\nimport torch\nimport torch.nn.functional as F\nfrom torch.nn import Linear, Module\nfrom torch.utils.tensorboard import SummaryWriter\n\nclass Network(Module):\n    def __init__(self):\n        super().__init__()\n\n        self.fc_1 = Linear(10, 20)\n        self.fc_2 = Linear(20, 30)\n        self.fc_3 = Linear(30, 2)\n\n\n    def forward(self, x):\n        x = self.fc_1(x)\n        x = self.fc_2(x)\n        x = self.fc_3(x)\n\n        x = F.relu(x)\n\n        return x\n\nif __name__ == \"__main__\":\n    log_dir = pathlib.Path.cwd() / \"tensorboard_logs\"\n    writer = SummaryWriter(log_dir)\n\n    x = torch.rand(1, 10)\n    net = Network()\n\n    def activation_hook(inst, inp, out):\n        \"\"\"Run activation hook.\n\n        Parameters\n        ----------\n        inst : torch.nn.Module\n            The layer we want to attach the hook to.\n        inp : tuple of torch.Tensor\n            The input to the `forward` method.\n        out : torch.Tensor\n            The output of the `forward` method.\n        \"\"\"\n        print(\"Here\")\n        writer.add_histogram(repr(inst), out)\n\n    handle_1 = net.fc_1.register_forward_hook(activation_hook)\n    net.fc_2.register_forward_hook(activation_hook)\n    net.fc_3.register_forward_hook(activation_hook)\n\n    y = net(x)\n    handle_1.remove()\n    y = net(x)\n"
  }
]