Full Code of sanjeevanahilan/nanoChatGPT for AI

master a77c85c2cdfc cached

36 files

549.4 KB

250.1k tokens

78 symbols

1 requests

Download .txt

Showing preview only (569K chars total). Download the full file or copy to clipboard to get everything.

Repository: sanjeevanahilan/nanoChatGPT
Branch: master
Commit: a77c85c2cdfc
Files: 36
Total size: 549.4 KB

Directory structure:
gitextract_q6tdf_kr/

├── .gitattributes
├── LICENSE
├── README.md
├── bench.py
├── chatgpt_dev_teaching.ipynb
├── config/
│   ├── config.yaml
│   ├── config_reward.yaml
│   ├── config_rl.yaml
│   ├── eval_gpt2.py
│   ├── eval_gpt2_large.py
│   ├── eval_gpt2_medium.py
│   ├── eval_gpt2_xl.py
│   ├── finetune_shakespeare.py
│   ├── train_gpt2.py
│   └── train_shakespeare_char.py
├── configurator.py
├── data/
│   ├── openai_summarize_tldr/
│   │   └── prepare.py
│   ├── openwebtext/
│   │   ├── prepare.py
│   │   └── readme.md
│   ├── shakespeare/
│   │   ├── prepare.py
│   │   └── readme.md
│   └── shakespeare_char/
│       ├── prepare.py
│       └── readme.md
├── model.py
├── requirements.txt
├── sample.py
├── scaling_laws.ipynb
├── train.py
├── train_reward_model.py
├── train_reward_model_simple.py
├── train_rl.py
├── trainers/
│   ├── reward_trainer.py
│   ├── rl_trainer.py
│   └── trainer.py
├── transformer_sizing.ipynb
└── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitattributes
================================================
# Override jupyter in Github language stats for more accurate estimate of repo code languages
# reference: https://github.com/github/linguist/blob/master/docs/overrides.md#generated-code
*.ipynb linguist-generated


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2022 Andrej Karpathy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# nanoChatGPT

A crude RLHF (Reinforcement Learing from Human Feedback) layer on top of nanoGPT to test an idea I had that you can backpropagate through the reward function rather than use policy gradient. I have verified it works for a very basic example where you incentivise the network to produce words containing 'and'. The trick is to use the Straight-Through Gumbel-Softmax estimator.

Also checkout chatgpt_dev_teaching.ipynb and the YouTube video explaining fine-tuning with RL: https://m.youtube.com/watch?v=soqTT0o1ZKo 

Prepare data:

```
$ python data/shakespeare/prepare.py
```

Once data is prepared start training. The configs assume cuda, if you don't have a gpu change to cpu in config.

```
$ python train.py # settings in config/config.yaml
```

Once a basic model is trained, can fine tune a reward model for an underlying reward rule. 

```
$ python train_reward_model_simple.py # settings in config/config_reward.yaml
```

This creates a multihead model on top of the existing one. Once the reward model is trained sufficiently you can train the RL policy using:

```
$ python train_rl.py # settings in config/config_rl.yaml
```

The default config uses the Gumbel trick but it can be set to PG and it will do policy gradient instead (the latter still needs a critic implementation etc). I have validated that the Gumbel method works given that the preceding steps also worked. I am curious to see if this would scale to large models - let me know if you're able to test this. 

Model output after a short amount of training produces results like:

```
hand hand thousand the thousand the hand hand hand hand thousand
```

If you're feeling adventorous you can also try:

```
$ python train_reward_model.py
```

Which uses the reddit tldr dataset. The pipes should work but I have not actually finetuned this at all.

References:

Gumbel
https://arxiv.org/pdf/1611.01144.pdf

InstructGPT
https://arxiv.org/abs/2203.02155

Below is Andrej Karpathy's original README, make sure you have installed the revelevant packages
________

# nanoGPT

![nanoGPT](assets/nanogpt.jpg)

The simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of [minGPT](https://github.com/karpathy/minGPT) that prioritizes teeth over education. Still under active development, but currently the file `train.py` reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training. The code itself is plain and readable: `train.py` is a ~300-line boilerplate training loop and `model.py` a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it.

![repro124m](assets/gpt2_124M_loss.png)

Because the code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints (e.g. biggest one currently available as a starting point would be the GPT-2 1.3B model from OpenAI).

## install

Dependencies:

- [pytorch](https://pytorch.org) <3
- [numpy](https://numpy.org/install/) <3
- `pip install transformers` for huggingface transformers <3 (to load GPT-2 checkpoints)
- `pip install datasets` for huggingface datasets <3 (if you want to download + preprocess OpenWebText)
- `pip install tiktoken` for OpenAI's fast BPE code <3
- `pip install wandb` for optional logging <3
- `pip install tqdm`

## quick start

If you are not a deep learning professional and you just want to feel the magic and get your feet wet, the fastest way to get started is to train a character-level GPT on the works of Shakespeare. First, we download it as a single (1MB) file and turn it from raw text into one large stream of integers:

```
$ python data/shakespeare_char/prepare.py
```

This creates a `train.bin` and `val.bin` in that data directory. Now it is time to train your GPT. The size of it very much depends on the computational resources of your system:

**I have a GPU**. Great, we can quickly train a baby GPT with the settings provided in the [config/train_shakespeare_char.py](config/train_shakespeare_char.py) config file:

```
$ python train.py config/train_shakespeare_char.py
```

If you peak inside it, you'll see that we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697. Based on the configuration, the model checkpoints are being written into the `--out_dir` directory `out-shakespeare-char`. So once the training finishes we can sample from the best model by pointing the sampling script at this directory:

```
$ python sample.py --out_dir=out-shakespeare-char
```

This generates a few samples, for example:

```
ANGELO:
And cowards it be strawn to my bed,
And thrust the gates of my threats,
Because he that ale away, and hang'd
An one with him.

DUKE VINCENTIO:
I thank your eyes against it.

DUKE VINCENTIO:
Then will answer him to save the malm:
And what have you tyrannous shall do this?

DUKE VINCENTIO:
If you have done evils of all disposition
To end his power, the day of thrust for a common men
That I leave, to fight with over-liking
Hasting in a roseman.
```

lol  `¯\_(ツ)_/¯`. Not bad for a character-level model after 3 minutes of training on a GPU. Better results are quite likely obtainable by instead finetuning a pretrained GPT-2 model on this dataset (see finetuning section later).

**I only have a macbook** (or other cheap computer). No worries, we can still train a GPT but we want to dial things down a notch. I recommend getting the bleeding edge PyTorch nightly ([select it here](https://pytorch.org/get-started/locally/) when installing) as it is currently quite likely to make your code more efficient. But even without it, a simple train run could look as follows:

```
$ python train.py config/train_shakespeare_char.py --device=cpu --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0
```

Here, since we are running on CPU instead of GPU we must set both `--device=cpu` and also turn off PyTorch 2.0 compile with `--compile=False`. Then when we evaluate we get a bit more noisy but faster estimate (`--eval_iters=20`, down from 200), our context size is only 64 characters instead of 256, and the batch size only 12 examples per iteration, not 64. We'll also use a much smaller Transformer (4 layers, 4 heads, 128 embedding size), and decrease the number of iterations to 2000 (and correspondingly usually decay the learning rate to around max_iters with `--lr_decay_iters`). Because our network is so small we also ease down on regularization (`--dropout=0.0`). This still runs in about ~3 minutes, but gets us a loss of only 1.88 and therefore also worse samples, but it's still good fun:

```
GLEORKEN VINGHARD III:
Whell's the couse, the came light gacks,
And the for mought you in Aut fries the not high shee
bot thou the sought bechive in that to doth groan you,
No relving thee post mose the wear
```

Not bad for ~3 minutes on a CPU, for a hint of the right character gestalt. If you're willing to wait longer free to tune the hyperparameters, increase the size of the network, the context length (`--block_size`), the length of training, etc.

Finally, on Apple Silicon Macbooks and with a recent PyTorch version make sure to add `--device mps` (short for "Metal Performance Shaders"); PyTorch then uses the on-chip GPU that can *significantly* accelerate training (2-3X) and allow you to use larger networks. See [Issue 28](https://github.com/karpathy/nanoGPT/issues/28) for more.

## reproducing GPT-2

A more serious deep learning professional may be more interested in reproducing GPT-2 results. So here we go - we first tokenize the dataset, in this case the [OpenWebText](https://openwebtext2.readthedocs.io/en/latest/), an open reproduction of OpenAI's (private) WebText:

```
$ python data/openwebtext/prepare.py
```

This downloads and tokenizes the [OpenWebText](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in one sequence, stored as raw uint16 bytes. Then we're ready to kick off training. To reproduce GPT-2 (124M) you'll want at least an 8X A100 40GB node and run:

```
$ torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2.py
```

This will run for about 4 days using PyTorch Distributed Data Parallel (DDP) and go down to loss of ~2.85. Now, a GPT-2 model just evaluated on OWT gets a val loss of about 3.11, but if you finetune it it will come down to ~2.85 territory (due to an apparent domain gap), making the two models ~match.

If you're in a cluster environment and you are blessed with multiple GPU nodes you can make GPU go brrrr e.g. across 2 nodes like:

```
Run on the first (master) node with example IP 123.456.123.456:
$ torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr=123.456.123.456 --master_port=1234 train.py
Run on the worker node:
$ torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr=123.456.123.456 --master_port=1234 train.py
```

It is a good idea to benchmark your interconnect (e.g. iperf3). In particular, if you don't have Infiniband then also prepend `NCCL_IB_DISABLE=1` to the above launches. Your multinode training will work, but most likely _crawl_. By default checkpoints are periodically written to the `--out_dir`. We can sample from the model by simply `$ python sample.py`.

Finally, to train on a single GPU simply run the `$ python train.py` script. Have a look at all of its args, the script tries to be very readable, hackable and transparent. You'll most likely want to tune a number of those variables depending on your needs.

## baselines

OpenAI GPT-2 checkpoints allow us to get some baselines in place for openwebtext. We can get the numbers as follows:

```
$ python train.py eval_gpt2
$ python train.py eval_gpt2_medium
$ python train.py eval_gpt2_large
$ python train.py eval_gpt2_xl
```

and observe the following losses on train and val:

| model | params | train loss | val loss |
| ------| ------ | ---------- | -------- |
| gpt2 | 124M         | 3.11  | 3.12     |
| gpt2-medium | 350M  | 2.85  | 2.84     |
| gpt2-large | 774M   | 2.66  | 2.67     |
| gpt2-xl | 1558M     | 2.56  | 2.54     |

However, we have to note that GPT-2 was trained on (closed, never released) WebText, while OpenWebText is just a best-effort open reproduction of this dataset. This means there is a dataset domain gap. Indeed, taking the GPT-2 (124M) checkpoint and finetuning on OWT directly for a while reaches loss down to ~2.85. This then becomes the more appropriate baseline w.r.t. reproduction.

## finetuning

Finetuning is no different than training, we just make sure to initialize from a pretrained model and train with a smaller learning rate. For an example of how to finetune a GPT on new text go to `data/shakespeare` and run `prepare.py` to download the tiny shakespeare dataset and render it into a `train.bin` and `val.bin`, using the OpenAI BPE tokenizer from GPT-2. Unlike OpenWebText this will run in seconds. Finetuning can take very little time, e.g. on a single GPU just a few minutes. Run an example finetuning like:

```
$ python train.py config/finetune_shakespeare.py
```

This will load the config parameter overrides in `config/finetune_shakespeare.py` (I didn't tune them much though). Basically, we initialize from a GPT2 checkpoint with `init_from` and train as normal, except shorter and with a small learning rate. If you're running out of memory try decreasing the model size (they are `{'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl'}`) or possibly decreasing the `block_size` (context length). The best checkpoint (lowest validation loss) will be in the `out_dir` directory, e.g. in `out-shakespeare` by default, per the config file. You can then run the code in `sample.py --out_dir=out-shakespeare`:

```
THEODORE:
Thou shalt sell me to the highest bidder: if I die,
I sell thee to the first; if I go mad,
I sell thee to the second; if I
lie, I sell thee to the third; if I slay,
I sell thee to the fourth: so buy or sell,
I tell thee again, thou shalt not sell my
possession.

JULIET:
And if thou steal, thou shalt not sell thyself.

THEODORE:
I do not steal; I sell the stolen goods.

THEODORE:
Thou know'st not what thou sell'st; thou, a woman,
Thou art ever a victim, a thing of no worth:
Thou hast no right, no right, but to be sold.
```

Whoa there, GPT, entering some dark place over there. I didn't really tune the hyperparameters in the config too much, feel free to try!

## sampling / inference

Use the script `sample.py` to sample either from pre-trained GPT-2 models released by OpenAI, or from a model you trained yourself. For example, here is a way to sample from the largest available `gpt2-xl` model:

```
$ python sample.py \
    --init_from=gpt2-xl \
    --start="What is the answer to life, the universe, and everything?" \
    --num_samples=5 --max_new_tokens=100
```

If you'd like to sample from a model you trained, use the `--out_dir` to point the code appropriately. You can also prompt the model with some text from a file, e.g. `$ python sample.py --start=FILE:prompt.txt`.

## efficiency notes

For simple model benchmarking and profiling, `bench.py` might be useful. It's identical to what happens in the meat of the training loop of `train.py`, but omits much of the other complexities.

Note that the code by default uses [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/). At the time of writing (Dec 29, 2022) this makes `torch.compile()` available in the nightly release. The improvement from the one line of code is noticeable, e.g. cutting down iteration time from ~250ms / iter to 135ms / iter. Nice work PyTorch team!

## todos

- Investigate and add FSDP instead of DDP
- Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.)
- Finetune the finetuning script, I think the hyperparams are not great
- Schedule for linear batch size increase during training
- Incorporate other embeddings (rotary, alibi)
- Separate out the optim buffers from model params in checkpoints I think
- Additional logging around network health (e.g. gradient clip events, magnitudes)
- Few more investigations around better init etc.

## troubleshooting

Note that by default this repo uses PyTorch 2.0 (i.e. `torch.compile`). This is fairly new and experimental, and not yet available on all platforms (e.g. Windows). If you're running into related error messages try to disable this by adding `--compile=False` flag. This will slow down the code but at least it will run.

For some context on this repository, GPT, and language modeling it might be helpful to watch my [Zero To Hero series](https://karpathy.ai/zero-to-hero.html). Specifically, the [GPT video](https://www.youtube.com/watch?v=kCc8FmEb1nY) is popular if you have some prior language modeling context.

For more questions/discussions feel free to stop by **#nanoGPT** on Discord:

[![](https://dcbadge.vercel.app/api/server/3zy8kqD9Cp?compact=true&style=flat)](https://discord.gg/3zy8kqD9Cp)

## acknowledgements

All nanoGPT experiments are powered by GPUs on [Lambda labs](https://lambdalabs.com), my favorite Cloud GPU provider. Thank you Lambda labs for sponsoring nanoGPT!


================================================
FILE: bench.py
================================================
"""
A much shorter version of train.py for benchmarking
"""
import os
from contextlib import nullcontext
import numpy as np
import time
import torch
from model import GPTConfig, GPT

# -----------------------------------------------------------------------------
batch_size = 12
block_size = 1024
bias = False
real_data = True
seed = 1337
device = 'cuda' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1', etc.
dtype = 'bfloat16' # 'float32' or 'bfloat16' or 'float16'
compile = True # use PyTorch 2.0 to compile the model to be faster
profile = False # use pytorch profiler, or just simple benchmarking?
exec(open('configurator.py').read()) # overrides from command line or config file
# -----------------------------------------------------------------------------

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul
torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn
device_type = 'cuda' if 'cuda' in device else 'cpu' # for later use in torch.autocast
ptdtype = {'float32': torch.float32, 'bfloat16': torch.bfloat16, 'float16': torch.float16}[dtype]
ctx = nullcontext() if device_type == 'cpu' else torch.amp.autocast(device_type=device_type, dtype=ptdtype)

# data loading init
if real_data:
    dataset = 'openwebtext'
    data_dir = os.path.join('data', dataset)
    train_data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r')
    def get_batch(split):
        data = train_data # note ignore split in benchmarking script
        ix = torch.randint(len(data) - block_size, (batch_size,))
        x = torch.stack([torch.from_numpy((data[i:i+block_size]).astype(np.int64)) for i in ix])
        y = torch.stack([torch.from_numpy((data[i+1:i+1+block_size]).astype(np.int64)) for i in ix])
        x, y = x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True)
        return x, y
else:
    # alternatively, if fixed data is desired to not care about data loading
    x = torch.randint(50304, (batch_size, block_size), device=device)
    y = torch.randint(50304, (batch_size, block_size), device=device)
    get_batch = lambda split: (x, y)

# model init
gptconf = GPTConfig(
    block_size = block_size, # how far back does the model look? i.e. context size
    n_layer = 12, n_head = 12, n_embd = 768, # size of the model
    dropout = 0, # for determinism
    bias = bias,
)
model = GPT(gptconf)
model.to(device)

optimizer = model.configure_optimizers(weight_decay=1e-2, learning_rate=1e-4, betas=(0.9, 0.95), device_type=device_type)

if compile:
    print("Compiling model...")
    model = torch.compile(model) # pytorch 2.0

if profile:
    # useful docs on pytorch profiler:
    # - tutorial https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html
    # - api https://pytorch.org/docs/stable/profiler.html#torch.profiler.profile
    wait, warmup, active = 5, 5, 5
    num_steps = wait + warmup + active
    with torch.profiler.profile(
        activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
        schedule=torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./bench_log'),
        record_shapes=False,
        profile_memory=False,
        with_stack=False, # incurs an additional overhead, disable if not needed
        with_flops=True,
        with_modules=False, # only for torchscript models atm
    ) as prof:

        X, Y = get_batch('train')
        for k in range(num_steps):
            with ctx:
                logits, loss = model(X, Y)
            X, Y = get_batch('train')
            optimizer.zero_grad(set_to_none=True)
            loss.backward()
            optimizer.step()
            lossf = loss.item()
            print(f"{k}/{num_steps} loss: {lossf:.4f}")

            prof.step() # notify the profiler at end of each step

else:

    # simple benchmarking
    torch.cuda.synchronize()
    for stage, num_steps in enumerate([10, 20]): # burnin, then benchmark
        t0 = time.time()
        X, Y = get_batch('train')
        for k in range(num_steps):
            with ctx:
                logits, loss = model(X, Y)
            X, Y = get_batch('train')
            optimizer.zero_grad(set_to_none=True)
            loss.backward()
            optimizer.step()
            lossf = loss.item()
            print(f"{k}/{num_steps} loss: {lossf:.4f}")
        torch.cuda.synchronize()
        t1 = time.time()
        dt = t1-t0
        mfu = model.estimate_mfu(batch_size * 1 * num_steps, dt)
        if stage == 1:
            print(f"time per iteration: {dt/num_steps*1000:.4f}ms, MFU: {mfu*100:.2f}%")


================================================
FILE: chatgpt_dev_teaching.ipynb
================================================
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyNVW45SCGk3DW7yIfDYKg2T",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "gpuClass": "premium",
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "49506f92d9d042d4a6111f3a1b9f79e1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_95f4423f202b44f9b5f6fbc250b5a300",
              "IPY_MODEL_e78ecc50321f42328d47999c01244651",
              "IPY_MODEL_cc422e07c3ce4ba99c84bda8b7d7c596"
            ],
            "layout": "IPY_MODEL_8be555a8bb6e4068a42e4312e75a30c2"
          }
        },
        "95f4423f202b44f9b5f6fbc250b5a300": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_b7f7646e496849b8bb3239b64b93e193",
            "placeholder": "",
            "style": "IPY_MODEL_b728662de8ac41888e838e47e67d6aa9",
            "value": "Downloading (…)lve/main/config.json: 100%"
          }
        },
        "e78ecc50321f42328d47999c01244651": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_70a9186eb9e64d9ea5b456f43622d979",
            "max": 949,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_7a5dedee422f42b49ecac65d3138c167",
            "value": 949
          }
        },
        "cc422e07c3ce4ba99c84bda8b7d7c596": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_c2e28b80efc74a549af7bdf3eee46eda",
            "placeholder": "",
            "style": "IPY_MODEL_057de1e97cad4a2297ad3db02c2481d6",
            "value": " 949/949 [00:00&lt;00:00, 45.7kB/s]"
          }
        },
        "8be555a8bb6e4068a42e4312e75a30c2": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "b7f7646e496849b8bb3239b64b93e193": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "b728662de8ac41888e838e47e67d6aa9": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "70a9186eb9e64d9ea5b456f43622d979": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "7a5dedee422f42b49ecac65d3138c167": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "c2e28b80efc74a549af7bdf3eee46eda": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "057de1e97cad4a2297ad3db02c2481d6": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "daabe15b3af34e1098a7e2f620ae1f36": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_c8fd1744cb744200b0bcc8629d173bf1",
              "IPY_MODEL_7a52e4742e594e25a28feb18c417c076",
              "IPY_MODEL_513d02e2c28440d9897d1aa097d1e743"
            ],
            "layout": "IPY_MODEL_d147cf4c15ff449598d6b429c90db0ae"
          }
        },
        "c8fd1744cb744200b0bcc8629d173bf1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_c74d25684a4d4eafbd3546dcff921cb0",
            "placeholder": "",
            "style": "IPY_MODEL_aa57d59560114d55bede331307257f6b",
            "value": "Downloading pytorch_model.bin: 100%"
          }
        },
        "7a52e4742e594e25a28feb18c417c076": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_e5c0bc2de75347c39dc9ca23a8129650",
            "max": 539679413,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_a18d35c8ff2649e397ed983a6f06e959",
            "value": 539679413
          }
        },
        "513d02e2c28440d9897d1aa097d1e743": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_5e3d4cc0ec9d4c0ca0d0136e97aca2ce",
            "placeholder": "",
            "style": "IPY_MODEL_3d4f1a5c8128451884bd21fef234b89b",
            "value": " 540M/540M [00:08&lt;00:00, 67.9MB/s]"
          }
        },
        "d147cf4c15ff449598d6b429c90db0ae": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "c74d25684a4d4eafbd3546dcff921cb0": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "aa57d59560114d55bede331307257f6b": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "e5c0bc2de75347c39dc9ca23a8129650": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "a18d35c8ff2649e397ed983a6f06e959": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "5e3d4cc0ec9d4c0ca0d0136e97aca2ce": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "3d4f1a5c8128451884bd21fef234b89b": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "9645d7c59b0546159890f042ca420632": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_8893c2570f464d9697dda12cc42023d5",
              "IPY_MODEL_0c71d29e93f24885802ad5fa6d240675",
              "IPY_MODEL_7b859eccf54e41698cf179d68bb109d0"
            ],
            "layout": "IPY_MODEL_725944a581244ab6a2a4b61afa29c1c0"
          }
        },
        "8893c2570f464d9697dda12cc42023d5": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_e536fea041f7441aa8cc93a448492e33",
            "placeholder": "",
            "style": "IPY_MODEL_8e5f96a8ec714284a7f000af96300f67",
            "value": "Downloading (…)okenizer_config.json: 100%"
          }
        },
        "0c71d29e93f24885802ad5fa6d240675": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_04984bbb402c4bd4afad349e629e2452",
            "max": 338,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_4a07648ab35047c8b61e32e3a61d970b",
            "value": 338
          }
        },
        "7b859eccf54e41698cf179d68bb109d0": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_bd5178cf064049678d4efac17e47a5da",
            "placeholder": "",
            "style": "IPY_MODEL_ae235b9add2d41ee9f808a2b4d43568b",
            "value": " 338/338 [00:00&lt;00:00, 25.9kB/s]"
          }
        },
        "725944a581244ab6a2a4b61afa29c1c0": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "e536fea041f7441aa8cc93a448492e33": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "8e5f96a8ec714284a7f000af96300f67": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "04984bbb402c4bd4afad349e629e2452": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "4a07648ab35047c8b61e32e3a61d970b": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "bd5178cf064049678d4efac17e47a5da": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "ae235b9add2d41ee9f808a2b4d43568b": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "8fb0a31c3ecf4a4880803e81e1bc581d": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_8bc84935b3964e338d00ea1bb7c3c59f",
              "IPY_MODEL_d729ac6a87174fbda03175c21859ea82",
              "IPY_MODEL_b80bd9149ae34878bfeeb4fca815231c"
            ],
            "layout": "IPY_MODEL_40007a4900824bd88d3a180fd0a8325a"
          }
        },
        "8bc84935b3964e338d00ea1bb7c3c59f": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_ce6627b477bc430d968fd186443b34e3",
            "placeholder": "",
            "style": "IPY_MODEL_2f2cf4fd7a434da5be62bd088360a6ba",
            "value": "Downloading (…)solve/main/vocab.txt: 100%"
          }
        },
        "d729ac6a87174fbda03175c21859ea82": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_31aa09c7a2ea443b8a177c4f91a7d427",
            "max": 843438,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_f17f1682313e49df97e399508c15716a",
            "value": 843438
          }
        },
        "b80bd9149ae34878bfeeb4fca815231c": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_dde9d315de174921844ebb72865d1d3f",
            "placeholder": "",
            "style": "IPY_MODEL_386eb97816cd41e188dbfa1fcd299b16",
            "value": " 843k/843k [00:01&lt;00:00, 753kB/s]"
          }
        },
        "40007a4900824bd88d3a180fd0a8325a": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "ce6627b477bc430d968fd186443b34e3": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "2f2cf4fd7a434da5be62bd088360a6ba": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "31aa09c7a2ea443b8a177c4f91a7d427": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "f17f1682313e49df97e399508c15716a": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "dde9d315de174921844ebb72865d1d3f": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "386eb97816cd41e188dbfa1fcd299b16": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "d5af658c67874c67b08bd8ef72411ea1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_aba6fb6586534c1f86ce65b7c8bde51f",
              "IPY_MODEL_a6878fe333e243dc81ec0d36bb0dce19",
              "IPY_MODEL_d1efe95c3d55440386db28242a8f38d1"
            ],
            "layout": "IPY_MODEL_ecfb1bf7ae8a4328ac982234127e5c2b"
          }
        },
        "aba6fb6586534c1f86ce65b7c8bde51f": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_4847c36ed46d49429e37f4b9b7862a57",
            "placeholder": "",
            "style": "IPY_MODEL_631e5ee44ccc4e4aaec456af90df7965",
            "value": "Downloading (…)solve/main/bpe.codes: 100%"
          }
        },
        "a6878fe333e243dc81ec0d36bb0dce19": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_662fd517bf0540ed9c607d3ac8dda0af",
            "max": 1078931,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_04d90c9c251a4fae8db96d711568cd64",
            "value": 1078931
          }
        },
        "d1efe95c3d55440386db28242a8f38d1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_7bd831ca2d354ee7a75c22c700963ea8",
            "placeholder": "",
            "style": "IPY_MODEL_3de7e32fefd54df6a6d3596205ba0338",
            "value": " 1.08M/1.08M [00:01&lt;00:00, 961kB/s]"
          }
        },
        "ecfb1bf7ae8a4328ac982234127e5c2b": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "4847c36ed46d49429e37f4b9b7862a57": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "631e5ee44ccc4e4aaec456af90df7965": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "662fd517bf0540ed9c607d3ac8dda0af": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "04d90c9c251a4fae8db96d711568cd64": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "7bd831ca2d354ee7a75c22c700963ea8": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "3de7e32fefd54df6a6d3596205ba0338": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "ddbbcddaba34416881ebb023864b0f2d": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_d0fc35ec11634f8691bf5eefa0505cb6",
              "IPY_MODEL_598a025b08dd4469a88bd9ebe85ebb94",
              "IPY_MODEL_9bc45acff2e049d59eef1e8374f03a93"
            ],
            "layout": "IPY_MODEL_5a627f08513443e09dac036f3ff2fa1f"
          }
        },
        "d0fc35ec11634f8691bf5eefa0505cb6": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_397d77d019e748f2b4178bb78ff2d471",
            "placeholder": "",
            "style": "IPY_MODEL_590cb10ff4164d309063faf6baedaf1f",
            "value": "Downloading (…)in/added_tokens.json: 100%"
          }
        },
        "598a025b08dd4469a88bd9ebe85ebb94": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_9d5f595908964edba6b6f7e8a6fba5eb",
            "max": 22,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_0e05ee7e2e934ca983d70addd1c9ce92",
            "value": 22
          }
        },
        "9bc45acff2e049d59eef1e8374f03a93": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_9c56fb6a76414f558c665608a21d39cd",
            "placeholder": "",
            "style": "IPY_MODEL_399aa39b85c14ff8b9cd887343af36be",
            "value": " 22.0/22.0 [00:00&lt;00:00, 1.66kB/s]"
          }
        },
        "5a627f08513443e09dac036f3ff2fa1f": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "397d77d019e748f2b4178bb78ff2d471": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "590cb10ff4164d309063faf6baedaf1f": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "9d5f595908964edba6b6f7e8a6fba5eb": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "0e05ee7e2e934ca983d70addd1c9ce92": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "9c56fb6a76414f558c665608a21d39cd": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "399aa39b85c14ff8b9cd887343af36be": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "d4712217939b44f59e58a0f23a8d4df7": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_c4fa3386d2234473ad7a8b6dae7a4d95",
              "IPY_MODEL_97c0a2c0b079410496ab497f8fa8bd12",
              "IPY_MODEL_565f60bb19d443f6aacc9b08ff68c7ff"
            ],
            "layout": "IPY_MODEL_5556b66847b04b4aabe0aaa029dc3e64"
          }
        },
        "c4fa3386d2234473ad7a8b6dae7a4d95": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_fe8f0c618d104deaaed3ea138acdc230",
            "placeholder": "",
            "style": "IPY_MODEL_52b516dd11a747099e9e0e808a168e55",
            "value": "Downloading (…)cial_tokens_map.json: 100%"
          }
        },
        "97c0a2c0b079410496ab497f8fa8bd12": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_8696eb98489f409a848684b77b2353ac",
            "max": 167,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_679b6d995fad4a649e055bee8bfe83d1",
            "value": 167
          }
        },
        "565f60bb19d443f6aacc9b08ff68c7ff": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_eff4e772ccd4445ba4b6f624ea3b7e8f",
            "placeholder": "",
            "style": "IPY_MODEL_c7570cd91b564389b59c98a694ab7771",
            "value": " 167/167 [00:00&lt;00:00, 13.0kB/s]"
          }
        },
        "5556b66847b04b4aabe0aaa029dc3e64": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "fe8f0c618d104deaaed3ea138acdc230": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "52b516dd11a747099e9e0e808a168e55": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "8696eb98489f409a848684b77b2353ac": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "679b6d995fad4a649e055bee8bfe83d1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "eff4e772ccd4445ba4b6f624ea3b7e8f": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "c7570cd91b564389b59c98a694ab7771": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        }
      }
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sanjeevanahilan/nanoChatGPT/blob/master/chatgpt_dev_teaching.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's train models for generating Shakespeare with different sentiments.\n",
        "\n",
        "Happy Shakespeare will say things like: \n",
        "\n",
        "*“Nay, thanks, then, I do meet change, this Romeo. \n",
        "The pleasure of his hair!”*\n",
        "\n",
        "Sad Shakespear will say things like:\n",
        "\n",
        "\n",
        "*“The senators's dead, of their world:\n",
        "Be not for your friends.”*\n",
        "\n"
      ],
      "metadata": {
        "id": "9-LuRS2wAaBS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Notebook created by Sanjeevan Ahilan\n",
        "# GPT implementation was borrowed from Andrej Karpathy \n",
        "# https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing\n",
        "# My repo: https://github.com/sanjeevanahilan/nanoChatGPT\n",
        "# My twitter: https://twitter.com/sanjeevanahilan\n",
        "\n",
        "import torch\n",
        "import torch.nn as nn\n",
        "from torch.nn import functional as F\n",
        "\n",
        "# hyperparameters\n",
        "batch_size = 16 # how many independent sequences will we process in parallel?\n",
        "block_size = 32 # what is the maximum context length for predictions?\n",
        "max_iters = 3000\n",
        "eval_interval = 100\n",
        "learning_rate = 1e-3\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "eval_iters = 200\n",
        "n_embd = 64\n",
        "n_head = 4\n",
        "n_layer = 4\n",
        "dropout = 0.0\n",
        "# ------------\n",
        "\n",
        "torch.manual_seed(1337)\n"
      ],
      "metadata": {
        "id": "aZGtyx5lAOzu",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "67a0cd77-355e-4c15-f07d-ef3758a87312"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<torch._C.Generator at 0x7fa55d231410>"
            ]
          },
          "metadata": {},
          "execution_count": 1
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Download Shakespeare\n",
        "%time\n",
        "!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\n",
        "with open('input.txt', 'r', encoding='utf-8') as f:\n",
        "    text = f.read()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "DA-Gox1YiM6l",
        "outputId": "e04b1540-6e70-41dc-d32f-a8e234a8c87d"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs\n",
            "Wall time: 7.15 µs\n",
            "--2023-03-18 11:49:44--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1115394 (1.1M) [text/plain]\n",
            "Saving to: ‘input.txt’\n",
            "\n",
            "input.txt           100%[===================>]   1.06M  --.-KB/s    in 0.03s   \n",
            "\n",
            "2023-03-18 11:49:45 (40.0 MB/s) - ‘input.txt’ saved [1115394/1115394]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Rather than use character level tokenizer lets use tiktoken; Openai's implementation of a Byte Pair Encoding (BPE) tokenizer"
      ],
      "metadata": {
        "id": "SyyJfXbdAr81"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install tiktoken\n",
        "import tiktoken"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "y7LBbGC9onTu",
        "outputId": "6a9ef1dd-2e22-4468-d797-d41cb0bc1de2"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting tiktoken\n",
            "  Downloading tiktoken-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m29.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.9/dist-packages (from tiktoken) (2022.10.31)\n",
            "Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.9/dist-packages (from tiktoken) (2.27.1)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests>=2.26.0->tiktoken) (1.26.15)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests>=2.26.0->tiktoken) (2022.12.7)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests>=2.26.0->tiktoken) (3.4)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests>=2.26.0->tiktoken) (2.0.12)\n",
            "Installing collected packages: tiktoken\n",
            "Successfully installed tiktoken-0.3.2\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "enc = tiktoken.get_encoding(\"gpt2\")\n",
        "vocab_size = 50257"
      ],
      "metadata": {
        "id": "btWJmnfapKhn"
      },
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# You can see that the sequence of characters 'the' has been encoded as a single\n",
        "# number because it is commonly reoccuring whereas 'thh' requires two numbers to\n",
        "# encode 'th' and 'h'\n",
        "\n",
        "print(enc.encode('the'))\n",
        "print(enc.encode('thh'))\n",
        "print(enc.decode([400]))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "8f9l4JdQCfSd",
        "outputId": "42ac95f9-96cb-4fd4-e110-a339c0e45fcf"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[1169]\n",
            "[400, 71]\n",
            "th\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Train and test splits\n",
        "data = torch.tensor(enc.encode(text), dtype=torch.long)\n",
        "n = int(0.9*len(data)) # first 90% will be train, rest val\n",
        "train_data = data[:n]\n",
        "val_data = data[n:]\n",
        "\n",
        "# data loading\n",
        "def get_batch(split):\n",
        "    # generate a small batch of data of inputs x and targets y\n",
        "    data = train_data if split == 'train' else val_data\n",
        "    ix = torch.randint(len(data) - block_size, (batch_size,))\n",
        "    x = torch.stack([data[i:i+block_size] for i in ix])\n",
        "    y = torch.stack([data[i+1:i+block_size+1] for i in ix])\n",
        "    x, y = x.to(device), y.to(device)\n",
        "    return x, y\n",
        "\n",
        "@torch.no_grad()\n",
        "def estimate_loss():\n",
        "    out = {}\n",
        "    model.eval()\n",
        "    for split in ['train', 'val']:\n",
        "        losses = torch.zeros(eval_iters)\n",
        "        for k in range(eval_iters):\n",
        "            X, Y = get_batch(split)\n",
        "            logits, loss = model(X, Y)\n",
        "            losses[k] = loss.item()\n",
        "        out[split] = losses.mean()\n",
        "    model.train()\n",
        "    return out\n",
        "\n",
        "class Head(nn.Module):\n",
        "    \"\"\" one head of self-attention \"\"\"\n",
        "\n",
        "    def __init__(self, head_size):\n",
        "        super().__init__()\n",
        "        self.key = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.query = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.value = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))\n",
        "\n",
        "        self.dropout = nn.Dropout(dropout)\n",
        "\n",
        "    def forward(self, x):\n",
        "        B,T,C = x.shape\n",
        "        k = self.key(x)   # (B,T,C)\n",
        "        q = self.query(x) # (B,T,C)\n",
        "        # compute attention scores (\"affinities\")\n",
        "        wei = q @ k.transpose(-2,-1) * C**-0.5 # (B, T, C) @ (B, C, T) -> (B, T, T)\n",
        "        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)\n",
        "        wei = F.softmax(wei, dim=-1) # (B, T, T)\n",
        "        wei = self.dropout(wei)\n",
        "        # perform the weighted aggregation of the values\n",
        "        v = self.value(x) # (B,T,C)\n",
        "        out = wei @ v # (B, T, T) @ (B, T, C) -> (B, T, C)\n",
        "        return out\n",
        "\n",
        "class MultiHeadAttention(nn.Module):\n",
        "    \"\"\" multiple heads of self-attention in parallel \"\"\"\n",
        "\n",
        "    def __init__(self, num_heads, head_size):\n",
        "        super().__init__()\n",
        "        self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])\n",
        "        self.proj = nn.Linear(n_embd, n_embd)\n",
        "        self.dropout = nn.Dropout(dropout)\n",
        "\n",
        "    def forward(self, x):\n",
        "        out = torch.cat([h(x) for h in self.heads], dim=-1)\n",
        "        out = self.dropout(self.proj(out))\n",
        "        return out\n",
        "\n",
        "class FeedFoward(nn.Module):\n",
        "    \"\"\" a simple linear layer followed by a non-linearity \"\"\"\n",
        "\n",
        "    def __init__(self, n_embd):\n",
        "        super().__init__()\n",
        "        self.net = nn.Sequential(\n",
        "            nn.Linear(n_embd, 4 * n_embd),\n",
        "            nn.ReLU(),\n",
        "            nn.Linear(4 * n_embd, n_embd),\n",
        "            nn.Dropout(dropout),\n",
        "        )\n",
        "\n",
        "    def forward(self, x):\n",
        "        return self.net(x)\n",
        "\n",
        "class Block(nn.Module):\n",
        "    \"\"\" Transformer block: communication followed by computation \"\"\"\n",
        "\n",
        "    def __init__(self, n_embd, n_head):\n",
        "        # n_embd: embedding dimension, n_head: the number of heads we'd like\n",
        "        super().__init__()\n",
        "        head_size = n_embd // n_head\n",
        "        self.sa = MultiHeadAttention(n_head, head_size)\n",
        "        self.ffwd = FeedFoward(n_embd)\n",
        "        self.ln1 = nn.LayerNorm(n_embd)\n",
        "        self.ln2 = nn.LayerNorm(n_embd)\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = x + self.sa(self.ln1(x))\n",
        "        x = x + self.ffwd(self.ln2(x))\n",
        "        return x\n",
        "\n",
        "class GPT(nn.Module):\n",
        "\n",
        "    def __init__(self):\n",
        "        super().__init__()\n",
        "        # each token directly reads off the logits for the next token from a lookup table\n",
        "        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)\n",
        "        self.position_embedding_table = nn.Embedding(block_size, n_embd)\n",
        "        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])\n",
        "        self.ln_f = nn.LayerNorm(n_embd) # final layer norm\n",
        "        self.lm_head = nn.Linear(n_embd, vocab_size)\n",
        "\n",
        "    def forward(self, idx, targets=None):\n",
        "        B, T = idx.shape\n",
        "\n",
        "        # idx and targets are both (B,T) tensor of integers\n",
        "        tok_emb = self.token_embedding_table(idx) # (B,T,C)\n",
        "        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)\n",
        "        x = tok_emb + pos_emb # (B,T,C)\n",
        "        x = self.blocks(x) # (B,T,C)\n",
        "        x = self.ln_f(x) # (B,T,C)\n",
        "        logits = self.lm_head(x) # (B,T,vocab_size)\n",
        "\n",
        "        if targets is None:\n",
        "            loss = None\n",
        "        else:\n",
        "            B, T, C = logits.shape\n",
        "            logits = logits.view(B*T, C)\n",
        "            targets = targets.view(B*T)\n",
        "            loss = F.cross_entropy(logits, targets)\n",
        "\n",
        "        return logits, loss\n",
        "\n",
        "    def generate(self, idx, max_new_tokens):\n",
        "        # idx is (B, T) array of indices in the current context\n",
        "        for _ in range(max_new_tokens):\n",
        "            # crop idx to the last block_size tokens\n",
        "            idx_cond = idx[:, -block_size:]\n",
        "            # get the predictions\n",
        "            logits, loss = self(idx_cond)\n",
        "            # focus only on the last time step\n",
        "            logits = logits[:, -1, :] # becomes (B, C)\n",
        "            # apply softmax to get probabilities\n",
        "            probs = F.softmax(logits, dim=-1) # (B, C)\n",
        "            # sample from the distribution\n",
        "            idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)\n",
        "            # append sampled index to the running sequence\n",
        "            idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)\n",
        "        return idx\n",
        "\n",
        "model = GPT()\n",
        "m = model.to(device)\n",
        "# print the number of parameters in the model\n",
        "print(sum(p.numel() for p in m.parameters())/1e6, 'M parameters')\n",
        "\n",
        "# create a PyTorch optimizer\n",
        "optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)\n",
        "\n",
        "for iter in range(max_iters):\n",
        "\n",
        "    # every once in a while evaluate the loss on train and val sets\n",
        "    if iter % eval_interval == 0 or iter == max_iters - 1:\n",
        "        losses = estimate_loss()\n",
        "        print(f\"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}\")\n",
        "\n",
        "    # sample a batch of data\n",
        "    xb, yb = get_batch('train')\n",
        "\n",
        "    # evaluate the loss\n",
        "    logits, loss = model(xb, yb)\n",
        "    optimizer.zero_grad(set_to_none=True)\n",
        "    loss.backward()\n",
        "    optimizer.step()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ngS-jQschyJc",
        "outputId": "8dedfa53-bdf7-446c-d335-887bf740efab"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "6.684497 M parameters\n",
            "step 0: train loss 10.9982, val loss 11.0105\n",
            "step 100: train loss 6.3955, val loss 6.4657\n",
            "step 200: train loss 6.0255, val loss 6.1584\n",
            "step 300: train loss 5.7608, val loss 5.9578\n",
            "step 400: train loss 5.4809, val loss 5.7327\n",
            "step 500: train loss 5.2543, val loss 5.5400\n",
            "step 600: train loss 5.1017, val loss 5.4008\n",
            "step 700: train loss 4.9718, val loss 5.3382\n",
            "step 800: train loss 4.8850, val loss 5.2539\n",
            "step 900: train loss 4.8084, val loss 5.1674\n",
            "step 1000: train loss 4.7187, val loss 5.0770\n",
            "step 1100: train loss 4.6621, val loss 5.1166\n",
            "step 1200: train loss 4.5916, val loss 5.0159\n",
            "step 1300: train loss 4.5599, val loss 5.0297\n",
            "step 1400: train loss 4.4998, val loss 4.9929\n",
            "step 1500: train loss 4.4481, val loss 4.9933\n",
            "step 1600: train loss 4.4182, val loss 4.9204\n",
            "step 1700: train loss 4.4202, val loss 4.9445\n",
            "step 1800: train loss 4.3644, val loss 4.9217\n",
            "step 1900: train loss 4.3479, val loss 4.9221\n",
            "step 2000: train loss 4.2925, val loss 4.8603\n",
            "step 2100: train loss 4.2446, val loss 4.8765\n",
            "step 2200: train loss 4.2491, val loss 4.8753\n",
            "step 2300: train loss 4.1900, val loss 4.8271\n",
            "step 2400: train loss 4.1705, val loss 4.8660\n",
            "step 2500: train loss 4.1608, val loss 4.8778\n",
            "step 2600: train loss 4.1379, val loss 4.8898\n",
            "step 2700: train loss 4.1021, val loss 4.8513\n",
            "step 2800: train loss 4.1099, val loss 4.8742\n",
            "step 2900: train loss 4.0874, val loss 4.8534\n",
            "step 2999: train loss 4.0720, val loss 4.8296\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# generate from the model\n",
        "context = torch.zeros((1, 1), dtype=torch.long, device=device)\n",
        "print(enc.decode(m.generate(context, max_new_tokens=100)[0].tolist()))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9C93Lk1biRLE",
        "outputId": "94a58a5d-a9fb-438f-ec29-ab231b145951"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "! what any consul\n",
            "matter they have we be for so dishonour\n",
            "Redpt'd in our courtesy,\n",
            "Or his highness i' the embracements, nor feeds had an err:\n",
            "I have the hallissign daughter to pawn'd.\n",
            "and stout see me: but God they forget\n",
            "And meet the tribranch man,\n",
            "Of a brace art'd; I'll plant show't\n",
            "From whom I a joyful with a chair access with callat.\n",
            "\n",
            "HENRY\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install transformers\n",
        "from transformers import pipeline\n",
        "sentiment_pipeline = pipeline(model=\"finiteautomata/bertweet-base-sentiment-analysis\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 675,
          "referenced_widgets": [
            "49506f92d9d042d4a6111f3a1b9f79e1",
            "95f4423f202b44f9b5f6fbc250b5a300",
            "e78ecc50321f42328d47999c01244651",
            "cc422e07c3ce4ba99c84bda8b7d7c596",
            "8be555a8bb6e4068a42e4312e75a30c2",
            "b7f7646e496849b8bb3239b64b93e193",
            "b728662de8ac41888e838e47e67d6aa9",
            "70a9186eb9e64d9ea5b456f43622d979",
            "7a5dedee422f42b49ecac65d3138c167",
            "c2e28b80efc74a549af7bdf3eee46eda",
            "057de1e97cad4a2297ad3db02c2481d6",
            "daabe15b3af34e1098a7e2f620ae1f36",
            "c8fd1744cb744200b0bcc8629d173bf1",
            "7a52e4742e594e25a28feb18c417c076",
            "513d02e2c28440d9897d1aa097d1e743",
            "d147cf4c15ff449598d6b429c90db0ae",
            "c74d25684a4d4eafbd3546dcff921cb0",
            "aa57d59560114d55bede331307257f6b",
            "e5c0bc2de75347c39dc9ca23a8129650",
            "a18d35c8ff2649e397ed983a6f06e959",
            "5e3d4cc0ec9d4c0ca0d0136e97aca2ce",
            "3d4f1a5c8128451884bd21fef234b89b",
            "9645d7c59b0546159890f042ca420632",
            "8893c2570f464d9697dda12cc42023d5",
            "0c71d29e93f24885802ad5fa6d240675",
            "7b859eccf54e41698cf179d68bb109d0",
            "725944a581244ab6a2a4b61afa29c1c0",
            "e536fea041f7441aa8cc93a448492e33",
            "8e5f96a8ec714284a7f000af96300f67",
            "04984bbb402c4bd4afad349e629e2452",
            "4a07648ab35047c8b61e32e3a61d970b",
            "bd5178cf064049678d4efac17e47a5da",
            "ae235b9add2d41ee9f808a2b4d43568b",
            "8fb0a31c3ecf4a4880803e81e1bc581d",
            "8bc84935b3964e338d00ea1bb7c3c59f",
            "d729ac6a87174fbda03175c21859ea82",
            "b80bd9149ae34878bfeeb4fca815231c",
            "40007a4900824bd88d3a180fd0a8325a",
            "ce6627b477bc430d968fd186443b34e3",
            "2f2cf4fd7a434da5be62bd088360a6ba",
            "31aa09c7a2ea443b8a177c4f91a7d427",
            "f17f1682313e49df97e399508c15716a",
            "dde9d315de174921844ebb72865d1d3f",
            "386eb97816cd41e188dbfa1fcd299b16",
            "d5af658c67874c67b08bd8ef72411ea1",
            "aba6fb6586534c1f86ce65b7c8bde51f",
            "a6878fe333e243dc81ec0d36bb0dce19",
            "d1efe95c3d55440386db28242a8f38d1",
            "ecfb1bf7ae8a4328ac982234127e5c2b",
            "4847c36ed46d49429e37f4b9b7862a57",
            "631e5ee44ccc4e4aaec456af90df7965",
            "662fd517bf0540ed9c607d3ac8dda0af",
            "04d90c9c251a4fae8db96d711568cd64",
            "7bd831ca2d354ee7a75c22c700963ea8",
            "3de7e32fefd54df6a6d3596205ba0338",
            "ddbbcddaba34416881ebb023864b0f2d",
            "d0fc35ec11634f8691bf5eefa0505cb6",
            "598a025b08dd4469a88bd9ebe85ebb94",
            "9bc45acff2e049d59eef1e8374f03a93",
            "5a627f08513443e09dac036f3ff2fa1f",
            "397d77d019e748f2b4178bb78ff2d471",
            "590cb10ff4164d309063faf6baedaf1f",
            "9d5f595908964edba6b6f7e8a6fba5eb",
            "0e05ee7e2e934ca983d70addd1c9ce92",
            "9c56fb6a76414f558c665608a21d39cd",
            "399aa39b85c14ff8b9cd887343af36be",
            "d4712217939b44f59e58a0f23a8d4df7",
            "c4fa3386d2234473ad7a8b6dae7a4d95",
            "97c0a2c0b079410496ab497f8fa8bd12",
            "565f60bb19d443f6aacc9b08ff68c7ff",
            "5556b66847b04b4aabe0aaa029dc3e64",
            "fe8f0c618d104deaaed3ea138acdc230",
            "52b516dd11a747099e9e0e808a168e55",
            "8696eb98489f409a848684b77b2353ac",
            "679b6d995fad4a649e055bee8bfe83d1",
            "eff4e772ccd4445ba4b6f624ea3b7e8f",
            "c7570cd91b564389b59c98a694ab7771"
          ]
        },
        "id": "f93j9jAXuK8S",
        "outputId": "d585128a-3fad-4415-e564-fe9d338c4726"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting transformers\n",
            "  Downloading transformers-4.27.1-py3-none-any.whl (6.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.7/6.7 MB\u001b[0m \u001b[31m90.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.10.31)\n",
            "Collecting huggingface-hub<1.0,>=0.11.0\n",
            "  Downloading huggingface_hub-0.13.2-py3-none-any.whl (199 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m199.2/199.2 KB\u001b[0m \u001b[31m23.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.22.4)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.27.1)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.10.0)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.0)\n",
            "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
            "  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m93.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2.0.12)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (1.26.15)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (3.4)\n",
            "Installing collected packages: tokenizers, huggingface-hub, transformers\n",
            "Successfully installed huggingface-hub-0.13.2 tokenizers-0.13.2 transformers-4.27.1\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)lve/main/config.json:   0%|          | 0.00/949 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "49506f92d9d042d4a6111f3a1b9f79e1"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading pytorch_model.bin:   0%|          | 0.00/540M [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "daabe15b3af34e1098a7e2f620ae1f36"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)okenizer_config.json:   0%|          | 0.00/338 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "9645d7c59b0546159890f042ca420632"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/843k [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "8fb0a31c3ecf4a4880803e81e1bc581d"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)solve/main/bpe.codes:   0%|          | 0.00/1.08M [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "d5af658c67874c67b08bd8ef72411ea1"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)in/added_tokens.json:   0%|          | 0.00/22.0 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "ddbbcddaba34416881ebb023864b0f2d"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading (…)cial_tokens_map.json:   0%|          | 0.00/167 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "d4712217939b44f59e58a0f23a8d4df7"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "inps = [\"I'm the king of the world!\", \n",
        "        \"I'll be back.\",\n",
        "        \"The cake is a lie\", \n",
        "        \"To be forgotten is worse than death\",\n",
        "        \"All happy families are alike; each unhappy family is unhappy in its own way.\",\n",
        "        \"You don't need a reason to help people\",\n",
        "        ]\n",
        "res = sentiment_pipeline(inps)\n",
        "\n",
        "for i in range(len(inps)):\n",
        "  res[i]['text'] = inps[i]\n",
        "  print(res[i])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "v-fryBdOyKSD",
        "outputId": "671ed206-a9eb-4793-992b-da9378535088"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'label': 'POS', 'score': 0.9771729707717896, 'text': \"I'm the king of the world!\"}\n",
            "{'label': 'POS', 'score': 0.5481611490249634, 'text': \"I'll be back.\"}\n",
            "{'label': 'NEG', 'score': 0.7581188678741455, 'text': 'The cake is a lie'}\n",
            "{'label': 'NEG', 'score': 0.8209365606307983, 'text': 'To be forgotten is worse than death'}\n",
            "{'label': 'NEU', 'score': 0.7874237895011902, 'text': 'All happy families are alike; each unhappy family is unhappy in its own way.'}\n",
            "{'label': 'NEU', 'score': 0.8731082081794739, 'text': \"You don't need a reason to help people\"}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "def get_reward(text, mode):\n",
        "  sent = sentiment_pipeline(text)\n",
        "  if mode == '+ve':\n",
        "    labels = torch.tensor([a['label']=='POS' for a in sent],dtype=torch.float16).unsqueeze(-1).to(device)\n",
        "  elif mode == '-ve':\n",
        "    labels = torch.tensor([a['label']=='NEG' for a in sent],dtype=torch.float16).unsqueeze(-1).to(device)\n",
        "  else:\n",
        "    raise ValueError('Unknown Mode')\n",
        "  \n",
        "  weights = torch.tensor([a['score'] for a in sent],dtype=torch.float32).unsqueeze(-1).to(device)\n",
        "  \n",
        "  rewards = labels * weights # (B, 1)\n",
        "\n",
        "  return rewards"
      ],
      "metadata": {
        "id": "St_giJWTZ19y"
      },
      "execution_count": 10,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "def flatten(l):\n",
        "    return [item for sublist in l for item in sublist]\n",
        "print('Rewards in +ve mode')\n",
        "list(zip(inps, flatten(get_reward(inps, '+ve').tolist())))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CmjSzmslbZZc",
        "outputId": "52846692-ffce-4659-f6cb-f0d9040b6f9d"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Rewards in +ve mode\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[(\"I'm the king of the world!\", 0.9771729707717896),\n",
              " (\"I'll be back.\", 0.5481611490249634),\n",
              " ('The cake is a lie', 0.0),\n",
              " ('To be forgotten is worse than death', 0.0),\n",
              " ('All happy families are alike; each unhappy family is unhappy in its own way.',\n",
              "  0.0),\n",
              " (\"You don't need a reason to help people\", 0.0)]"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print('Rewards in -ve mode')\n",
        "list(zip(inps, flatten(get_reward(inps, '-ve').tolist())))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OzUDgNwGcfAH",
        "outputId": "2b0ba9e3-41e9-4450-b5b0-d6cee2eb4a00"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Rewards in -ve mode\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[(\"I'm the king of the world!\", 0.0),\n",
              " (\"I'll be back.\", 0.0),\n",
              " ('The cake is a lie', 0.7581188678741455),\n",
              " ('To be forgotten is worse than death', 0.8209365606307983),\n",
              " ('All happy families are alike; each unhappy family is unhappy in its own way.',\n",
              "  0.0),\n",
              " (\"You don't need a reason to help people\", 0.0)]"
            ]
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "eval_interval_rlhf = 20\n",
        "max_iters_rlhf = 1000\n"
      ],
      "metadata": {
        "id": "37tezEm-CFXR"
      },
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from torch.distributions import Categorical\n",
        "class RLHF(nn.Module):\n",
        "    def __init__(self, model):\n",
        "        super().__init__()\n",
        "        self.model = model\n",
        "\n",
        "    def forward(self, idx, targets=None):\n",
        "        return self.model(idx, targets)\n",
        "     \n",
        "    def generate(self, idx, max_new_tokens, block_size, ref_model=None):\n",
        "        # idx is (B, T) array of indices in the current context\n",
        "        log_probs = torch.tensor([]).to(device)\n",
        "        log_probs_ref = torch.tensor([]).to(device)\n",
        "        \n",
        "        for i in range(max_new_tokens):\n",
        "            # crop idx to the last block_size tokens\n",
        "            idx_cond = idx[:, -block_size:]\n",
        "\n",
        "            # get the predictions\n",
        "            logits, loss = self(idx_cond)\n",
        "\n",
        "            # focus only on the last time step\n",
        "            logits = logits[:, -1, :] # becomes (B, C)\n",
        "            \n",
        "            # logits define instance of Iategorical class\n",
        "            m = Categorical(logits=logits)\n",
        "            \n",
        "            # sample from the distribution\n",
        "            idx_next = m.sample()\n",
        "            \n",
        "            # get the log probability and append to running sequence\n",
        "            log_probs_idx_next = m.log_prob(idx_next)    \n",
        "            log_probs = torch.cat((log_probs, log_probs_idx_next.view(-1,1)), dim=1)\n",
        "            \n",
        "            if ref_model is not None:\n",
        "              # get log probability of sample idx_next under the reference model\n",
        "              logits_ref, _ = ref_model(idx_cond)\n",
        "              logits_ref = logits_ref[:, -1, :] # becomes (B, C)\n",
        "            \n",
        "              m_ref = Categorical(logits=logits_ref)\n",
        "              log_probs_ref_idx_next = m_ref.log_prob(idx_next)    \n",
        "              log_probs_ref = torch.cat((log_probs_ref, log_probs_ref_idx_next.view(-1,1)), dim=1)\n",
        "\n",
        "            # append sampled index to the running sequence\n",
        "            idx = torch.cat((idx, idx_next.view(-1,1)), dim=1) # (B, T+1)\n",
        "\n",
        "        return idx, log_probs, log_probs_ref"
      ],
      "metadata": {
        "id": "Rp4iqC2RWm6n"
      },
      "execution_count": 14,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import copy\n",
        "ref_model = copy.deepcopy(model)"
      ],
      "metadata": {
        "id": "4Z_-ygQYYhal"
      },
      "execution_count": 15,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import time\n",
        "import numpy as np\n",
        "\n",
        "RLHFmodel = RLHF(model)\n",
        "RLHFmodel.to(device)\n",
        "\n",
        "ref_model.to(device)\n",
        "\n",
        "actor_optimizer = torch.optim.AdamW(RLHFmodel.parameters(), lr=1e-3)\n",
        "X, Y = get_batch('train') # fetch the very first batch\n",
        "X = torch.ones((X.shape[0], 1), dtype=torch.long).to(device) # for now there is no prompt\n",
        "X = X*enc.encode('The')[0] # start with ''The'\n",
        "t0  = time.time()\n",
        "max_new_tokens = block_size\n",
        "rews_all = []\n",
        "actor_loss_all = []\n",
        "mode = '+ve'\n",
        "ref_coef = 0.2\n",
        "e_coef = 0.1\n",
        "for iter in range(max_iters_rlhf):\n",
        "\n",
        "  states, log_probs, log_probs_ref = RLHFmodel.generate(\n",
        "      X, max_new_tokens, block_size, ref_model=ref_model)\n",
        "\n",
        "  states = states[:,-max_new_tokens:]\n",
        "  log_probs = log_probs[:,-max_new_tokens:] # (B, max_new_tokens)\n",
        "  if ref_model is not None:\n",
        "    log_probs_ref = log_probs_ref[:,-max_new_tokens:] # (B, max_new_tokens)\n",
        "  \n",
        "  rewards = get_reward([enc.decode(s.tolist()) for s in states], mode)\n",
        "  \n",
        "  pg = (rewards+ref_coef*log_probs_ref-e_coef*log_probs)* log_probs.squeeze()\n",
        "  \n",
        "  # log(1) = 0\n",
        "  # -log(1/N) = log(N)\n",
        "\n",
        "  # when ref_coef=e_coef this is equivalent to penalising for KL divergence\n",
        "  # pg = (rewards-ref_coef*(log_probs-log_probs_ref)* log_probs.squeeze() \n",
        "  \n",
        "  actor_loss = -pg.sum()\n",
        "\n",
        "  actor_optimizer.zero_grad(set_to_none=True)\n",
        "  actor_loss.backward()\n",
        "  actor_optimizer.step()\n",
        "\n",
        "  rews_all.append(rewards.mean().detach().cpu().numpy())\n",
        "  actor_loss_all.append(actor_loss.detach().cpu().numpy())\n",
        "\n",
        "  if iter % eval_interval_rlhf == 0:\n",
        "      t1 = time.time()\n",
        "      print('\\n')\n",
        "      print(f'iter: {iter}, time: {t1-t0}')\n",
        "      print(f'Actor loss: {np.mean(actor_loss_all[-eval_interval_rlhf:])}')\n",
        "      print(f'rets: {np.mean(rews_all[-eval_interval_rlhf:])}')\n",
        "\n",
        "      textRLHF = RLHFmodel.generate(X, 2*max_new_tokens, block_size, ref_model=None)[0]\n",
        "      for i in range(1):\n",
        "          text_i = textRLHF[i,:]\n",
        "          print(enc.decode(text_i.tolist()))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iZ7HG0qAahgo",
        "outputId": "0c6b1d4a-c4a8-4c63-ae51-3f399f5c343b"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "\n",
            "iter: 0, time: 2.0925092697143555\n",
            "Actor loss: -1174.6953125\n",
            "rets: 0.15557555854320526\n",
            "The nobles\n",
            "withoveseed\n",
            "Of our dimina to the narrow, of mine,\n",
            "For this earth brawling some fathers, thou shalt\n",
            "Con exceeds bow an ostows that were I\n",
            "actioning Roe that word;\n",
            "For what 'twas, likeers Lartft,\n",
            "And whose secret maid\n",
            "\n",
            "\n",
            "iter: 20, time: 43.471943855285645\n",
            "Actor loss: -1276.6146240234375\n",
            "rets: 0.08284272998571396\n",
            "The deceived!\n",
            "\n",
            "SICINs:\n",
            "The very faults of Norfolk is the cousin of sword;\n",
            "And, by me how so, seldom the last of that wounds\n",
            "Or much power home his clish'd utterhire, wine of that dame'd\n",
            "Of my life as the loss of death, be\n",
            "\n",
            "\n",
            "iter: 40, time: 83.07254672050476\n",
            "Actor loss: -1171.9931640625\n",
            "rets: 0.12373824417591095\n",
            "The bride, wives are\n",
            "Mis kneel to repent him for heaven and in debt.\n",
            "Look, thou fly, makes your beauty is his rash, I will not\n",
            "Say a side that taught their robbery.\n",
            "Mis up the demands Hastings.\n",
            "\n",
            "JULNot SAL:\n",
            "Your brother is drawn, and in\n",
            "\n",
            "\n",
            "iter: 60, time: 123.09890699386597\n",
            "Actor loss: -1200.064208984375\n",
            "rets: 0.1355905532836914\n",
            "The clouds in the blood'd age'd fearful's throne,\n",
            "To the priestred of me.\n",
            "\n",
            "HERMAMILLO:\n",
            "She are Imen, dear out your grace\n",
            "Whichose on your mother of the Duke of prince young\n",
            "Of blood of foul sword! I love upon your feet\n",
            "and ban\n",
            "\n",
            "\n",
            "iter: 80, time: 163.42521929740906\n",
            "Actor loss: -1015.2078247070312\n",
            "rets: 0.19954833388328552\n",
            "The loss ofitiveness.\n",
            "\n",
            "PAULINA:\n",
            "I am about the noble house of this presence,\n",
            " Rouro too words of England,\n",
            "And nothing, then all so, weeping yet\n",
            "Your grace comesstrance ouride the proudous thoughts!\n",
            "\n",
            "alt! Rome, pastUMNIA:\n",
            "\n",
            "\n",
            "iter: 100, time: 203.27433586120605\n",
            "Actor loss: -904.3199462890625\n",
            "rets: 0.2031252384185791\n",
            "The brother.\n",
            "\n",
            "MMeasure about the loved,\n",
            "Our letters of your worthy Gaunt lady:\n",
            "But four a town is worth'd by him,\n",
            "To rigence yield hand, there I am anyhee,\n",
            "I' love that steed our brother\n",
            "Than in bount about to joy,\n",
            "And\n",
            "\n",
            "\n",
            "iter: 120, time: 243.294597864151\n",
            "Actor loss: -1021.115234375\n",
            "rets: 0.16891352832317352\n",
            "The thrustied\n",
            "Against his majesty to- idol and entence?\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "Is that was too late that when love.\n",
            "\n",
            "JOHN OF GAUNT:\n",
            "Bend, fear of the dukedom.\n",
            "\n",
            "First Lord:\n",
            "I along, p stopp The\n",
            "\n",
            "\n",
            "iter: 140, time: 283.161762714386\n",
            "Actor loss: -1081.0517578125\n",
            "rets: 0.16179659962654114\n",
            "The set-antied live,\n",
            "So bed change with the sovereign.\n",
            "\n",
            "CAMILLIUS:\n",
            "At me, sir. arise, welcome my lords.\n",
            "\n",
            "First Citizen:\n",
            "What someever,\n",
            "I know it again so little, that I have\n",
            "say that done sweat prisoner to Hermione,\n",
            "\n",
            "\n",
            "iter: 160, time: 323.04486894607544\n",
            "Actor loss: -994.4212036132812\n",
            "rets: 0.18603841960430145\n",
            "The get hath a true,\n",
            "If our land, like every laws;\n",
            "Some common pomp--\n",
            "\n",
            "NOROLYCUS:\n",
            "For fit.\n",
            "\n",
            "NORTHUMBERLAND:\n",
            "'Tis a man that I think one mine too:\n",
            "Her consent breathing night.\n",
            "\n",
            "QUEEN MARG\n",
            "\n",
            "\n",
            "iter: 180, time: 362.9912130832672\n",
            "Actor loss: -1087.4278564453125\n",
            "rets: 0.16612616181373596\n",
            "The seven; that else,\n",
            "If very Lancaster, prisoners hope, I'll sit,\n",
            "By our honouredraidign,\n",
            "Whilst endiniusial earued for that and meet,\n",
            "Who took mean you what thou fool nort\n",
            "More than they-rate moleer therefore from thygood course:\n",
            "\n",
            "\n",
            "\n",
            "iter: 200, time: 402.95860266685486\n",
            "Actor loss: -878.0130004882812\n",
            "rets: 0.22536030411720276\n",
            "The day,\n",
            "Had she in theEWway of all!\n",
            "DUCHESS OF lord, I will not sound in theirAL.\n",
            "\n",
            "RICHARD:\n",
            "\n",
            "Purs III:\n",
            "Your Mow me, pale!\n",
            "\n",
            "LUCIO:\n",
            "\n",
            "ClABETH:\n",
            "O me,\n",
            "\n",
            "\n",
            "iter: 220, time: 442.6643555164337\n",
            "Actor loss: -704.8284301757812\n",
            "rets: 0.2918173372745514\n",
            "Thebleness'd\n",
            " liberties us. Thouce statue,\n",
            "The good you had a jest of a name?\n",
            "\n",
            "TYBOW:\n",
            "This I love,\n",
            "So you in oneseech you, as but\n",
            " bloody fre gentleman's well-day? marry much tortures,\n",
            "Of your grace,\n",
            "\n",
            "\n",
            "iter: 240, time: 482.6602349281311\n",
            "Actor loss: -1022.29150390625\n",
            "rets: 0.1844368577003479\n",
            "The sons?\n",
            "Lord ROSS cloy that our maid, where there with honour\n",
            "That I sleep to the soldiers.\n",
            "\n",
            "RUKE VINCENTIO:\n",
            "J the boy; but, to giving my go.\n",
            "\n",
            "HASTINGS:\n",
            "The better of too great little vanity.\n",
            "\n",
            "Second\n",
            "\n",
            "\n",
            "iter: 260, time: 522.2385444641113\n",
            "Actor loss: -835.5398559570312\n",
            "rets: 0.2348378598690033\n",
            "The absence of you must the\n",
            "the heard you have full this 'em.\n",
            "\n",
            "QUEEN MARGARET:\n",
            " punishment is love, g hand of you or\n",
            "More than thidalines.\n",
            "\n",
            "PARIS:\n",
            "Is aATES affails morely then, I have\n",
            "k Thomas mercy.\n",
            "\n",
            "\n",
            "iter: 280, time: 562.4093163013458\n",
            "Actor loss: -739.087890625\n",
            "rets: 0.2898559868335724\n",
            "The suspicious: nor shall come a cause.\n",
            "\n",
            "BENVOLUMNIA:\n",
            "I give a king.\n",
            "\n",
            "POLIXENES:\n",
            "' they are kind,Pray follows as I look,\n",
            "Is last.\n",
            "\n",
            "CAMILLd.\n",
            "\n",
            "KING EDWARD IV:\n",
            "Talk,\n",
            "\n",
            "\n",
            "iter: 300, time: 601.8023648262024\n",
            "Actor loss: -852.7369384765625\n",
            "rets: 0.20052771270275116\n",
            "The soldiers.\n",
            "b moved how and things, give me not despise.\n",
            "\n",
            "ISABELLA:\n",
            "Ay, sir, further, were my lord;\n",
            "Behee the child', for King your high temples.\n",
            "Come, let us this prince!\n",
            "I must make him too younger whither of his\n",
            "\n",
            "\n",
            "iter: 320, time: 641.1348361968994\n",
            "Actor loss: -746.7435913085938\n",
            "rets: 0.2848814129829407\n",
            "The nobles of your hand?\n",
            "\n",
            "deepT:\n",
            "And these inferior shall writ thee here? What that shall I were we better and mine\n",
            "Now do for gentle learning to ours.\n",
            "\n",
            "SUNLORIZEL:\n",
            "Be patient time to their eyes to up thy brother of rose:\n",
            "Tell him,\n",
            "\n",
            "\n",
            "iter: 340, time: 680.8197565078735\n",
            "Actor loss: -876.5494995117188\n",
            "rets: 0.2481001913547516\n",
            "The mile live?\n",
            "If shall I will dwell round what love.\n",
            "Marry, take what I sit: give me?\n",
            "\n",
            "MENENIUS:\n",
            "And Warwick's mad?\n",
            "O shrewakret please you here;\n",
            "And all that I'll not.\n",
            "3 KING EDWARD IV:\n",
            "E\n",
            "\n",
            "\n",
            "iter: 360, time: 720.4318358898163\n",
            "Actor loss: -697.50244140625\n",
            "rets: 0.3108132779598236\n",
            "The king will bear the royal, like a ones,\n",
            "And gains day.\n",
            "\n",
            "CAMILLO:\n",
            "I dare this, I thank them, on a made, or else\n",
            "Is stay at the morning are think about, by eyes\n",
            "inish moreinateers.' Thisush's.\n",
            "\n",
            "KING ED\n",
            "\n",
            "\n",
            "iter: 380, time: 760.4552597999573\n",
            "Actor loss: -825.5812377929688\n",
            "rets: 0.2534932792186737\n",
            "The stroke of Norfolk.\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "The statue stay, I they are withdraw to your king,\n",
            "They hold your majestyire to the found it that\n",
            "for lead the villain to stir, all love, till hear many\n",
            "will what I you have fall'd with you.\n",
            "\n",
            "\n",
            "\n",
            "iter: 400, time: 799.7820234298706\n",
            "Actor loss: -680.2034301757812\n",
            "rets: 0.306072473526001\n",
            "The must fight and their friends that,\n",
            "It and letcentio. I love, come, good lords,\n",
            " Hark, what furthercing him them.\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "You were YourTo give thee--\n",
            "It is Thomas in Henry forsweness\n",
            "That many these days\n",
            "\n",
            "\n",
            "iter: 420, time: 839.0483448505402\n",
            "Actor loss: -846.8916015625\n",
            "rets: 0.25812074542045593\n",
            "The other of Warwick,\n",
            "With though it honours, for the See of set\n",
            "As thou stands by this R weary had won,\n",
            "Let his heads and glst delight.\n",
            "\n",
            "KING HENVOLIO:\n",
            "I thank thou not ables--My lord; I pray more, the\n",
            "Sh design,\n",
            "\n",
            "\n",
            "iter: 440, time: 878.8783130645752\n",
            "Actor loss: -863.8115234375\n",
            "rets: 0.2483484447002411\n",
            "The grace are be now,\n",
            "And encounters now going married.\n",
            "\n",
            "AONTESC:\n",
            "AndSo one we king, for heaven thought;\n",
            "SomeENT give that a honesty neighbour following thine.\n",
            "\n",
            "HASTINGS:\n",
            "If such map with thou hemaniful.\n",
            "\n",
            "LADY\n",
            "\n",
            "\n",
            "iter: 460, time: 918.7519881725311\n",
            "Actor loss: -773.82421875\n",
            "rets: 0.3029620051383972\n",
            "The victory body's wish you well Warwick.\n",
            "\n",
            "First Servant::\n",
            "My lord, noble nobleself for my holpge!\n",
            "Served, bloody is this grave; but my study\n",
            " toweratingond degree unto be drawn appear,\n",
            "The better.\n",
            "\n",
            "KING EDWARD IV:\n",
            "I reason\n",
            "\n",
            "\n",
            "iter: 480, time: 957.8695306777954\n",
            "Actor loss: -790.4373168945312\n",
            "rets: 0.2682585120201111\n",
            "The point of this. You sun,\n",
            "How fares the vault I, is; itad, seem sm indirectutio:\n",
            "Heio,\n",
            "I will one to thy sake to make your tongue.\n",
            "\n",
            "HENRY BOLARET:\n",
            "God you, dead: tell him came?\n",
            "\n",
            "S\n",
            "\n",
            "\n",
            "iter: 500, time: 997.3224573135376\n",
            "Actor loss: -603.6134033203125\n",
            "rets: 0.3471847474575043\n",
            "The ground is best with all subjects\n",
            "inn shed, I come itself, such content coats,--\n",
            "Or, come, a present in things,\n",
            "And by the morning.\n",
            "\n",
            "KING EDWARD IV:\n",
            "I thank thee, and a word as they have you joy,\n",
            "have you, to show.\n",
            "\n",
            "\n",
            "\n",
            "iter: 520, time: 1036.6428000926971\n",
            "Actor loss: -643.1966552734375\n",
            "rets: 0.3397030532360077\n",
            "The house of mine hand's anger\n",
            "In an day, my gracious lord.\n",
            "Come it, so did I hunt to breed friends,\n",
            "Who unh prominous faults, then of Rome time\n",
            "Shall reg us, with the tombity witness\n",
            "In sorrow, here: here are more bed\n",
            "Be frowns love\n",
            "\n",
            "\n",
            "iter: 540, time: 1076.3727550506592\n",
            "Actor loss: -708.031005859375\n",
            "rets: 0.3134646415710449\n",
            "The edge!\n",
            "\n",
            "\n",
            "LUCIO:\n",
            "Ah, acting you untime,\n",
            "That I say 'twas: pray they are it\n",
            "In wo; to carryeech you richer;\n",
            "Like beasts what fellow? Pray, dear you shall fly you\n",
            "Yourself to have no such a and true\n",
            "\n",
            "\n",
            "iter: 560, time: 1115.688039779663\n",
            "Actor loss: -682.0416259765625\n",
            "rets: 0.3094523847103119\n",
            "The place, deputy\n",
            "Of England.\n",
            "\n",
            "DUKE OF AUMLE:\n",
            "Where you what stands it? Pray? I'll give you\n",
            "Even with the churchous daughter of us!\n",
            "\n",
            "NORTHUMBERLAND:\n",
            "Nay, come to our wlockish'd hand,\n",
            "The\n",
            "\n",
            "\n",
            "iter: 580, time: 1155.1704773902893\n",
            "Actor loss: -584.12646484375\n",
            "rets: 0.36164161562919617\n",
            "The price;\n",
            "And so shall I loveing ignorant have\n",
            "To husband. send myself are\n",
            "Sinceives? THR modestyible,\n",
            "On neither in the fire have done.\n",
            "\n",
            "BUCKINGHAM:\n",
            "Let you your love. I\n",
            "\n",
            "ISABELLA:\n",
            "Good brother, good mad\n",
            "\n",
            "\n",
            "iter: 600, time: 1194.873331785202\n",
            "Actor loss: -517.8790283203125\n",
            "rets: 0.39595964550971985\n",
            "The king.\n",
            "\n",
            "HENRY PER:\n",
            "Well, the in York, love.\n",
            "\n",
            "VOLUMNIA:\n",
            "I'll cutign defend my lord.\n",
            "\n",
            "MENENIUS:\n",
            "Go, be prosperous so:\n",
            "O will-morrow in the father's love.\n",
            "\n",
            "ROMEO:\n",
            "\n",
            "\n",
            "iter: 620, time: 1234.6517779827118\n",
            "Actor loss: -632.740966796875\n",
            "rets: 0.3639276921749115\n",
            "The nature man would respect drunk:\n",
            "business my mind receives and fear of his way\n",
            "As to their proceedings.\n",
            "\n",
            "VOLUMNIA:\n",
            "I would I love for this cap, a tear'd,\n",
            "But, love him should be not that hath\n",
            "ighthART you undertake, to go.\n",
            "\n",
            "CLA\n",
            "\n",
            "\n",
            "iter: 640, time: 1273.8108813762665\n",
            "Actor loss: -603.2329711914062\n",
            "rets: 0.37746334075927734\n",
            "The earth,\n",
            "to not worse set his knife, a Clarence apprehend,\n",
            "To be me; while him and Romeo so.\n",
            "\n",
            "DUKE OF YORK:\n",
            "I doeech thee the pleasureier:\n",
            "Didishedief make Sweet loved him my knowledge!\n",
            "\n",
            "DUKE OF YORK:\n",
            "\n",
            "\n",
            "\n",
            "iter: 660, time: 1313.7660591602325\n",
            "Actor loss: -787.6739501953125\n",
            "rets: 0.288984090089798\n",
            "The second,\n",
            "First Senator itself, when I will come there have his\n",
            "With the princely like clouds.\n",
            "\n",
            "DUKE OF YORK:\n",
            "CAPULETis, we dream I love a doth thy retire.\n",
            "\n",
            "RICHHENRY BWiltIO:\n",
            "O, I come to\n",
            "\n",
            "\n",
            "iter: 680, time: 1353.1964864730835\n",
            "Actor loss: -592.0076293945312\n",
            "rets: 0.36385422945022583\n",
            "The point and least'd in casting for our king;\n",
            "I shall have where he, too enrich,--\n",
            "Withape: what is 'not there by the state; and all love,\n",
            "I say he may sit.\n",
            "\n",
            "KING RICHARD III:\n",
            "If she is your longbs. He out\n",
            "\n",
            "\n",
            "\n",
            "iter: 700, time: 1392.60701918602\n",
            "Actor loss: -641.7145385742188\n",
            "rets: 0.3566742539405823\n",
            "The course?\n",
            "\n",
            "MENENIUS:\n",
            "Your gracious lord.\n",
            "3 KING HENRY VI.\n",
            "\n",
            "First Senator:\n",
            "O, lady'st, he is that you?\n",
            "\n",
            "GLOUCESTER:\n",
            "Well, Romeo, where is my lord, I\n",
            "ilt desire you. spirit\n",
            "\n",
            "\n",
            "iter: 720, time: 1432.1645839214325\n",
            "Actor loss: -631.0171508789062\n",
            "rets: 0.33802512288093567\n",
            "The sweet tears.\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "I know if woe?\n",
            "\n",
            "DU informIA:\n",
            "My gracious brother, as protest;\n",
            "A slave clouds, I hath i'ld. O bird.\n",
            "\n",
            "NORTHUMBERLAND:\n",
            "Be length's child mercy,\n",
            "\n",
            "\n",
            "iter: 740, time: 1472.0193173885345\n",
            "Actor loss: -770.1849365234375\n",
            "rets: 0.306596577167511\n",
            "The provost: I thank you, be little man.\n",
            "\n",
            "HORTENSIO:\n",
            "Thou hast my son, speak.\n",
            "\n",
            "DUKE OF YORK:\n",
            "No, it is a happy\n",
            "ears my nime so writ;\n",
            "And nowcreator entertainment to behold.\n",
            "\n",
            "KING EDWARD IV\n",
            "\n",
            "\n",
            "iter: 760, time: 1511.5322704315186\n",
            "Actor loss: -654.1215209960938\n",
            "rets: 0.34393274784088135\n",
            "Theou is A spirit.\n",
            "\n",
            "KING EDWARD IV:\n",
            "Come, I do know it so, most greatness,\n",
            "Tread, I will proud scene him. My a throne\n",
            "Well, then, sweet for a debt on war,\n",
            "Thus Barnards and himself to happy one hark.\n",
            "\n",
            "\n",
            "\n",
            "\n",
            "iter: 780, time: 1550.8467333316803\n",
            "Actor loss: -635.3596801757812\n",
            "rets: 0.32344189286231995\n",
            "The bed of the face?\n",
            "\n",
            "HENRY BOLINGBROKE:\n",
            "HereORDoler thou odd!\n",
            "Of it not too, prefer.\n",
            "\n",
            "ARD III:\n",
            "I come,\n",
            "Upon them.\n",
            "\n",
            "HERGDUCHESS OF AUM:\n",
            "\n",
            "NORTHUMBERLAND\n",
            "\n",
            "\n",
            "iter: 800, time: 1590.659930229187\n",
            "Actor loss: -659.7162475585938\n",
            "rets: 0.3370627164840698\n",
            "The noble\n",
            " Pleaseing, for grace him as with the excellent of their\n",
            "To frame out onte.\n",
            "\n",
            "MERBIONDELL:\n",
            "I know for the orderly spirit to know\n",
            "WhomENTIO.\n",
            "Why'll may have free\n",
            "Of your stronger law of smiles of another give:\n",
            "O\n",
            "\n",
            "\n",
            "iter: 820, time: 1629.6801323890686\n",
            "Actor loss: -540.1078491210938\n",
            "rets: 0.3763672113418579\n",
            "The common of thee.\n",
            "\n",
            "QUEEN ELIZABETH:\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "God hold me well I am this noble soul we.\n",
            "\n",
            "KING EDWARD IV:\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "Come, I'll blELL you, subjects enough\n",
            "\n",
            "\n",
            "iter: 840, time: 1669.0213844776154\n",
            "Actor loss: -715.7509765625\n",
            "rets: 0.3233412206172943\n",
            "The sleep,\n",
            "And being, doublere our forbear, loving friends,\n",
            "A prett hath thou grow patient;\n",
            "Whom thou hastest? scalares,--\n",
            "\n",
            "PAULINA:\n",
            "Will he did your pleasure would please him down.\n",
            "\n",
            "LUCIO:\n",
            " marry much, good lord,\n",
            "\n",
            "\n",
            "iter: 860, time: 1708.6020300388336\n",
            "Actor loss: -698.3497924804688\n",
            "rets: 0.3224356472492218\n",
            "The trick can shed. Who'\n",
            "KATHARENCE:\n",
            "Darest thou did prove I never usur heart, our unroble spent;\n",
            "We hope, to rise, love you shall kill enver.\n",
            " You will them more, I will not speak thee.\n",
            "\n",
            "KING EDWARD IV:\n",
            "A\n",
            "\n",
            "\n",
            "iter: 880, time: 1748.3153533935547\n",
            "Actor loss: -658.4968872070312\n",
            "rets: 0.31735074520111084\n",
            "The unt in my lner.\n",
            "\n",
            "ComeUMNIA:\n",
            "He cannot good queen of his\n",
            "Than it in our exnely days, 'em,\n",
            "As not too.\n",
            "\n",
            "MERCALUS:\n",
            "'Tis you have peril'd.\n",
            "There's rose out of the matter snow\n",
            "\n",
            "\n",
            "iter: 900, time: 1787.5656242370605\n",
            "Actor loss: -619.747314453125\n",
            "rets: 0.3222334086894989\n",
            "The heads to do thou\n",
            "To free!Glad the gates is do?\n",
            "SICINCE of Norfolk to the world to London and mother.\n",
            "Alearch.\n",
            "\n",
            "Second Citizen:\n",
            "That she of the old bright:\n",
            "The gates of the house?\n",
            "\n",
            "GLOUCESTER:\n",
            "All\n",
            "\n",
            "\n",
            "iter: 920, time: 1827.115246772766\n",
            "Actor loss: -655.8629760742188\n",
            "rets: 0.3531707227230072\n",
            "The gageeth man.\n",
            "\n",
            "WARD IV:\n",
            "And therefore, wife!\n",
            "\n",
            "ELBRAY, a slaveies fear is\n",
            "Do I playraw you to-likely:\n",
            "I have I love else thy breast?\n",
            "Hear what is made me: thou shalt stand in 'tis happy tallprovided\n",
            "\n",
            "\n",
            "iter: 940, time: 1866.3652894496918\n",
            "Actor loss: -479.79656982421875\n",
            "rets: 0.40305882692337036\n",
            "The worth, that my gracious lords.\n",
            "\n",
            "May sheANUS:\n",
            "Your love, like that will prove.\n",
            "\n",
            "PARIS:\n",
            "Why, for you have pin, luck was born,\n",
            "To bring you the common, what an blood of the\n",
            "Even are starkly the gods: I will have it\n",
            "\n",
            "\n",
            "iter: 960, time: 1905.8332304954529\n",
            "Actor loss: -573.760986328125\n",
            "rets: 0.36464911699295044\n",
            "The sorrow, yet they will gates.\n",
            "\n",
            "RIARINA:\n",
            "She! This is Lord Hastings\n",
            "Shall be a well up heaven,\n",
            "If this sin in himself.\n",
            "\n",
            "KING EDWARD IV:\n",
            "\n",
            "CLARENCE:\n",
            "My Lord?\n",
            "\n",
            "AUFIDIUS:\n",
            "What pain\n",
            "\n",
            "\n",
            "iter: 980, time: 1945.2791361808777\n",
            "Actor loss: -630.5237426757812\n",
            "rets: 0.34373268485069275\n",
            "The law, I love his\n",
            "To win them: I seem cannot be found; 'em me no.\n",
            "\n",
            "ISABHCFICHARDINE:\n",
            "Go, peace, misay, than we must you.\n",
            "\n",
            "PARians!\n",
            "A:\n",
            "\n",
            "KING RICHARD III:\n",
            "I thank\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# generate from the model\n",
        "context = torch.zeros((1, 1), dtype=torch.long, device=device)\n",
        "print(enc.decode(m.generate(context, max_new_tokens=200)[0].tolist()))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Mjv63OeQ0bUI",
        "outputId": "af51f8a4-a8fc-4861-a005-22ae9eb93695"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "!\n",
            "You be right,\n",
            "And triumph; am so, my good\n",
            "to of general; if thou husband\n",
            "As never sunder best against my blood.\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "Then like a Romansers, art speed at the queen,\n",
            "Fromagenetings, my tears. But\n",
            "\n",
            "ulet:\n",
            "Which was what I break scarce; where then.\n",
            "For this we'll bear the dewance of man!\n",
            "\n",
            "POLIXENES:\n",
            "I'll swear soon I will say out a happyker you.\n",
            "Hear thee.\n",
            "\n",
            "DUKE VINCENTIO:\n",
            "But since our life, Lord, Titus me,\n",
            "For I love this wed is the battle your man.\n",
            "\n",
            "CATESBY:\n",
            "Farewell you, why.\n",
            "\n",
            "CLAUDIO:\n",
            " if you hear that live Not gone, sent, let our headsarer\n",
            "wixt the XIant cannot talkings\n",
            "Should is cl\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "1. We trained a GPT model to reproduce Shakespeare\n",
        "2. We built a reward model by repurposing a Huggingface sentiment classifier\n",
        "3. We fine tuned the GPT model using reinforcement learning. \n",
        "4. The model over-optimised the reward so we penalised it for moving too far from a reference model\n",
        "5. We found it to be far too repetitive and so we added in an entropy bonus to encourage diverse outputs\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "6Na4mH46sFpx"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "PJrQpMzU9gWJ"
      },
      "execution_count": 17,
      "outputs": []
    }
  ]
}

================================================
FILE: config/config.yaml
================================================
IO:
  out_dir: out
  eval_interval: 2000
  log_interval: 1
  eval_iters: 200
  eval_only: False # if True, script exits right after the first eval
  always_save_checkpoint: True # if True, always save a checkpoint after each eval
  init_from: scratch # 'scratch' or 'resume' or 'gpt2*'
wandb:
  wandb_log: False # disabled by default
  wandb_project: rlhf # 'gpt2'
  wandb_run_name: gpt2 # 'run' + str(time.time())
data:
  dataset: shakespeare # 'openwebtext', 'shakespeare', 'openai_summarize_tldr'
  gradient_accumulation_steps: 1 # used to simulate larger batch sizes
  batch_size: 12 # if gradient_accumulation_steps > 1, this is the micro-batch size
  block_size: 32
model:
  n_layer: 2
  n_head: 2
  n_embd: 32
  dropout: 0.0 # for pretraining 0 is good, for finetuning try 0.1+
  bias: False # do we use bias inside LayerNorm and Linear layers?
optimizer: # adamw
  learning_rate: 6.0e-4 # max learning rate
  max_iters: 600000 # total number of training iterations
  weight_decay: 1.0e-2
  beta1: 0.9
  beta2: 0.95
  grad_clip: 1.0 # clip gradients at this value, or disable if == 0.0
  decay_lr: True # whether to decay the learning rate
  warmup_iters: 2000 # how many steps to warm up for
  lr_decay_iters: 600000 # should be ~= max_iters per Chinchilla
  min_lr: 6.0e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla
DDP:
  backend: nccl # 'nccl', 'gloo', etc.
system:
  device: cuda # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks
  dtype: float16 # 'float32', 'bfloat16', or 'float16', the latter will auto implement a GradScaler
  compile: False # use PyTorch 2.0 to compile the model to be faster


================================================
FILE: config/config_reward.yaml
================================================
IO:
  out_dir: out
  eval_interval: 500
  log_interval: 1
  eval_iters: 100
  eval_only: False # if True, script exits right after the first eval
  always_save_checkpoint: True # if True, always save a checkpoint after each eval
  init_from: resume # 'scratch' or 'resume' or 'gpt2*'
  init_multihead_from: scratch
  out_dir_multihead: out_reward # used if restoring multihead
wandb:
  wandb_log: True # disabled by default
  wandb_project: rlhf # 'gpt2'
  wandb_run_name: gpt2 # 'run' + str(time.time())
data:
  dataset: 'shakespeare' # 'openwebtext', 'shakespeare', 'openai_summarize_tldr'
  gradient_accumulation_steps: 1 # used to simulate larger batch sizes
  batch_size: 64 # if gradient_accumulation_steps > 1, this is the micro-batch size
  block_size: 32
model:
  n_layer: 2
  n_head: 2
  n_embd: 32
  dropout: 0.0 # for pretraining 0 is good, for finetuning try 0.1+
  bias: False # do we use bias inside LayerNorm and Linear layers?
optimizer: # adamw
  learning_rate: 6.0e-4 # max learning rate
  max_iters: 600000 # total number of training iterations
  weight_decay: 1.0e-2
  beta1: 0.9
  beta2: 0.95
  grad_clip: 1.0 # clip gradients at this value, or disable if == 0.0
  decay_lr: True # whether to decay the learning rate
  warmup_iters: 2000 # how many steps to warm up for
  lr_decay_iters: 600000 # should be ~= max_iters per Chinchilla
  min_lr: 6.0e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla
DDP:
  backend: nccl # 'nccl', 'gloo', etc.
system:
  device: cuda # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks
  dtype: float16 # 'float32', 'bfloat16', or 'float16', the latter will auto implement a GradScaler
  compile: False # use PyTorch 2.0 to compile the model to be faster


================================================
FILE: config/config_rl.yaml
================================================
algorithm:
  method: gumbel # pg or gumbel
  hard_code_reward: False # use a learned reward model or hard code reward (latter does not work with Gumbel)
  separate_reward_model: True # when using a reward model, instantiate it separately rather than share params with LM
  discrete_reward: True # reward output is 0 or 1 sample if True, otherwise reward is continuous
  episode_length: 32
IO:
  out_dir: out
  eval_interval: 100
  log_interval: 1
  eval_iters: 200
  eval_only: False # if True, script exits right after the first eval
  always_save_checkpoint: True # if True, always save a checkpoint after each eval
  init_from: scratch # 'scratch' or 'resume' or 'gpt2*'
  init_multihead_from: scratch
  out_dir_multihead: out_reward # used if restoring multihead
wandb:
  wandb_log: False # disabled by default
  wandb_project: rlhf # 'gpt2'
  wandb_run_name: gpt2 # 'run' + str(time.time())
data:
  dataset: shakespeare # 'openwebtext', 'shakespeare', 'openai_summarize_tldr'
  gradient_accumulation_steps: 1 # used to simulate larger batch sizes
  batch_size: 12 # if gradient_accumulation_steps > 1, this is the micro-batch size
  block_size: 32
model:
  n_layer: 2
  n_head: 2
  n_embd: 32
  dropout: 0.0 # for pretraining 0 is good, for finetuning try 0.1+
  bias: False # do we use bias inside LayerNorm and Linear layers?
optimizer: # adamw
  learning_rate: 6.0e-4 # max learning rate
  max_iters: 600000 # total number of training iterations
  weight_decay: 1.0e-2
  beta1: 0.9
  beta2: 0.95
  grad_clip: 1.0 # clip gradients at this value, or disable if == 0.0
  decay_lr: True # whether to decay the learning rate
  warmup_iters: 2000 # how many steps to warm up for
  lr_decay_iters: 600000 # should be ~= max_iters per Chinchilla
  min_lr: 6.0e-5 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla
DDP:
  backend: nccl # 'nccl', 'gloo', etc.
system:
  device: cuda # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks
  dtype: float16 # 'float32', 'bfloat16', or 'float16', the latter will auto implement a GradScaler
  compile: False # use PyTorch 2.0 to compile the model to be faster


================================================
FILE: config/eval_gpt2.py
================================================
# evaluate the base gpt2
# n_layer=12, n_head=12, n_embd=768
# 124M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2'


================================================
FILE: config/eval_gpt2_large.py
================================================
# evaluate the base gpt2
# n_layer=36, n_head=20, n_embd=1280
# 774M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2-large'


================================================
FILE: config/eval_gpt2_medium.py
================================================
# evaluate the base gpt2
# n_layer=24, n_head=16, n_embd=1024
# 350M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2-medium'


================================================
FILE: config/eval_gpt2_xl.py
================================================
# evaluate the base gpt2
# n_layer=48, n_head=25, n_embd=1600
# 1558M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2-xl'


================================================
FILE: config/finetune_shakespeare.py
================================================
import time

out_dir = 'out-shakespeare'
eval_interval = 5
eval_iters = 40
wandb_log = False # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-xl' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = False

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 20

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False


================================================
FILE: config/train_gpt2.py
================================================
# config for training GPT-2 (124M) down to very nice loss of ~2.85 on 1 node of 8X A100 40GB
# launch as the following (e.g. in a screen session) and wait ~5 days:
# $ torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2.py

wandb_log = True
wandb_project = 'owt'
wandb_run_name='gpt2-124M'

# these make the total batch size be ~0.5M
# 12 batch size * 1024 block size * 5 gradaccum * 8 GPUs = 491,520
batch_size = 12
block_size = 1024
gradient_accumulation_steps = 5

# this makes total number of tokens be 300B
max_iters = 600000
lr_decay_iters = 600000

# eval stuff
eval_interval = 1000
eval_iters = 200
log_interval = 10

# weight decay
weight_decay = 1e-1


================================================
FILE: config/train_shakespeare_char.py
================================================
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model


================================================
FILE: configurator.py
================================================
"""
Poor Man's Configurator. Probably a terrible idea. Example usage:
$ python train.py config/override_file.py --batch_size=32
this will first run config/override_file.py, then override batch_size to 32

The code in this file will be run as follows from e.g. train.py:
>>> exec(open('configurator.py').read())

So it's not a Python module, it's just shuttling this code away from train.py
The code in this script then overrides the globals()

I know people are not going to love this, I just really dislike configuration
complexity and having to prepend config. to every single variable. If someone
comes up with a better simple Python solution I am all ears.
"""

import sys
from ast import literal_eval

for arg in sys.argv[1:]:
    if '=' not in arg:
        # assume it's the name of a config file
        assert not arg.startswith('--')
        config_file = arg
        print(f"Overriding config with {config_file}:")
        with open(config_file) as f:
            print(f.read())
        exec(open(config_file).read())
    else:
        # assume it's a --key=value argument
        assert arg.startswith('--')
        key, val = arg.split('=')
        key = key[2:]
        if key in globals():
            try:
                # attempt to eval it it (e.g. if bool, number, or etc)
                attempt = literal_eval(val)
            except (SyntaxError, ValueError):
                # if that goes wrong, just use the string
                attempt = val
            # ensure the types match ok
            assert type(attempt) == type(globals()[key])
            # cross fingers
            print(f"Overriding: {key} = {attempt}")
            globals()[key] = attempt
        else:
            raise ValueError(f"Unknown config key: {key}")


================================================
FILE: data/openai_summarize_tldr/prepare.py
================================================
# saves the openwebtext dataset to a binary file for training. following was helpful:
# https://github.com/HazyResearch/flash-attention/blob/main/training/src/datamodules/language_modeling_hf.py

import os
from tqdm import tqdm
import numpy as np
import tiktoken
from datasets import load_dataset # huggingface datasets

# number of workers in .map() call
# good number to use is ~order number of cpu cores // 2
num_proc = 16

# takes 54GB in huggingface .cache dir, about 8M documents (8,013,769)
dataset = load_dataset("CarperAI/openai_summarize_tldr")



# class TLDRDataset(Dataset):
#     def __init__(self, split):
#         self.text = []
#         dataset = load_dataset(train_path, split=split)
#         for sample in dataset:
#             self.text.append(sample["prompt"] + sample["label"])
#         # if "valid" in train_path:
#         #     self.post_list = self.post_list[0:2000]
#         # self.tokenizer = tokenizer
#         # self.max_length = max_length
#         # self.input_ids = []
#         # self.attn_masks = []

#     def __len__(self):
#         return len(self.text)

#     def __getitem__(self, idx):
#         txt = self.text[idx]
#         # encodings_dict = self.tokenizer(txt, truncation=True, max_length=self.max_length, padding="max_length")
#         # input_ids = torch.tensor(encodings_dict["input_ids"])
#         # attn_masks = torch.tensor(encodings_dict["attention_mask"])

#         return {
#             # "input_ids": input_ids,
#             # "attention_mask": attn_masks,
#             # "labels": input_ids,
#         }

# dataset = TLDRDataset(split="train")

train_text_list = []
for sample in dataset['train']:
    train_text_list.append(sample['prompt'] + sample['label'])
dataset['train'] = dataset['train'].add_column('text', train_text_list) # add the text column to the train dataset

dataset['val'] = dataset.pop('valid') # rename the valid dataset to val

val_text_list = []
for sample in dataset['val']:
    val_text_list.append(sample['prompt'] + sample['label'])
dataset['val'] = dataset['val'].add_column('text', val_text_list) # add the text column to the val dataset

dataset.pop('test') # remove the test dataset

split_dataset = dataset

# this results in:
# >>> split_dataset
# DatasetDict({
#     train: Dataset({
#         features: ['prompt', 'label', 'text'],
#         num_rows: 116722
#     })
#     val: Dataset({
#         features: ['prompt', 'label', 'text'],
#         num_rows: 6447
#     })
# })

# we now want to tokenize the dataset. first define the encoding function (gpt2 bpe)
enc = tiktoken.get_encoding("gpt2")
def process(example):
    ids = enc.encode_ordinary(example['text']) # encode_ordinary ignores any special tokens
    ids.append(enc.eot_token) # add the end of text token, e.g. 50256 for gpt2 bpe
    # note: I think eot should be prepended not appended... hmm. it's called "eot" though...
    out = {'ids': ids, 'len': len(ids)}
    return out

# tokenize the dataset
tokenized = split_dataset.map(
    process,
    remove_columns=['text','prompt','label'],
    desc="tokenizing the splits",
    num_proc=num_proc,
)

# concatenate all the ids in each dataset into one large file we can use for training
for split, dset in tokenized.items():
    arr_len = np.sum(dset['len'])
    filename = os.path.join(os.path.dirname(__file__), f'{split}.bin')
    dtype = np.uint16 # (can do since enc.max_token_value == 50256 is < 2**16)
    arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(arr_len,))

    print(f"writing {filename}...")
    idx = 0
    for example in tqdm(dset):
        arr[idx : idx + example['len']] = example['ids']
        idx += example['len']
    arr.flush()

# train.bin is ~17GB, val.bin ~8.5MB
# train has ~9B tokens (9,035,582,198)
# val has ~4M tokens (4,434,897)

# to read the bin files later, e.g. with numpy:
# m = np.memmap('train.bin', dtype=np.uint16, mode='r')


================================================
FILE: data/openwebtext/prepare.py
================================================
# saves the openwebtext dataset to a binary file for training. following was helpful:
# https://github.com/HazyResearch/flash-attention/blob/main/training/src/datamodules/language_modeling_hf.py

import os
from tqdm import tqdm
import numpy as np
import tiktoken
from datasets import load_dataset # huggingface datasets

# number of workers in .map() call
# good number to use is ~order number of cpu cores // 2
num_proc = 8

# takes 54GB in huggingface .cache dir, about 8M documents (8,013,769)
dataset = load_dataset("openwebtext")

# owt by default only contains the 'train' split, so create a test split
split_dataset = dataset["train"].train_test_split(test_size=0.0005, seed=2357, shuffle=True)
split_dataset['val'] = split_dataset.pop('test') # rename the test split to val

# this results in:
# >>> split_dataset
# DatasetDict({
#     train: Dataset({
#         features: ['text'],
#         num_rows: 8009762
#     })
#     val: Dataset({
#         features: ['text'],
#         num_rows: 4007
#     })
# })

# we now want to tokenize the dataset. first define the encoding function (gpt2 bpe)
enc = tiktoken.get_encoding("gpt2")
def process(example):
    ids = enc.encode_ordinary(example['text']) # encode_ordinary ignores any special tokens
    ids.append(enc.eot_token) # add the end of text token, e.g. 50256 for gpt2 bpe
    # note: I think eot should be prepended not appended... hmm. it's called "eot" though...
    out = {'ids': ids, 'len': len(ids)}
    return out

# tokenize the dataset
tokenized = split_dataset.map(
    process,
    remove_columns=['text'],
    desc="tokenizing the splits",
    num_proc=num_proc,
)

# concatenate all the ids in each dataset into one large file we can use for training
for split, dset in tokenized.items():
    arr_len = np.sum(dset['len'])
    filename = os.path.join(os.path.dirname(__file__), f'{split}.bin')
    dtype = np.uint16 # (can do since enc.max_token_value == 50256 is < 2**16)
    arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(arr_len,))

    print(f"writing {filename}...")
    idx = 0
    for example in tqdm(dset):
        arr[idx : idx + example['len']] = example['ids']
        idx += example['len']
    arr.flush()

# train.bin is ~17GB, val.bin ~8.5MB
# train has ~9B tokens (9,035,582,198)
# val has ~4M tokens (4,434,897)

# to read the bin files later, e.g. with numpy:
# m = np.memmap('train.bin', dtype=np.uint16, mode='r')


================================================
FILE: data/openwebtext/readme.md
================================================

## openwebtext dataset

after running `prepare.py` (preprocess) we get:

- train.bin is ~17GB, val.bin ~8.5MB
- train has ~9B tokens (9,035,582,198)
- val has ~4M tokens (4,434,897)

this came from 8,013,769 documents in total.

references:

- OpenAI's WebText dataset is dis

Download .txt

gitextract_q6tdf_kr/

├── .gitattributes
├── LICENSE
├── README.md
├── bench.py
├── chatgpt_dev_teaching.ipynb
├── config/
│   ├── config.yaml
│   ├── config_reward.yaml
│   ├── config_rl.yaml
│   ├── eval_gpt2.py
│   ├── eval_gpt2_large.py
│   ├── eval_gpt2_medium.py
│   ├── eval_gpt2_xl.py
│   ├── finetune_shakespeare.py
│   ├── train_gpt2.py
│   └── train_shakespeare_char.py
├── configurator.py
├── data/
│   ├── openai_summarize_tldr/
│   │   └── prepare.py
│   ├── openwebtext/
│   │   ├── prepare.py
│   │   └── readme.md
│   ├── shakespeare/
│   │   ├── prepare.py
│   │   └── readme.md
│   └── shakespeare_char/
│       ├── prepare.py
│       └── readme.md
├── model.py
├── requirements.txt
├── sample.py
├── scaling_laws.ipynb
├── train.py
├── train_reward_model.py
├── train_reward_model_simple.py
├── train_rl.py
├── trainers/
│   ├── reward_trainer.py
│   ├── rl_trainer.py
│   └── trainer.py
├── transformer_sizing.ipynb
└── utils.py

Download .txt

SYMBOL INDEX (78 symbols across 10 files)

FILE: bench.py
  function get_batch (line 37) | def get_batch(split):

FILE: data/openai_summarize_tldr/prepare.py
  function process (line 80) | def process(example):

FILE: data/openwebtext/prepare.py
  function process (line 36) | def process(example):

FILE: data/shakespeare_char/prepare.py
  function encode (line 32) | def encode(s):
  function decode (line 34) | def decode(l):

FILE: model.py
  function new_gelu (line 19) | def new_gelu(x):
  class LayerNorm (line 26) | class LayerNorm(nn.Module):
    method __init__ (line 29) | def __init__(self, ndim, bias):
    method forward (line 34) | def forward(self, input):
  class CausalSelfAttention (line 37) | class CausalSelfAttention(nn.Module):
    method __init__ (line 39) | def __init__(self, config):
    method forward (line 60) | def forward(self, x):
  class MLP (line 86) | class MLP(nn.Module):
    method __init__ (line 88) | def __init__(self, config):
    method forward (line 94) | def forward(self, x):
  class Block (line 101) | class Block(nn.Module):
    method __init__ (line 103) | def __init__(self, config):
    method forward (line 110) | def forward(self, x):
  class GPTConfig (line 116) | class GPTConfig:
  class GPT (line 125) | class GPT(nn.Module):
    method __init__ (line 127) | def __init__(self, config):
    method get_num_params (line 157) | def get_num_params(self, non_embedding=True):
    method _init_weights (line 169) | def _init_weights(self, module):
    method forward (line 177) | def forward(self, idx, targets=None):
    method crop_block_size (line 202) | def crop_block_size(self, block_size):
    method from_pretrained (line 213) | def from_pretrained(cls, model_type, override_args=None):
    method configure_optimizers (line 269) | def configure_optimizers(self, weight_decay, learning_rate, betas, dev...
    method estimate_mfu (line 327) | def estimate_mfu(self, fwdbwd_per_iter, dt):
    method generate (line 344) | def generate(self, idx, max_new_tokens, temperature=1.0, top_k=None):
  class RLHF (line 370) | class RLHF(nn.Module):
    method __init__ (line 371) | def __init__(self, model, mode, discrete_reward=False):
    method forward_reward (line 387) | def forward_reward(self, idx, targets=None):
    method forward (line 414) | def forward(self, idx, targets=None):
    method generate (line 420) | def generate(self, idx, max_new_tokens, device, block_size, use_refere...
    method generate_gumbel (line 501) | def generate_gumbel(self, idx, max_new_tokens, device, block_size, rew...
    method sample_gumbel (line 534) | def sample_gumbel(self, shape, eps=1e-20):
    method gumbel_softmax_sample (line 540) | def gumbel_softmax_sample(self, logits, tau, device, dim=1):
    method gumbel_softmax (line 545) | def gumbel_softmax(self, logits, tau, device):
    method forward_reward_gumbel (line 558) | def forward_reward_gumbel(self, onehots, idx=None, targets=None):

FILE: train_reward_model.py
  function create_comparison_dataset (line 23) | def create_comparison_dataset(path="CarperAI/openai_summarize_comparison...
  class PairwiseDataset (line 41) | class PairwiseDataset(Dataset):
    method __init__ (line 42) | def __init__(self, pairs, max_length):
    method __len__ (line 56) | def __len__(self):
    method __getitem__ (line 59) | def __getitem__(self, idx):
  class DataCollatorReward (line 65) | class DataCollatorReward:
    method __call__ (line 66) | def __call__(self, data):
  function collate_fn (line 72) | def collate_fn(data):

FILE: trainers/reward_trainer.py
  class RewardModelTrainer (line 11) | class RewardModelTrainer(Trainer):
    method __init__ (line 12) | def __init__(self, config, train_data, val_data, collate_fn):
    method get_batch (line 24) | def get_batch(self, split):
    method estimate_loss (line 37) | def estimate_loss(self, model, ctx):
    method evaluate (line 53) | def evaluate(self, model, ctx):
    method train (line 80) | def train(self):
  class ProbRewardModelTrainer (line 171) | class ProbRewardModelTrainer(Trainer):
    method __init__ (line 172) | def __init__(self, config, discrete_reward=False):
    method get_batch (line 179) | def get_batch(self, split):
    method reward (line 194) | def reward(self, sequence, t='and'):
    method evaluate (line 201) | def evaluate(self, model, ctx, X, lr):
    method train (line 256) | def train(self):

FILE: trainers/rl_trainer.py
  class PolicyGradientTrainer (line 10) | class PolicyGradientTrainer(Trainer):
    method __init__ (line 11) | def __init__(self, config):
    method train (line 17) | def train(self):
  class GumbelTrainer (line 105) | class GumbelTrainer(Trainer):
    method __init__ (line 106) | def __init__(self, config):
    method train (line 112) | def train(self):

FILE: trainers/trainer.py
  class Trainer (line 20) | class Trainer():
    method __init__ (line 21) | def __init__(self, config):
    method from_config (line 31) | def from_config(self, config):
    method setup_ddp (line 83) | def setup_ddp(self):
    method get_batch (line 102) | def get_batch(self, split):
    method get_lr (line 114) | def get_lr(self, it):
    method init_model (line 128) | def init_model(self):
    method setup (line 176) | def setup(self):
    method setup_model (line 205) | def setup_model(self, model):
    method evaluate (line 218) | def evaluate(self, model, ctx, lr):
    method train (line 244) | def train(self):
    method estimate_loss (line 336) | def estimate_loss(self, model, ctx):

FILE: utils.py
  class dotdict (line 3) | class dotdict(dict):

Download .json

Condensed preview — 36 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (587K chars).

[
  {
    "path": ".gitattributes",
    "chars": 214,
    "preview": "# Override jupyter in Github language stats for more accurate estimate of repo code languages\n# reference: https://githu"
  },
  {
    "path": "LICENSE",
    "chars": 1072,
    "preview": "MIT License\n\nCopyright (c) 2022 Andrej Karpathy\n\nPermission is hereby granted, free of charge, to any person obtaining a"
  },
  {
    "path": "README.md",
    "chars": 15433,
    "preview": "# nanoChatGPT\n\nA crude RLHF (Reinforcement Learing from Human Feedback) layer on top of nanoGPT to test an idea I had th"
  },
  {
    "path": "bench.py",
    "chars": 4736,
    "preview": "\"\"\"\nA much shorter version of train.py for benchmarking\n\"\"\"\nimport os\nfrom contextlib import nullcontext\nimport numpy as"
  },
  {
    "path": "chatgpt_dev_teaching.ipynb",
    "chars": 158908,
    "preview": "{\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0,\n  \"metadata\": {\n    \"colab\": {\n      \"provenance\": [],\n      \"authorship_tag\":"
  },
  {
    "path": "config/config.yaml",
    "chars": 1665,
    "preview": "IO:\n  out_dir: out\n  eval_interval: 2000\n  log_interval: 1\n  eval_iters: 200\n  eval_only: False # if True, script exits "
  },
  {
    "path": "config/config_reward.yaml",
    "chars": 1757,
    "preview": "IO:\n  out_dir: out\n  eval_interval: 500\n  log_interval: 1\n  eval_iters: 100\n  eval_only: False # if True, script exits r"
  },
  {
    "path": "config/config_rl.yaml",
    "chars": 2146,
    "preview": "algorithm:\n  method: gumbel # pg or gumbel\n  hard_code_reward: False # use a learned reward model or hard code reward (l"
  },
  {
    "path": "config/eval_gpt2.py",
    "chars": 208,
    "preview": "# evaluate the base gpt2\n# n_layer=12, n_head=12, n_embd=768\n# 124M parameters\nbatch_size = 8\neval_iters = 500 # use mor"
  },
  {
    "path": "config/eval_gpt2_large.py",
    "chars": 215,
    "preview": "# evaluate the base gpt2\n# n_layer=36, n_head=20, n_embd=1280\n# 774M parameters\nbatch_size = 8\neval_iters = 500 # use mo"
  },
  {
    "path": "config/eval_gpt2_medium.py",
    "chars": 216,
    "preview": "# evaluate the base gpt2\n# n_layer=24, n_head=16, n_embd=1024\n# 350M parameters\nbatch_size = 8\neval_iters = 500 # use mo"
  },
  {
    "path": "config/eval_gpt2_xl.py",
    "chars": 213,
    "preview": "# evaluate the base gpt2\n# n_layer=48, n_head=25, n_embd=1600\n# 1558M parameters\nbatch_size = 8\neval_iters = 500 # use m"
  },
  {
    "path": "config/finetune_shakespeare.py",
    "chars": 645,
    "preview": "import time\n\nout_dir = 'out-shakespeare'\neval_interval = 5\neval_iters = 40\nwandb_log = False # feel free to turn on\nwand"
  },
  {
    "path": "config/train_gpt2.py",
    "chars": 677,
    "preview": "# config for training GPT-2 (124M) down to very nice loss of ~2.85 on 1 node of 8X A100 40GB\n# launch as the following ("
  },
  {
    "path": "config/train_shakespeare_char.py",
    "chars": 1100,
    "preview": "# train a miniature character-level shakespeare model\n# good for debugging and playing on macbooks and such\n\nout_dir = '"
  },
  {
    "path": "configurator.py",
    "chars": 1758,
    "preview": "\"\"\"\nPoor Man's Configurator. Probably a terrible idea. Example usage:\n$ python train.py config/override_file.py --batch_"
  },
  {
    "path": "data/openai_summarize_tldr/prepare.py",
    "chars": 3903,
    "preview": "# saves the openwebtext dataset to a binary file for training. following was helpful:\n# https://github.com/HazyResearch/"
  },
  {
    "path": "data/openwebtext/prepare.py",
    "chars": 2420,
    "preview": "# saves the openwebtext dataset to a binary file for training. following was helpful:\n# https://github.com/HazyResearch/"
  },
  {
    "path": "data/openwebtext/readme.md",
    "chars": 489,
    "preview": "\n## openwebtext dataset\n\nafter running `prepare.py` (preprocess) we get:\n\n- train.bin is ~17GB, val.bin ~8.5MB\n- train h"
  },
  {
    "path": "data/shakespeare/prepare.py",
    "chars": 1096,
    "preview": "import os\nimport requests\nimport tiktoken\nimport numpy as np\n\n# download the tiny shakespeare dataset\ninput_file_path = "
  },
  {
    "path": "data/shakespeare/readme.md",
    "chars": 161,
    "preview": "\n# tiny shakespeare\n\nTiny shakespeare, of the good old char-rnn fame :)\n\nAfter running `prepare.py`:\n\n- train.bin has 30"
  },
  {
    "path": "data/shakespeare_char/prepare.py",
    "chars": 2337,
    "preview": "\"\"\"\nPrepare the Shakespeare dataset for character-level language modeling.\nSo instead of encoding with GPT-2 BPE tokens,"
  },
  {
    "path": "data/shakespeare_char/readme.md",
    "chars": 209,
    "preview": "\n# tiny shakespeare, character-level\n\nTiny shakespeare, of the good old char-rnn fame :) Treated on character-level.\n\nAf"
  },
  {
    "path": "model.py",
    "chars": 28675,
    "preview": "\"\"\"\nFull definition of a GPT Language Model, all of it in this single file.\nReferences:\n1) the official GPT-2 TensorFlow"
  },
  {
    "path": "requirements.txt",
    "chars": 38,
    "preview": "tiktoken\nnumpy\ncontextlib\ntorch\nwandb\n"
  },
  {
    "path": "sample.py",
    "chars": 3863,
    "preview": "\"\"\"\nSample from a trained model\n\"\"\"\nimport os\nimport pickle\nfrom contextlib import nullcontext\nimport torch\nimport tikto"
  },
  {
    "path": "scaling_laws.ipynb",
    "chars": 268524,
    "preview": "{\n \"cells\": [\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Reproducing s"
  },
  {
    "path": "train.py",
    "chars": 1410,
    "preview": "\"\"\"\nThis training script can be run both on a single gpu in debug mode,\nand also in a larger training run with distribut"
  },
  {
    "path": "train_reward_model.py",
    "chars": 2992,
    "preview": "\nfrom trainers.reward_trainer import RewardModelTrainer\nimport yaml\n\nfrom datasets import load_dataset\nfrom torch.utils."
  },
  {
    "path": "train_reward_model_simple.py",
    "chars": 470,
    "preview": "\nfrom trainers.reward_trainer import ProbRewardModelTrainer\nimport yaml\n\nfrom tqdm import tqdm\nimport tiktoken\nimport to"
  },
  {
    "path": "train_rl.py",
    "chars": 834,
    "preview": "import yaml\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nfrom trainers.rl_trainer import PolicyGradientT"
  },
  {
    "path": "trainers/reward_trainer.py",
    "chars": 13779,
    "preview": "import torch\nimport numpy as np\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nfrom torch.distributed impo"
  },
  {
    "path": "trainers/rl_trainer.py",
    "chars": 10172,
    "preview": "import torch\nimport numpy as np\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nfrom torch.distributed impo"
  },
  {
    "path": "trainers/trainer.py",
    "chars": 15504,
    "preview": "\nimport os\nimport time\nimport math\nimport pickle\nfrom contextlib import nullcontext\n\nimport numpy as np\nimport torch\n\nfr"
  },
  {
    "path": "transformer_sizing.ipynb",
    "chars": 14579,
    "preview": "{\n \"cells\": [\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Transform"
  },
  {
    "path": "utils.py",
    "chars": 188,
    "preview": "import torch\n\nclass dotdict(dict):\n    \"\"\"dot.notation access to dictionary attributes\"\"\"\n    __getattr__ = dict.get\n   "
  }
]

About this extraction

This page contains the full source code of the sanjeevanahilan/nanoChatGPT GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 36 files (549.4 KB), approximately 250.1k tokens, and a symbol index with 78 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo