main 6042517b4463 cached
41 files
4.2 MB
1.1M tokens
1 requests
Download .txt
Showing preview only (4,403K chars total). Download the full file or copy to clipboard to get everything.
Repository: HandsOnLLM/Hands-On-Large-Language-Models
Branch: main
Commit: 6042517b4463
Files: 41
Total size: 4.2 MB

Directory structure:
gitextract_kmx38cwi/

├── .setup/
│   ├── README.md
│   └── conda/
│       ├── README.md
│       └── common_issues.md
├── LICENSE
├── README.md
├── bonus/
│   ├── 2_deeplearningai.md
│   ├── 3_quantization.md
│   ├── 4_mamba.md
│   ├── 5_mixture_of_experts.md
│   ├── 6_stable_diffusion.md
│   ├── 7_reasoning_llms.md
│   ├── 8_deepseek_r1.md
│   ├── 9_agents.md
│   └── README.md
├── chapter01/
│   ├── Chapter 1 - Introduction to Language Models.ipynb
│   └── README.md
├── chapter02/
│   ├── Chapter 2 - Tokens and Token Embeddings.ipynb
│   └── README.md
├── chapter03/
│   ├── Chapter 3 - Looking Inside LLMs.ipynb
│   └── README.md
├── chapter04/
│   ├── Chapter 4 - Text Classification.ipynb
│   └── README.md
├── chapter05/
│   ├── Chapter 5 - Text Clustering and Topic Modeling.ipynb
│   └── README.md
├── chapter06/
│   ├── Chapter 6 - Prompt Engineering.ipynb
│   └── README.md
├── chapter07/
│   ├── Chapter 7 - Advanced Text Generation Techniques and Tools.ipynb
│   └── README.md
├── chapter08/
│   ├── Chapter 8 - Semantic Search.ipynb
│   └── README.md
├── chapter09/
│   ├── Chapter 9 - Multimodal Large Language Models.ipynb
│   └── README.md
├── chapter10/
│   ├── Chapter 10 - Creating Text Embedding Models.ipynb
│   └── README.md
├── chapter11/
│   ├── Chapter 11 - Fine-Tuning BERT.ipynb
│   └── README.md
├── chapter12/
│   ├── Chapter 12 - Fine-tuning Generation Models.ipynb
│   └── README.md
├── environment.yml
├── requirements.txt
└── requirements_min.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .setup/README.md
================================================
# Setup Instructions

Here you will find several methods of running the code found in the book. There are two preferred methods, one for local usage and one for cloud-based:

* **Local**: Using a [Conda](../.setup/conda) environment
* **Cloud**: Using [Google Colab Notebooks](https://github.com/HandsOnLLM/Hands-On-Large-Language-Models/tree/main?tab=readme-ov-file#table-of-contents)

## Quick start

If you already have a local Python (3.10.*) environment and [Microsoft Visual C++ 14.0](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or greater installed, you can install the complete environment as follows at the root of this repository:

```bash
pip install -r requirements.txt
```

> [!WARNING]
> If you get the following error `error: Microsoft Visual C++ 14.0 or greater is required.` then you will need to install C++. 
> Follow the instructions [here](../.setup/conda/common_issues.md) for an installation guide before you can install your environment.

If you have conda installed (this **does not** require an additional installation of C++), you can also install the complete environment as follows at the root of this repository:

```bash
conda env create -f environment.yml
```

> [!TIP]
> After preparing your environment, it is recommended to install the GPU version of PyTorch following the instructions [here](https://pytorch.org/) as most examples will require a GPU.

For a complete tutorial on setting up your environment, visit the [conda example](../.setup/conda).


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2024 Jay Alammar & Maarten Grootendorst

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Hands-On Large Language Models

<a href="https://www.linkedin.com/in/jalammar/"><img src="https://img.shields.io/badge/Follow%20Jay-blue.svg?logo=linkedin"></a>
<a href="https://www.linkedin.com/in/mgrootendorst/"><img src="https://img.shields.io/badge/Follow%20Maarten-blue.svg?logo=linkedin"></a>
<a href="https://www.deeplearning.ai/short-courses/how-transformer-llms-work/?utm_campaign=handsonllm-launch&utm_medium=partner"><img src="https://img.shields.io/badge/DeepLearning.AI%20Course-NEW!-&labelColor=black&color=red.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAuMDAwMzY1MjgxIC0wLjAwMDE0MDE0MiAzMy4yOSAzMy4xNSI+Cgk8cGF0aCBkPSJNMTYuNjQzIDMzLjE0NWMtMy4yOTIgMC02LjUxLS45NzItOS4yNDYtMi43OTNhMTYuNTg4IDE2LjU4OCAwIDAxLTYuMTMtNy40MzhBMTYuNTA3IDE2LjUwNyAwIDAxLjMyIDEzLjM0YTE2LjU1IDE2LjU1IDAgMDE0LjU1NS04LjQ4NUExNi42NjUgMTYuNjY1IDAgMDExMy4zOTYuMzE4YTE2LjcxIDE2LjcxIDAgMDE5LjYxNi45NDQgMTYuNjI4IDE2LjYyOCAwIDAxNy40NyA2LjEwMyAxNi41MjIgMTYuNTIyIDAgMDEyLjgwNCA5LjIwN2MwIDQuMzk2LTEuNzUzIDguNjEtNC44NzQgMTEuNzE5YTE2LjY4IDE2LjY4IDAgMDEtMTEuNzY5IDQuODU0em0uMTI1LTYuNjI4YzYuOTA2IDAgMTIuNTE3LTUuNjk4IDEyLjUxNy0xMi43MyAwLTcuMDMtNS42MS0xMi43MjUtMTIuNTE3LTEyLjcyNS02LjkwNiAwLTEyLjUxNyA1LjY5OC0xMi41MTcgMTIuNzI1IDAgNy4wMjcgNS42MTEgMTIuNzMgMTIuNTE3IDEyLjczem0tLjEyNS0yLjkxOGMtNi4yODkgMC0xMS4zODYtNC45MjUtMTEuMzg2LTExLjAwMkM1LjI1NyA2LjUyIDEwLjM2IDEuNTkgMTYuNjQzIDEuNTljNi4yODQgMCAxMS4zODYgNC45MyAxMS4zODYgMTEuMDA3cy01LjA5NyAxMS4wMDItMTEuMzg2IDExLjAwMnptLS4yNDItNC41MDhjNC43NyAwIDguNjMzLTMuNjc5IDguNjMzLTguMjE4IDAtNC41MzgtMy44ODUtOC4yMjEtOC42MzMtOC4yMjEtNC43NDcgMC04LjYzMiAzLjY3OS04LjYzMiA4LjIyMSAwIDQuNTQzIDMuODg1IDguMjE4IDguNjMyIDguMjE4eiIgZmlsbD0iI0ZENEE2MSIvPgo8L3N2Zz4="></a>

Welcome! In this repository you will find the code for all examples throughout the book [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) written by [Jay Alammar](https://www.linkedin.com/in/jalammar/) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/) which we playfully dubbed: <br> 

<p align="center"><b><i>"The Illustrated LLM Book"</i></b></p>

Through the visually educational nature of this book and with **almost 300 custom made figures**, learn the practical tools and concepts you need to use Large Language Models today!

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="images/book_cover.png" width="50%" ></a>

<br>

The book is available on:

* [Amazon](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961)
* [Shroff Publishers (India)](https://www.shroffpublishers.com/books/computer-science/large-language-models/9789355425522/)
* [O'Reilly](https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/)
* [Kindle](https://www.amazon.com/Hands-Large-Language-Models-Alammar-ebook/dp/B0DGZ46G88/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr=)
* [Barnes and Noble](https://www.barnesandnoble.com/w/hands-on-large-language-models-jay-alammar/1145185960)
* [Goodreads](https://www.goodreads.com/book/show/210408850-hands-on-large-language-models)

## Table of Contents

We advise to run all examples through Google Colab for the easiest setup. Google Colab allows you to use a T4 GPU with 16GB of VRAM for free. All examples were mainly built and tested using Google Colab, so it should be the most stable platform. However, any other cloud provider should work. 

| Chapter  | Notebook  |
|---|---|
| Chapter 1: Introduction to Language Models  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)   |
| Chapter 2: Tokens and Embeddings  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter02/Chapter%202%20-%20Tokens%20and%20Token%20Embeddings.ipynb)  |
| Chapter 3: Looking Inside Transformer LLMs  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter03/Chapter%203%20-%20Looking%20Inside%20LLMs.ipynb)  |
| Chapter 4: Text Classification  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter04/Chapter%204%20-%20Text%20Classification.ipynb)  |
| Chapter 5: Text Clustering and Topic Modeling  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter05/Chapter%205%20-%20Text%20Clustering%20and%20Topic%20Modeling.ipynb)  |
| Chapter 6: Prompt Engineering  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter06/Chapter%206%20-%20Prompt%20Engineering.ipynb)  |
| Chapter 7: Advanced Text Generation Techniques and Tools  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter07/Chapter%207%20-%20Advanced%20Text%20Generation%20Techniques%20and%20Tools.ipynb)  |
| Chapter 8: Semantic Search and Retrieval-Augmented Generation  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter08/Chapter%208%20-%20Semantic%20Search.ipynb)  |
| Chapter 9: Multimodal Large Language Models  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter09/Chapter%209%20-%20Multimodal%20Large%20Language%20Models.ipynb)  |
| Chapter 10: Creating Text Embedding Models  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter10/Chapter%2010%20-%20Creating%20Text%20Embedding%20Models.ipynb)  |
| Chapter 11: Fine-tuning Representation Models for Classification  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter11/Chapter%2011%20-%20Fine-Tuning%20BERT.ipynb)  |
| Chapter 12: Fine-tuning Generation Models  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter12/Chapter%2012%20-%20Fine-tuning%20Generation%20Models.ipynb)  |

> [!TIP]
> You can check the [setup](.setup/) folder for a quick-start guide to install all packages locally and you can check the [conda](.setup/conda/) folder for a complete guide on how to setup your environment, including conda and PyTorch installation.
> Note that the depending on your OS, Python version, and dependencies your results might be slightly differ. However, they
> should this be similar to the examples in the book. 


## Reviews

> "*Jay and Maarten have continued their tradition of providing beautifully illustrated and insightful descriptions of complex topics in their new book. Bolstered with working code, timelines, and references to key papers, their book is a valuable resource for anyone looking to understand the main techniques behind how Large Language Models are built.*"
>    
> **Andrew Ng** - founder of [DeepLearning.AI](https://www.deeplearning.ai/)

---

> "*This is an exceptional guide to the world of language models and their practical applications in industry. Its highly-visual coverage of generative, representational, and retrieval applications of language models empowers readers to quickly understand, use, and refine LLMs. Highly recommended!*"
>
> **Nils Reimers** - Director of Machine Learning at Cohere | creator of [sentence-transformers](https://github.com/UKPLab/sentence-transformers)

---

> "*I can’t think of another book that is more important to read right now. On every single page, I learned something that is critical to success in this era of language models.*"
> 
> **Josh Starmer** - [StatQuest](https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw)

---

> "*If you’re looking to get up to speed in everything regarding LLMs, look no further! In this wonderful book, Jay and Maarten will take you from zero to expert in the history and latest advances in large language models. With very intuitive explanations, great real-life examples, clear illustrations, and comprehensive code labs, this book lifts the curtain on the complexities of transformer models, tokenizers, semantic search, RAG, and many other cutting-edge technologies. A must read for anyone interested in the latest AI technology!*"
> 
> **Luis Serrano, PhD** - Founder and CEO of [Serrano Academy](https://www.youtube.com/@SerranoAcademy)

---

> "*Hands-On Large Language Models brings clarity and practical examples to cut through the hype of AI. It provides a wealth of great diagrams and visual aids to supplement the clear explanations. The worked examples and code make concrete what other books leave abstract. The book starts with simple introductory beginnings, and steadily builds in scope. By the final chapters, you will be fine-tuning and building your own large language models with confidence.*"
>
> **Leland McInnes** - Researcher at the Tutte Institute for Mathematics and Computing | creator of [UMAP](https://github.com/lmcinnes/umap) and [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan)

---

## [Bonus content!](bonus/)

We attempted to put as much information into the book without it being overwhelming. However, even with a 400-page book there is still much to discover! 

We continue to create more guides that compliment the book and go more in-depth into new and [exciting topics]((bonus/)):

| [A Visual Guide to Mamba](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state)             |  [A Visual Guide to Quantization](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization) | [The Illustrated Stable Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/) |
:-------------------------:|:-------------------------:|:-------------------------:
![](images/mamba.png)  |  ![](images/quant.png) |  ![](images/diffusion.png)
**[A Visual Guide to Mixture of Experts](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts)**  | **[A Visual Guide to Reasoning LLMs](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-reasoning-llms)**  |  **[The Illustrated DeepSeek-R1](https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)**
![](images/moe.png)  |  ![](images/reasoning.png) |  ![](images/deepseek.png)

## Citation

Please consider citing the book if you consider it useful for your research:

```
@book{hands-on-llms-book,
  author       = {Jay Alammar and Maarten Grootendorst},
  title        = {Hands-On Large Language Models},
  publisher    = {O'Reilly},
  year         = {2024},
  isbn         = {978-1098150969},
  url          = {https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/},
  github       = {https://github.com/HandsOnLLM/Hands-On-Large-Language-Models}
}
```


================================================
FILE: bonus/2_deeplearningai.md
================================================
# [How Transformer LLMs Work](https://www.deeplearning.ai/short-courses/how-transformer-llms-work/?utm_campaign=handsonllm-launch&utm_medium=partner)

In this course, we take the content from the book and enhance it through a short course on the main components of what makes a Transformer LLM. This **highly animated** course, will further enhance the intuition you get from the book's content.
 
<p align="center">
    <a href="https://www.deeplearning.ai/short-courses/how-transformer-llms-work/?utm_campaign=handsonllm-launch&utm_medium=partner"><img src="../images/dlai.png" width="50%" ></a>
</p>

## What you'll learn

* Gain an understanding of the key components of transformers, including tokenization, embeddings, self-attention, and transformer blocks, to build a strong technical foundation.
* Understand recent transformer improvements to the attention mechanism such as KV cache, multi-query attention, grouped query attention, and sparse attention.
* Compare tokenization strategies used in modern LLMS and explore transformers in the Hugging Face Transformers library.



================================================
FILE: bonus/README.md
================================================
# Bonus Material

With the incredible growth of Language AI in recent years, capturing everything in a single book (even with 400 pages!) 
is near impossible. That does not mean one should not try and come up with a creative solution to this problem of size and growth.

We decided to cover the fundamentals of LLMs in the book which left us with an interesting opportunity. 
Using the book as a starting point, we could continue creating visual/illustrative content that explore 
certain topics in-depth that could not be covered that way in the book.

<p align="center"><i>All bonus materials enhance the book through the same <b>visual</b> and <b>illustrative</b> style of the book</i></p>

After reading the book, you are ready to go through these more complex topics through highly visual, detailed, and in-depth guides. Because you are already familiar with our illustrative styles, reading through these advanced concepts should be a breeze!

<p align="center">
<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="../images/bonus_overview.png" width="100%"></a>
</p>

To get a feeling of each piece of additional content, click any of the markdown files in the folder to get a short description of what it is about, or you can go directly to these visual educative articles:

1. [**Hands-On Large Language Models**](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961)
2. [How **Transformer LLMs** Work](https://www.deeplearning.ai/short-courses/how-transformer-llms-work/?utm_campaign=handsonllm-launch&utm_medium=partner)
3. [A Visual Guide to **Quantization**](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization)
4. [A Visual Guide to **Mamba and State Space Models**](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state)
5. [A Visual Guide to **Mixture of Experts** (MoE)](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts)
6. [The Illustrated **Stable Diffusion**](https://jalammar.github.io/illustrated-stable-diffusion/)
7. [A Visual Guide to **Reasoning LLMs**](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-reasoning-llms)
8. [The Illustrated **DeepSeek-R1**](https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)
9. [A Visual Guide to **LLM Agents**](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-llm-agents)


================================================
FILE: chapter01/Chapter 1 - Introduction to Language Models.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EDe7DsPWmEBV"
   },
   "source": [
    "<h1>Chapter 1 - Introduction to Language Models</h1>\n",
    "<i>Exploring the exciting field of Language AI</i>\n",
    "\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
    "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
    "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "This notebook is for Chapter 1 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
    "\n",
    "---\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
    "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
    "\n",
    "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
    "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install transformers==4.41.2 accelerate==0.31.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hXp09JFsFBXi"
   },
   "source": [
    "# Phi-3\n",
    "\n",
    "The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RSNalRXZyTTk"
   },
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "\n",
    "# Load model and tokenizer\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"microsoft/Phi-3-mini-4k-instruct\",\n",
    "    device_map=\"cuda\",\n",
    "    torch_dtype=\"auto\",\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qdyYYS0E5fEU"
   },
   "source": [
    "Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "DiUi4Wu1FCyN"
   },
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "# Create a pipeline\n",
    "generator = pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    return_full_text=False,\n",
    "    max_new_tokens=500,\n",
    "    do_sample=False\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "mD49kysT5mMY"
   },
   "source": [
    "Finally, we create our prompt as a user and give it to the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "hkR7LBmiyXmY"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Why did the chicken join the band? Because it had the drumsticks!\n"
     ]
    }
   ],
   "source": [
    "# The prompt (user input / query)\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Create a funny joke about chickens.\"}\n",
    "]\n",
    "\n",
    "# Generate output\n",
    "output = generator(messages)\n",
    "print(output[0][\"generated_text\"])"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "authorship_tag": "ABX9TyPCWg08aO4e8NWQuYCK5ppF",
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: chapter02/Chapter 2 - Tokens and Token Embeddings.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "g_a9QvUFVCUR"
   },
   "source": [
    "<h1>Chapter 2 - Tokens and Token Embeddings</h1>\n",
    "<i>Exploring tokens and embeddings as an integral part of building LLMs</i>\n",
    "\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
    "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
    "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter02/Chapter%202%20-%20Tokens%20and%20Token%20Embeddings.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "This notebook is for Chapter 2 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
    "\n",
    "---\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
    "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
    "\n",
    "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
    "\n",
    "---\n",
    "\n",
    "💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
    "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install --upgrade transformers==4.41.2 sentence-transformers==3.0.1 gensim==4.3.2 scikit-learn==1.5.0 accelerate==0.31.0 peft==0.11.1 scipy==1.10.1 numpy==1.26.4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oQHfpqT_t9-K"
   },
   "source": [
    "# Downloading and Running An LLM\n",
    "\n",
    "The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately and keep them as such so that we can explore them separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 753,
     "referenced_widgets": [
      "851b6e59cc2e4eb8961cb5fa4906c47c",
      "fd5b6ec0a82c493a92a2235635ea5ac0",
      "2bc5005713ba4b61a392935e1d83994c",
      "607bb5b3f27d463dbdb5ad04583a1c4f",
      "0d0c6c47fbb34090a73ed8bf20597ee5",
      "083149fa68934cce90cd03b6c8f90dd4",
      "73263751695843ada347c6c694afd0cd",
      "79c1152a61dd4a87a6a4e700096d92b4",
      "27fb45533f134c308cc0c27797c127f3",
      "78eb9f6c02e841ad9ecf8ed809c724bb",
      "1ade569c5cec4764af86e7eec6bc64ed",
      "dbf8268e613e4e1498133a42a48a58f8",
      "66c3d79c9411447eabae228b5d125c09",
      "46ba964cc4c84bbb87eeccc817a3354c",
      "150625fb48b740f6bce66fd4919357a9",
      "974be84551a54deb82051b947ff013af",
      "22e2cbe577e6475aa80dd94f4b704f1b",
      "83daaa4a1a5d437e8345d4740c3625cd",
      "6e1c0e41b31b4b0b83aa3736e386aa49",
      "bf1d39ed8ee84a6b95e68a11962032b4",
      "7476eb73c9544435b07256f028168a11",
      "f33bc31f0cb842388057b33dd107f2e7",
      "9b9f8a8f0eb14dd9a2c82461f22c4636",
      "214eb5ac9a5048c586b3418452017991",
      "dd1b7306ad4641799d4947a94aa0f088",
      "d58e84330cff499c87e03827d9727d19",
      "2809ac7769af482a86d5e763f5f69211",
      "746054d4ac0b442e85b75959b66ceb34",
      "9b77017c3e764c97b44c56fb5ad3bc77",
      "3dc073c6257143748849311e1c08bdbe",
      "3117fa7446394cb6b6b56427be0a3290",
      "5c3c4612f65f40808f8ef1d0e31f9836",
      "f66983763d454728b86d881621948b8d",
      "a855727c923648ea8d6c7b9ec117c94f",
      "6c9350120fec4f4e9b80e063279b459c",
      "28283e52317e437c8e118d052d93419c",
      "5556b8b2f22548109153146ef66702df",
      "dc1457b4d1d344c9863fbb6b0d38e2cf",
      "029a23c145604f209b12044a6b367802",
      "252ccf08bd52433a888d2bac9ed1b64c",
      "893af8dae5874b55a582f180652e36e3",
      "27e14bb1e4aa467c97ac0c6a6eb44108",
      "87812873ff2d4514aca02c64a3509bd9",
      "13ecb25d0d6541b1a4f2b7dea92dee61",
      "2107a7e7d40f416ca4bc46c90725e0ca",
      "1cfc631c1d7d4fb087753e9e34fd1aa2",
      "238f094ef6744b31ba4444d47c17558c",
      "b11bf972b98645e79f98cdec2c1440f3",
      "baa7c53a02cd42688d768a4595611f64",
      "6a63559431934a4aae4ce79e8b2e76c0",
      "81652b39152145748580c9667b2554de",
      "4dc8d9b1a2d248f9a30bc8e985db4568",
      "adfcbd50473c4ea9bb414e33926cc33b",
      "52a9a5aed89a4294882c8c55b2078b84",
      "83fb238d4eba47a390233ecb5e870ed4",
      "8da9bab6ee214504a187e5b3c9bf1b80",
      "77922a918c1b4bb08f5304a34044e9ba",
      "ac36ee2d4df04399b11f57bb56930a52",
      "df2f73dc04d64bf6adb4df0eae207c27",
      "3cbc759c16b24d63971ba5ef53ff9ba3",
      "ea5d84bf35754e71a628755c0cff49be",
      "3ed30ef053324b3f9a5d3258d2a88a86",
      "f8bae10ac5b54f1b99f7b595f123be4d",
      "dfb5b5a357ca41f88efa53288ebeb5b3",
      "b917827c07344e018543fcbb9c7a6fe8",
      "bad199037f6548749f4dfe12724e3e62",
      "0e30dc616d9c4c50adb9cd292fc3d89e",
      "570a9d836e48456d8c301184d670c50e",
      "471cdd69adc24003b14680e8dbe7b183",
      "c11552c4f423434aa2c41dfa22b8f227",
      "687c61c78b044690a91e34affe570613",
      "82dcc51143ac46349c13d2f5bee380da",
      "a9c8a9b099aa416d9015ee48006db324",
      "1fcc7e4cd65449c49956981ec1b46843",
      "5fc8310895a64815b15af53e3583f60d",
      "ffa0950ea21a4689b5804b0285049ccd",
      "da9ec5fd7acd43b7a11ac860771732df",
      "961d4e8f3a1845e393e25d21de3cd6d3",
      "e7506426cc424dfdb28fa6c35cb74c24",
      "d9b26784fc7c4e6b8abe9490caf53afc",
      "e96d58589fcc4bd7a469c4281baa01ee",
      "dd8fa47d73774a7684cda73e6675c0bd",
      "41843fb3c92c425b9f39cb124b74e3ad",
      "2ab54f26cc504dc09db592039c02c210",
      "cab531e67dd3452f9a4fcd17e324d76a",
      "58de29a54c91401d92cc73a96721a3e9",
      "0eabbb5ac0784a1c89ad58cca37a38fe",
      "362724ce1c5944bd9335b66a14f89844",
      "bd2749d8ba7240408be56e26c796e00a",
      "b54e7ab6e22345bdae3484cab6dab985",
      "77d3a6cc940a4dd7a38c47dfc2b7041b",
      "13071c70f17144b891a1f0fde6d9b21e",
      "205628f7d62d4e54add8ee90b418ebdf",
      "037c74818af74266b2bce5a454b99ab8",
      "bba33c0c581d415bba86aebfc8642196",
      "294ee2f27b4c4ee1ad6ca2a121a9e66d",
      "5bdf1b7758624e379c71da5296929973",
      "6cae5555dce346dc9867edd130c840c9",
      "0e37d9ed7d3142a3b80f188ddee54a4d",
      "d077b2c0e6a142cb905e38a7c7855997",
      "2a47328eace84efbb8fcf6644d4e469d",
      "c1634b24795243f5b8384957ce11b9c0",
      "b8285e7e6db348c385694d2fd63b514e",
      "da9d77e8fa0641dc97f7610aa59374a2",
      "7209a02303ad46f3880799f0b92171f0",
      "770f06c5ff714c87af1d0d7536da12c9",
      "f59c537df5684195b1b5c44754ed2d07",
      "6dd8dd0a99bd4cbbb7401a3fa3893128",
      "f8c4e2a7f11f47fcbd574ceaf828aa66",
      "193e7cde826d43bd894362498a217888",
      "febedf9dd6d248baa984ac215697968d",
      "0bf43b40190141d5bf8e0be3fdb1e529",
      "4070e2510e1549c3879e2458e42bb269",
      "9ba44d309f684c98991889b992767ab3",
      "b47b04d69b33402f80ea75ed472d3c29",
      "19a091b24a5d44b9a98569a7e124c6f8",
      "0d1cea03253c426b84814936c93e5279",
      "7c1b3811b42549569cc83423bbeb2163",
      "55bc2055b3c5426f8ba9afc9f80ccd58",
      "61098d22dd294891afdd34b6208906ba",
      "0a0db951df42424084b531ebbd6cfa98",
      "3f1dc764fb2c48bd9fd8797a86b6c59f",
      "95146eb05e964e32bfa6f5078a66fab7",
      "8c7e09509c524cf29782df69988eef6d",
      "5237587cfb9c4e86a82aaf72cc9db39d",
      "22fd14ac3ccf4e1d8e4782d35e7ec91c",
      "138a3cb6ba5b494c8a237af0c6306a4f",
      "e88a1d50d7f940d793f4dcd76a5f5929",
      "4182011b06304005a6e8922d338c65fb",
      "16ae1f6d9d844d60b55da9e36a5792cd",
      "56ea52f3a1ea485ca52eebaf4f97c3a3",
      "c81c6fe3a8ed4466bf95c53dfe415fba",
      "157b7540b5a3404e82d7e86b19906c7a",
      "48d180eca04341d692d5fb83a8e0f76d",
      "c84b966b98144e22884183cd2a03e2a8",
      "9a6496f9037149289e64e66a0f46cbd3",
      "ecd15644266b43678dbe7b067b553833",
      "43a6de1c960341eda9e154d47e0142c9",
      "6e8fa3e12a8b4262bbca73c57c7f4eb9",
      "86445e2d62b44847a4be7419a6000305",
      "3837338134c240caa0c51501fae402fb",
      "5e2d4a77710e428ebef6bae0854c8527",
      "63d81ace73634bcdaef6f5a103c35dcf",
      "dd1af9490bc24d5ea2804adefd442fdb",
      "cf18ae8dd5bb4d378f017f9eaff967b4",
      "461fde63ac0a41ed8ad9aab2d60769a7",
      "3c5bb3c7c77f4ca393dc05b12fb69dd0",
      "bd707eacc9914dc69a1fe4e35320a9a4",
      "d7cb5720fbbb4ae0858dd3070c712615",
      "bf390e3a7172415089b2cce2223dd8b6",
      "72ae6f3dcf3d4567a4a60f627a51f1f4",
      "026141ba5e404e4e86c1d1c083ab186d",
      "fcd5f49ec07f401fb8a2b403516bbd17",
      "22d378959eb941c5bbfe77702fb426e6"
     ]
    },
    "executionInfo": {
     "elapsed": 95520,
     "status": "ok",
     "timestamp": 1723034396041,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": -60
    },
    "id": "jjU8NBHnwA4j",
    "outputId": "286bdccb-f25d-4b0e-bda3-44d2a4be45cd"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.\n",
      "Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dd45dd8837f94b38ae6f4ffd205d9ea6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "\n",
    "# Load model and tokenizer\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"microsoft/Phi-3-mini-4k-instruct\",\n",
    "    device_map=\"cuda\",\n",
    "    torch_dtype=\"auto\",\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 5750,
     "status": "ok",
     "timestamp": 1719641447389,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "_iVl5yePuq3B",
    "outputId": "4ce629bf-3897-4ab0-8cf1-8f55e2040155"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<s> Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|> Subject: My Sincere Apologies for the Gardening Mishap\n",
      "\n",
      "Dear\n"
     ]
    }
   ],
   "source": [
    "prompt = \"Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|>\"\n",
    "\n",
    "# Tokenize the input prompt\n",
    "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids.to(\"cuda\")\n",
    "\n",
    "# Generate the text\n",
    "generation_output = model.generate(\n",
    "  input_ids=input_ids,\n",
    "  max_new_tokens=20\n",
    ")\n",
    "\n",
    "# Print the output\n",
    "print(tokenizer.decode(generation_output[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 4,
     "status": "ok",
     "timestamp": 1719641447389,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "JmzgbbdKuvHt",
    "outputId": "82511d5b-7949-49a0-e3a6-c128564575c8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[    1, 14350,   385,  4876, 27746,  5281,   304, 19235,   363,   278,\n",
      "         25305,   293, 16423,   292,   286,   728,   481, 29889, 12027,  7420,\n",
      "           920,   372,  9559, 29889, 32001]], device='cuda:0')\n"
     ]
    }
   ],
   "source": [
    "print(input_ids)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3,
     "status": "ok",
     "timestamp": 1719641447389,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "W4vsjbxwu1K1",
    "outputId": "506f32d1-f058-4cfd-a9cd-13c4dabe80e6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<s>\n",
      "Write\n",
      "an\n",
      "email\n",
      "apolog\n",
      "izing\n",
      "to\n",
      "Sarah\n",
      "for\n",
      "the\n",
      "trag\n",
      "ic\n",
      "garden\n",
      "ing\n",
      "m\n",
      "ish\n",
      "ap\n",
      ".\n",
      "Exp\n",
      "lain\n",
      "how\n",
      "it\n",
      "happened\n",
      ".\n",
      "<|assistant|>\n"
     ]
    }
   ],
   "source": [
    "for id in input_ids[0]:\n",
    "   print(tokenizer.decode(id))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3,
     "status": "ok",
     "timestamp": 1719641447389,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "A9wRZ3J3u4z1",
    "outputId": "7efaa49c-7a5a-41d7-f000-7aace16007e5"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[    1, 14350,   385,  4876, 27746,  5281,   304, 19235,   363,   278,\n",
       "         25305,   293, 16423,   292,   286,   728,   481, 29889, 12027,  7420,\n",
       "           920,   372,  9559, 29889, 32001,  3323,   622, 29901,  1619,   317,\n",
       "          3742,   406,  6225, 11763,   363,   278, 19906,   292,   341,   728,\n",
       "           481,    13,    13, 29928,   799]], device='cuda:0')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generation_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 275,
     "status": "ok",
     "timestamp": 1723034447362,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": -60
    },
    "id": "7QlHLof3u8A3",
    "outputId": "c2315e1b-91b4-4a1b-9bcc-084f16ac8db1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sub\n",
      "ject\n",
      "Subject\n",
      ":\n"
     ]
    }
   ],
   "source": [
    "print(tokenizer.decode(3323))\n",
    "print(tokenizer.decode(622))\n",
    "print(tokenizer.decode([3323, 622]))\n",
    "print(tokenizer.decode(29901))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "T9nRducW48bd"
   },
   "source": [
    "# Comparing Trained LLM Tokenizers\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7W0xFIVo5A0S"
   },
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "\n",
    "colors_list = [\n",
    "    '102;194;165', '252;141;98', '141;160;203',\n",
    "    '231;138;195', '166;216;84', '255;217;47'\n",
    "]\n",
    "\n",
    "def show_tokens(sentence, tokenizer_name):\n",
    "    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)\n",
    "    token_ids = tokenizer(sentence).input_ids\n",
    "    for idx, t in enumerate(token_ids):\n",
    "        print(\n",
    "            f'\\x1b[0;30;48;2;{colors_list[idx % len(colors_list)]}m' +\n",
    "            tokenizer.decode(t) +\n",
    "            '\\x1b[0m',\n",
    "            end=' '\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Gcc3JjwX5DK-"
   },
   "outputs": [],
   "source": [
    "text = \"\"\"\n",
    "English and CAPITALIZATION\n",
    "🎵 鸟\n",
    "show_tokens False None elif == >= else: two tabs:\"    \" Three tabs: \"       \"\n",
    "12.0*50=600\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 354,
     "status": "ok",
     "timestamp": 1725544666773,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "fCDGSXP75Hv-",
    "outputId": "f2c26835-a857-41db-ff2d-d930d06e512e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m[CLS]\u001b[0m \u001b[0;30;48;2;252;141;98menglish\u001b[0m \u001b[0;30;48;2;141;160;203mand\u001b[0m \u001b[0;30;48;2;231;138;195mcapital\u001b[0m \u001b[0;30;48;2;166;216;84m##ization\u001b[0m \u001b[0;30;48;2;255;217;47m[UNK]\u001b[0m \u001b[0;30;48;2;102;194;165m[UNK]\u001b[0m \u001b[0;30;48;2;252;141;98mshow\u001b[0m \u001b[0;30;48;2;141;160;203m_\u001b[0m \u001b[0;30;48;2;231;138;195mtoken\u001b[0m \u001b[0;30;48;2;166;216;84m##s\u001b[0m \u001b[0;30;48;2;255;217;47mfalse\u001b[0m \u001b[0;30;48;2;102;194;165mnone\u001b[0m \u001b[0;30;48;2;252;141;98meli\u001b[0m \u001b[0;30;48;2;141;160;203m##f\u001b[0m \u001b[0;30;48;2;231;138;195m=\u001b[0m \u001b[0;30;48;2;166;216;84m=\u001b[0m \u001b[0;30;48;2;255;217;47m>\u001b[0m \u001b[0;30;48;2;102;194;165m=\u001b[0m \u001b[0;30;48;2;252;141;98melse\u001b[0m \u001b[0;30;48;2;141;160;203m:\u001b[0m \u001b[0;30;48;2;231;138;195mtwo\u001b[0m \u001b[0;30;48;2;166;216;84mtab\u001b[0m \u001b[0;30;48;2;255;217;47m##s\u001b[0m \u001b[0;30;48;2;102;194;165m:\u001b[0m \u001b[0;30;48;2;252;141;98m\"\u001b[0m \u001b[0;30;48;2;141;160;203m/\u001b[0m \u001b[0;30;48;2;231;138;195mt\u001b[0m \u001b[0;30;48;2;166;216;84m/\u001b[0m \u001b[0;30;48;2;255;217;47mt\u001b[0m \u001b[0;30;48;2;102;194;165m\"\u001b[0m \u001b[0;30;48;2;252;141;98mthree\u001b[0m \u001b[0;30;48;2;141;160;203mtab\u001b[0m \u001b[0;30;48;2;231;138;195m##s\u001b[0m \u001b[0;30;48;2;166;216;84m:\u001b[0m \u001b[0;30;48;2;255;217;47m\"\u001b[0m \u001b[0;30;48;2;102;194;165m\"\u001b[0m \u001b[0;30;48;2;252;141;98m12\u001b[0m \u001b[0;30;48;2;141;160;203m.\u001b[0m \u001b[0;30;48;2;231;138;195m0\u001b[0m \u001b[0;30;48;2;166;216;84m*\u001b[0m \u001b[0;30;48;2;255;217;47m50\u001b[0m \u001b[0;30;48;2;102;194;165m=\u001b[0m \u001b[0;30;48;2;252;141;98m600\u001b[0m \u001b[0;30;48;2;141;160;203m[SEP]\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"bert-base-uncased\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 219,
     "referenced_widgets": [
      "76ff072348d0471abaa566d7de6b8e93",
      "5323bca1afe64a2a9bc9c14ac39ad230",
      "bf9fbc21d76a424dab0c8687bd0874c9",
      "f67d099d1e0343d9a822523223d21a75",
      "1dc884bbe68e4bf4ad1e14ca44b5c2fd",
      "b57a5357a7334d47b39f14e07e2d5708",
      "e608cc17958347c19615577b73926f45",
      "1695867ecb3f402ea8e10920adf651ee",
      "9fb081ecc7374f73b32cb203c7ed042d",
      "2fdb8c6afd2647b695f249b4f8122e52",
      "6e12c295923644ab930dcb51de932fd9",
      "d3316f8df2804c2ba34504b196aed6be",
      "893c6036041346ab9486cb7bde06ad0d",
      "42f7a33f1b554c289a434300bcc19f70",
      "d75a253a2ef648e88e35d765be5d4c34",
      "3b5a28c840ee4ba78676b4c3dbcd7af6",
      "e1175aef0e2e4b17afdb6a62f68887a7",
      "8774144d2cbc44f4bb39305e20cd5093",
      "139f7c4f547f4d08a6126897617989d5",
      "61bafb93125042f5bb7cc1195b459d45",
      "8053c516a8df40498085843ff07a2884",
      "81aa45723ec44d18a0e37844b9c70c4e",
      "33541a7c0d664fa2bd104fc9bc91f1bd",
      "580f25a7f04943e6a5100bdf584f8c97",
      "1848a0b868254c848115fc59b2cdd639",
      "47e387c2bcdc43329354304e5358c224",
      "883b562fa1c7409fbf43d2ac90c29955",
      "9798a6e28f56466f9582a40064ad4c4f",
      "4f31fd12d7c04233856206304b2a1bc7",
      "1460bc4aca764def9120218a963ae183",
      "3933809fb01c43bca62dc220cc94f217",
      "e4caf964309345bf82ad65411c1a7f3f",
      "ea6b6957c38d469abbe07bb20c811d2d",
      "eabd5498553646f18eada3542254cb0b",
      "206f377162614c72a74e73557fece973",
      "deabe5ef1f1f473f81403df9d8923846",
      "4096283e990a47a7bd4c00416fa71788",
      "3598675602e34600ae5c719d67778d24",
      "9ca5334888b249b2bd318be5a97fae7e",
      "7cce748a6f264cb69921fc97d4b8c946",
      "449aaf8bc9a64f6483eff88cc4678f6c",
      "e983503abea646a68dfe45652f7e78d1",
      "06b9fe5f3e644ddab47a17a3276e1a67",
      "4128b1818d30497f9a3d6869e02addaa"
     ]
    },
    "executionInfo": {
     "elapsed": 1520,
     "status": "ok",
     "timestamp": 1719589575187,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "0Ay_NX3K5HyP",
    "outputId": "4a32ab93-75f2-4b70-a55b-b643283c8270"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "76ff072348d0471abaa566d7de6b8e93",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d3316f8df2804c2ba34504b196aed6be",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "33541a7c0d664fa2bd104fc9bc91f1bd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "eabd5498553646f18eada3542254cb0b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m[CLS]\u001b[0m \u001b[0;30;48;2;252;141;98mEnglish\u001b[0m \u001b[0;30;48;2;141;160;203mand\u001b[0m \u001b[0;30;48;2;231;138;195mCA\u001b[0m \u001b[0;30;48;2;166;216;84m##PI\u001b[0m \u001b[0;30;48;2;255;217;47m##TA\u001b[0m \u001b[0;30;48;2;102;194;165m##L\u001b[0m \u001b[0;30;48;2;252;141;98m##I\u001b[0m \u001b[0;30;48;2;141;160;203m##Z\u001b[0m \u001b[0;30;48;2;231;138;195m##AT\u001b[0m \u001b[0;30;48;2;166;216;84m##ION\u001b[0m \u001b[0;30;48;2;255;217;47m[UNK]\u001b[0m \u001b[0;30;48;2;102;194;165m[UNK]\u001b[0m \u001b[0;30;48;2;252;141;98mshow\u001b[0m \u001b[0;30;48;2;141;160;203m_\u001b[0m \u001b[0;30;48;2;231;138;195mtoken\u001b[0m \u001b[0;30;48;2;166;216;84m##s\u001b[0m \u001b[0;30;48;2;255;217;47mF\u001b[0m \u001b[0;30;48;2;102;194;165m##als\u001b[0m \u001b[0;30;48;2;252;141;98m##e\u001b[0m \u001b[0;30;48;2;141;160;203mNone\u001b[0m \u001b[0;30;48;2;231;138;195mel\u001b[0m \u001b[0;30;48;2;166;216;84m##if\u001b[0m \u001b[0;30;48;2;255;217;47m=\u001b[0m \u001b[0;30;48;2;102;194;165m=\u001b[0m \u001b[0;30;48;2;252;141;98m>\u001b[0m \u001b[0;30;48;2;141;160;203m=\u001b[0m \u001b[0;30;48;2;231;138;195melse\u001b[0m \u001b[0;30;48;2;166;216;84m:\u001b[0m \u001b[0;30;48;2;255;217;47mtwo\u001b[0m \u001b[0;30;48;2;102;194;165mta\u001b[0m \u001b[0;30;48;2;252;141;98m##bs\u001b[0m \u001b[0;30;48;2;141;160;203m:\u001b[0m \u001b[0;30;48;2;231;138;195m\"\u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47mThree\u001b[0m \u001b[0;30;48;2;102;194;165mta\u001b[0m \u001b[0;30;48;2;252;141;98m##bs\u001b[0m \u001b[0;30;48;2;141;160;203m:\u001b[0m \u001b[0;30;48;2;231;138;195m\"\u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m12\u001b[0m \u001b[0;30;48;2;102;194;165m.\u001b[0m \u001b[0;30;48;2;252;141;98m0\u001b[0m \u001b[0;30;48;2;141;160;203m*\u001b[0m \u001b[0;30;48;2;231;138;195m50\u001b[0m \u001b[0;30;48;2;166;216;84m=\u001b[0m \u001b[0;30;48;2;255;217;47m600\u001b[0m \u001b[0;30;48;2;102;194;165m[SEP]\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"bert-base-cased\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 284,
     "referenced_widgets": [
      "77d1608ca87f4bc1b7a731c896d86db9",
      "6e2c83d85af0419c81df10e85e31d29d",
      "5d3e86f8d3f949aeacabace1f7640d81",
      "6fd2baf1fc1244d38fc41cde100d7b6e",
      "41965c378ba243339352ad3926d48862",
      "873384e3c478450ea4a5a9e061c87133",
      "dcce05965068457fb764ae1c04066d88",
      "41f2087fd27a4c018087063e8e7629d3",
      "905ac86fe3294497b1722e955d63ed4c",
      "131c47105d324eddaea5c241829e878a",
      "22f955573c2845cfba5c314b59d26739",
      "41301a0547754ecfbd6044f6eacb0b8d",
      "eff43edf235a4b92b0e26d5bc21fc909",
      "bbdc2b4c70a0426aa3d2043d0b91b839",
      "405476ddfe634ad793e28474dfe30ecc",
      "8575a84785714069921bbfdc13fb957e",
      "bad64205077a496f96e6d03d927140ba",
      "ca152f8ec99e48b39ee5267269eeaca0",
      "461e3c04697641359924ae0902b13db0",
      "f15e240b2d01488698caa3275e0bacc1",
      "4e2491afa9fc4d65b95e9471af782e4d",
      "ff1f8b630ceb449e910ca34d969fdafd",
      "fa60038dc7c547b8b1d9c54f88fd6b39",
      "e69fe73aa3d44de0bddbe1711269bd8e",
      "5ff700af61664f0eaebe580c8a49a910",
      "e901ed75738f4e41b79651dc012003a6",
      "b115c5c5193f489f87209bb5c6d788f9",
      "3bce3be7198c4917a6cb2183e1344e2c",
      "dedac5ecd5844ee29e346f465074d3fd",
      "4521cc909b2942de889a37dbec1f0277",
      "645a2e3dba2f49fb9ce9c0e2b2a8e73f",
      "198833f7fa2f4ff8ab064b4671461830",
      "66acf835e274473f84ccd486a99e71d2",
      "db2934af14274fe78ffc85f7d03fd1c8",
      "9be5d1e096934134a00974cf8e3fa63c",
      "fc2c07f1eeee43e3aad438206929f5df",
      "08d6a11cebf840748261e0ba6970092b",
      "f8912b0da8aa4f7499ad3f4e5ccca84b",
      "c87ad6d49a054fc8850bebf87c444ee3",
      "17b6d632e333476b99f4315fe737d359",
      "2661e810b7084f93a4dcd454ea7665a0",
      "042557f8b84b422882c651f910a9fce0",
      "8f95639d6dc946f18f14d8a16e73b4a4",
      "89224986b13645d9a9dbedb038e795bb",
      "2fcd1b9c380f422291096e57a6c7f85e",
      "142133a41f664fdf82a1b16d87a68ae5",
      "d1c2b3aac5cc4f3fad1413f8cfdc04e3",
      "bce267a75b9946ae8b0db42dd7f925d1",
      "3b46d5e1b7fe4427909d1c82debd7ba7",
      "3dc1a66c56fb428aad53f5221ed1ae18",
      "23cda173696645b6955515990b6834ec",
      "2141c20003154d8dab2855deb44d3aad",
      "2213d4aa55eb4ef384eaf879552dcd7d",
      "56009a7bf6fc4d7c8300fc9dc4d6ad14",
      "551cdd6ea1d94ba8a6cce7b00798c63b"
     ]
    },
    "executionInfo": {
     "elapsed": 2010,
     "status": "ok",
     "timestamp": 1719589579935,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "K_k5QduY5H0u",
    "outputId": "2e844f23-3dee-4078-8d51-4c250d2c2f3e"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "77d1608ca87f4bc1b7a731c896d86db9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "41301a0547754ecfbd6044f6eacb0b8d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fa60038dc7c547b8b1d9c54f88fd6b39",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "db2934af14274fe78ffc85f7d03fd1c8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2fcd1b9c380f422291096e57a6c7f85e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mEnglish\u001b[0m \u001b[0;30;48;2;141;160;203m and\u001b[0m \u001b[0;30;48;2;231;138;195m CAP\u001b[0m \u001b[0;30;48;2;166;216;84mITAL\u001b[0m \u001b[0;30;48;2;255;217;47mIZ\u001b[0m \u001b[0;30;48;2;102;194;165mATION\u001b[0m \u001b[0;30;48;2;252;141;98m\n",
      "\u001b[0m \u001b[0;30;48;2;141;160;203m�\u001b[0m \u001b[0;30;48;2;231;138;195m�\u001b[0m \u001b[0;30;48;2;166;216;84m�\u001b[0m \u001b[0;30;48;2;255;217;47m �\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m\n",
      "\u001b[0m \u001b[0;30;48;2;231;138;195mshow\u001b[0m \u001b[0;30;48;2;166;216;84m_\u001b[0m \u001b[0;30;48;2;255;217;47mt\u001b[0m \u001b[0;30;48;2;102;194;165mok\u001b[0m \u001b[0;30;48;2;252;141;98mens\u001b[0m \u001b[0;30;48;2;141;160;203m False\u001b[0m \u001b[0;30;48;2;231;138;195m None\u001b[0m \u001b[0;30;48;2;166;216;84m el\u001b[0m \u001b[0;30;48;2;255;217;47mif\u001b[0m \u001b[0;30;48;2;102;194;165m ==\u001b[0m \u001b[0;30;48;2;252;141;98m >=\u001b[0m \u001b[0;30;48;2;141;160;203m else\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m two\u001b[0m \u001b[0;30;48;2;255;217;47m tabs\u001b[0m \u001b[0;30;48;2;102;194;165m:\"\u001b[0m \u001b[0;30;48;2;252;141;98m \u001b[0m \u001b[0;30;48;2;141;160;203m \u001b[0m \u001b[0;30;48;2;231;138;195m \u001b[0m \u001b[0;30;48;2;166;216;84m \"\u001b[0m \u001b[0;30;48;2;255;217;47m Three\u001b[0m \u001b[0;30;48;2;102;194;165m tabs\u001b[0m \u001b[0;30;48;2;252;141;98m:\u001b[0m \u001b[0;30;48;2;141;160;203m \"\u001b[0m \u001b[0;30;48;2;231;138;195m \u001b[0m \u001b[0;30;48;2;166;216;84m \u001b[0m \u001b[0;30;48;2;255;217;47m \u001b[0m \u001b[0;30;48;2;102;194;165m \u001b[0m \u001b[0;30;48;2;252;141;98m \u001b[0m \u001b[0;30;48;2;141;160;203m \u001b[0m \u001b[0;30;48;2;231;138;195m \"\u001b[0m \u001b[0;30;48;2;166;216;84m\n",
      "\u001b[0m \u001b[0;30;48;2;255;217;47m12\u001b[0m \u001b[0;30;48;2;102;194;165m.\u001b[0m \u001b[0;30;48;2;252;141;98m0\u001b[0m \u001b[0;30;48;2;141;160;203m*\u001b[0m \u001b[0;30;48;2;231;138;195m50\u001b[0m \u001b[0;30;48;2;166;216;84m=\u001b[0m \u001b[0;30;48;2;255;217;47m600\u001b[0m \u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"gpt2\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 183,
     "referenced_widgets": [
      "63772475e9234672994f2a8edf89b192",
      "fa45fdb364444208b2693760809e3c60",
      "5a92b1072a834d69ad33060ae1b0fdf0",
      "b8a3d22de4964f369ecabc31b6cdca57",
      "274983c981654d1eb25d630d8d5e47e3",
      "eebaba8a83ed48f8afe2ca129f3a73c9",
      "2d7a3c68a2e246059a93fea280f4c2c8",
      "22983afeb04d4a06b8b01527d869585e",
      "f7492077d2ba4ccf8df039473057321d",
      "fb255528c7754047aeb29863dd642b19",
      "cc7f7f3ef40042458ac7565b070af032",
      "bc67bb86f76e482ab703f3f403f4cc76",
      "a7605351b83941698348ee84cd99f955",
      "88af753262344cfe9a88b133540bcaf7",
      "8f92316b17a24623b8646747f5ecc7d6",
      "8153c7e3f21f44f09a3222da6312137d",
      "22bee52a95d243868b40b3f8ce5ca7d9",
      "f09f8925fc34454dbb970c31d5d82707",
      "354c2db5dbd34284baf62a7529537b8b",
      "028db37ce29c45939adeed5ae311583c",
      "dd534b3f89c64de9b9aa4f7a95a05f34",
      "30ed305df62c45329f24a2f64499d490",
      "36e3e6b45fca44c8b01a729189b1bdab",
      "4727da03ed724b8dafff24856652fe95",
      "b580d17fa1134425a837a48aba06dbf4",
      "e71172ffd8d74f189cea18e4898c4c2d",
      "ce0031117cf347f48d027cd70e87193b",
      "9ee79c669c6641d887fa284286435f57",
      "9885aaa3e0be4052af4848b92c642cdd",
      "ce6eee6121334479a72e023b252124d2",
      "3adda22d2bca47a886da662621ec9a9d",
      "6a1a34996ae24a14972fd425be48dd7c",
      "2c323975d4454113a863e3ec0b56f4fb",
      "4b9fab6416924d509bbb6361f63797e1",
      "a721b2fba975474c8a3d9384cf998228",
      "1f76260c0f6c46598bee13eb4a0f8b65",
      "fd95f752cc684613b0f6c6db12af874f",
      "3c0a4cbec1bf4c7886a1a9271c1b0832",
      "2489776fa74e4dccb4154368f3861623",
      "c8506a9393604d119ff71264e27b6734",
      "93187864ee6d4d41b49df1c84f35e6e2",
      "d7cbd635afec493ebcc4973f8d98c58b",
      "d09ad050869c47fb9fbc558f8b6d47d7",
      "b4e35a7edeea4b089182f4e9b15dc12e"
     ]
    },
    "executionInfo": {
     "elapsed": 1618,
     "status": "ok",
     "timestamp": 1719589589160,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "EJn5nf3c5H2_",
    "outputId": "607c38ff-9425-4371-f5e0-1f8ee9449eee"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "63772475e9234672994f2a8edf89b192",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bc67bb86f76e482ab703f3f403f4cc76",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "36e3e6b45fca44c8b01a729189b1bdab",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4b9fab6416924d509bbb6361f63797e1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165mEnglish\u001b[0m \u001b[0;30;48;2;252;141;98mand\u001b[0m \u001b[0;30;48;2;141;160;203mCA\u001b[0m \u001b[0;30;48;2;231;138;195mPI\u001b[0m \u001b[0;30;48;2;166;216;84mTAL\u001b[0m \u001b[0;30;48;2;255;217;47mIZ\u001b[0m \u001b[0;30;48;2;102;194;165mATION\u001b[0m \u001b[0;30;48;2;252;141;98m\u001b[0m \u001b[0;30;48;2;141;160;203m<unk>\u001b[0m \u001b[0;30;48;2;231;138;195m\u001b[0m \u001b[0;30;48;2;166;216;84m<unk>\u001b[0m \u001b[0;30;48;2;255;217;47mshow\u001b[0m \u001b[0;30;48;2;102;194;165m_\u001b[0m \u001b[0;30;48;2;252;141;98mto\u001b[0m \u001b[0;30;48;2;141;160;203mken\u001b[0m \u001b[0;30;48;2;231;138;195ms\u001b[0m \u001b[0;30;48;2;166;216;84mFal\u001b[0m \u001b[0;30;48;2;255;217;47ms\u001b[0m \u001b[0;30;48;2;102;194;165me\u001b[0m \u001b[0;30;48;2;252;141;98mNone\u001b[0m \u001b[0;30;48;2;141;160;203m\u001b[0m \u001b[0;30;48;2;231;138;195me\u001b[0m \u001b[0;30;48;2;166;216;84ml\u001b[0m \u001b[0;30;48;2;255;217;47mif\u001b[0m \u001b[0;30;48;2;102;194;165m=\u001b[0m \u001b[0;30;48;2;252;141;98m=\u001b[0m \u001b[0;30;48;2;141;160;203m>\u001b[0m \u001b[0;30;48;2;231;138;195m=\u001b[0m \u001b[0;30;48;2;166;216;84melse\u001b[0m \u001b[0;30;48;2;255;217;47m:\u001b[0m \u001b[0;30;48;2;102;194;165mtwo\u001b[0m \u001b[0;30;48;2;252;141;98mtab\u001b[0m \u001b[0;30;48;2;141;160;203ms\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m\"\u001b[0m \u001b[0;30;48;2;102;194;165mThree\u001b[0m \u001b[0;30;48;2;252;141;98mtab\u001b[0m \u001b[0;30;48;2;141;160;203ms\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m\"\u001b[0m \u001b[0;30;48;2;102;194;165m12.\u001b[0m \u001b[0;30;48;2;252;141;98m0\u001b[0m \u001b[0;30;48;2;141;160;203m*\u001b[0m \u001b[0;30;48;2;231;138;195m50\u001b[0m \u001b[0;30;48;2;166;216;84m=\u001b[0m \u001b[0;30;48;2;255;217;47m600\u001b[0m \u001b[0;30;48;2;102;194;165m\u001b[0m \u001b[0;30;48;2;252;141;98m</s>\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"google/flan-t5-small\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 714,
     "status": "ok",
     "timestamp": 1723035784494,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": -60
    },
    "id": "1ymhAsTg5H5e",
    "outputId": "7827a535-4f33-4620-f4e7-4a2b622a78c2"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mEnglish\u001b[0m \u001b[0;30;48;2;141;160;203m and\u001b[0m \u001b[0;30;48;2;231;138;195m CAPITAL\u001b[0m \u001b[0;30;48;2;166;216;84mIZATION\u001b[0m \u001b[0;30;48;2;255;217;47m\n",
      "\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m�\u001b[0m \u001b[0;30;48;2;231;138;195m �\u001b[0m \u001b[0;30;48;2;166;216;84m�\u001b[0m \u001b[0;30;48;2;255;217;47m�\u001b[0m \u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mshow\u001b[0m \u001b[0;30;48;2;141;160;203m_tokens\u001b[0m \u001b[0;30;48;2;231;138;195m False\u001b[0m \u001b[0;30;48;2;166;216;84m None\u001b[0m \u001b[0;30;48;2;255;217;47m elif\u001b[0m \u001b[0;30;48;2;102;194;165m ==\u001b[0m \u001b[0;30;48;2;252;141;98m >=\u001b[0m \u001b[0;30;48;2;141;160;203m else\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m two\u001b[0m \u001b[0;30;48;2;255;217;47m tabs\u001b[0m \u001b[0;30;48;2;102;194;165m:\"\u001b[0m \u001b[0;30;48;2;252;141;98m   \u001b[0m \u001b[0;30;48;2;141;160;203m \"\u001b[0m \u001b[0;30;48;2;231;138;195m Three\u001b[0m \u001b[0;30;48;2;166;216;84m tabs\u001b[0m \u001b[0;30;48;2;255;217;47m:\u001b[0m \u001b[0;30;48;2;102;194;165m \"\u001b[0m \u001b[0;30;48;2;252;141;98m      \u001b[0m \u001b[0;30;48;2;141;160;203m \"\n",
      "\u001b[0m \u001b[0;30;48;2;231;138;195m12\u001b[0m \u001b[0;30;48;2;166;216;84m.\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m*\u001b[0m \u001b[0;30;48;2;252;141;98m50\u001b[0m \u001b[0;30;48;2;141;160;203m=\u001b[0m \u001b[0;30;48;2;231;138;195m600\u001b[0m \u001b[0;30;48;2;166;216;84m\n",
      "\u001b[0m "
     ]
    }
   ],
   "source": [
    "# The official is `tiktoken` but this the same tokenizer on the HF platform\n",
    "show_tokens(text, \"Xenova/gpt-4\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 284,
     "referenced_widgets": [
      "770da5f1b8b24972b2018e1cadd3ec8a",
      "e2f255083a1b4d8f9992f27b4d21c676",
      "4fac840948a048d6b1ca7c0dc4f4c5d5",
      "c2ac158eb3f0469ca90f7247c546c70f",
      "4a8cfc4637124995866810ef1b750fe4",
      "409169ca000a484ca4472750cfe63f30",
      "8663795c551e457dafc93d02cf0026c3",
      "5639fe0c03e7451db316579356290e3d",
      "46e73023abe1465382771e9af87f36fc",
      "5feaeba2e88a4a7fb83a85f3200e2639",
      "76a8030af7794356b5c2daa891d789e2",
      "37c46ab78fc64eb98923b24d6a0de37e",
      "3bb4b5235ef74c7d89489f8fa8cded17",
      "a720bb387fca45968c75352398935382",
      "683b85afadb744e4bd7164c51f01d3f9",
      "00dd050102674a1ab3fd8d8f9caec4b0",
      "cf6a7c6ada024f8e9f106428d506b078",
      "33744d7c827e4784a91955159a47e337",
      "f6dce141c94d4c8494f75a7387b65331",
      "48738fb1cf8e4f0fb70b74a5896669cd",
      "1b10141545cb489fa3a58d4939cc4d9b",
      "99b9c874e58c4db9b596b6ca1699e666",
      "07bf43728198472997c8b59b9343adfe",
      "74d33e70d8af43148fd3a618b5d3c5dd",
      "8008a03780b24639abce64498b1d832e",
      "82ad72412e1343b983679e625c85f47d",
      "0d3aa270949048a5886de118b1a3b1f1",
      "cc568e7a8ca84810ab878e601fae557a",
      "cb57c7f3455f4a34b59ae39d0b599b8b",
      "577a22cb6c7549ff96f367bd6f4f8b12",
      "d0d4c92c9a0f4bd29255d8ff47d18c11",
      "e6d9b96a5cb9487d90136d097e716a5a",
      "e878edbee8ea48178b424e56417b7fa5",
      "e227f5f6bb3b4580b0ea4304d34ad556",
      "36863dd97aa04c48831d1fb455557adc",
      "ece59919873646f9bbf41c7547e802a3",
      "7ff1f54520324b3e9462062ecd87ce69",
      "2e8d55afb3fc4e2fa6b0b887a09b7ca9",
      "e527a040f0be4d43830ae6d4335771b0",
      "32f18ab0328146b5aac38b4c7ef8029d",
      "5d5d4e02a6724861aa36d9af5ea70ea7",
      "c8637134a894493093654456f2a9763b",
      "8a1023f076f34f34ab0b091d7f62c172",
      "3537e0361ab5475282b7f34adbfc70dc",
      "12d4df348cd34dc2b3d7ffc41f0561e8",
      "65326149d62e4404ba49a8d2d505adac",
      "b84c1bff2fcc48ed8fee636e1bdb16f9",
      "a8ae6ec72f2744b4991de0a961e1b142",
      "98182c58a56343e482ed44935be4fd31",
      "15719135708047a49a19887989dac12d",
      "a43ccd6d114644ae86c3129786f8105a",
      "90cc27a829e54b16850552e04f718bc3",
      "2be39bb4eacf41e381432b37febfd788",
      "df4dd594480d446da1523bd2d016c1cb",
      "11ea292f268d481396192b158814e6b1"
     ]
    },
    "executionInfo": {
     "elapsed": 9948,
     "status": "ok",
     "timestamp": 1719590292199,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "3_vAyeTy5H7_",
    "outputId": "ad3f759f-19b7-4880-cbf8-9ed7cb25d627"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "770da5f1b8b24972b2018e1cadd3ec8a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/7.88k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "37c46ab78fc64eb98923b24d6a0de37e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.json:   0%|          | 0.00/777k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "07bf43728198472997c8b59b9343adfe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "merges.txt:   0%|          | 0.00/442k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e227f5f6bb3b4580b0ea4304d34ad556",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/2.06M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "12d4df348cd34dc2b3d7ffc41f0561e8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/958 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mEnglish\u001b[0m \u001b[0;30;48;2;141;160;203m and\u001b[0m \u001b[0;30;48;2;231;138;195m CAPITAL\u001b[0m \u001b[0;30;48;2;166;216;84mIZATION\u001b[0m \u001b[0;30;48;2;255;217;47m\n",
      "\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m�\u001b[0m \u001b[0;30;48;2;231;138;195m \u001b[0m \u001b[0;30;48;2;166;216;84m�\u001b[0m \u001b[0;30;48;2;255;217;47m�\u001b[0m \u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mshow\u001b[0m \u001b[0;30;48;2;141;160;203m_\u001b[0m \u001b[0;30;48;2;231;138;195mtokens\u001b[0m \u001b[0;30;48;2;166;216;84m False\u001b[0m \u001b[0;30;48;2;255;217;47m None\u001b[0m \u001b[0;30;48;2;102;194;165m elif\u001b[0m \u001b[0;30;48;2;252;141;98m ==\u001b[0m \u001b[0;30;48;2;141;160;203m >=\u001b[0m \u001b[0;30;48;2;231;138;195m else\u001b[0m \u001b[0;30;48;2;166;216;84m:\u001b[0m \u001b[0;30;48;2;255;217;47m two\u001b[0m \u001b[0;30;48;2;102;194;165m tabs\u001b[0m \u001b[0;30;48;2;252;141;98m:\"\u001b[0m \u001b[0;30;48;2;141;160;203m   \u001b[0m \u001b[0;30;48;2;231;138;195m \"\u001b[0m \u001b[0;30;48;2;166;216;84m Three\u001b[0m \u001b[0;30;48;2;255;217;47m tabs\u001b[0m \u001b[0;30;48;2;102;194;165m:\u001b[0m \u001b[0;30;48;2;252;141;98m \"\u001b[0m \u001b[0;30;48;2;141;160;203m      \u001b[0m \u001b[0;30;48;2;231;138;195m \"\u001b[0m \u001b[0;30;48;2;166;216;84m\n",
      "\u001b[0m \u001b[0;30;48;2;255;217;47m1\u001b[0m \u001b[0;30;48;2;102;194;165m2\u001b[0m \u001b[0;30;48;2;252;141;98m.\u001b[0m \u001b[0;30;48;2;141;160;203m0\u001b[0m \u001b[0;30;48;2;231;138;195m*\u001b[0m \u001b[0;30;48;2;166;216;84m5\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m=\u001b[0m \u001b[0;30;48;2;252;141;98m6\u001b[0m \u001b[0;30;48;2;141;160;203m0\u001b[0m \u001b[0;30;48;2;231;138;195m0\u001b[0m \u001b[0;30;48;2;166;216;84m\n",
      "\u001b[0m "
     ]
    }
   ],
   "source": [
    "# You need to request access before being able to use this tokenizer\n",
    "show_tokens(text, \"bigcode/starcoder2-15b\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 220,
     "referenced_widgets": [
      "6a6efb1d66ea423a9b5ff4b2f1f1194c",
      "1283ee793a40405aa9763e1b88d6d7a3",
      "11df3b517fd94524be18cff070e273a8",
      "86b01695df4b42d3a6602d704243b6ee",
      "b0b1039168a74b8fa79271592f29f0b7",
      "877ff6d25f524779a10df09be5fc6093",
      "276ec5fb636b49bcb933a4ce96cb900e",
      "ac88f9025a0e44b38fc14720d810b5ab",
      "4ceb8ce8b67a44b4b7505cd7e589dec1",
      "b0d45aec56fd4219b9224dbe31fad3a3",
      "bf3a8980a70547f5b853390235a37592",
      "f11120105af24fe1b40b6490897e2e2e",
      "8109bb974a3e4b12bd7534b62be20940",
      "f1d6a31870da4e27bc482bb84953b165",
      "1128f56169ac4376ab5fbb46d44b01dc",
      "3da38a2294a145bf86124d0fda8b3255",
      "85113d3b53fb47bb8593c3a21e37142c",
      "1e4a17f723d14694b5aeb673db7394cc",
      "87a4011d2d0a4076b027f7068b244dda",
      "48ca9047fc7d424f99b40c54e6d732f4",
      "3b169b44c1814ed5a7feba7bab0f3ce6",
      "dff4ee8d0bd74822a0adf204e21521b8",
      "5d3f3b08ec5044e3acf4414703e579d9",
      "900ddbabea1846a3a0dfd8380668bec1",
      "7da6e29f0349438494ff83975d48f02a",
      "01f1433d221f437eb0692d25948ce080",
      "9f189b7e32c94a3a84ffd40beed7d1fd",
      "3e5f406442df4b848d324ff584eea75c",
      "575b63bdd98047a4934d556423dd9ce6",
      "c9662398b8ad4d4fa1a59d44f6205769",
      "37379e486478437f9fa2f8eac7f9fd60",
      "2b83154cf934484da54fe0d0b08fe3d3",
      "759faaea712c4a8abb0ece5217e1c470"
     ]
    },
    "executionInfo": {
     "elapsed": 1388,
     "status": "ok",
     "timestamp": 1719589605088,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "KeWcUdxY6I3u",
    "outputId": "f39c8f56-1e71-44bb-bade-75bfb33b581c"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6a6efb1d66ea423a9b5ff4b2f1f1194c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/166 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f11120105af24fe1b40b6490897e2e2e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/2.14M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5d3f3b08ec5044e3acf4414703e579d9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/3.00 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98mEnglish\u001b[0m \u001b[0;30;48;2;141;160;203m and\u001b[0m \u001b[0;30;48;2;231;138;195m CAP\u001b[0m \u001b[0;30;48;2;166;216;84mITAL\u001b[0m \u001b[0;30;48;2;255;217;47mIZATION\u001b[0m \u001b[0;30;48;2;102;194;165m\n",
      "\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m�\u001b[0m \u001b[0;30;48;2;231;138;195m�\u001b[0m \u001b[0;30;48;2;166;216;84m�\u001b[0m \u001b[0;30;48;2;255;217;47m �\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m\n",
      "\u001b[0m \u001b[0;30;48;2;231;138;195mshow\u001b[0m \u001b[0;30;48;2;166;216;84m_\u001b[0m \u001b[0;30;48;2;255;217;47mtokens\u001b[0m \u001b[0;30;48;2;102;194;165m False\u001b[0m \u001b[0;30;48;2;252;141;98m None\u001b[0m \u001b[0;30;48;2;141;160;203m elif\u001b[0m \u001b[0;30;48;2;231;138;195m \u001b[0m \u001b[0;30;48;2;166;216;84m==\u001b[0m \u001b[0;30;48;2;255;217;47m \u001b[0m \u001b[0;30;48;2;102;194;165m>\u001b[0m \u001b[0;30;48;2;252;141;98m=\u001b[0m \u001b[0;30;48;2;141;160;203m else\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m two\u001b[0m \u001b[0;30;48;2;255;217;47m t\u001b[0m \u001b[0;30;48;2;102;194;165mabs\u001b[0m \u001b[0;30;48;2;252;141;98m:\u001b[0m \u001b[0;30;48;2;141;160;203m\"\u001b[0m \u001b[0;30;48;2;231;138;195m    \u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m Three\u001b[0m \u001b[0;30;48;2;102;194;165m t\u001b[0m \u001b[0;30;48;2;252;141;98mabs\u001b[0m \u001b[0;30;48;2;141;160;203m:\u001b[0m \u001b[0;30;48;2;231;138;195m \u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m       \u001b[0m \u001b[0;30;48;2;102;194;165m\"\u001b[0m \u001b[0;30;48;2;252;141;98m\n",
      "\u001b[0m \u001b[0;30;48;2;141;160;203m1\u001b[0m \u001b[0;30;48;2;231;138;195m2\u001b[0m \u001b[0;30;48;2;166;216;84m.\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m*\u001b[0m \u001b[0;30;48;2;252;141;98m5\u001b[0m \u001b[0;30;48;2;141;160;203m0\u001b[0m \u001b[0;30;48;2;231;138;195m=\u001b[0m \u001b[0;30;48;2;166;216;84m6\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m0\u001b[0m \u001b[0;30;48;2;252;141;98m\n",
      "\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"facebook/galactica-1.3b\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 374,
     "status": "ok",
     "timestamp": 1719589632350,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "__QNj2Cohzz2",
    "outputId": "17ffab73-b07c-44a9-c482-64ab9f4c45a4"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;30;48;2;102;194;165m<s>\u001b[0m \u001b[0;30;48;2;252;141;98m\u001b[0m \u001b[0;30;48;2;141;160;203m\n",
      "\u001b[0m \u001b[0;30;48;2;231;138;195mEnglish\u001b[0m \u001b[0;30;48;2;166;216;84mand\u001b[0m \u001b[0;30;48;2;255;217;47mC\u001b[0m \u001b[0;30;48;2;102;194;165mAP\u001b[0m \u001b[0;30;48;2;252;141;98mIT\u001b[0m \u001b[0;30;48;2;141;160;203mAL\u001b[0m \u001b[0;30;48;2;231;138;195mIZ\u001b[0m \u001b[0;30;48;2;166;216;84mATION\u001b[0m \u001b[0;30;48;2;255;217;47m\n",
      "\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m�\u001b[0m \u001b[0;30;48;2;231;138;195m�\u001b[0m \u001b[0;30;48;2;166;216;84m\u001b[0m \u001b[0;30;48;2;255;217;47m�\u001b[0m \u001b[0;30;48;2;102;194;165m�\u001b[0m \u001b[0;30;48;2;252;141;98m�\u001b[0m \u001b[0;30;48;2;141;160;203m\n",
      "\u001b[0m \u001b[0;30;48;2;231;138;195mshow\u001b[0m \u001b[0;30;48;2;166;216;84m_\u001b[0m \u001b[0;30;48;2;255;217;47mto\u001b[0m \u001b[0;30;48;2;102;194;165mkens\u001b[0m \u001b[0;30;48;2;252;141;98mFalse\u001b[0m \u001b[0;30;48;2;141;160;203mNone\u001b[0m \u001b[0;30;48;2;231;138;195melif\u001b[0m \u001b[0;30;48;2;166;216;84m==\u001b[0m \u001b[0;30;48;2;255;217;47m>=\u001b[0m \u001b[0;30;48;2;102;194;165melse\u001b[0m \u001b[0;30;48;2;252;141;98m:\u001b[0m \u001b[0;30;48;2;141;160;203mtwo\u001b[0m \u001b[0;30;48;2;231;138;195mtabs\u001b[0m \u001b[0;30;48;2;166;216;84m:\"\u001b[0m \u001b[0;30;48;2;255;217;47m  \u001b[0m \u001b[0;30;48;2;102;194;165m\"\u001b[0m \u001b[0;30;48;2;252;141;98mThree\u001b[0m \u001b[0;30;48;2;141;160;203mtabs\u001b[0m \u001b[0;30;48;2;231;138;195m:\u001b[0m \u001b[0;30;48;2;166;216;84m\"\u001b[0m \u001b[0;30;48;2;255;217;47m     \u001b[0m \u001b[0;30;48;2;102;194;165m\"\u001b[0m \u001b[0;30;48;2;252;141;98m\n",
      "\u001b[0m \u001b[0;30;48;2;141;160;203m1\u001b[0m \u001b[0;30;48;2;231;138;195m2\u001b[0m \u001b[0;30;48;2;166;216;84m.\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m*\u001b[0m \u001b[0;30;48;2;252;141;98m5\u001b[0m \u001b[0;30;48;2;141;160;203m0\u001b[0m \u001b[0;30;48;2;231;138;195m=\u001b[0m \u001b[0;30;48;2;166;216;84m6\u001b[0m \u001b[0;30;48;2;255;217;47m0\u001b[0m \u001b[0;30;48;2;102;194;165m0\u001b[0m \u001b[0;30;48;2;252;141;98m\n",
      "\u001b[0m "
     ]
    }
   ],
   "source": [
    "show_tokens(text, \"microsoft/Phi-3-mini-4k-instruct\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9Tu7OY4HvBEm"
   },
   "source": [
    "# Contextualized Word Embeddings From a Language Model (Like BERT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 265,
     "referenced_widgets": [
      "761c3e6c7f26453bba2f463f39f3ae73",
      "a5c45eeafcf4456bbb0fe1bb43ef2497",
      "1f35082ee7ec425ea801321112d48db1",
      "618c1c5ae3ea4650a412a3e009d9fb49",
      "a158991ec46e4587aac2111e02153f4c",
      "51af2c28e34245ed83f02b424a6a640a",
      "56a95848a2814195b334c31f0a961cd8",
      "c1e09c1869f7410ba328e6efe56a0460",
      "17777d8eae4742aaa78d7854a52e102f",
      "5916af8bb7ed4efcb05c8f1cdf826149",
      "b1778299eac04302ac136872dfb0a359",
      "8ee98d9d017542609881a1e16a8f393f",
      "c4a8f08da3f64fb287f7a940c4a9408f",
      "d66ee44b7f5b4c6eac1536235a627441",
      "c44feccdd01949c1af40fa69f6e03dd0",
      "84682eb9cfff444da7633f4ca9360f77",
      "332810f458df4e23bc034852706bcc6f",
      "4999e7d8e2384bf4adfbf1777587a65f",
      "a8d4f19ff4554165a5e78d3783928fe1",
      "29d9e7a799ea402fb5a28b2817156838",
      "222d489bf1664763babf0e377e45f4d8",
      "2ae57535c39541fe98d6a8ae22bcd7d4",
      "fd134a05028c447a994166eccc557806",
      "69101d935ae841e59aa7f30e40789496",
      "3ef63f93e424409192edd1b1364aba48",
      "1d204aedfeb14df5ae7e27eb88a87018",
      "e74b57abfaa2487aa8e369103be5d00d",
      "94609e349c8b43b5b74d2c059623f9f0",
      "febce6a7c96e4a42a9c6faa0bf1763c0",
      "18a4603fa6a040c0acd792243510562a",
      "c04ce575bf624bd1a80113d1eff1ae94",
      "6a9de4aaed054608b800820831aec87f",
      "46cd179c1c09474a80dc4cea39b759d5",
      "20331d1d457143719fe732325e79877e",
      "2a5055c8fc03457eb29390312c555ec5",
      "6de0874e33c146bba06131aa452c403f",
      "0f1d2e4c312d4ab38e359f96c9a760b7",
      "f946c53a81f34d64b25e331fc4b4c7a1",
      "2dc553ef192c4002b818de6736367fe0",
      "665b4085199a4ec5890dba773f07d4b7",
      "de74a5af1ef9462e82e5622232346a79",
      "34df4fe808174f80a89a765f6ce2f28f",
      "56c082332310434fa5ec791b728fe82e",
      "9ef54c7a15d1400f91598b367ac6552e",
      "ed19532a8eda4925a4014a3f11517ff9",
      "3859f0311ade4d278190924f0107ba0e",
      "6ea963b64fd642b29021110d77827021",
      "e3efc5b43bf2417faaea2e44a306fa75",
      "4975826de21547fca434e1fb492a216a",
      "19b547738c8e45bda7879dc527229019",
      "6469ef3a7ba6465c911a22060d54ff95",
      "657b6ce5f0804bc78d64f9b7b6a27777",
      "301363b755a24bc9bf413fc3b0ffd8b2",
      "50fc50f975294a23a1d5bccf64efc872",
      "3c722f92f6c2479a91cf957d1d18fbee",
      "7570614d79184ab2b44700df2342b294",
      "543ae93b2c5f4339979b8c64eff33f62",
      "72b54fe7c88540488adee31c15595c89",
      "3fbdaadec9f545b99057f7c55c6a6df1",
      "7656d1b978554f4383160b93a80ee7c6",
      "4d55d194e62942df8f1c32b9bb244e9b",
      "403f03c3a2434fd2ac6f143c1972c62e",
      "cc68d46e46e7487484d366f77ea863ca",
      "3dee319ce0aa4ce589ce84033bba8d9d",
      "a280161408504bb894ab78526f67750b",
      "72d18d21ca2a45ac868d390ead3ac086"
     ]
    },
    "executionInfo": {
     "elapsed": 5049,
     "status": "ok",
     "timestamp": 1719641476949,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "nsjz-VsYu9bB",
    "outputId": "03ea124b-c6de-449d-ea6f-f5e5b84c2c97"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "761c3e6c7f26453bba2f463f39f3ae73",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8ee98d9d017542609881a1e16a8f393f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/474 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fd134a05028c447a994166eccc557806",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "20331d1d457143719fe732325e79877e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ed19532a8eda4925a4014a3f11517ff9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/578 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7570614d79184ab2b44700df2342b294",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "pytorch_model.bin:   0%|          | 0.00/241M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from transformers import AutoModel, AutoTokenizer\n",
    "\n",
    "# Load a tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/deberta-base\")\n",
    "\n",
    "# Load a language model\n",
    "model = AutoModel.from_pretrained(\"microsoft/deberta-v3-xsmall\")\n",
    "\n",
    "# Tokenize the sentence\n",
    "tokens = tokenizer('Hello world', return_tensors='pt')\n",
    "\n",
    "# Process the tokens\n",
    "output = model(**tokens)[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 567,
     "status": "ok",
     "timestamp": 1719641482036,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "lQly_KcbvDce",
    "outputId": "fe2cc467-2a5a-4111-8d23-4da9aa799b79"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([1, 4, 384])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1719641482353,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "8GcRrpPV0kVj",
    "outputId": "93766ff1-1ae5-4e90-dba0-286d9e721c3d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[CLS]\n",
      "Hello\n",
      " world\n",
      "[SEP]\n"
     ]
    }
   ],
   "source": [
    "for token in tokens['input_ids'][0]:\n",
    "    print(tokenizer.decode(token))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1,
     "status": "ok",
     "timestamp": 1719641482353,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "e8oHVC7B0lkk",
    "outputId": "f7dd1e0c-a2db-4ae4-8ccb-c97fa150071a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[-3.4816,  0.0861, -0.1819,  ..., -0.0612, -0.3911,  0.3017],\n",
       "         [ 0.1898,  0.3208, -0.2315,  ...,  0.3714,  0.2478,  0.8048],\n",
       "         [ 0.2071,  0.5036, -0.0485,  ...,  1.2175, -0.2292,  0.8582],\n",
       "         [-3.4278,  0.0645, -0.1427,  ...,  0.0658, -0.4367,  0.3834]]],\n",
       "       grad_fn=<NativeLayerNormBackward0>)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DdEDuLWa0r4L"
   },
   "source": [
    "# Text Embeddings (For Sentences and Whole Documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 425,
     "referenced_widgets": [
      "a50156f59d8548b683982af06d5bda09",
      "7cbfb80418f14068bb4327a3823140fb",
      "ef1e2b9a1e694eaa8d5b371caa66277e",
      "ebf8ef569e374c17aa15472ee6ea98d8",
      "75f5174b66c14f5480960c79a44283ad",
      "05504faf760d43ea9082ff5a91ff82f7",
      "50c6cf2be63b4be29524188d79e8cd53",
      "9088f66d46f44030af46aee55005a939",
      "7af424dbfd864b8ba6f7661f2204302c",
      "44aae8dd70da4be98fd86a970373753b",
      "49dd6490203640d1821196fe28a08732",
      "c40e76c5e9bd42bea7903e811bce53a4",
      "8f7d3eea82614e4ca93482ad7cb637a6",
      "05eb94a4253544648654f24268e8b6da",
      "01b9535a72c64c95b482640b2bd3c5fa",
      "b954ab8edd66487cb9a5502879f0e1c5",
      "e6e103b2c53a4b5ca71f988b7804aeaf",
      "77f25a7ee7f843b0a24a1d360a1df211",
      "a0e30058096941dcaab8f24235ee1c49",
      "b26ac1747c884e3a913d90b8c05a991b",
      "e5db8326887a48ae97341cf216d35893",
      "c76f0b056c1f40f3ac37df24c449e9ec",
      "c4c7962b94674cd6898980cea6483595",
      "7fc0e3e3c75e47f88410a2774695081e",
      "6c96c43dd36a4b08ad94a9cad642b1ab",
      "73b403ec39fa4a16883ad08a50c7204b",
      "6feee2b015ef4fa6b94c905ec91f5146",
      "ab68f3b41d954ec992b9339a91def6a8",
      "fb9054b0d27c4a8a9b8241f9d5910e51",
      "78b6d1098f754eb491a2a729e00d2335",
      "e84f055fc0c04c4983b162d1d8c67147",
      "13455551484542ea93d4dbbd937288d5",
      "99db2aef29ba4717866072f20f1acf61",
      "10f9441d843c44e5b11cfb2e21b5d89e",
      "5c987bfb44d14b1ea822c99cb7dde071",
      "52ac1a973f7140e8b49dbf58dc0c8b21",
      "37117814cae9440c9c54f63def546c4b",
      "e019010a35ea4f3a9b9236a626b34760",
      "ba03fe6450b742c99e8b8836f585232c",
      "47e9e2ed70c84023ae3a1dbf0cb27328",
      "99e53690e1c940cea12f581b625c3b3d",
      "f53653a9050042649df9d91114ba39bb",
      "737c25a604ee4bcd982487f79450d3ea",
      "cea92b3b96f24087b494487bf3f4c0f9",
      "784748de51254ba18128af8df30b8a93",
      "11890f81eceb41f7be6a2d52c9a9e55e",
      "f7adb025a07a4aca8b7a3a174304666f",
      "7f7a5bfc6073495da65dcfd4b2d49309",
      "fa6483d07ae54fb2904fc117aa9a3d5b",
      "2cbcf7d0b1384b8cb320e1b30c124d71",
      "94ea21e19acf4eb7a9010510b226db86",
      "f399e1a91e7d4b7eaddfb910bd81d750",
      "f26297a84f224c4da8afd4316c8e7477",
      "e4bef8778ddc46e5a8d0756eb27c3e7c",
      "66943967c327428a9796fcd38c36b24d",
      "f0793746dff34da8858758fb55284b97",
      "1bc7b31eddc54588979f3fff14a0e12e",
      "75f54f7a8e5b4965b6f0ad28e5f3bf26",
      "3cb9178f0568448fa839d5cccc7973d7",
      "40bde45ac20f48ab93c4fd9e8284eac8",
      "09b701e83e3844fab97ff237b06a1238",
      "c1bbe572b8324ea48d42c40a5128bb8e",
      "a4bd81ed4d9d498a983c13ea79265819",
      "d8d65b5ac8914792b82460cf0bae980d",
      "136cd465bac246f2ac2454eea2f0484d",
      "59030392bbde468aae6c62aecddd499e",
      "ca7eb54b296c4a1fa91678c0e3d65f5d",
      "35521a33c1324a928fd2c9f7fce2ce69",
      "80c82a58f0924a578bcae9d3c6537c11",
      "589d24a92b3d4e5c99460cd609c8a230",
      "0edd8bfe5bba47d59c5b195841cc4228",
      "2aed08160bef4a7189821c560c01d6bb",
      "01ca9f66804048a9a75475ab9c49a24e",
      "880c7bdcf3174a78892aa7d0cd11dca7",
      "1d4f85ce80d841eab27b411b3e61e9be",
      "9960dd3bff70458abc148f1a153175ec",
      "0bdc012004ad445fa527fafee0ab55c2",
      "2895fad80e754b4e8158c6dd8db69058",
      "4a2c467901414bf0afc5b310aa959dae",
      "a465317d0f3b4106bbf8fd6c7a3caf6a",
      "7c93b5df16a64e1981e350459b05852d",
      "66d5c0087bd141b3baa502e3aa8bd408",
      "d26ec2f29c154da1ac7cb49cb9729113",
      "7385eea9ed2a438c8dae350fe2328162",
      "c861634df7524e7cbd7fdca030a0b663",
      "9904b80c37c44ed2a6b3c21786016e26",
      "df424fcea3e84c8084dbf8e146d1231f",
      "a1e116cf62d74d4e8e33c99379e924ed",
      "652ddbc085994a36b553ea04359943a1",
      "b340ba4de77043dcbedbcbdf6033d0c7",
      "10c89678b42b4cf0b8e95b74ad346fba",
      "fb22220311094fe2b3d245ff080ca4d7",
      "fc1358383bba4e5ebe2edfca57473002",
      "342adaf9c12548a4af25f5361ca869ca",
      "374287b3d21a427fabd82dc1e0710d62",
      "427e31cb6fcc4687937d807116e5e581",
      "8195fa9ad4e3487c90cbc0860361a336",
      "82bf559e6997425aba4245c44531f762",
      "6bc8887963d744a7a6c15930844f5513",
      "4688efe8ab954510b30df18f8daa74a5",
      "27c697f872b64fb5a43deb255e27fb50",
      "08f2dd1dd0a742eb8c9e11a41dbe69a3",
      "94b1fee85a034b49aba0c50fcdfc0fdb",
      "c21e510111f5425bad28afd9b723f9d1",
      "fea69d561fb94e8c98af4526f8d4b33e",
      "0cec3fa3672b4bbaa20e3d4aae6fd575",
      "2b09f8d112ea43a5b60c010f2bf0bbdb",
      "60ea33df89a449ad9e0f90a0bca672ad",
      "48230299285241a28e2c8e03db6fce4d",
      "3229d1aa3fe74299a1d4e3f917cc2ca6",
      "1f5849878621437397efe2de7b7a43fe",
      "2a554904b31c4ed8a4269d2c26ea4e91",
      "a29c289ad83d4c718d5854e5d3eff48a",
      "07584d1f220f4fb6b82a42090f3818f7",
      "04e248de88ac4a50ad20272d31549304",
      "57bb0b0d061d4e778493adb47482f234",
      "2a24003cc0d54e568706cc6fc77d2831",
      "849583682f034d3d8b8887cbafb3daaf",
      "8ab0464a810e4b208d4c0fc481c58b54",
      "cab50f86e61d43df90643acaf98670ab",
      "d4180478ac134757bcfe6c4f0ff4990e"
     ]
    },
    "executionInfo": {
     "elapsed": 7006,
     "status": "ok",
     "timestamp": 1719641491724,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "TQHWioIc0pQ8",
    "outputId": "87112ec7-bee0-4894-d850-8dd5e0f4e38c"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a50156f59d8548b683982af06d5bda09",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c40e76c5e9bd42bea7903e811bce53a4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c4c7962b94674cd6898980cea6483595",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "10f9441d843c44e5b11cfb2e21b5d89e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "784748de51254ba18128af8df30b8a93",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f0793746dff34da8858758fb55284b97",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ca7eb54b296c4a1fa91678c0e3d65f5d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2895fad80e754b4e8158c6dd8db69058",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "652ddbc085994a36b553ea04359943a1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4688efe8ab954510b30df18f8daa74a5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1f5849878621437397efe2de7b7a43fe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sentence_transformers import SentenceTransformer\n",
    "\n",
    "# Load model\n",
    "model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')\n",
    "\n",
    "# Convert text to text embeddings\n",
    "vector = model.encode(\"Best movie ever!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1719641491724,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "PDwfmBiC0uER",
    "outputId": "db6755ce-92b2-45d1-85aa-9b53baee446e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(768,)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xnuGRjo80yKj"
   },
   "source": [
    "# Word Embeddings Beyond LLMs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 44634,
     "status": "ok",
     "timestamp": 1719641543423,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "sKgNdnwe0vfK",
    "outputId": "180bbb09-b030-4fa0-9198-085b0eb54c7b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[==================================================] 100.0% 66.0/66.0MB downloaded\n"
     ]
    }
   ],
   "source": [
    "import gensim.downloader as api\n",
    "\n",
    "# Download embeddings (66MB, glove, trained on wikipedia, vector size: 50)\n",
    "# Other options include \"word2vec-google-news-300\"\n",
    "# More options at https://github.com/RaRe-Technologies/gensim-data\n",
    "model = api.load(\"glove-wiki-gigaword-50\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1719641543423,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "u_vj5NVn01aD",
    "outputId": "73c3edd8-0185-494d-a842-d78cbe100642"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('king', 1.0000001192092896),\n",
       " ('prince', 0.8236179351806641),\n",
       " ('queen', 0.7839043140411377),\n",
       " ('ii', 0.7746230363845825),\n",
       " ('emperor', 0.7736247777938843),\n",
       " ('son', 0.766719400882721),\n",
       " ('uncle', 0.7627150416374207),\n",
       " ('kingdom', 0.7542161345481873),\n",
       " ('throne', 0.7539914846420288),\n",
       " ('brother', 0.7492411136627197),\n",
       " ('ruler', 0.7434253692626953)]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.most_similar([model['king']], topn=11)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QMSgyKKS4xUx"
   },
   "source": [
    "# Recommending songs by embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3dJdWzT67nDL"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from urllib import request\n",
    "\n",
    "# Get the playlist dataset file\n",
    "data = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/train.txt')\n",
    "\n",
    "# Parse the playlist dataset file. Skip the first two lines as\n",
    "# they only contain metadata\n",
    "lines = data.read().decode(\"utf-8\").split('\\n')[2:]\n",
    "\n",
    "# Remove playlists with only one song\n",
    "playlists = [s.rstrip().split() for s in lines if len(s.split()) > 1]\n",
    "\n",
    "# Load song metadata\n",
    "songs_file = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/song_hash.txt')\n",
    "songs_file = songs_file.read().decode(\"utf-8\").split('\\n')\n",
    "songs = [s.rstrip().split('\\t') for s in songs_file]\n",
    "songs_df = pd.DataFrame(data=songs, columns = ['id', 'title', 'artist'])\n",
    "songs_df = songs_df.set_index('id')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3,
     "status": "ok",
     "timestamp": 1724598630488,
     "user": {
      "displayName": "Jay Alammar جهاد العمار",
      "userId": "14617748739431919458"
     },
     "user_tz": 240
    },
    "id": "Q3zirG-lo3H8",
    "outputId": "e3b4269e-dd42-428e-8b28-46c27d0231af"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Playlist #1:\n",
      "  ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '2', '42', '43', '44', '45', '46', '47', '48', '20', '49', '8', '50', '51', '52', '53', '54', '55', '56', '57', '25', '58', '59', '60', '61', '62', '3', '63', '64', '65', '66', '46', '47', '67', '2', '48', '68', '69', '70', '57', '50', '71', '72', '53', '73', '25', '74', '59', '20', '46', '75', '76', '77', '59', '20', '43'] \n",
      "\n",
      "Playlist #2:\n",
      "  ['78', '79', '80', '3', '62', '81', '14', '82', '48', '83', '84', '17', '85', '86', '87', '88', '74', '89', '90', '91', '4', '73', '62', '92', '17', '53', '59', '93', '94', '51', '50', '27', '95', '48', '96', '97', '98', '99', '100', '57', '101', '102', '25', '103', '3', '104', '105', '106', '107', '47', '108', '109', '110', '111', '112', '113', '25', '63', '62', '114', '115', '84', '116', '117', '118', '119', '120', '121', '122', '123', '50', '70', '71', '124', '17', '85', '14', '82', '48', '125', '47', '46', '72', '53', '25', '73', '4', '126', '59', '74', '20', '43', '127', '128', '129', '13', '82', '48', '130', '131', '132', '133', '134', '135', '136', '137', '59', '46', '138', '43', '20', '139', '140', '73', '57', '70', '141', '3', '1', '74', '142', '143', '144', '145', '48', '13', '25', '146', '50', '147', '126', '59', '20', '148', '149', '150', '151', '152', '56', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '164', '165', '166', '167', '168', '169', '170', '171', '172', '173', '174', '175', '60', '176', '51', '177', '178', '179', '180', '181', '182', '183', '184', '185', '57', '186', '187', '188', '189', '190', '191', '46', '192', '193', '194', '195', '196', '197', '198', '25', '199', '200', '49', '201', '100', '202', '203', '204', '205', '206', '207', '32', '208', '209', '210']\n"
     ]
    }
   ],
   "source": [
    "print( 'Playlist #1:\\n ', playlists[0], '\\n')\n",
    "print( 'Playlist #2:\\n ', playlists[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "EaUz3E0P7sJs"
   },
   "outputs": [],
   "source": [
    "from gensim.models import Word2Vec\n",
    "\n",
    "# Train our Word2Vec model\n",
    "model = Word2Vec(\n",
    "    playlists, vector_size=32, window=20, negative=50, min_count=1, workers=4\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 314,
     "status": "ok",
     "timestamp": 1719642095066,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "9EFGWesO8rOJ",
    "outputId": "1e46ce56-7b14-4268-a38a-c328e0f52943"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('2849', 0.9979680776596069),\n",
       " ('2640', 0.9964019060134888),\n",
       " ('3167', 0.9963980317115784),\n",
       " ('5549', 0.9959008693695068),\n",
       " ('2715', 0.9958351850509644),\n",
       " ('3117', 0.9954560995101929),\n",
       " ('2987', 0.9953479766845703),\n",
       " ('2881', 0.9951083660125732),\n",
       " ('2886', 0.9950577616691589),\n",
       " ('3094', 0.994985044002533)]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "song_id = 2172\n",
    "\n",
    "# Ask the model for songs similar to song #2172\n",
    "model.wv.most_similar(positive=str(song_id))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 321,
     "status": "ok",
     "timestamp": 1719642762615,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "AMiY6isXqKk4",
    "outputId": "0f465f20-ada8-4fa8-92d6-f72966d03aa4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "title     Fade To Black\n",
      "artist        Metallica\n",
      "Name: 2172 , dtype: object\n"
     ]
    }
   ],
   "source": [
    "print(songs_df.iloc[2172])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 237
    },
    "executionInfo": {
     "elapsed": 556,
     "status": "ok",
     "timestamp": 1719642918281,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "aOzWENxr2Fl3",
    "outputId": "0b1ac29a-14f7-4e30-e153-e8f35ca97d7e"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"print_recommendations(2172)\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"2640 \",\n          \"2715 \",\n          \"3167 \"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"title\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Red Barchetta\",\n          \"Rainbow In The Dark\",\n          \"Unchained\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"artist\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Rush\",\n          \"Dio\",\n          \"Van Halen\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-94b64d84-06f0-49f5-a721-a51ab661e5c4\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2849</th>\n",
       "      <td>Run To The Hills</td>\n",
       "      <td>Iron Maiden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2640</th>\n",
       "      <td>Red Barchetta</td>\n",
       "      <td>Rush</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3167</th>\n",
       "      <td>Unchained</td>\n",
       "      <td>Van Halen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5549</th>\n",
       "      <td>November Rain</td>\n",
       "      <td>Guns N' Roses</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2715</th>\n",
       "      <td>Rainbow In The Dark</td>\n",
       "      <td>Dio</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-94b64d84-06f0-49f5-a721-a51ab661e5c4')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-94b64d84-06f0-49f5-a721-a51ab661e5c4 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-94b64d84-06f0-49f5-a721-a51ab661e5c4');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-66b2f5cc-45c9-44ee-b044-69f0e59b123a\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-66b2f5cc-45c9-44ee-b044-69f0e59b123a')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-66b2f5cc-45c9-44ee-b044-69f0e59b123a button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "                     title         artist\n",
       "id                                       \n",
       "2849      Run To The Hills    Iron Maiden\n",
       "2640         Red Barchetta           Rush\n",
       "3167             Unchained      Van Halen\n",
       "5549         November Rain  Guns N' Roses\n",
       "2715   Rainbow In The Dark            Dio"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def print_recommendations(song_id):\n",
    "    similar_songs = np.array(\n",
    "        model.wv.most_similar(positive=str(song_id),topn=5)\n",
    "    )[:,0]\n",
    "    return  songs_df.iloc[similar_songs]\n",
    "\n",
    "# Extract recommendations\n",
    "print_recommendations(2172)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 310
    },
    "executionInfo": {
     "elapsed": 681,
     "status": "ok",
     "timestamp": 1719642181255,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "xqrzQQ-m1EJ5",
    "outputId": "3cf4967d-f510-4772-cb11-4166d16c6956"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "title     Fade To Black\n",
      "artist        Metallica\n",
      "Name: 2172 , dtype: object\n",
      "['2849' '2640' '3167' '5549' '2715']\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"print_recommendations(2172)\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"2640 \",\n          \"2715 \",\n          \"3167 \"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"title\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Red Barchetta\",\n          \"Rainbow In The Dark\",\n          \"Unchained\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"artist\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Rush\",\n          \"Dio\",\n          \"Van Halen\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-c38e0eb4-9c39-45f5-aa32-dbd65ad89576\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2849</th>\n",
       "      <td>Run To The Hills</td>\n",
       "      <td>Iron Maiden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2640</th>\n",
       "      <td>Red Barchetta</td>\n",
       "      <td>Rush</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3167</th>\n",
       "      <td>Unchained</td>\n",
       "      <td>Van Halen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5549</th>\n",
       "      <td>November Rain</td>\n",
       "      <td>Guns N' Roses</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2715</th>\n",
       "      <td>Rainbow In The Dark</td>\n",
       "      <td>Dio</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c38e0eb4-9c39-45f5-aa32-dbd65ad89576')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-c38e0eb4-9c39-45f5-aa32-dbd65ad89576 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-c38e0eb4-9c39-45f5-aa32-dbd65ad89576');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-dbb90c85-6dc6-4ec9-a4c5-ebcbdb0a5897\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-dbb90c85-6dc6-4ec9-a4c5-ebcbdb0a5897')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-dbb90c85-6dc6-4ec9-a4c5-ebcbdb0a5897 button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "                     title         artist\n",
       "id                                       \n",
       "2849      Run To The Hills    Iron Maiden\n",
       "2640         Red Barchetta           Rush\n",
       "3167             Unchained      Van Halen\n",
       "5549         November Rain  Guns N' Roses\n",
       "2715   Rainbow In The Dark            Dio"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print_recommendations(2172)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 310
    },
    "executionInfo": {
     "elapsed": 316,
     "status": "ok",
     "timestamp": 1719642205517,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "TIHiN62g1NMi",
    "outputId": "c548f528-6e2e-4a46-89e0-6599395d6419"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "title     California Love (w\\/ Dr. Dre & Roger Troutman)\n",
      "artist                                              2Pac\n",
      "Name: 842 , dtype: object\n",
      "['5668' '413' '5661' '330' '886']\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"print_recommendations(842)\",\n  \"rows\": 5,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"413 \",\n          \"886 \",\n          \"5661 \"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"title\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"If I Ruled The World (Imagine That) (w\\\\/ Lauryn Hill)\",\n          \"Heartless\",\n          \"Sweet Dreams\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"artist\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          \"Nas\",\n          \"Kanye West\",\n          \"The Game\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-1afa899d-2db1-434a-a095-9b7ade3d2589\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5668</th>\n",
       "      <td>How We Do (w\\/ 50 Cent)</td>\n",
       "      <td>The Game</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>413</th>\n",
       "      <td>If I Ruled The World (Imagine That) (w\\/ Laury...</td>\n",
       "      <td>Nas</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5661</th>\n",
       "      <td>Sweet Dreams</td>\n",
       "      <td>Beyonce</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>330</th>\n",
       "      <td>Hate It Or Love It (w\\/ 50 Cent)</td>\n",
       "      <td>The Game</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>886</th>\n",
       "      <td>Heartless</td>\n",
       "      <td>Kanye West</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1afa899d-2db1-434a-a095-9b7ade3d2589')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-1afa899d-2db1-434a-a095-9b7ade3d2589 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-1afa899d-2db1-434a-a095-9b7ade3d2589');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-a8ceaf3a-b291-4c01-adfc-895ceccda974\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a8ceaf3a-b291-4c01-adfc-895ceccda974')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-a8ceaf3a-b291-4c01-adfc-895ceccda974 button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "                                                   title      artist\n",
       "id                                                                  \n",
       "5668                             How We Do (w\\/ 50 Cent)    The Game\n",
       "413    If I Ruled The World (Imagine That) (w\\/ Laury...         Nas\n",
       "5661                                        Sweet Dreams     Beyonce\n",
       "330                     Hate It Or Love It (w\\/ 50 Cent)    The Game\n",
       "886                                            Heartless  Kanye West"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print_recommendations(842)"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: chapter03/Chapter 3 - Looking Inside LLMs.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "adFzzFsB-Ofl"
   },
   "source": [
    "<h1>Chapter 3 - Looking Inside Transformer LLMs</h1>\n",
    "<i>An extensive look into the transformer architecture of generative LLMs</i>\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
    "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
    "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter03/Chapter%203%20-%20Looking%20Inside%20LLMs.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "This notebook is for Chapter 3 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
    "\n",
    "---\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
    "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
    "\n",
    "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
    "\n",
    "---\n",
    "\n",
    "💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
    "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install transformers>=4.41.2 accelerate>=0.31.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "W_23Z_do-faF"
   },
   "source": [
    "# Loading the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 759,
     "referenced_widgets": [
      "5ecafba0f8f04685a56d2b1495baea24",
      "53945d03a26044878ddb7fc6eadfd8db",
      "99db408011ea43c79adbd4a880839484",
      "fd181746067945febc55f94e2dcf6f67",
      "c5acd3e22bbe4a1897f7a12051e8eae9",
      "446d084fde5f422a8c0525e8c5b47f93",
      "6a0dabe02c874ecabd257c5da5f4a7c5",
      "1ee8996717864f79af6cf0314cd27c59",
      "7c719ce3694845b384ad4ae7207d31cc",
      "989fc06dfb25420eaf155bfc0174c692",
      "443850bc37d94a90aa037e73a77f9369",
      "184bbd6daf424ddeba7f8cbe0b5b34d9",
      "a6ee07658a234a54b38d849ff2017d6c",
      "bf63e0a23fa147778dd88b48075611e3",
      "acf8b6868c484885871964cc538359c6",
      "32cc3b668de84e8a9f8cde25d3846822",
      "b1115e5097c4404694e5a7c228e25234",
      "6ca20198bc144510a14b795aaf813940",
      "982a35743f51448cba583e54e7d3987d",
      "17250e2791ba42099c50efa594e229ce",
      "a4c438b7029f47d9a54d5a0ce16541c0",
      "fafd3d7714d4466ea00f354b20dc954a",
      "0a18c83d2645496797d74aef5e84dafa",
      "bfbbea9912104cb1803c7c827c1cea7d",
      "18c23f29824e4cb58e461bf89811a32a",
      "74105769e6a541098769921b84d1f1cd",
      "52109f76852e4388b6df4e5441c43a48",
      "4f39386aea7c4ac9abf2d66291cbcb4e",
      "4893546c2d6f4eec9cb5b7f6853013f0",
      "88616761593a4b2b9365294427a3136d",
      "d3ab3ad192f54d5b9f9e08db62884f0c",
      "6be54d3b31864134bc36d3b9d997530a",
      "27a1df02460948e19cbd40b69a89bec6",
      "6b81bad9c639454980de2a67b414b988",
      "1e8224a73a724058b28a13db2c2197c2",
      "3cbbd1ae4e4d4bf1a34c9d960ee7d26d",
      "e9c0309e4af44726b180c977692e2469",
      "97421a19ce25438e846a84b45321f9d0",
      "c6aa0ad56ce44b1eab1356d3bc56706b",
      "87d0758fdaaa4d90b21b1291f4d20039",
      "8c85660c9e9f4027bd64d596160a7d7f",
      "fce8b8595211452f8a759a6e98410f6a",
      "4249df3055d4427493e2a2e775dc1a93",
      "e42e39002e66410b9d714656a632adea",
      "3cf79ac9541c4b7bbf686b604ff73b81",
      "b709537463c3485dbbcef93e3636af2d",
      "cf7c24d7629b4362ae0905f8e4bbd997",
      "d9f9f98cb9dd465daed1da06d7d4084b",
      "cb20a4ccd9ea4548bc7f694068bb30f6",
      "b91d534a7a5c4863a9e858cedbda9fd5",
      "a65601c4918c4c2cb2020837cd1e1f85",
      "99505d538b894a359bc3791cd95423d5",
      "1a48245eddfc4923a5f267cd799ba9c7",
      "6c50ddb227b4421da5cd391e4d6ec94c",
      "7d62b5c9c90644378b0ca96cee430419",
      "9ec56d025f8446a08184b055fe11598e",
      "54894d44acce4fe6b0ef749d5c02e3cd",
      "e2dd956536a0407fb9b9c4a01c01ba9c",
      "54370883565644c5a2529e092db7f259",
      "7ec24f1bff5f4ca78d7c36a637cfa294",
      "36ff492b5c7c4ff3bcf72cc574d03a38",
      "8279cee867884166bb09df4e02635e2a",
      "a2644dee82b14cfbab069a484f3841e2",
      "e761a9afe1b847579fb51eb0eddd4488",
      "c59d9e04e5964a8891cc3ddce33d6f86",
      "a9007e7552ab4634ac44577279b242ac",
      "2cff15834149418c81eee5239a5e275a",
      "6623f077d95f4faf883fa2ca4397169d",
      "98529c941229460da8563b94d3419c37",
      "3bef1bf002c94340a3323592d616e7ac",
      "47c8fd6b7c9844d983014e07f1999cd2",
      "aa24ef9c6e1d49708b7bb4755a9adb78",
      "28261270cd0449f8ac85dc4b0efdac57",
      "a231962de8584f69a6a107c275d1cda5",
      "2d35dfe75a744987ab210f5fb0118301",
      "1cbaa3bcf20b4af099f6d4310dd071dd",
      "4f7088db853e47e6b6a6ccd654e379dc",
      "7e0710dc5b5c4002b4bcc189bf5514cf",
      "66dd13ca8234409eab15ecbf7a009914",
      "8f021e2bfb7246248baa49b08f4d3358",
      "0a91788829fc49ef95a69efd3256a8e3",
      "b42a56f9b2e44b6992413e023ed44b0e",
      "f9fc98e3d8ed4338bd763d152f8cc5f9",
      "f786e0117562476697092ef828ceb1b2",
      "07fed042c0894ca5aebe717eca6f3018",
      "3bea08fef3ef456e9f180d8a23ded5bc",
      "35f3727f37c44640986d8416141f8069",
      "0dd1ce8a2306431c9b0412e5992f4f84",
      "dcea31213c9f421abc7ffabc3499ecb9",
      "61643fa2fcb54ebaad8da5106dec9ea0",
      "0c4f8c58d213493494120c070d86ac76",
      "e15e9978db794891b9fa0d8ce096c983",
      "b15098d69b5f418da3f81aba8fb79de0",
      "f7cf40042a2e4c8cb8b87444893d8ec1",
      "6dcbc6ec46d44a55a1b38267201926b8",
      "6de4dd6e36b444638b2f65e0ca80bc9a",
      "60febb04a12447c192e1b8eb2aa5ba28",
      "6d200e6b1bdc4e918093670df8e37dc8",
      "934e950a3c0c48f08802d551ee1bd429",
      "34058533e3cf46a88db8927372102b9f",
      "2a7ba0f87814436386e66f4ef7f1111d",
      "d3e41286c5b747a8bb5cf326f9f80ad3",
      "fe92dc2d5e8c449bbb36dafbd6c9935f",
      "615d4ec0fb194688a392b71b99cf3621",
      "e2056e3dba884242807a98b9b3837843",
      "6ca50fb814b64fccaf0e1c6c11d8f4d8",
      "9ec1d173921748b2af19a3a21df9ed40",
      "f8c3d566e98d47239ef2b823544b75a5",
      "3618dcec35a740c485ecafa5589e0c91",
      "f66dd730f3364e35974a3918d12ff51d",
      "dcb9240335394bfa8d3949ef1cdbcdf8",
      "2df88b2fd5e242eca8b1d3f6cea1349b",
      "1b103c69baf74fceb551ebcb5a0ac5e8",
      "6847b7b6b3854d6e9b72f40040f84c8f",
      "ac3e67f03f604883ab4787930cd316f3",
      "6d7e012a7fbf4d788054ead5020e9314",
      "a432e2e32c7c4e56b138ebaefad76c93",
      "352fe4ea215240149d73478e34cd9b66",
      "bf9bc31d7f99477bb391f69df41b8dbe",
      "41b8e463309e4013a50268677f44d4b5",
      "989326b74cd146e3b5b2f2d5f19bdf41",
      "36b7269d8eb849b084694fd1f3b177b9",
      "2d09f9ed15fa452ba8d9ce9aba9f61ac",
      "53818082d62749a28552a1eebd304d88",
      "766f67cb6a8a4a58965edb671ce624e8",
      "a86d22308dbf4c11bf3d6f6515aef561",
      "e9e7d944715b402ab149f86862b92259",
      "4aaa732bb1b94b4895ca3f00f93cd762",
      "9fb02b3bbe79434a93f32291c208aaad",
      "dd9b3a5e84ba44cb9717ded470c258b4",
      "fa89524d446b480aa50d203d01ec7bb7",
      "c8b77256d5fc436fbfdcc150843a6b5b",
      "f5f5b592768048169676e09cca453645",
      "77f40b8bf30c437ba987b71178d0e9f6",
      "3f39ec300bd84852a2388dadaafd8c4b",
      "250cac43e6da47dd8d732ea57d8c50ec",
      "f929d12aad68458b98c21e0669da3d8e",
      "44ddbcadcc4c477c80daf278122de46d",
      "eb606db4125e4eb097d5b7d3cdb90976",
      "b6c177de60b54edd887d1ca983ea7546",
      "6db56d7c52244a3984a0638e060a81cb",
      "11d17ae63dd44ecc8813d482ee17dd95",
      "ff3733c6a1f34580b037e296e3abed6b",
      "2d31d51641e945f695f7315b68e0ad2e",
      "1b694930328e46bd9e0d61063b9141d4",
      "b544b6f2c2bb4f36bf9a983005a8bdb8",
      "afb8b0e602b649fcb92634c8aab4caf7",
      "24fc703c916f43aab5900288b8aa5aca",
      "16cffa93ab234718a8ede1044596e8b2",
      "2d7999217424413d988cd29d41ed5ace",
      "0d1546581c90418fa1cfc37491339134",
      "0e233853a3b74211a0b65dcdd001feed",
      "a1d2163af40a4aaab8a765b148845807",
      "2aaa722b303b4e1f823cee828fc7958c"
     ]
    },
    "executionInfo": {
     "elapsed": 130259,
     "status": "ok",
     "timestamp": 1718959891215,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "-5RLd6dI-Ytm",
    "outputId": "fb085ff7-e06f-4142-8e95-5ff98b212e37"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5ecafba0f8f04685a56d2b1495baea24",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "184bbd6daf424ddeba7f8cbe0b5b34d9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0a18c83d2645496797d74aef5e84dafa",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6b81bad9c639454980de2a67b414b988",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3cf79ac9541c4b7bbf686b604ff73b81",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9ec56d025f8446a08184b055fe11598e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/931 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2cff15834149418c81eee5239a5e275a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
      "- configuration_phi3.py\n",
      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7e0710dc5b5c4002b4bcc189bf5514cf",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
      "- modeling_phi3.py\n",
      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.\n",
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dcea31213c9f421abc7ffabc3499ecb9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "34058533e3cf46a88db8927372102b9f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dcb9240335394bfa8d3949ef1cdbcdf8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "36b7269d8eb849b084694fd1f3b177b9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f5f5b592768048169676e09cca453645",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2d31d51641e945f695f7315b68e0ad2e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
    "\n",
    "# Load model and tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"microsoft/Phi-3-mini-4k-instruct\",\n",
    "    device_map=\"cuda\",\n",
    "    torch_dtype=\"auto\",\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "\n",
    "# Create a pipeline\n",
    "generator = pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    return_full_text=False,\n",
    "    max_new_tokens=50,\n",
    "    do_sample=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "REqcz-ID_XgV"
   },
   "source": [
    "# The Inputs and Outputs of a Trained Transformer LLM\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 4955,
     "status": "ok",
     "timestamp": 1718959896168,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "17h6TPHluJ-i",
    "outputId": "18727eeb-ccd6-40f8-aab1-25c8d9a03cbe"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n",
      "Solution 1:\n",
      "\n",
      "Subject: My Sincere Apologies for the Gardening Mishap\n",
      "\n",
      "\n",
      "Dear Sarah,\n",
      "\n",
      "\n",
      "I hope this message finds you well. I am writing to express my deep\n"
     ]
    }
   ],
   "source": [
    "prompt = \"Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
    "\n",
    "output = generator(prompt)\n",
    "\n",
    "print(output[0]['generated_text'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1,
     "status": "ok",
     "timestamp": 1718959898745,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "eoFkdTd6_g5o",
    "outputId": "bdcfde9f-28b7-4f43-ec0c-32c16677a776"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Phi3ForCausalLM(\n",
      "  (model): Phi3Model(\n",
      "    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)\n",
      "    (embed_dropout): Dropout(p=0.0, inplace=False)\n",
      "    (layers): ModuleList(\n",
      "      (0-31): 32 x Phi3DecoderLayer(\n",
      "        (self_attn): Phi3Attention(\n",
      "          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)\n",
      "          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)\n",
      "          (rotary_emb): Phi3RotaryEmbedding()\n",
      "        )\n",
      "        (mlp): Phi3MLP(\n",
      "          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)\n",
      "          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)\n",
      "          (activation_fn): SiLU()\n",
      "        )\n",
      "        (input_layernorm): Phi3RMSNorm()\n",
      "        (resid_attn_dropout): Dropout(p=0.0, inplace=False)\n",
      "        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)\n",
      "        (post_attention_layernorm): Phi3RMSNorm()\n",
      "      )\n",
      "    )\n",
      "    (norm): Phi3RMSNorm()\n",
      "  )\n",
      "  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RTrwzB67BYVY"
   },
   "source": [
    "# Choosing a single token from the probability distribution (sampling / decoding)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sEcxYgJxBYbJ"
   },
   "outputs": [],
   "source": [
    "prompt = \"The capital of France is\"\n",
    "\n",
    "# Tokenize the input prompt\n",
    "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
    "\n",
    "# Tokenize the input prompt\n",
    "input_ids = input_ids.to(\"cuda\")\n",
    "\n",
    "# Get the output of the model before the lm_head\n",
    "model_output = model.model(input_ids)\n",
    "\n",
    "# Get the output of the lm_head\n",
    "lm_head_output = model.lm_head(model_output[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 36
    },
    "executionInfo": {
     "elapsed": 421,
     "status": "ok",
     "timestamp": 1718960391623,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "68YUSS4GBf9Q",
    "outputId": "2dc25e8d-03b6-4bca-b46c-fec3e3a4a492"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'Paris'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "token_id = lm_head_output[0,-1].argmax(-1)\n",
    "tokenizer.decode(token_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 901,
     "status": "ok",
     "timestamp": 1718960415287,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "cWWrfC5oBjwp",
    "outputId": "c2fdeab7-e787-466f-88f4-988cd5f939a6"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([1, 6, 3072])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_output[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1079,
     "status": "ok",
     "timestamp": 1718960424560,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "nC1PdOnTBnxZ",
    "outputId": "1fd5f482-7046-4536-b745-4e681d6ecdaf"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([1, 6, 32064])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lm_head_output.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Of2_rP4QBqrZ"
   },
   "source": [
    "# Speeding up generation by caching keys and values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "B0n6JhNHBrin"
   },
   "outputs": [],
   "source": [
    "prompt = \"Write a very long email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
    "\n",
    "# Tokenize the input prompt\n",
    "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
    "input_ids = input_ids.to(\"cuda\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 47155,
     "status": "ok",
     "timestamp": 1718960517928,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "BwIvt6jSByAF",
    "outputId": "e71c4141-2ca3-488a-fdfb-8d9357af0125"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6.66 s ± 2.22 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1\n",
    "# Generate the text\n",
    "generation_output = model.generate(\n",
    "  input_ids=input_ids,\n",
    "  max_new_tokens=100,\n",
    "  use_cache=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 152674,
     "status": "ok",
     "timestamp": 1718960670601,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "dFb1dcvJByCW",
    "outputId": "0aba6a01-9bc7-40b7-e2e1-e064f13b4c88"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "21.9 s ± 94.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1\n",
    "# Generate the text\n",
    "generation_output = model.generate(\n",
    "  input_ids=input_ids,\n",
    "  max_new_tokens=100,\n",
    "  use_cache=False\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: chapter04/Chapter 4 - Text Classification.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "g_a9QvUFVCUR"
   },
   "source": [
    "<h1>Chapter 4 - Text Classification</h1>\n",
    "<i>Classifying text with both representative and generative models</i>\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
    "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
    "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter04/Chapter%204%20-%20Text%20Classification.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "This notebook is for Chapter 4 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
    "\n",
    "---\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
    "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
    "\n",
    "\n",
    "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
    "\n",
    "---\n",
    "\n",
    "💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
    "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install transformers sentence-transformers openai\n",
    "# !pip install -U datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UBeVnXxQWy7-"
   },
   "source": [
    "# **Data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 9938,
     "status": "ok",
     "timestamp": 1709737297789,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -60
    },
    "id": "5phRS_z2U_3T",
    "outputId": "27f79175-2ec3-4922-e0ba-5bffd56c82cd"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    train: Dataset({\n",
       "        features: ['text', 'label'],\n",
       "        num_rows: 8530\n",
       "    })\n",
       "    validation: Dataset({\n",
       "        features: ['text', 'label'],\n",
       "        num_rows: 1066\n",
       "    })\n",
       "    test: Dataset({\n",
       "        features: ['text', 'label'],\n",
       "        num_rows: 1066\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "# Load our data\n",
    "data = load_dataset(\"rotten_tomatoes\")\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 15,
     "status": "ok",
     "timestamp": 1709737297790,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -60
    },
    "id": "xJJmaJzHDLZv",
    "outputId": "04501032-aed3-425c-8d70-b069a34c280b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'text': ['the rock is destined to be the 21st century\\'s new \" conan \" and that he\\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',\n",
       "  'things really get weird , though not particularly scary : the movie is all portent and no content .'],\n",
       " 'label': [1, 0]}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[\"train\"][0, -1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xya5dfmVoR1R"
   },
   "source": [
    "# **Text Classification with Representation Models**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "co68g-Eloknf"
   },
   "source": [
    "## **Using a Task-specific Model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 17052,
     "status": "ok",
     "timestamp": 1709737314828,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -60
    },
    "id": "ph-3T3XJopdN",
    "outputId": "62abf01e-ba0f-42fc-a8f3-fb467b02583b"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
      "  return self.fget.__get__(instance, owner)()\n",
      "Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']\n",
      "- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "/usr/local/lib/python3.10/dist-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar functionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "# Path to our HF model\n",
    "model_path = \"cardiffnlp/twitter-roberta-base-sentiment-latest\"\n",
    "\n",
    "# Load model into pipeline\n",
    "pipe = pipeline(\n",
    "    model=model_path,\n",
    "    tokenizer=model_path,\n",
    "    return_all_scores=True,\n",
    "    device=\"cuda:0\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 37634,
     "status": "ok",
     "timestamp": 1709737352458,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -60
    },
    "id": "B2gbnL5Q69Y5",
    "outputId": "11c80fd6-0609-429f-caa4-443ee7298bbe"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1066/1066 [00:37<00:00, 28.25it/s]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
   
Download .txt
gitextract_kmx38cwi/

├── .setup/
│   ├── README.md
│   └── conda/
│       ├── README.md
│       └── common_issues.md
├── LICENSE
├── README.md
├── bonus/
│   ├── 2_deeplearningai.md
│   ├── 3_quantization.md
│   ├── 4_mamba.md
│   ├── 5_mixture_of_experts.md
│   ├── 6_stable_diffusion.md
│   ├── 7_reasoning_llms.md
│   ├── 8_deepseek_r1.md
│   ├── 9_agents.md
│   └── README.md
├── chapter01/
│   ├── Chapter 1 - Introduction to Language Models.ipynb
│   └── README.md
├── chapter02/
│   ├── Chapter 2 - Tokens and Token Embeddings.ipynb
│   └── README.md
├── chapter03/
│   ├── Chapter 3 - Looking Inside LLMs.ipynb
│   └── README.md
├── chapter04/
│   ├── Chapter 4 - Text Classification.ipynb
│   └── README.md
├── chapter05/
│   ├── Chapter 5 - Text Clustering and Topic Modeling.ipynb
│   └── README.md
├── chapter06/
│   ├── Chapter 6 - Prompt Engineering.ipynb
│   └── README.md
├── chapter07/
│   ├── Chapter 7 - Advanced Text Generation Techniques and Tools.ipynb
│   └── README.md
├── chapter08/
│   ├── Chapter 8 - Semantic Search.ipynb
│   └── README.md
├── chapter09/
│   ├── Chapter 9 - Multimodal Large Language Models.ipynb
│   └── README.md
├── chapter10/
│   ├── Chapter 10 - Creating Text Embedding Models.ipynb
│   └── README.md
├── chapter11/
│   ├── Chapter 11 - Fine-Tuning BERT.ipynb
│   └── README.md
├── chapter12/
│   ├── Chapter 12 - Fine-tuning Generation Models.ipynb
│   └── README.md
├── environment.yml
├── requirements.txt
└── requirements_min.txt
Condensed preview — 41 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,499K chars).
[
  {
    "path": ".setup/README.md",
    "chars": 1534,
    "preview": "# Setup Instructions\r\n\r\nHere you will find several methods of running the code found in the book. There are two preferr"
  },
  {
    "path": "LICENSE",
    "chars": 11364,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 11802,
    "preview": "# Hands-On Large Language Models\r\n\r\n<a href=\"https://www.linkedin.com/in/jalammar/\"><img src=\"https://img.shields.io/ba"
  },
  {
    "path": "bonus/2_deeplearningai.md",
    "chars": 1102,
    "preview": "# [How Transformer LLMs Work](https://www.deeplearning.ai/short-courses/how-transformer-llms-work/?utm_campaign=handson"
  },
  {
    "path": "bonus/README.md",
    "chars": 2461,
    "preview": "# Bonus Material\r\n\r\nWith the incredible growth of Language AI in recent years, capturing everything in a single book (e"
  },
  {
    "path": "chapter01/Chapter 1 - Introduction to Language Models.ipynb",
    "chars": 5772,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"EDe7DsPWmEBV\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter02/Chapter 2 - Tokens and Token Embeddings.ipynb",
    "chars": 127945,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"g_a9QvUFVCUR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter03/Chapter 3 - Looking Inside LLMs.ipynb",
    "chars": 27433,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"adFzzFsB-Ofl\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter04/Chapter 4 - Text Classification.ipynb",
    "chars": 36196,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"g_a9QvUFVCUR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter05/Chapter 5 - Text Clustering and Topic Modeling.ipynb",
    "chars": 1501479,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"PIa-kJ-Frdm-\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter06/Chapter 6 - Prompt Engineering.ipynb",
    "chars": 45003,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"-ETtu9CvVMDR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter07/Chapter 7 - Advanced Text Generation Techniques and Tools.ipynb",
    "chars": 34589,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"-ETtu9CvVMDR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter08/Chapter 8 - Semantic Search.ipynb",
    "chars": 72826,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"_A2SZPmbD4Pk\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter09/Chapter 9 - Multimodal Large Language Models.ipynb",
    "chars": 2180051,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"-ETtu9CvVMDR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter10/Chapter 10 - Creating Text Embedding Models.ipynb",
    "chars": 80686,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"-ETtu9CvVMDR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter11/Chapter 11 - Fine-Tuning BERT.ipynb",
    "chars": 186003,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"g_a9QvUFVCUR\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "chapter12/Chapter 12 - Fine-tuning Generation Models.ipynb",
    "chars": 64444,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"WpBVeU0XX8Uk\"\n   },\n   \"source\": [\n    \"<h1>Ch"
  },
  {
    "path": "environment.yml",
    "chars": 6352,
    "preview": "name: thellmbook\nchannels:\n  - conda-forge\n  - defaults\ndependencies:\n  - bzip2=1.0.8\n  - ca-certificates=2024.8.30\n  - "
  },
  {
    "path": "requirements.txt",
    "chars": 823,
    "preview": "# Data handling\nnumpy == 1.26.4\npandas == 2.2.2\ndatasets == 2.20.0\n\n# Environment\njupyterlab == 4.2.2\nipywidgets == 8.1."
  },
  {
    "path": "requirements_min.txt",
    "chars": 945,
    "preview": "# Data handling\nnumpy >= 1.26.4\npandas >= 2.2.2\ndatasets >= 2.20.0\n\n# Environment\njupyterlab >= 4.2.2\nipywidgets >= 8.1."
  }
]

// ... and 21 more files (download for full content)

About this extraction

This page contains the full source code of the HandsOnLLM/Hands-On-Large-Language-Models GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 41 files (4.2 MB), approximately 1.1M tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!