[
  {
    "path": ".gitignore",
    "content": "# Python-generated files\n__pycache__/\n*.py[oc]\nbuild/\ndist/\nwheels/\n*.egg-info\n\n# Virtual environments\n.venv\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2025 Will Kurt\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Token Explorer\n\nToken Explorer allows you to interactively explore the token generation process of an LLM, using a \"video game\" style interface. You can use either arrow keys or vim-style navigation (h/j/k/l) along with WASD keys.\n\nToken explore allows you to:\n- Choose a starting prompt, or provide your own text file.\n- Step through generation one token at a time using either:\n  * Arrow keys to navigate, pop and append tokens\n  * Vim-style keys: h/l to pop/append, j/k to move up/down\n- View the token probabilities and entropies.\n- Add a copy of your current prompt to the list of prompts.\n- Cycle through the prompts by pressing `w` and `s`.\n- Add and remove prompts from the list with `a` and `d`.\n- Automatically uses the best available device (CUDA > MPS > CPU).\n\n\n## Running the app\n\nToken Explore uses `uv` for project management. Please see the [uv docs](https://docs.astral.sh/uv/getting-started/installation/) for more information.\n\nOnce you have `uv` installed, you can install the dependencies and run the app with:\n\n```bash \nuv run main.py\n```\n\nIn the model has a default prompt, but you can provide any text file as an argument to the app.\n\n```bash\nuv run main.py --input <path_to_file>\n```\n\nIf you're using regex structs, you can precompile them with:\n\n```bash\nuv run main.py --precompile\n```\n\n## Usage\n\nWhen you start the app you will see your prompt as well as a table of the top 30 tokens and their probabilities.\n\n![Starting prompt](./imgs/starting_prompt.png)\n\nThe idea of Token Explorer is to make it very easy to explore the space of possible prompts and token generations from an LLM. To use Token Explorer it's best to treat your keyboard like you're playing a video game: put your left hand on WASD and your right hand on the arrow keys.\n\n### Basic Usage\n\nUse the up and down arrow keys to navigate the table. Use 'k'/'j' keys to select the current token so LLM can start generate the next one.\n\n![Navigating the table](./imgs/navigating_table.png)\n\nHere you can see I've highlighted the token \"very\" which has a probability of 0.03. Pressing the right arrow key or 'l' will append this token to the end of your prompt. Then it will display the next set of tokens.\n\n![Appending a token](./imgs/appending_token.png)\n\nIf you want to go back and reselect the token, you can use the left arrow key or 'h' to pop the token back off the end of the prompt.\n\nTo **quit** the app, you can press `ctrl+q`.\n\nYou can also save your current prompt by pressing `x`. This will save the prompt to the `prompts` folder.\n\n### Adding prompts\n\nOne of the goals of Token Explorer is to make it easy to play around with alternate methods of prompting. To faciliate this, Token Explorer allows you to duplicate your current prompt and add it to the list of prompts by pressing 'd'. In this image below we've added a copy of our current prompt to the list of prompts and are now at propmt 2 of 2:\n\n![Adding a prompt](./imgs/add_a_prompt.png)\n\nYou can cycle through the prompts by pressing 'w' and 's', making it easy to try out different possible paths for your prompt, all while acting like you are the models sampler!\n\nIf you want to experiment with dramatically different prompts, you should write these out in a text file and pass them as an argument to the app.\n\n## Visualization Layers\n\nToken Explorer has a few different visualization layers that you can toggle on and off.\n\n### Token Probabilities\n\nIt can be very helpful to see the probabilities of each token when generated, in part so we can see where our model might be going wrong. You can press `e` to toggle the probability view.\n\n![Probabilities](./imgs/probabilities.png)\n\nIn the image above we've used the entire output of an LLM as our prompt. This allows us to understand better what the model was reasoning about when it generated the output. Notice for example that the model was basically certain the answer was 72.\n\n### Token Entropies\n\nAnother way to understand the model's reasoning is to look at the entropy of the token distribution. Entropy represents how uncertain it is about the next token chosen. The highest (normalized) entropy is 1 (which means all tokens look like reasonable choices). The lowest is 0 (which means the model is certain about the next token).\n\nYou can simply press `e` again to enable the entropy view.\n\n![Entropy](./imgs/entropies.png)\n\nPressing `e` again will return you back to the default view.\n\n## Example Workflow\n\nLet's try to understand our GSM8K prompt a bit better. The plaintext prompt is:\n\n```\nQuestion: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\nReasoning: Natalia sold 48/2 = <<48/2=24>>24 clips in May. Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\nAnswer: 72\n```\n\nFirst let's understand the model's answer. We'll start by loading the prompt into Token Explorer, and then backup using the `left` arrow key until we get to the answer token.\n\n![Backing up to the answer](./imgs/backing_up_to_answer.png)\n\nHere we can see that the model was basically certain about the answer, which makes sense given that the prompt is a simple arithmetic problem. As we can see, the model assigns a probability of essentially 1 to the answer starting with '7'. Recall that we could also see this visually be looking at the 'probabilities' layer.\n\nIt looks like our model is doing great, but let's go back to the entropy layer to see if we can find places to explore. Notice that the token 'Natalia' has higher entropy than the other tokens, which means the model is more uncertain about which token to choose next.\n\n![Entropy](./imgs/natalia.png)\n\nI'm curious what's happening there. I want to back up, but at the same time, don't want to lose my place in the prompt. I can use the `d` copy my prompt as a new prompt.\n\n![Swapping prompts](./imgs/workflow1.png)\n\nNow I can rewind until the token 'Natalia' and see if we can understand what's happening there, while still preserving my place in the prompt.\n\nWhen we look at the probabilities for the next token, we can see that the model is confused between a range of choices.\n\n![Probabilities](./imgs/workflow2.png)\n\nI'm curious if this part was important for our getting the correct answer. To explore we'll:\n\n- Create a copy of this point in the prompt with 'd'\n- use 'right' arrow to fill until the end.\n\nHere's the probability view for the new prompt:\n\n![Exploring](./imgs/workflow3.png)\n\nYou can see both that we *did* still get the correct answer, and that the path I chose was fairly low probability for a bit. So we've learned something interesting! Even if we perturb the prompt to a low-probability path in the middle of it's reasoning, we still get the correct answer!\n\n## Experimental Support For Structured Outputs\n\nToken Explorer now supports structured outputs!\n\nCurrently Token Explorer comes with 5 precompiled regex structs:\n- eu_date\n- iso_date\n- us_date\n- month_name\n- name\n\nYou'll see your current struct at the bottom of the prompt like so:\n\n![Struct](./imgs/current_struct.png)\n\nBy pressing `Shift+r` you can cycle through the structs.\n\n![Toggle struct](./imgs/toggle_struct.png)\n\nWhen you press `r` the structure will be activated. This will change it's color to green and you will only see tokens that adhere to that structure.\n\n![Struct active](./imgs/struct_active.png)\n\nThe structure will eventually reach a final state where the structure is satisfied (though there may be multiple ways to satisfy the structure). In this case the named structure will be highlighted in red.\n\n![Struct final](./imgs/struct_final.png)\n\nFinally if you complete the defined structure, the structure will automatically be toggled off, allowing you to choose from among the unstructured tokens.\n\n![Struct complete](./imgs/struct_complete.png)\n\n### Adding structure\n\nYou can add your own structure by putting a .txt file in the `struct` folder. The file should contain a Python regex pattern.\n\n**Warning:** This is an experimental feature and the regex pattern can currently take a loooong time to compile (if they're sophisticated). Once compiled they take up very little memory, and are fast to run. To precompile your structs, you can use the `--precompile` flag.\n\nThe precompile structs are stored in `src/.cache/` as `.pkl` files. 5 of them are precompiled in the repo for easy demoing.\n\n\n## Configuration\n\nThe configuration is done in the `config.toml` file. The only thing you might want to change is the `model` section, which defaults to `Qwen/Qwen2.5-0.5B`. However Token Explorer is *far* from optimized for performance, so it's best to use a smaller model for now.\n"
  },
  {
    "path": "config.toml",
    "content": "# Model Configuration\n[model]\nname = \"Qwen/Qwen2.5-0.5B\"        # Model identifier\n\n# Prompt Settings\n[prompt]\nexample_prompt = \"Once upon a time, there was a\"\nmax_prompts = 9\n\n# Display Settings\n[display]\ntokens_to_show = 30          # Number of tokens to display in preview\n"
  },
  {
    "path": "demo_prompt.txt",
    "content": "This is a story aobut"
  },
  {
    "path": "main.py",
    "content": "from ast import literal_eval\nfrom itertools import cycle\nfrom src.explorer import Explorer\nfrom src.utils import entropy_to_color, probability_to_color\nfrom textual.app import App, ComposeResult, Binding\nfrom textual.containers import VerticalScroll\nfrom textual.reactive import reactive\nfrom textual.widgets import Footer, Header, Static, DataTable\nfrom textwrap import dedent\nimport sys\nimport os\nimport argparse\nimport tomli \nfrom datetime import datetime\n\n\ndef load_config():\n    try:\n        with open(\"config.toml\", \"rb\") as f:\n            return tomli.load(f)\n    except FileNotFoundError:\n        print(\"Config file not found, using default values\")\n        return {\n            \"model\": \"Qwen/Qwen2.5-0.5B\",\n            \"example_prompt\": \"Once upon a time, there was a\",\n            \"tokens_to_show\": 30,\n            \"max_prompts\": 9\n        }\n\nconfig = load_config()\nMODEL_NAME = config[\"model\"][\"name\"]\nEXAMPLE_PROMPT = config[\"prompt\"][\"example_prompt\"]\nTOKENS_TO_SHOW = config[\"display\"][\"tokens_to_show\"]\nMAX_PROMPTS = config[\"prompt\"][\"max_prompts\"]\n\nclass TokenExplorer(App):\n    \"\"\"Main application class.\"\"\"\n\n    display_modes = cycle([\"prompt\", \"prob\", \"entropy\"])\n    display_mode = reactive(next(display_modes))\n\n    BINDINGS = [(\"e\", \"change_display_mode\", \"Mode\"),\n                (\"left,h\", \"pop_token\", \"Back\"),\n                (\"right,l\", \"append_token\", \"Add\"),\n                (\"d\", \"add_prompt\", \"New\"),\n                (\"a\", \"remove_prompt\", \"Del\"),\n                (\"w\", \"increment_prompt\", \"Next\"),\n                (\"s\", \"decrement_prompt\", \"Prev\"),\n                (\"x\", \"save_prompt\", \"Save\"),\n                (\"j\", \"select_next\", \"Down\"),\n                (\"k\", \"select_prev\", \"Up\"),\n                (\"r\", \"toggle_struct\", \"Toggle struct\"),\n                (\"R\", \"next_struct\", \"Next struct\")\n\n                ]\n    \n    \n    def __init__(self, prompt=EXAMPLE_PROMPT, precompile=False):\n        super().__init__()\n        # Add support for multiple prompts.\n        self.prompts = [prompt]\n        self.prompt_index = 0\n        self.explorer = Explorer(MODEL_NAME)\n        self.explorer.set_prompt(prompt)\n        self.rows = self._top_tokens_to_rows(\n            self.explorer.get_top_n_tokens(n=TOKENS_TO_SHOW)\n            )\n        self.selected_row = 0  # Track currently selected token row\n        self.regex_structs = self._get_regex_structs()\n        # this is the position of the stuct in the prompt\n        self.struct_index = None\n        # this is the position of the struct in the regex_structs list\n        self.current_struct_index = 0\n        if precompile:\n            self.precompile_regex_structs()\n\n    def precompile_regex_structs(self):\n        print(\"Precompiling regex structs, this may take a while...\")\n        for name, regex in self.regex_structs:\n            print(name)\n            self.explorer.set_guide(regex)\n            self.explorer.clear_guide()\n        self.explorer.clear_guide()\n\n    def _get_regex_structs(self):\n        try:\n            struct_files = []\n            # Get all files in struct directory\n            for file in os.listdir(\"struct\"):\n                if file.endswith(\".txt\"):\n                    file_path = os.path.join(\"struct\", file)\n                    try:\n                        with open(file_path, \"r\") as f:\n                            # Get first line and strip whitespace\n                            regex = f.readline().strip()\n                            # Remove file extension and add tuple\n                            name = os.path.splitext(file)[0]\n                            struct_files.append((name, str(literal_eval(regex))))\n                    except:\n                        # Skip files that can't be read\n                        continue\n            return struct_files\n        except FileNotFoundError:\n            return []\n\n    def _top_tokens_to_rows(self, tokens):\n        return [(\"token_id\", \"token\", \"prob\")] + [\n            (token[\"token_id\"], token[\"token\"], token[\"probability\"])\n            for token in tokens\n        ]\n        \n    def compose(self) -> ComposeResult:\n        yield Header()\n        with VerticalScroll():\n            yield Static(id=\"results\")\n            yield DataTable(id=\"table\")\n        yield Footer()\n\n    def _refresh_table(self):\n        table = self.query_one(DataTable)\n        self.rows = self._top_tokens_to_rows(\n            self.explorer.get_top_n_tokens(n=TOKENS_TO_SHOW)\n            )\n        table.clear()\n        table.add_rows(self.rows[1:])\n        # Reset cursor to top\n        self.selected_row = 0\n        table.move_cursor(row=self.selected_row)\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n    \n\n    def _render_structure_section(self):\n        struct_section = \"\"\n        if self.explorer.guide_is_finished():\n            struct_section = f\"[on red]{self.regex_structs[self.current_struct_index][0]}[/on]\"\n        elif self.struct_index is not None:\n            struct_section = f\"[on green]{self.regex_structs[self.current_struct_index][0]}[/on]\"\n\n        else:\n            struct_section = f\"[on grey]{self.regex_structs[self.current_struct_index][0]}[/on]\"\n        return struct_section\n    \n    def _render_prompt(self):\n        if self.display_mode == \"entropy\":\n            entropy_legend = \"\".join([\n                f\"[on {entropy_to_color(i/10)}] {i/10:.2f} [/on]\"\n                for i in range(11)\n                ])\n            prompt_legend = f\"[bold]Token entropy:[/bold]{entropy_legend}\"\n            token_entropies = self.explorer.get_prompt_token_normalized_entropies()\n            token_strings = self.explorer.get_prompt_tokens_strings()\n            prompt_text = \"\".join(f\"[on {entropy_to_color(entropy)}]{token}[/on]\" for token, entropy in zip(token_strings, token_entropies))\n        elif self.display_mode == \"prob\":\n            prob_legend = \"\".join([\n                f\"[on {probability_to_color(i/10)}] {i/10:.2f} [/on]\"\n                for i in range(11)\n                ])\n            prompt_legend = f\"[bold]Token prob:[/bold]{prob_legend}\"\n            token_probs = self.explorer.get_prompt_token_probabilities()\n            token_strings = self.explorer.get_prompt_tokens_strings()\n            prompt_text = \"\".join(f\"[on {probability_to_color(prob)}]{token}[/on]\" for token, prob in zip(token_strings, token_probs))\n        else:\n            prompt_text = self.explorer.get_prompt()\n            prompt_legend = \"\"\n        return dedent(f\"\"\"\n{prompt_text}\n\n\n\n\n\n{prompt_legend}\n[bold]Prompt[/bold] {self.prompt_index+1}/{len(self.prompts)} tokens: {len(self.explorer.prompt_tokens)}\n[bold]Struct[/bold] {self._render_structure_section()}\n\"\"\")\n    \n    def on_mount(self) -> None:\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n        table = self.query_one(DataTable)\n        table.add_columns(*self.rows[0])\n        table.add_rows(self.rows[1:])\n        table.cursor_type = \"row\"\n\n    def action_next_struct(self):\n        self.current_struct_index = (self.current_struct_index + 1) % len(self.regex_structs)\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n\n    def action_toggle_struct(self):\n        if self.struct_index is None:\n            # this is the theoretical index of the first\n            # structure token when structured gen is activated\n            # even though that token *doesn't* exist yet.\n            # this track to help with backtracking.\n            self.struct_index = len(self.explorer.get_prompt_tokens())\n            self.explorer.set_guide(self.regex_structs[self.current_struct_index][1])\n        else:\n            self.struct_index = None\n            self.explorer.clear_guide()\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n        self._refresh_table()\n        \n    def action_add_prompt(self):\n        if len(self.prompts) < MAX_PROMPTS:\n            self.prompts.append(self.explorer.get_prompt())\n            self.prompt_index = (self.prompt_index + 1) % len(self.prompts)\n            self.explorer.set_prompt(self.prompts[self.prompt_index])\n            self.query_one(\"#results\", Static).update(self._render_prompt())\n            self._refresh_table()\n\n    def action_remove_prompt(self):\n        if len(self.prompts) > 1:\n            self.prompts.pop(self.prompt_index)\n            self.prompt_index = (self.prompt_index - 1) % len(self.prompts)\n            self.explorer.set_prompt(self.prompts[self.prompt_index])\n            self.query_one(\"#results\", Static).update(self._render_prompt())\n            self._refresh_table()\n    \n    def action_increment_prompt(self):\n        self.prompt_index = (self.prompt_index + 1) % len(self.prompts)\n        self.explorer.set_prompt(self.prompts[self.prompt_index])\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n        self._refresh_table()\n\n    def action_decrement_prompt(self):\n        self.prompt_index = (self.prompt_index - 1) % len(self.prompts)\n        self.explorer.set_prompt(self.prompts[self.prompt_index])\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n        self._refresh_table()\n\n    def action_change_display_mode(self):\n        self.display_mode = next(self.display_modes)\n        self.query_one(\"#results\", Static).update(self._render_prompt())\n\n    def action_save_prompt(self):\n        with open(f\"prompts/prompt_{self.prompt_index}_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.txt\", \"w\") as f:\n            f.write(self.explorer.get_prompt())\n\n    def action_select_next(self):\n        \"\"\"Move selection down one row\"\"\"\n        if self.selected_row < len(self.rows) - 2:  # -2 for header row\n            self.selected_row += 1\n            table = self.query_one(DataTable)\n            table.move_cursor(row=self.selected_row)\n            \n    def action_select_prev(self):\n        \"\"\"Move selection up one row\"\"\"\n        if self.selected_row > 0:\n            self.selected_row -= 1\n            table = self.query_one(DataTable)\n            table.move_cursor(row=self.selected_row)\n\n    def action_append_token(self):\n        \"\"\"Append currently selected token\"\"\"\n        # TODO: here we need to distinguish between a dead and finished guide\n\n        table = self.query_one(DataTable)\n        if table.cursor_row is not None:\n            if len(self.rows) > (table.cursor_row+1):\n                self.explorer.append_token(self.rows[table.cursor_row+1][0])\n                if self.explorer.guide_is_dead():\n                    self.explorer.clear_guide()\n                    self.struct_index = None\n                self.prompts[self.prompt_index] = self.explorer.get_prompt()\n                self._refresh_table()  # This will reset cursor position\n\n    def action_pop_token(self):\n        if len(self.explorer.get_prompt_tokens()) > 1:\n            self.explorer.pop_token()\n            if self.explorer.guide is not None:\n                self.explorer.clear_guide()\n                #  need to add logic for backtracking the guide\n                self.explorer.set_guide(self.regex_structs[self.current_struct_index][1]\n                                        ,ff_from=self.struct_index)\n            self.prompts[self.prompt_index] = self.explorer.get_prompt()\n            self.query_one(\"#results\", Static).update(self._render_prompt())\n            self._refresh_table()\n\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description='Token Explorer Application')\n    parser.add_argument('--input', '-i', type=str, help='Path to input text file')\n    parser.add_argument('--precompile', '-p', action='store_true', help='Precompile regex structs')\n    args = parser.parse_args()\n\n    prompt = EXAMPLE_PROMPT\n    if args.input:\n        try:\n            with open(args.input, 'r') as f:\n                prompt = f.read()\n        except FileNotFoundError:\n            print(f\"Error: Could not find input file '{args.input}'\")\n            sys.exit(1)\n        except Exception as e:\n            print(f\"Error reading file: {e}\")\n            sys.exit(1)\n    app = TokenExplorer(prompt, args.precompile)\n\n    app.run()\n"
  },
  {
    "path": "prompts/about.txt",
    "content": "This is a directory where all of your quick save prompt end up, just in case you find a real gem!\n"
  },
  {
    "path": "prompts/date_demo.txt",
    "content": "Remember, the date of this this year's (2025) Fourth of July party is: \n"
  },
  {
    "path": "pyproject.toml",
    "content": "[project]\nname = \"token-explorer\"\nversion = \"0.1.0\"\ndescription = \"Add your description here\"\nreadme = \"README.md\"\nrequires-python = \">=3.12\"\ndependencies = [\n    \"greenery>=4.2.2\",\n    \"httpx>=0.28.1\",\n    \"jupyter>=1.1.1\",\n    \"pytest>=8.3.5\",\n    \"textual>=2.1.2\",\n    \"textual-dev>=1.7.0\",\n    \"tomli>=2.2.1\",\n    \"torch>=2.6.0\",\n    \"transformers>=4.49.0\",\n]\n"
  },
  {
    "path": "simple_prompt.txt",
    "content": "Once upon a time, there was a"
  },
  {
    "path": "src/__init__.py",
    "content": ""
  },
  {
    "path": "src/explorer.py",
    "content": "\"\"\"\nThis code is used to process an LLM one token at at time.\n\nThe Explorer class manages the prompt internally and handles all interactions with the LLM.\n\"\"\"\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\nimport numpy as np\nfrom src.simpleguide import SimpleGuide\nclass Explorer:\n    def __init__(self, model_name=\"Qwen/Qwen2.5-0.5B\"):\n        \"\"\"\n        Initialize the Explorer with a model name.\n        \n        Args:\n            model_name: Name of the model to load (default \"Qwen/Qwen2.5-0.5B\")\n        \"\"\"\n        self.model_name = model_name\n        self.tokenizer = AutoTokenizer.from_pretrained(model_name)\n        self.model = AutoModelForCausalLM.from_pretrained(model_name)\n        \n        # Auto select device (CUDA > MPS > CPU)\n        if torch.cuda.is_available():\n            self.device = torch.device(\"cuda\")\n        elif hasattr(torch.backends, \"mps\") and torch.backends.mps.is_available():\n            self.device = torch.device(\"mps\")\n        else:\n            self.device = torch.device(\"cpu\")\n        self.model = self.model.to(self.device)\n        self.guide = None\n        \n        # Initialize with empty promp\n        self.prompt_text = \"\"\n        self.prompt_tokens = []\n    \n\n    def clear_guide(self):\n        self.guide = None\n\n    def set_guide(self, regex_struct,ff_from=None):\n        self.clear_guide()\n        self.guide = SimpleGuide(regex_struct, self.tokenizer)\n        if ff_from is not None:\n            for token in self.prompt_tokens[ff_from:]:\n                self.guide.advance(token)\n\n\n    def set_prompt(self, prompt_text):\n        \"\"\"\n        Set the current prompt text and update the encoded tokens.\n        \n        Args:\n            prompt_text: The prompt text to set\n        \"\"\"\n        self.prompt_text = prompt_text\n        self.prompt_tokens = self.tokenizer.encode(prompt_text)\n        return self\n    \n\n    def get_prompt_token_probabilities(self):\n        \"\"\"\n        Calculate the probability of each token in the sequence given its preceding context,\n        using a single forward pass.\n        \n        Args:\n            self: The Explorer object\n        Returns:\n            list: A list of probabilities for each token in the sequence\n        \"\"\"\n        # Convert token IDs to tensor and create input\n        input_ids = torch.tensor([self.prompt_tokens]).to(self.device)\n        \n        # Get the model's output in a single forward pass\n        with torch.no_grad():\n            outputs = self.model(input_ids)\n            logits = outputs.logits[0]  # Shape: [sequence_length, vocab_size]\n        \n        # Calculate probabilities for each position\n        token_probabilities = []\n        \n        # First token has no context, so we'll use None or some default\n        token_probabilities.append(0.5)\n        \n        # For each position after the first\n        for pos in range(len(self.prompt_tokens) - 1):\n            # The logits at position 'pos' predict the token at position 'pos+1'\n            position_logits = logits[pos]\n            position_probs = torch.softmax(position_logits, dim=-1)\n            \n            # Get probability of the actual next token\n            next_token_id = self.prompt_tokens[pos + 1]\n            next_token_prob = position_probs[next_token_id].item()\n            \n            token_probabilities.append(next_token_prob)\n        return token_probabilities\n    \n    def get_prompt_token_normalized_entropies(self):\n        # Convert token IDs to tensor and create input\n        input_ids = torch.tensor([self.prompt_tokens]).to(self.device)\n        \n        # Get the model's output in a single forward pass\n        with torch.no_grad():\n            outputs = self.model(input_ids)\n            logits = outputs.logits[0]  # Shape: [sequence_length, vocab_size]\n        \n        # Calculate normalized entropy for each position\n        normalized_entropies = []\n        \n        # First token has no context, so we'll use None or some default\n        normalized_entropies.append(0.5)\n        \n        # For each position after the first\n        for pos in range(len(self.prompt_tokens) - 1):\n            # The logits at position 'pos' predict the token at position 'pos+1'\n            position_logits = logits[pos]\n            position_probs = torch.softmax(position_logits, dim=-1)\n            \n            # Calculate entropy: -sum(p * log(p))\n            # We filter out zeros to avoid log(0) issues\n            probs_np = position_probs.cpu().numpy()\n            non_zero_probs = probs_np[probs_np > 0]\n            entropy = -np.sum(non_zero_probs * np.log2(non_zero_probs))\n            \n            # Normalize by maximum possible entropy (log2 of vocabulary size)\n            max_entropy = np.log2(len(position_probs))\n            normalized_entropy = entropy / max_entropy\n            \n            normalized_entropies.append(normalized_entropy)\n        \n        return normalized_entropies\n\n\n    def get_prompt(self):\n        \"\"\"\n        Get the current prompt text.\n        \n        Returns:\n            The current prompt text\n        \"\"\"\n        return self.prompt_text\n    \n    def get_prompt_tokens(self):\n        \"\"\"\n        Get the current encoded prompt tokens.\n        \n        Returns:\n            List of token ids representing the current prompt\n        \"\"\"\n        return self.prompt_tokens\n    \n    def get_prompt_tokens_strings(self):\n        \"\"\"\n        Get the current prompt tokens as a string.\n        \"\"\"\n        return [self.tokenizer.decode(token) for token in self.prompt_tokens]\n    \n    def pop_token(self):\n        \"\"\"\n        NOTE: Need to handle the guide in this case.\n        Remove and return the last token from the prompt tokens.\n        If the prompt is empty, return None.\n        \n        Returns:\n            The removed token id, or None if prompt was empty\n        \"\"\"\n        if not self.prompt_tokens:\n            return None\n            \n        # Pop last token and update prompt text\n        last_token = self.prompt_tokens.pop()\n        self.prompt_text = self.tokenizer.decode(self.prompt_tokens)\n        \n        return last_token\n    \n    def append_token(self, token_id):\n        \"\"\"\n        Append a token to the current prompt tokens and update prompt text.\n        \n        Args:\n            token_id: The token id to append\n        \"\"\"\n        # Add token to prompt tokens\n        self.prompt_tokens.append(token_id)\n        \n        # Update prompt text to match new tokens\n        self.prompt_text = self.tokenizer.decode(self.prompt_tokens)\n        if self.guide is not None:\n            self.guide.advance(token_id)\n\n        \n        return self\n    \n    def guide_is_finished(self):\n        if self.guide is not None:\n            return self.guide.is_finished()\n        return False\n    \n    def guide_is_dead(self):\n        if self.guide is not None:\n            return self.guide.is_dead()\n        return False\n    \n    def get_top_n_tokens(self, n=5, search=\"\"):\n        \"\"\"\n        Get the top n most likely next tokens given the current prompt.\n        Optionally filter tokens by a search string.\n        \n        Args:\n            n: Number of top tokens to return (default 5)\n            search: Optional string to filter tokens (default \"\")\n            \n        Returns:\n            List of dicts containing token info and probabilities, sorted by probability\n        \"\"\"\n        # Get model output for the encoded prompt\n        with torch.no_grad():\n            outputs = self.model(torch.tensor([self.prompt_tokens]).to(self.device))\n            \n        # Get logits for the next token\n        next_token_logits = outputs.logits[0, -1, :]\n        \n        # Get probabilities using softmax\n        next_token_probs = torch.nn.functional.softmax(next_token_logits, dim=0)\n\n        if self.guide is not None:\n            allowed_tokens = self.guide.get_tokens()\n            allowed_tokens_mask = torch.zeros(len(next_token_probs), device=next_token_logits.device)\n            allowed_tokens_mask[allowed_tokens] = 1.0\n            next_token_probs =  next_token_probs * allowed_tokens_mask\n            # renormalize the probabilities\n            next_token_probs = next_token_probs / next_token_probs.sum()\n        if search:\n            # Filter tokens that contain the search string\n            matching_tokens = []\n            for idx, prob in enumerate(next_token_probs):\n                token = self.tokenizer.decode(idx)\n                if search.lower() in token.lower():\n                    matching_tokens.append({\n                        \"token_id\": idx,\n                        \"token\": token,\n                        \"probability\": prob.item()\n                    })\n            \n            # Sort by probability and take top n\n            matching_tokens.sort(key=lambda x: x[\"probability\"], reverse=True)\n            if self.guide is not None:\n                # make sure that the token id is in the allowed tokens\n                matching_tokens = [token for token in matching_tokens if token[\"token_id\"] in allowed_tokens]\n            return matching_tokens[:n]\n        else:\n            # Original behavior for no search string\n            top_probs, top_indices = torch.topk(next_token_probs, n)\n            \n            results = []\n            for prob, idx in zip(top_probs, top_indices):\n                token = self.tokenizer.decode(idx)\n                results.append({\n                    \"token\": token,\n                    \"token_id\": idx.item(),\n                    \"probability\": prob.item()\n                })\n            if self.guide is not None:\n                # make sure that the token id is in the allowed tokens\n                results = [token for token in results if token[\"token_id\"] in allowed_tokens]\n            return results\n\n\"\"\"\nAttempting to replicate the basic api of outlines-core, but\nwe're going to try to reduce the memory footprint and make it more efficient.\n\n\"\"\"\n\n# Example usage\nif __name__ == \"__main__\":\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")  \n    # test the RegexGuide\n    guide = RegexGuide(r'a{1,5}', tokenizer)\n    print(\"Tokens:\", guide.get_tokens())\n    guide.advance('a')\n    print(\"Tokens:\", guide.get_tokens())\n    guide.advance('a')\n    print(\"Tokens:\", guide.get_tokens())\n    explorer = Explorer()\n    explorer.set_prompt(\"Once upon a time, there was a\")\n    \n    print(\"Prompt:\", explorer.get_prompt())\n    print(\"Encoded prompt:\", explorer.get_prompt_tokens())\n    print(\"-----\")\n    print(\"Top tokens:\", explorer.get_top_n_tokens())\n    print(\"-----\")\n    print(\"Filtered tokens:\", explorer.get_top_n_tokens(search=\"man\"))\n    print(\"-----\")\n    print(\"Appending token:\", explorer.get_top_n_tokens(search=\"man\")[0])\n    explorer.append_token(explorer.get_top_n_tokens(search=\"man\")[0][\"token_id\"])\n    print(\"-----\")\n    print(\"Prompt:\", explorer.get_prompt())\n    print(\"Encoded prompt:\", explorer.get_prompt_tokens())\n    print(\"-----\")\n    print(\"Popping token:\", explorer.pop_token())\n    print(\"-----\")\n    print(\"Prompt:\", explorer.get_prompt()) \n    print(\"Token probabilities:\", explorer.get_prompt_token_probabilities())\n    print(\"-----\")\n    print(\"Token entropies:\", explorer.get_prompt_token_normalized_entropies())\n    explorer.set_guide(r'a{1,5}')\n    print(\"-----\")\n    print(\"Top tokens:\", explorer.get_top_n_tokens())\n    print(\"-----\")\n    print(\"Guide is finished:\", explorer.guide.is_finished())\n\n"
  },
  {
    "path": "src/simpleguide.py",
    "content": "from transformers import AutoTokenizer\nimport re\nimport time\nfrom greenery import parse\nimport pickle\nimport hashlib\nimport os\n\n\n\nclass TrieNode:\n    def __init__(self):\n        self.children = {}\n        self.token_id = None\n        \nclass TokenTrie:\n    def __init__(self):\n        self.root = TrieNode()\n        \n    def insert(self, token_meta):\n        node = self.root\n        word = token_meta['str']\n        token_id = token_meta['id']\n        for char in word:\n            if char not in node.children:\n                node.children[char] = TrieNode()\n            node = node.children[char]\n        node.token_id = token_id\n\n    def collect_valid_tokens(self, state, fsm):\n        node_state_stack = [(self.root, state)]\n        valid_tokens = []\n        while node_state_stack:\n            node, s = node_state_stack.pop()\n            for c, next_node in node.children.items():\n                for cc, next_state in fsm.map[s].items():\n                    if cc.accepts(c) and fsm.islive(next_state):\n                        if next_node.token_id:\n                            valid_tokens.append(next_node.token_id)\n                        node_state_stack.append((next_node, next_state))\n        return valid_tokens\n\nclass SimpleGuide:\n    \"\"\"\n    A minimal guide for structured generation, based on the greenery library for regex parsing.\n    \"\"\"\n    def __init__(self, regex_struct, tokenizer, no_cache=False, verbose=False):\n        self.regex_struct = re.compile(regex_struct)\n        start_time = time.time()\n        self.pattern = parse(regex_struct)\n        end_time = time.time()\n        if verbose:\n            print(f\"Regex parsing time: {(end_time - start_time) * 1000:.2f}ms\")\n        start_time = time.time()\n        self.fsm = self.pattern.to_fsm()\n        end_time = time.time()\n        if verbose:\n            print(f\"FSM construction time: {(end_time - start_time) * 1000:.2f}ms\")\n        self.tokenizer = tokenizer\n        # get the string representation of the tokenizer vocabulary\n        self.vocab = tokenizer.get_vocab()\n        self.vocab_list = [{\n            'id': value,\n            'str': tokenizer.decode(value)\n            } for value in self.vocab.values()]\n        self.string_so_far = \"\"\n        self.tokens_so_far = []\n        self.finished = False\n        self.build_token_str_trie()\n        self.build_state_token_map(no_cache, verbose)\n    \n    def build_token_str_trie(self):\n        self.token_str_trie = TokenTrie()\n        for item in self.vocab_list:\n            self.token_str_trie.insert(item)\n\n    def build_state_token_map(self, no_cache=False, verbose=False):\n        # Create a unique hash for this regex and tokenizer combination\n        cache_key = hashlib.md5(\n            f\"{self.regex_struct}_{self.tokenizer.name_or_path}\".encode()\n        ).hexdigest()\n        \n        cache_dir = os.path.join(os.path.dirname(__file__), \".cache\")\n        cache_file = os.path.join(cache_dir, f\"state_token_map_{cache_key}.pkl\")\n        \n        # Try to load from cache first\n        if os.path.exists(cache_file) and not no_cache:\n            try:\n                with open(cache_file, 'rb') as f:\n                    self.state_token_map = pickle.load(f)\n                return\n            except Exception as e:\n                print(f\"Cache load failed: {e}\")\n        \n        # If cache doesn't exist or fails, build the map\n        start_time = time.time()\n        self.state_token_map = {}\n        for state in self.fsm.states:\n            self.state_token_map[state] = self.token_str_trie.collect_valid_tokens(state, self.fsm)\n        end_time = time.time()\n        if verbose:\n            print(f\"State token map construction time: {(end_time - start_time) * 1000:.2f}ms\")\n        # Save to cache\n        os.makedirs(cache_dir, exist_ok=True)\n        if not no_cache:\n            try:\n                with open(cache_file, 'wb') as f:\n                    pickle.dump(self.state_token_map, f)\n            except Exception as e:\n                print(f\"Cache save failed: {e}\")\n\n    def get_current_state(self, candidate, state=None):\n        if state is None:\n            state = self.fsm.initial\n        for c in candidate:\n            valid_next = [state for cc, state in self.fsm.map[state].items() if cc.accepts(c) and self.fsm.islive(state)]\n            if len(valid_next) == 0:\n                return None\n            state = valid_next[0]\n        return state\n    \n    def is_potential_prefix(self, candidate):\n        state = self.get_current_state(candidate)\n        return state is not None\n    \n    def get_tokens(self):\n        \"\"\"\n        Appends the token to the string_so_far (temporarily) and returns the ids of the tokens that match the current regex\n        given the string so far.\n        \"\"\"\n        # Here's where we can distinguish between in a finished state and when dead.\n        if self.finished:\n            return [self.tokenizer.eos_token_id]\n        matching_tokens = self.state_token_map[self.get_current_state(self.string_so_far)]\n        return matching_tokens \n\n    def advance(self, token_id):\n        if token_id == self.tokenizer.eos_token_id:\n            self.finished = True\n            return self\n        self.string_so_far += self.tokenizer.decode(token_id)\n        self.tokens_so_far.append(token_id)\n        return self\n\n    def is_finished(self):\n        # Might want to also check if it's dead.\n        finished=self.get_current_state(self.string_so_far) in self.fsm.finals\n        dead=not self.fsm.islive(self.get_current_state(self.string_so_far))\n        return finished or dead\n    \n    def is_dead(self):\n        current_state = self.get_current_state(self.string_so_far)\n        live_states = [val for val in self.fsm.map[current_state].values() if self.fsm.islive(val)]\n        return len(live_states) == 0\n        return not self.fsm.islive(current_state)\n\ndef test_guide_loading():\n    import sys\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    print(\"Vocab size:\", len(tokenizer.get_vocab()))\n    start_time = time.time()\n    #guide = SimpleGuide(r'(0?[1-9]|[12]\\d|3[01])/(0?[1-9]|1[0-2])/\\d{4}', tokenizer)\n    #regex = r'\\w{5} \\w{5} \\w{5}\\n'\n    regex = r'\\s?(January|February|March|April|May|June|July|August|September|October|November|December)\\s+(0?[1-9]|[12]\\d|3[01]),\\s+\\d{4}'\n    guide = SimpleGuide(regex, tokenizer, no_cache=True, verbose=True)\n    end_time = time.time()\n    loading_time = (end_time - start_time) * 1000  # Convert to milliseconds\n    print(f\"Guide loading time: {loading_time:.2f}ms\")\n\n    start_time = time.time()\n    for _ in range(10000):\n        guide.get_current_state(\"1\")\n    end_time = time.time()\n    print(f\"Current state time: {(end_time - start_time) * 1000 / 10000:.2f}ms\")\n    start_time = time.time()\n    tokens = guide.get_tokens()\n    end_time = time.time()\n    #print([tokenizer.decode(token) for token in tokens])\n    print(f\"Tokens time: {(end_time - start_time) * 1000:.2f}ms\")\n    # get the size of the pickled guide\n    print(f\"SimpleGuide size: {sys.getsizeof(pickle.dumps(guide)) / 1024 / 1024:.2f}MB\")\n\nif __name__ == \"__main__\":\n    test_guide_loading()\n"
  },
  {
    "path": "src/utils.py",
    "content": "def probability_to_color(probability, alpha=1.0):\n    \"\"\"\n    Maps a probability value (0.0-1.0) to a color on a blue-red scale.\n    Blue represents high probability (1.0)\n    Red represents low probability (0.0)\n    \n    Args:\n        probability (float): Probability value between 0.0 and 1.0\n        alpha (float, optional): Alpha/opacity value between 0.0 and 1.0. Defaults to 1.0.\n    \n    Returns:\n        str: RGBA color string (format: 'rgba(r, g, b, a)')\n    \"\"\"\n    # Ensure probability is in valid range\n    probability = max(0, min(1, probability))\n    \n    # Red component (high when probability is low)\n    red = int(255 * (1 - probability))\n    \n    # Blue component (high when probability is high)\n    blue = int(255 * probability)\n    \n    # Green component (kept at 0 for a cleaner red-blue gradient)\n    green = 0\n    \n    # Return rgba string\n    return f\"rgba({red}, {green}, {blue}, {alpha})\"\n\ndef entropy_to_color(entropy, alpha=1.0):\n    \"\"\"\n    Maps a normalized entropy value (0.0-1.0) to a grayscale color.\n    White (255,255,255) represents highest entropy (1.0)\n    Black (0,0,0) represents lowest entropy (0.0)\n    \n    Args:\n        entropy (float): Normalized entropy value between 0.0 and 1.0\n        alpha (float, optional): Alpha/opacity value between 0.0 and 1.0. Defaults to 1.0.\n    \n    Returns:\n        str: RGBA color string (format: 'rgba(r, g, b, a)')\n    \"\"\"\n    # Ensure entropy is in valid range\n    entropy = max(0, min(1, entropy))\n    \n    # For grayscale, all RGB components have the same value\n    # Higher entropy = lighter color (closer to white)\n    value = int(255 * entropy)\n    \n    # Return rgba string\n    return f\"rgba({value}, {value}, {value}, {alpha})\""
  },
  {
    "path": "struct/eu_date.txt",
    "content": "r'\\s?(0?[1-9]|[12]\\d|3[01])/(0?[1-9]|1[0-2])/\\d{4}'"
  },
  {
    "path": "struct/iso_date.txt",
    "content": "r'\\s?\\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\\d|3[01])'\n"
  },
  {
    "path": "struct/month_name.txt",
    "content": "r'\\s?(January|February|March|April|May|June|July|August|September|October|November|December)\\s+(0?[1-9]|[12]\\d|3[01]),\\s+\\d{4}'"
  },
  {
    "path": "struct/name.txt",
    "content": "r'\\s?[A-Z][a-z]{3,10}\\s[A-Z][a-z]{4,16}'"
  },
  {
    "path": "struct/us_date.txt",
    "content": "r'\\s?(0?[1-9]|1[0-2])/(0?[1-9]|[12]\\d|3[01])/\\d{4}'"
  },
  {
    "path": "tests/__init__.py",
    "content": ""
  },
  {
    "path": "tests/test_explorer.py",
    "content": "from src.explorer import Explorer\n\ndef test_get_prompt_token_probabilities():\n    explorer = Explorer()\n    explorer.set_prompt(\"Hello, world\")\n    probabilities = explorer.get_prompt_token_probabilities()\n    assert probabilities[0] == 0.5 # first token has no context\n    assert len(probabilities) == len(explorer.prompt_tokens)\n\ndef test_get_top_n_tokens():\n    explorer = Explorer()\n    explorer.set_prompt(\"Hello, world\")\n    tokens = explorer.get_top_n_tokens(n=5)\n    assert len(tokens) == 5\n    assert tokens[0][\"token\"] == \"!\"\n\ndef test_guide():\n    explorer = Explorer()\n    explorer.set_prompt(\"Hello, world\")\n    explorer.set_guide(\"ba+\")\n    tokens = explorer.get_top_n_tokens(n=5)\n    assert len(tokens) == 2 # actually only 2 valid tokens\n    assert tokens[0][\"token\"][0] == \"b\"\n\ndef test_guide_append_token():\n    explorer = Explorer()\n    tokenizer = explorer.tokenizer\n    explorer.set_prompt(\"Hello, world\")\n    explorer.set_guide(\"abc\")\n    tokens = explorer.get_top_n_tokens(n=5)\n    assert tokens[0][\"token\"][0] == \"a\"\n    explorer.append_token(tokenizer.encode(\"a\")[0])\n    tokens = explorer.get_top_n_tokens(n=5)\n    assert tokens[0][\"token\"][0] == \"b\""
  },
  {
    "path": "tests/test_simpleguide.py",
    "content": "from src.simpleguide import SimpleGuide\nfrom transformers import AutoTokenizer\n\ndef test_get_tokens():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    # test a basic regex\n    guide = SimpleGuide(\"a+\", tokenizer)\n    decoded_tokens = [tokenizer.decode(token) for token in guide.get_tokens()]\n    for token in decoded_tokens:\n        for i in range(len(token)):\n            assert token[i] == \"a\"\n\ndef test_advance():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\"abbbc\", tokenizer)\n    guide.advance(tokenizer.encode(\"a\")[0])\n    decoded_tokens = [tokenizer.decode(token) for token in guide.get_tokens()]\n    for token in decoded_tokens:\n        assert token[0] == \"b\"\n\ndef test_is_finished_single_finish():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\"abc\", tokenizer)\n    guide.advance(tokenizer.encode(\"a\")[0])\n    assert not guide.is_finished()\n    guide.advance(tokenizer.encode(\"b\")[0])\n    assert not guide.is_finished()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert guide.is_finished()\n\ndef test_is_finished_multiple_finish():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\"abc{1,2}\", tokenizer)\n    guide.advance(tokenizer.encode(\"a\")[0])\n    assert not guide.is_finished()\n    guide.advance(tokenizer.encode(\"b\")[0])\n    assert not guide.is_finished()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert guide.is_finished()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert guide.is_finished()\n\ndef test_is_dead():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\"abc\", tokenizer)\n    guide.advance(tokenizer.encode(\"a\")[0])\n    assert not guide.is_dead()\n    guide.advance(tokenizer.encode(\"b\")[0])\n    assert not guide.is_dead()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert guide.is_dead()\n\ndef test_is_dead_multiple_finish():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\"abc{1,2}\", tokenizer)\n    guide.advance(tokenizer.encode(\"a\")[0])\n    assert not guide.is_dead()\n    guide.advance(tokenizer.encode(\"b\")[0])\n    assert not guide.is_dead()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert not guide.is_dead() and guide.is_finished()\n    guide.advance(tokenizer.encode(\"c\")[0])\n    assert guide.is_dead()\n\ndef test_spaces():\n    tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-0.5B\")\n    guide = SimpleGuide(\" ab\", tokenizer, no_cache=True)\n    assert any([cc.accepts(' ') for cc in guide.fsm.map[0].keys()])\n    assert tokenizer.encode(\" \")[0] in guide.state_token_map[0]\n\n    \n"
  }
]